This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/polly/
-
polly/
-
CodeGen/
4/7
BlockGenerators.h
-
ScopDetection.h
-
Support/
1/6
ScopHelper.h
2/2
TempScopInfo.h
-
lib/
-
Analysis/
2/2
ScopDetection.cpp
10/17
TempScopInfo.cpp
-
CodeGen/
9/12
BlockGenerators.cpp
-
CodeGeneration.cpp
-
Support/
4/7
SCEVValidator.cpp
-
ScopHelper.cpp
-
test/
-
Isl/CodeGen/
-
CodeGen/
-
inner_scev_sdiv_2.ll
-
loop_with_conditional_entry_edge_split_hard_case.ll
-
phi_loop_carried_float.ll
-
phi_loop_carried_float_escape.ll
-
phi_scalar_simple_1.ll
-
phi_scalar_simple_2.ll
-
ScopDetect/
-
keep_going_expansion.ll
3/5
multidim_indirect_access.ll
-
non-affine-loop-condition-dependent-access_2.ll
-
non-affine-loop-condition-dependent-access_3.ll
-
phi_with_multi_exiting_edges.ll
-
phi_with_multi_exiting_edges_2.ll
-
simple_non_single_entry.ll
-
ScopInfo/NonAffine/
-
NonAffine/
-
non-affine-loop-condition-dependent-access_2.ll
-
non-affine-loop-condition-dependent-access_3.ll

Differential D11870

[Polly] Allow PHI nodes in exit blocks
AbandonedPublic

Authored by Meinersbur on Aug 8 2015, 11:29 AM.

Download Raw Diff

Details

Reviewers

grosser

Summary

In case there are dependences from multiple scalars inside the region to a single PHI node in the exit block of the region, it is unclear which value needs to be stored in the PHI node.

Example:

bb:
  br %cond, stmtA, stmt B,

stmtA:
  br merge

stmtB:
  br merge

merge:
  %val = phi [%valA, %stmtA], [%valB, stmtB]
  ret %val

For the region (bb -> merge), the value that needs to be stored in %val is %valA, if the PHI node is reached through %stmtA, and %valB, if the PHI node is reached through %stmtB. Hence, at code generation time we can not just store the value of a specific scalar in the PHI node, but need to "choose" between the values of multiple scalars. For this we use the same approach as we use for PHI nodes in general, we add virtual writes to a PHI-node location and then read back the original value in the exiting block.

This patch removes the check in ScopDetection for PHI nodes in the exit node and teaches TempScopInfo to generate access information for them. The PHI nodes are later found and handled by CodeGeneration to reload the writes into a scalar.

Modification of the IR before CodeGeneration is avoided because we do not want the IR changed if there are Scop-level optimizations to be applied or we do analysis only.

Diff Detail

Event Timeline

Meinersbur updated this revision to Diff 31589.Aug 8 2015, 11:29 AM

Meinersbur retitled this revision from to [Polly] Make TempScopInfo handle PHI nodes in exit block.

Meinersbur updated this object.

Meinersbur added a reviewer: grosser.

Meinersbur added a project: Restricted Project.

Meinersbur added a subscriber: pollydev.

This all seems like a really huge hack in every level of Polly just to allow these PHI nodes. If we really want them in the region why not split the basic block first and change the region tree? That transformation is easily reversable and would remove code from all main Polly parts

In D11870#219807, @jdoerfert wrote:

This all seems like a really huge hack in every level of Polly just to allow these PHI nodes. If we really want them in the region why not split the basic block first and change the region tree? That transformation is easily reversable and would remove code from all main Polly parts

This is how I already tried to argue with Tobias. There is actually code in IndependentBlock that does this already, but never called.

According Tobias we are actively working towards making no changes to the IR before code generation, in case Polly is used for analysis only.

In D11870#219867, @Meinersbur wrote:

In D11870#219807, @jdoerfert wrote:

This all seems like a really huge hack in every level of Polly just to allow these PHI nodes. If we really want them in the region why not split the basic block first and change the region tree? That transformation is easily reversable and would remove code from all main Polly parts

This is how I already tried to argue with Tobias. There is actually code in IndependentBlock that does this already, but never called.

According Tobias we are actively working towards making no changes to the IR before code generation, in case Polly is used for analysis only.

Ok, some thoughts from me:

I would argue that we can modify the CFG if it is trivially reversable.
I am still not sure why we need the PHI nodes in the region. Why not make code gen able to handle PHI nodes in the exit? Is there a benefit in this way I do not see or a major problem in the other one?
I would like good reasons to change 4-5 layers of Polly code and add special cases all over the place.

Can we discuss this on wednesday or in a thread?

First, thank you Michael for pushing all these changes out. I did not yet manage to
look into all of them yet, but will do so tomorrow (evening?).

Hi Johannes,

thanks for joining the discussion. Michael and me discussed this topic already on the phone, but it would probably have better to either do this via email or to send out meeting notes at least. Sorry for having missed this.

Let me first give you the missing context.

Michael found two weeks ago that Polly does not detect a lot of SCoPs if executed late in the pass pipeline and that in many cases Polly bails out due to PHI nodes in the exit basic blocks. The reason why there are so many PHI nodes in exit basic blocks is that LLVM's -lcssa pass introduces PHI nodes after loops for each value that is defined inside a loop and used after.

For example:

define i64 @foo() {                                                              
start:                                                                           
  br label %loop                                                                 
                                                                                 
loop:                                                                            
  %indvar = phi i64 [0, %start], [%indvar.next, %loop]                           
  %indvar.next = add i64 %indvar, 1                                              
  %val = add i64 %indvar, %indvar                                                
  %cmp = icmp eq i64 %indvar.next, 100                                           
  br i1 %cmp, label %loop, label %end                                            
                                                                                 
end:                                                                             
  ret i64 %val                                                                   
}

opt -lcssa

define i64 @foo() {
start:
  br label %loop

loop:                                             ; preds = %loop, %start
  %indvar = phi i64 [ 0, %start ], [ %indvar.next, %loop ]
  %indvar.next = add i64 %indvar, 1
  %val = add i64 %indvar, %indvar
  %cmp = icmp eq i64 %indvar.next, 100
  br i1 %cmp, label %loop, label %end

end:                                              ; preds = %loop
  %val.lcssa = phi i64 [ %val, %loop ]
  ret i64 %val.lcssa
}

As you see we now have a new %val.lcssa phi node in the code, that prevents us from detecting the SCoP.

This is case 1) of PHI-node in exit node. It has the important property that only a _single_ edge from inside the region goes into the PHI node into the exit node.

However, there _may_ also be cases where more than one edge goes from inside
the scop to the exit block.

define @foo() {

entry:
   %X
   br label %scop_start

scop_start
   br i1 %cond, label %stmtA, label %stmtB

stmtA:
  %A
  br label %exit

stmtB
  %B
  br label %exit

exit:
  %val = phi i64 [%A, %stmtA], [%B, %stmtB], [%X, %entry]
  ret i64 %val

}

This case 2) is more complicated as the value that needs to be written into %val, can be coming from multiple places from inside the SCoP and to my understanding the main reason for the changes TempScop and ScopInfo is the need to model certain data-dependences to support this second case.

Now, are there alternatives

A) Split the block ``before'' Polly

A) Do not handle case 2), but just focus on case 1)

I would argue that we can modify the CFG if it is trivially reversable.

Both of you already suggested option A). For option A) we need to realize that we can actually not split before Polly, but only in between ScopDetection and TempScop/ScopInfo. At an earlier point we had a large set of transformations in between the two passes and unfortunately some of them triggered surprising bugs due to them invalidating ScopDetection unexpectedly. (I remember that some seemingly unrelated IR changes invalided the BasicBlock alias analysis). To be fully save, we now _always_ rerun ScopDetection before TempScopInfo to ensure the SCoP we look at is still valid. At some point, I would like to get rid of this additional verification pass and the easiest way to get confidence that nothing breaks is to just _not_ touch anything in between. (There are also issues with multiple scops that affect each other, but this may possibly be resolved with an updated detection->modeling->transformation order when we are moving to the new pass manager). Another reason for avoiding transformations is that I would like Polly to be usable as an analysis only. Many of the concepts in the pass manager rely on analysis to not touch the IR and I am sure chandler is going to exploit a lot more of this freedom in the new pass manager. Hence, I would like to not introduce a hack that may cause trouble (or debugging costs) later on.

I am still not sure why we need the PHI nodes in the region. Why not make code gen able to handle PHI nodes in the exit? Is there a benefit in this way I do not see or a major problem in the other one?

Support for case 2) is why this can not be handled in codegen only. I did not find the time to look through all the
patches, but this does not seem to be clear from the proposed commit messages at least.

I would like good reasons to change 4-5 layers of Polly code and add special cases all over the place.

Right. When Michael and me discussed this two weeks ago, I was also concerned about the amount of changes and the additional code complexity. My suggestion was at this point to just focus on case 1) which seems to be the common case and only to worry about case 2) if we find cases where supporting it actually matters. To
my understanding we did not yet find such cases, but Michael was convinced he can handle both cases without introducing too much complexity such that even
without such supporting cases it is better to just implement it once and for all
rather than to wait for use cases that make it actually necessary.

I did not yet manage to look at the patches in detail, but the change seems to be at least non-trivial now. Michael, did you happen to come across a motivating example
that would justify adding support for case 2)?

AFAIU Johannes would have preferred to have this conversation at the weekly phone conference, but here we go...

In D11870#220048, @grosser wrote:

...
As you see we now have a new %val.lcssa phi node in the code, that prevents us from detecting the SCoP.

This is case 1) of PHI-node in exit node. It has the important property that only a _single_ edge from inside the region goes into the PHI node into the exit node.

This is because there is an LCSSA pass before and LICM tries to preserve that property. I.e. There will nearly always be PHI nodes in the exit loop if the top-most element is a loop.

However, there _may_ also be cases where more than one edge goes from inside
the scop to the exit block.

During my first experiments with LICM I saw them very often. They nearly always occur if there is a reduction-to-scalar in the scop. In my first designs on de-LICM I did not consider that a loop body may read and write the same value, which typically occurs in reductions. During a Wednesday's phone call Tobias presented me several code snippets which I should consider and show exactly this behavior.

A typical example is a simple reduction:

for (size_t i = 0; i<n; i+=1)
  sum += A[i];
use(sum);

Polly has functionality for detecting reductions. If we say that we do not support such kind of loops because of multi-incoming-edge in the exit node, reduction detection becomes somewhat useless.

Additionally, it is very difficult to users of Polly to explain why Polly does not recognize their loop.

This case 2) is more complicated as the value that needs to be written into %val, can be coming from multiple places from inside the SCoP and to my understanding the main reason for the changes TempScop and ScopInfo is the need to model certain data-dependences to support this second case.

Now, are there alternatives

A) Split the block ``before'' Polly

A) Do not handle case 2), but just focus on case 1)

I would argue that we can modify the CFG if it is trivially reversable.

[...]. At an earlier point we had a large set of transformations in between the two passes and unfortunately some of them triggered surprising bugs due to them invalidating ScopDetection unexpectedly. (I remember that some seemingly unrelated IR changes invalided the BasicBlock alias analysis). To be fully save, we now _always_ rerun ScopDetection before TempScopInfo to ensure the SCoP we look at is still valid. At some point, I would like to get rid of this additional verification pass and the easiest way to get confidence that nothing breaks is to just _not_ touch anything in between. [...]

Isn't CodeGeneration the larger problem? We cannot make it not modify the code and multiple CodeGenerations have to be serialized between each other, so one of them may touch the other scop.

I am still not sure why we need the PHI nodes in the region. Why not make code gen able to handle PHI nodes in the exit? Is there a benefit in this way I do not see or a major problem in the other one?

Support for case 2) is why this can not be handled in codegen only. I did not find the time to look through all the
patches, but this does not seem to be clear from the proposed commit messages at least.

A CodeGen-only solution cannot handle the situation with two incoming edges. For the passes before the use of the PHI node is just an exposed value, since used outside the region. To select which of values the PHI node will have we also need to know which exiting edge it was using. This information is not preserved in the scop.

I would like good reasons to change 4-5 layers of Polly code and add special cases all over the place.

Right. When Michael and me discussed this two weeks ago, I was also concerned about the amount of changes and the additional code complexity. My suggestion was at this point to just focus on case 1) which seems to be the common case and only to worry about case 2) if we find cases where supporting it actually matters. To
my understanding we did not yet find such cases, but Michael was convinced he can handle both cases without introducing too much complexity such that even
without such supporting cases it is better to just implement it once and for all
rather than to wait for use cases that make it actually necessary.

I did not yet manage to look at the patches in detail, but the change seems to be at least non-trivial now. Michael, did you happen to come across a motivating example
that would justify adding support for case 2)?

I have multiple arguments to this point:

CodeGeneration is already complicated enough. This approach does not change _anything_ in BlockGenerator. Changes there would have to consider all the generators (Scalar, Vector, (GPU?)), making such a solution even harder.

The largest change (http://reviews.llvm.org/D11867) is because simplifyRegion and executeScopConditionally did not properly update RegionInfo. They did not care for setRegionFor sometimes did not always set the correct region entries/exits or made assumptions that do not always hold (There can be PHI nodes in blocks with a single incoming edge). Sorry, I have no test cases; those were usually only triggered in preliminary modifications, e.g. I added more invocations of RegionInfo::verifyAnalysis. I requested adding a check of RegionInfo::getRegionInfo to the verification, so these changes are necessary in every case.

I would not argue in number of "layers" touched to evaluate the complexity a patch. Renaming llvm::BasicBlock to something else would touch all the layers but does not change the complexity at all. Unless the solution we come up with is to reinstate IndependentBlock::splitExitBlock, there is no way around modifying ScopDetection. It is the very same preliminary patch from Johannes he sent me to look at and he proposes to change CodeGeneration himself. Leaves 2 passes that this patch set modifies: ScopInfo and TempScopInfo (Which will be just one because I am already working on merging them).

I never argued that any change would be "trivial". AFAI remember it was something like "not too complicated".

I'd also argue that this change is not too complicated. All it does is "redefining" the PHI nodes of the exit block to be inside the regions. Since there was no abstraction for that yet, the individual places had to be changed. There are only 5 of them. All of them follow this very same idea. In this sense there are no "special cases" added, there is just a redefinition which is easy to understand.

Adding handling for PHI nodes in the exit block in BlockGenerator would be a special case only for such PHI nodes. In this patch we make the existing code handle the new situation.

IMHO supporting only case 1) but not case 2) would be an ugly workaround of the symptom. I'd very much prefer to make the problem go away altogether.

The examples (all -O3 -polly-position=before-vectorizer)

for (size_t i = 0; i<n; i+=1)
  sum += A[i];
use(sum)

for (long long j = 0; j<n; j+=1) {
  A[j] = B[j] + 1;
  t = 7;
}
use(t);

  float x = 0;
  if (n > 2) {
	  for (int i = 0; i<n; i+=1) {
		A[i] = B[i] + 1;
	  }
	  x = 3;
  }
  return x;

Not all have any parallelism in the classic sense (had to add at least one loop or otherwise ScopDetection would bail out).

Just one of these patterns have to appear and the loop would not be recognized anymore. Maybe inner loops, but those could again make use of one of these patterns) e.g.:

for (long long i = 0; i<n; i+=1) {
  for (long long j = 0; j<n; j+=1) {
    A[i] =+ B[j];
  }
  t = 7;
}
use(t);

(I should add some of these to test cases)

Motivated enough?

Some comments on the actual patch.

include/polly/TempScopInfo.h
272	We use references for Regions here if they cannot be null.
284	We use references for Regions here if they cannot be null.
lib/Analysis/TempScopInfo.cpp
179	I do not understand this condition? Why is it needed and what is the implication?
300	Can we remove the `buildExitPHIAccessFunctions` part if we add here something like: if (&R == &SR) buidAccessFunctions(R, R->getExit(), /* only PHI's */ true); and an `onlyPHIs` parameter in the buildAccessFunctions method?
355	typo

In D11870#220094, @Meinersbur wrote:

AFAIU Johannes would have preferred to have this conversation at the weekly phone conference, but here we go...

I think it does not hurt to get started. Also, like this we actually document our discussion for other interested people.

However, there _may_ also be cases where more than one edge goes from inside
the scop to the exit block.

During my first experiments with LICM I saw them very often. They nearly always occur if there is a reduction-to-scalar in the scop. In my first designs on de-LICM I did not consider that a loop body may read and write the same value, which typically occurs in reductions. During a Wednesday's phone call Tobias presented me several code snippets which I should consider and show exactly this behavior.

A typical example is a simple reduction:
for (size_t i = 0; i<n; i+=1)
  sum += A[i];
use(sum);

Here is the CFG I get when translating this into a function that actually compiles and running it with "polly-clang /tmp/test.c -O3 -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-show"

Polly has functionality for detecting reductions. If we say that we do not support such kind of loops because of multi-incoming-edge in the exit node, reduction detection becomes somewhat useless.

Additionally, it is very difficult to users of Polly to explain why Polly does not recognize their loop.

You are right, in the above case we get this multi-exit edges from region that go into a PHI node. Now, even if we would not support them, Polly would still detect the scop 'for.body -> for.cond.cleanup', but it would miss the conditional branch. My current feeling is that such conditional branches around a single loop are probably not so important, as most optimizations I can think of will be done on the loop itself. Even without support for 2) we would still detect the loop.

Now, maybe including the branch allows us to derive some context information about the loop? E.g. that it is executed at least once?

Looking for an example where we would actually loose loops, I came up with the following:

float foo(long n, float *A) {

float sum = 0;                                                                 
  for (long i = 0; i<n; i+=1)                                                  
  sum += A[i];                                                                 
  for (long i = 0; i<n; i+=1)                                                  
  sum += A[i];                                                                 
return sum;

}

If we do not support the multi-from-scop-edge PHI nodes, we will not be able to form a single scop and consequently will not be able to fuse the loops. I think this is a good motivating example, no?

This case 2) is more complicated as the value that needs to be written into %val, can be coming from multiple places from inside the SCoP and to my understanding the main reason for the changes TempScop and ScopInfo is the need to model certain data-dependences to support this second case.

Now, are there alternatives

A) Split the block ``before'' Polly

A) Do not handle case 2), but just focus on case 1)

I would argue that we can modify the CFG if it is trivially reversable.

[...]. At an earlier point we had a large set of transformations in between the two passes and unfortunately some of them triggered surprising bugs due to them invalidating ScopDetection unexpectedly. (I remember that some seemingly unrelated IR changes invalided the BasicBlock alias analysis). To be fully save, we now _always_ rerun ScopDetection before TempScopInfo to ensure the SCoP we look at is still valid. At some point, I would like to get rid of this additional verification pass and the easiest way to get confidence that nothing breaks is to just _not_ touch anything in between. [...]

Isn't CodeGeneration the larger problem? We cannot make it not modify the code and multiple CodeGenerations have to be serialized between each other, so one of them may touch the other scop.

Sure it is. Still, this does not make the other problem go away.

I am still not sure why we need the PHI nodes in the region. Why not make code gen able to handle PHI nodes in the exit? Is there a benefit in this way I do not see or a major problem in the other one?

Support for case 2) is why this can not be handled in codegen only. I did not find the time to look through all the
patches, but this does not seem to be clear from the proposed commit messages at least.

A CodeGen-only solution cannot handle the situation with two incoming edges. For the passes before the use of the PHI node is just an exposed value, since used outside the region. To select which of values the PHI node will have we also need to know which exiting edge it was using. This information is not preserved in the scop.

Right, that is what you explained me earlier and I do understand this now. However, this was not clear from the proposed commit messages. I think it will become clear to Johannes as well.

I would like good reasons to change 4-5 layers of Polly code and add special cases all over the place.

Right. When Michael and me discussed this two weeks ago, I was also concerned about the amount of changes and the additional code complexity. My suggestion was at this point to just focus on case 1) which seems to be the common case and only to worry about case 2) if we find cases where supporting it actually matters. To
my understanding we did not yet find such cases, but Michael was convinced he can handle both cases without introducing too much complexity such that even
without such supporting cases it is better to just implement it once and for all
rather than to wait for use cases that make it actually necessary.

I did not yet manage to look at the patches in detail, but the change seems to be at least non-trivial now. Michael, did you happen to come across a motivating example
that would justify adding support for case 2)?

I have multiple arguments to this point:

CodeGeneration is already complicated enough. This approach does not change _anything_ in BlockGenerator. Changes there would have to consider all the generators (Scalar, Vector, (GPU?)), making such a solution even harder.

Sorry that this has not become clear. I think the solution you have chosen is the best possible given that I asked to not touch the IR beforehand. I know you put quite some thoughts into why you choose the approach you did. It might be worth sharing these considerations in your commit messages.

The largest change (http://reviews.llvm.org/D11867) is because simplifyRegion and executeScopConditionally did not properly update RegionInfo. They did not care for setRegionFor sometimes did not always set the correct region entries/exits or made assumptions that do not always hold (There can be PHI nodes in blocks with a single incoming edge). Sorry, I have no test cases; those were usually only triggered in preliminary modifications, e.g. I added more invocations of RegionInfo::verifyAnalysis. I requested adding a check of RegionInfo::getRegionInfo to the verification, so these changes are necessary in every case.

Yes, this is a very good point. I did not look at all patches in detail yet, but it seems you did a lot of cleaning/bug-fixing on the way. So probably many of the changes you propose are not only needed for the PHI node handling, but are by themselves beneficial and don't really increase code complexity just for PHI nodes. I already started to go through them, such that we can get them committed quickly.

I never argued that any change would be "trivial". AFAI remember it was something like "not too complicated".

That's what I wrote ;):

"Michael was convinced he can handle both cases without introducing too much complexity"

The solution not being trivial, just requires some thoughts/motivations.

I'd also argue that this change is not too complicated. All it does is "redefining" the PHI nodes of the exit block to be inside the regions. Since there was no abstraction for that yet, the individual places had to be changed. There are only 5 of them. All of them follow this very same idea. In this sense there are no "special cases" added, there is just a redefinition which is easy to understand.

Adding handling for PHI nodes in the exit block in BlockGenerator would be a special case only for such PHI nodes. In this patch we make the existing code handle the new situation.

I did not look into the details of your patches. I obviously dislike any complexity increase, but my current feeling is that it will be hard to go with less complexity and that the complexity added is in the end not too bad.

I think the first point to address is to clarify the motivation of these patches and the implementation choices. (Which we are doing here). I think we are on a good way.

IMHO supporting only case 1) but not case 2) would be an ugly workaround of the symptom. I'd very much prefer to make the problem go away altogether.

OK.

The examples (all -O3 -polly-position=before-vectorizer)

:
for (size_t i = 0; i<n; i+=1)
  sum += A[i];
use(sum)
or
for (long long j = 0; j<n; j+=1) {
  A[j] = B[j] + 1;
  t = 7;
}
use(t);
or
  float x = 0;
  if (n > 2) {
	  for (int i = 0; i<n; i+=1) {
		A[i] = B[i] + 1;
	  }
	  x = 3;
  }
  return x;
Not all have any parallelism in the classic sense (had to add at least one loop or otherwise ScopDetection would bail out).

Just one of these patterns have to appear and the loop would not be recognized anymore. Maybe inner loops, but those could again make use of one of these patterns) e.g.:
for (long long i = 0; i<n; i+=1) {
  for (long long j = 0; j<n; j+=1) {
    A[i] =+ B[j];
  }
  t = 7;
}
use(t);
(I should add some of these to test cases)

Motivated enough?

All these examples will just loose the outermost condition if we do not support 2). As long as we do not know any optimization/transformation why including this outer condition is indeed useful, just detecting the inner loops seems still OK to me.

Now, I think the loop fusion example I gave above is a good motivation.

Best,
Tobias

Just a test if my reply gets added to phabricator.

Tobias

Another phabricator test (This time with inline comments)

Another phabricator test II (This time with inline comments)

Hallo

Test

Another phabricator test III (This time with inline comments)

Hallo

Test

grosser added a comment.

In http://reviews.llvm.org/D11870#220094, @Meinersbur wrote:

AFAIU Johannes would have preferred to have this conversation at the weekly phone conference, but here we go...

This time with inline comments.

Tobias

In D11870#220194, @grosser wrote:
In D11870#220094, @Meinersbur wrote:
A typical example is a simple reduction:
for (size_t i = 0; i<n; i+=1)
  sum += A[i];
use(sum);
Here is the CFG I get when translating this into a function that actually compiles and running it with "polly-clang /tmp/test.c -O3 -mllvm -polly -mllvm -polly-position=before-vectorizer -mllvm -polly-show"

Thanks for the nice graphs.

Polly has functionality for detecting reductions. If we say that we do not support such kind of loops because of multi-incoming-edge in the exit node, reduction detection becomes somewhat useless.

Additionally, it is very difficult to users of Polly to explain why Polly does not recognize their loop.

You are right, in the above case we get this multi-exit edges from region that go into a PHI node. Now, even if we would not support them, Polly would still detect the scop 'for.body -> for.cond.cleanup', but it would miss the conditional branch. My current feeling is that such conditional branches around a single loop are probably not so important, as most optimizations I can think of will be done on the loop itself. Even without support for 2) we would still detect the loop.

Mmh, you are right I only looked whether the PHI nodes have more than one edge and whether it the region could be made smaller, not where the edges come from.

I think Polly could support such cases easily if ScopDetection::isValidExit would not just look for the existence of a PHI node, but instead count the number of edges from the region. If it is exactly one, we are good to go. I'd still expect some problems in how CodeGeneration currently simplifies loops.

Looking for an example where we would actually loose loops, I came up with the following:

float foo(long n, float *A) {
float sum = 0;                                                                 
  for (long i = 0; i<n; i+=1)                                                  
  sum += A[i];                                                                 
  for (long i = 0; i<n; i+=1)                                                  
  sum += A[i];                                                                 
return sum;
}

If we do not support the multi-from-scop-edge PHI nodes, we will not be able to form a single scop and consequently will not be able to fuse the loops. I think this is a good motivating example, no?

Yes. Thank you for coming up with a better example than me.

I am still not sure why we need the PHI nodes in the region. Why not make code gen able to handle PHI nodes in the exit? Is there a benefit in this way I do not see or a major problem in the other one?

Support for case 2) is why this can not be handled in codegen only. I did not find the time to look through all the
patches, but this does not seem to be clear from the proposed commit messages at least.

A CodeGen-only solution cannot handle the situation with two incoming edges. For the passes before the use of the PHI node is just an exposed value, since used outside the region. To select which of values the PHI node will have we also need to know which exiting edge it was using. This information is not preserved in the scop.

Right, that is what you explained me earlier and I do understand this now. However, this was not clear from the proposed commit messages. I think it will become clear to Johannes as well.

An addendum: The handling of scalars gets around this by adding writes to the scop statement. This creates output dependencies (WAW) between them and ensures that the last write is the value we look for. Without these output dependencies, isl is free to reorder the statements such that the wrong definition might be the last one.

I would like good reasons to change 4-5 layers of Polly code and add special cases all over the place.

Right. When Michael and me discussed this two weeks ago, I was also concerned about the amount of changes and the additional code complexity. My suggestion was at this point to just focus on case 1) which seems to be the common case and only to worry about case 2) if we find cases where supporting it actually matters. To
my understanding we did not yet find such cases, but Michael was convinced he can handle both cases without introducing too much complexity such that even
without such supporting cases it is better to just implement it once and for all
rather than to wait for use cases that make it actually necessary.

I did not yet manage to look at the patches in detail, but the change seems to be at least non-trivial now. Michael, did you happen to come across a motivating example
that would justify adding support for case 2)?

I have multiple arguments to this point:

CodeGeneration is already complicated enough. This approach does not change _anything_ in BlockGenerator. Changes there would have to consider all the generators (Scalar, Vector, (GPU?)), making such a solution even harder.

Sorry that this has not become clear. I think the solution you have chosen is the best possible given that I asked to not touch the IR beforehand. I know you put quite some thoughts into why you choose the approach you did. It might be worth sharing these considerations in your commit messages.

We still have to convince Johannes.

I never argued that any change would be "trivial". AFAI remember it was something like "not too complicated".

That's what I wrote ;):

"Michael was convinced he can handle both cases without introducing too much complexity"

The solution not being trivial, just requires some thoughts/motivations.

Sorry for the strawman.

I did not look into the details of your patches. I obviously dislike any complexity increase, but my current feeling is that it will be hard to go with less complexity and that the complexity added is in the end not too bad.

It's probably me as implementer which should dislike complexity increase the most. The least complex solution would be modifying the IR. This one should be the second least complex that does the job.

Now, I think the loop fusion example I gave above is a good motivation.

Thanks again for it :-)

msg-13366-147.dat219 BDownload

Meinersbur added a comment.

In http://reviews.llvm.org/D11870#220194, @grosser wrote:

In http://reviews.llvm.org/D11870#220094, @Meinersbur wrote:

You are right, in the above case we get this multi-exit edges from region that go into a PHI node. Now, even if we would not support them, Polly would still detect the scop 'for.body -> for.cond.cleanup', but it would miss the conditional branch. My current feeling is that such conditional branches around a single loop are probably not so important, as most optimizations I can think of will be done on the loop itself. Even without support for 2) we would still detect the loop.

Mmh, you are right I only looked whether the PHI nodes have more than one edge and whether it the region could be made smaller, not where the edges come from.

I think Polly could support such cases easily if ScopDetection::isValidExit would not just look for the existence of a PHI node, but instead count the number of edges from the region. If it is exactly one, we are good to go. I'd still expect some problems in how CodeGeneration currently simplifies loops.

This is precisely what I was trying to suggest earlier. I assume it requires some changes, but
those should be mostly local to the region simplification in the code generation.

Actually, if this change is simple, it might be worth adding this (and a test case as an intermediate step).

Right, that is what you explained me earlier and I do understand this now. However, this was not clear from the proposed commit messages. I think it will become clear to Johannes as well.

An addendum: The handling of scalars gets around this by adding writes to the scop statement. This creates output dependencies (WAW) between them and ensures that the last write is the value we look for. Without these output dependencies, isl is free to reorder the statements such that the wrong definition might be the last one.

Johannes had some comments here. I let you guys figure this one out.

Best,
Tobias

Hi Michael,

i think we should add a comment and an example that explains why we need to model exiting PHI nodes. Here another
one that might be useful:

test.ll839 BDownload

Here some possible text:

In case there are dependences from multiple scalars inside the region to a single PHI node in the exit block of the region, it is unclear which value needs to be stored in the PHI node.

Example:

bb:
  br %cond, stmtA, stmt B,

stmtA:
  br merge

stmtB:
  br merge

merge:
  %val = phi [%valA, %stmtA], [%valB, stmtB]
  ret %val

Now thinking more about this patch, I wonder if we are not modeling slightly too much. Would it not be sufficient to model the _writes_ to the PHI node location, but to not emit a ScopStmt for the ExitBlock node, the reads in this PHI node nor the scalar write in this PHI node. Instead, we just iterate during code generation through all exit PHI nodes, add a read from the scalar location that belongs to them and replace the multiple exiting edges in this PHI node with the scalar we just read?

Also, I think it makes sense to merge all patches and just track the entire patch in this phabricator review. This review has the most discussion happening and none of these patches can be committed (or even reasoned about in isolation).

Best,
Tobias

lib/Analysis/TempScopInfo.cpp
304	they belong
314	This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not their reads.
352	This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not their reads.
356	point
lib/Support/SCEVValidator.cpp
494	Is this really needed? To my understanding, there is no way a SCEV expression that is used in the scop will reference any of the PHI nodes in the exit block.

In D11870#223181, @grosser wrote:

i think we should add a comment and an example that explains why we need to model exiting PHI nodes. Here another
one that might be useful:

test.ll839 BDownload

Thanks for the support.

Now thinking more about this patch, I wonder if we are not modeling slightly too much. Would it not be sufficient to model the _writes_ to the PHI node location, but to not emit a ScopStmt for the ExitBlock node, the reads in this PHI node nor the scalar write in this PHI node. Instead, we just iterate during code generation through all exit PHI nodes, add a read from the scalar location that belongs to them and replace the multiple exiting edges in this PHI node with the scalar we just read?

Yes, I think it would be sufficient to model only the writes s.t. the write dependencies cause output dependencies (WAW) and that the correct value is written last. We already discussed this and Johannes was not agreeing with me.

However, implementing this would add a many special, potentially buggy, code just for this PHis while we can handle such PHIs like any other with already working and tested code. Also, what would be the advantage of omitting a single read MemoryAccess at the end?

lib/Analysis/TempScopInfo.cpp
300	I refactored this part, has less specialized code now.
314	refactored
lib/Support/SCEVValidator.cpp
494	I am not sure enough to remove this condition. Are you? Could it be the induction variable of a parent loop?

In D11870#223225, @Meinersbur wrote:

In D11870#223181, @grosser wrote:

i think we should add a comment and an example that explains why we need to model exiting PHI nodes. Here another
one that might be useful:

test.ll839 BDownload

Thanks for the support.

Now thinking more about this patch, I wonder if we are not modeling slightly too much. Would it not be sufficient to model the _writes_ to the PHI node location, but to not emit a ScopStmt for the ExitBlock node, the reads in this PHI node nor the scalar write in this PHI node. Instead, we just iterate during code generation through all exit PHI nodes, add a read from the scalar location that belongs to them and replace the multiple exiting edges in this PHI node with the scalar we just read?

Yes, I think it would be sufficient to model only the writes s.t. the write dependencies cause output dependencies (WAW) and that the correct value is written last. We already discussed this and Johannes was not agreeing with me.

However, implementing this would add a many special, potentially buggy, code just for this PHis while we can handle such PHIs like any other with already working and tested code. Also, what would be the advantage of omitting a single read MemoryAccess at the end?

My hope was that the code actually does not need to change so much, but that we can just drop some of what is proposed for inclusion today. We would need no changes to ScopInfo, less changes to TempScopInfo and hopefully no copying of PHIs in CodeGeneration. This idea was meant as a hint of how this code could possibly simplified. If this does not notably reduce complexity, then this idea is probably not so useful.

Best,
Tobias

PS. In your comments you said you addressed some of the comments. Can you possibly upload a revised version of this patch for me to look (and possibly play around)?

lib/Support/SCEVValidator.cpp
494	I am pretty certain none of these conditions is needed. I propose to drop them if we can not find a test case which requires them. My reasoning. Any value that is part of a SCEV needs to dominate the location at which this SCEV is evaluated. The values in the exit block of a scop do not dominate any of the values inside the scop. There is one very funny case which I did not think fully throuhg, which is a scop in the backedge of a larger loop, where the exit of this scop is the header of the larger loop and where some of the PHI nodes in this header are again used in the scop. However, if such a case actually can be created, I am not yet convinced that what you do is right here. Before we put some random instructions in, I propose to create a test-case first or, if this fails, to just place an assert that warns us in case we encounter such piece of code.

msg-23618-150.dat219 BDownload

grosser added a comment.
Now thinking more about this patch, I wonder if we are not modeling slightly too much. Would it not be sufficient to model the _writes_ to the PHI node location, but to not emit a ScopStmt for the ExitBlock node, the reads in this PHI node nor the scalar write in this PHI node. Instead, we just iterate during code generation through all exit PHI nodes, add a read from the scalar location that belongs to them and replace the multiple exiting edges in this PHI node with the scalar we just read?

That would mean 3 changes:

Remove the ScopDetection exit phi stuff (as before)

Create write accesses to the PHI location at the predecessors blocks of the exit and link them to the operand of the PHI for that predecessor.

Right. To my understanding this is already available in Michael's changes.
The only change to TempScopInfo that needs to remain is the following:

void TempScopInfo::buildExitPHIAccessFunctions(Region *R) {
AccFuncSetType Functions;
BasicBlock *ExitBB = R->getExit();
assert(!R->contains(ExitBB));

for (auto I = ExitBB->begin(); isa<PHINode>(I); ++I) {
// TODO: Maybe we can ignore trivial (one predecessor) Phi nodes
auto PHI = cast<PHINode>(I);
buildPHIAccesses(PHI, *R, Functions, nullptr);
}
}

After code generation (which should already allocate and write the operands to unique locations per exit PHI) iterate over all exit PHI nodes and replace the incoming edges that are in the SCoP with the reloaded the value.

Interesting. So you say the writes to unique values will already happen?
Then the question is really only how the PHI node survives the splitting
and how we can replace its operands. That sounds too simple to be true?

Tobias

Meinersbur retitled this revision from [Polly] Make TempScopInfo handle PHI nodes in exit block to [Polly] Allow PHI nodes in exit blocks.Aug 13 2015, 10:48 AM

Merged all 4 patches, fixed updating the RegionInfo in case the exit block is also the entry of a different region.

Added Tobias 2nd example as test case.

Without D12014 and fixing inner_scev.ll, this breaks some test-suite programs. Will try to fix the latter and possibly undo the change in SCEVValidator before committing.

Herald added a subscriber: sanjoy. · View Herald TranscriptAug 13 2015, 10:48 AM

Meinersbur mentioned this in D11868: [Polly] Make CodeGeneration handle PHI nodes in exit block.Aug 13 2015, 10:52 AM

Meinersbur mentioned this in D11869: [Polly] Make ScopInfo handle PHI nodes in exit block.Aug 13 2015, 10:55 AM

Meinersbur mentioned this in D11871: [Polly] Allow PHI nodes in exit block.

My hope was that the code actually does not need to change so much, but that we can just drop some of what is proposed for inclusion today. We would need no changes to ScopInfo,

It's just 3 lines w/o comments/assertions!

less changes to TempScopInfo

I don't think so.

and hopefully no copying of PHIs in CodeGeneration.

But a lot of special handling. Why is copying PHIs bad? We copy a lot of instructions in CodeGeneration.

This idea was meant as a hint of how this code could possibly simplified. If this does not notably reduce complexity, then this idea is probably not so useful.

I have a solution ready now, and think this is more complicated. Is it worth investigating? If yes, why?

PS. In your comments you said you addressed some of the comments. Can you possibly upload a revised version of this patch for me to look (and possibly play around)?

Sorry, was searching bugs in test-suite cases that I wanted to fix in the new upload. Was harder than expected.

lib/Analysis/ScopInfo.cpp
862 ↗	(On Diff #32074)	Not going to commit this

In D11870#223533, @grosser wrote:

Create write accesses to the PHI location at the predecessors blocks of the exit and link them to the operand of the PHI for that predecessor.

Right. To my understanding this is already available in Michael's changes.
The only change to TempScopInfo that needs to remain is the following:

void TempScopInfo::buildExitPHIAccessFunctions(Region *R) {
AccFuncSetType Functions;
BasicBlock *ExitBB = R->getExit();
assert(!R->contains(ExitBB));

for (auto I = ExitBB->begin(); isa<PHINode>(I); ++I) {
// TODO: Maybe we can ignore trivial (one predecessor) Phi nodes
auto PHI = cast<PHINode>(I);
buildPHIAccesses(PHI, *R, Functions, nullptr);
}
}

I removed that function in the most recent patch.

After code generation (which should already allocate and write the operands to unique locations per exit PHI) iterate over all exit PHI nodes and replace the incoming edges that are in the SCoP with the reloaded the value.

Interesting. So you say the writes to unique values will already happen?

By TempScopInfo::buildPHIAccesses, invoked on the PHI node.

Then the question is really only how the PHI node survives the splitting
and how we can replace its operands. That sounds too simple to be true?

The previous exit node by simplifyRegion has been moved behind polly.merge_new_and_old. Its not really relevant since we just have to look at the exposed value list.

One can make this work by just "behaving" as if the last read for the PHI node was there. I still think its more complicated because the PHI node is referenced in IRAccess (e.g. as "BaseAddress") and may assume it is in the region.

Do not create a ScopStmt for the exit block anymore. IRAccesses for it are still generated but not used.

This passes the regression tests, but not the test-suite (maybe for the same reasons as inner_scev.ll)

Hi Michael,

very nice! This patch has now become very small and easy to understand. Thank you for refactoring it once again. I added just a few minor comments, but nothing serious. Feel free to incorporate them following your own judgement.

Best,
Tobias

include/polly/CodeGen/BlockGenerators.h
395	space after 'by'
lib/Analysis/TempScopInfo.cpp
319	To avoid spreading the knowledge about how to handle the exit nodes, you could give this function a parameter "OnlyPHINodes" and then set it to true when calling it for the exit node. Like this this piece of code does not need to know about exit-node specialities and the comment above buildAccessFunctions can explain everything.
362	Would this be a good place for the large comment that explains why we need to model access functions in the exit node?
lib/CodeGen/BlockGenerators.cpp
455	I do not think this change is needed. If I drop this no test case fails and we also will never iterate over the instructions of an exit node, hence 'Base' will always be contained in the region.
458	It seems you are also running this code on ExitingBlocks that do _not_ result from region simplification. PHI nodes in these BasicBlocks never need this code to be run, no? If we keep a note that we simplified the region (and that the exiting node is now not modeled) and only then run this code, this code below could possiby be a little bit shorter and we would also not need to reason about if this code actually does the right thing for ExitNodes that do not result from simplification.
test/Isl/CodeGen/OpenMP/single_loop.ll
46 ↗	(On Diff #32156)	This change is unrelated. It seems to revert 244954 which was needed to adjust this test case to a recent change in upstream LLVM.

This revision is now accepted and ready to land.Aug 15 2015, 1:40 AM

I looked into the solution Tobias and I talked about. It is entirely feasible to let the regular Scalar and Escape magic in the codegen handle these PHI nodes.

My patch is available here (http://reviews.llvm.org/D12051) but it does not yet pass all unit tests (I have to change them as they are changed here I guess) and 3 lnt tests. 2 of the lnt tests are unrelated errors, the third is unspecified yet. The one error was actually discovered a few days ago (the sdiv prameter problem) and for the other one I will write a tests case soon. Anyway, my point is that we do not need that much special code in all Polly passes after all.

jdoerfert added a comment.

I looked into the solution Tobias and I talked about. It is entirely feasible to let the regular Scalar and Escape magic in the codegen handle these PHI nodes.

My patch is available here (http://reviews.llvm.org/D12051) but it does not yet pass all unit tests (I have to change them as they are changed here I guess) and 3 lnt tests. 2 of the lnt tests are unrelated errors, the third is unspecified yet. The one error was actually discovered a few days ago (the sdiv prameter problem) and for the other one I will write a tests case soon. Anyway, my point is that we do not need that much special code in all Polly passes after all.

Johannes, did you have a look at Michael's latest patch? He implemented a codegen-only approach,
which indeed seems to be optimal in terms of code-complexity. I would be interested to read
your opinion (in a review?).

Best,
Tobias

grosser added a comment.

jdoerfert added a comment.

I looked into the solution Tobias and I talked about. It is entirely feasible to let the regular Scalar and Escape magic in the codegen handle these PHI nodes.

My patch is available here (http://reviews.llvm.org/D12051) but it does not yet pass all unit tests (I have to change them as they are changed here I guess) and 3 lnt tests. 2 of the lnt tests are unrelated errors, the third is unspecified yet. The one error was actually discovered a few days ago (the sdiv prameter problem) and for the other one I will write a tests case soon. Anyway, my point is that we do not need that much special code in all Polly passes after all.

Johannes, did you have a look at Michael's latest patch? He implemented a codegen-only approach,
which indeed seems to be optimal in terms of code-complexity. I would be interested to read
your opinion (in a review?).

Briefly, but I can tonight. I saw that it is codegen-only but it still
treats exit PHI nodes different from everything else while they are the
same. Anyway, I'll write something more tonight.

msg-27489-138.dat219 BDownload

I added some comments to this commit. Additionally, I would like to include at least the cases from http://reviews.llvm.org/D12051 .

Because we actually have 2 "codegen only" solutions to this problem now, there are a two new questions I will only ask but not answer:

Should we model the exit PHI node operands always or only if we actually need to (hence split the region exit later and work on the PHI nodes)
Should we use the general scalar codegen and escaping scalar system for these PHI nodes or should we handle them separatly? There is actually a middle ground where the operands are stored by the ordinary codegen into a PHI operand alloca automatically and only the merging needs to be done explicitly later on or delegated to the scalar escape system.

include/polly/CodeGen/BlockGenerators.h
395	The comment is not what I would have expected as it does not describe the function but somehow tries to justify it's existens. Maybe somethin like: PHI nodes in the exiting block have been in the region exit block before region simplification (a pre transformation run before code generation). As the region exit is not part of the region they have not been modeled in the SCoP, hence general code generarion did not consider these PHI nodes. However, they have to be "replicated" in the optimized version of the SCoP and these results need to be merged with the original PHI nodes in order to get the correct live out scalar values.
lib/CodeGen/BlockGenerators.cpp
455	I think we have to options here. Either we treat the exit PHI nodes as "almost regular" PHI nodes than we need some special handling here or we add the special handling later in the pipeline and skip them here. As this patch choose the latter option I am not so sure that we can drop the above code, however we probably could if we later use the location created for the phi operands here in the special handling of the exit PHI nodes.
458	I think Tobias idea is good but only one possibility. What comes to mind are at least three possibilites: We could track if we simplified a region. We could only model the exit PHI nodes if we will simplify the region. We could always run the code and distinguish here what to do. I am voting for 1) or 2) while I slightly favour 2). Regarding the code here, it looks like the handleOutsideUses code somewhere and does basically the same thing, hence we should probably try to reuse a maybe generalized version of the handleOutsideUses here instead of copying it.
test/ScopDetect/multidim_indirect_access.ll
2–3	We do not have independent blocks anymore. This is not for this commit to change but we should just remove the one run of this test case and the comment here.
7–8	Is this comment still valid afterwards?

Briefly, but I can tonight. I saw that it is codegen-only but it still
treats exit PHI nodes different

Only of my goals was to _not_ trat such PHis differently, which I weakened since Tobias think it was modeling too much.

It's not codegen-only. There are two changes in TempScopInfo.

D12051 has even more special handling for exit nodes PHIs in TempScopInfo.

from everything else while they are the
same.

There are big differences in codegen. D12051 tries to handle exit node PHIs while creating block. This one identifies the PHIs afterwards as escaping uses.

lib/Analysis/TempScopInfo.cpp
319	I don't see any advantage of that. The unit of cohesion is the class, not the method.
lib/CodeGen/BlockGenerators.cpp
455	"Base" is not the instruction to copy (which is "Inst"), but the virtual address accessed. In case of scalars the PHI node itself is abused as address and can well be the one in the exit node. I know for certain that this condition hits. However, it might be still a mistake. The code below is responsible to write the value to the "virtual PHI address" for this incoming block. Not writing it means undef value.
458	polly::simplifyRegion is not about "modeling exit nodes". It is just to ensure that there is a single exiting edge. Adding such thing would take away the general purpose of that function and I still think it belongs closer to RegionInfo than to polly. Having robust functions that do work in general cases is a good thing.
test/Isl/CodeGen/OpenMP/single_loop.ll
46 ↗	(On Diff #32156)	I just noticed something breaks and I fixed it, assuming of of my changes induced some metadata reordering. Presumbly I updated Polly, but not LLVM.

In D11870#225077, @Meinersbur wrote:

Only of my goals was to _not_ trat such PHis differently, which I weakened since Tobias think it was modeling too much.

Question now is if we want to treat them with the machinery we have or with handleExitingPHIs afterwards.

It's not codegen-only. There are two changes in TempScopInfo.

Agreed, but that is only relevant if these changes could be avoided somehow.

D12051 has even more special handling for exit nodes PHIs in TempScopInfo.

You mean the part that only exit PHI nodes are modelt we actually need to model?

from everything else while they are the same.

There are big differences in codegen. D12051 tries to handle exit node PHIs while creating block. This one identifies the PHIs afterwards as escaping uses.

I would phrase it differently:

D12051 handles exit nodes PHIs the same way other escape users are handled. Same as all other escaping values it demotes the escaping alue when it occures in the new SCoP and waits for the SCoP finalization to reload and merge the values correctly.

When I think about it, I am actually a little puzzled how/when/where the values are stored in the PHI operand alloca here. Where do we generate the store instructions for the exit PHI operands?

Btw. How many lnt tests fail (some are unrelated I know but it would be good to know.)?

lib/CodeGen/BlockGenerators.cpp
455	The code below is responsible to write the value to the "virtual PHI address" for this incoming block. That's what I was hinting at with: we probably could [remove this code] if we later use the location created for the phi operands here in the special handling of the exit PHI nodes.

Dear all,

I think this patch is already very close to optimal. Johannes seems to have some last ideas how the exit PHI-node generation could be improved which I don't yet fully understand. As he has written the PHI node generation he probably knows this better as me anyhow. To resolve the deadlock of multiple patch submissions, i propose to continue with this review and to only use Johannes' patch to explain/illustrate ideas. It would be great if you guys could work out together how the final submission should look like.

The blocking srem issue seems to be close to be resolved, based on the work both of you did, so it would be nice to get this in quickly, too.

Best,
Tobias

include/polly/CodeGen/BlockGenerators.h
395	The comment johannes suggested here seems indeed useful.
lib/Analysis/TempScopInfo.cpp
319	Sure, but if I want to learn about how exit-node PHI nodes are code generated it is easier if I can look at one place and one comment, rather than having to look through the whole class to find the places they need special handling. Anyhow, this is just a minor stylistic comment. If you feel strong about this, feel free to leave it as it is.
lib/CodeGen/BlockGenerators.cpp
458	@Meinsburg: I was just proposing that polly::simplifyRegion reports if it simplified the exit node (or we detect it otherwise). This still seems to be general purpose. From this information we can then derive that we only need to run this code if the simplification actually happened and can get rid of the code that makes sure we do not do anything if we hit PHI-nodes that are modeled as part of the region. Again, this is getting very detailed. If you feel strong, I am fine leaving the code as it is and possibly improve it later.

The last diff was from a while ago so though I update it.

This is based on Johannes' patch D12066. It passes regression tests and all lnt tests except those that fail already on trunk:
http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly/builds/2719/
Note that for some reason when I run lnt locally on trunk, I get 4 failed tests. I will try running on the buildbots.

There are some unnecessary changes, out-to-date comments and one fix that I will commit separately. I suggest to wait for the next diff before commenting on individual lines.

In D11870#225083, @jdoerfert wrote:

In D11870#225077, @Meinersbur wrote:

Only of my goals was to _not_ trat such PHis differently, which I weakened since Tobias think it was modeling too much.

Question now is if we want to treat them with the machinery we have or with handleExitingPHIs afterwards.

IMHO it is a good idea to keep the complexity out of the general case when those only occur at the end.

D12051 has even more special handling for exit nodes PHIs in TempScopInfo.

You mean the part that only exit PHI nodes are modelt we actually need to model?

Never mind, I see they are about equivalent. I think changing buildPHIAccesses might be preferred over changing SCEVInRegionDependences.

When I think about it, I am actually a little puzzled how/when/where the values are stored in the PHI operand alloca here. Where do we generate the store instructions for the exit PHI operands?

In generateScalarStores when it encounters a PHI MemoryAccess.

Meinersbur updated this revision to Diff 32398.
Meinersbur added a comment.

The last diff was from a while ago so though I update it.

This is based on Johannes' patch http://reviews.llvm.org/D12066. It passes regression tests and all lnt tests except those that fail already on trunk:
http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly/builds/2719/
Note that for some reason when I run lnt locally on trunk, I get 4 failed tests. I will try running on the buildbots.

Do not worry about these 4 crashes. They are issues in LLVM and should hopefully be fixed quickly.

Best,
Tobias

Meinersbur added inline comments.Aug 18 2015, 4:17 AM

lib/CodeGen/BlockGenerators.cpp
452–453	The change has been dropped as it is not necessary anymore.
458	I ensured that there is a distinct block just for such PHIs. That block will be empty if there is nothing to be done. I may have dropped that part prematurely in the last patch.
lib/CodeGen/Utils.cpp
24 ↗	(On Diff #32398)	I made this function public because I was using it in handleExitingPHIs. That use is optional, hence this doesn't need to become public.
lib/Support/SCEVValidator.cpp
494	It actually is needed or lnt test will fail. However, I can instead pre-check whether the PHI is in the exit block as in D12051.
test/ScopDetect/multidim_indirect_access.ll
2–3	OK, I will update this test case.

In D11870#226538, @grosser wrote:

Do not worry about these 4 crashes. They are issues in LLVM and should hopefully be fixed quickly.

OK, thanks for letting me know. Do you know why?

Meinersbur added a comment.

In http://reviews.llvm.org/D11870#226538, @grosser wrote:

Do not worry about these 4 crashes. They are issues in LLVM and should hopefully be fixed quickly.

OK, thanks for letting me know. Do you know why?

I attached you one of the problematic commits. The other was meanwhile already reverted.

Tobias

msg-1439-534.txt733 BDownload

Rebase to trunk

Still not the clean version I want to have.

Fails one lnt test:
FAIL: MultiSource/Benchmarks/mafft/pairlocalalign.execution_time (499 of 1494)

I would suggest to either simplify this a lot or use http://reviews.llvm.org/D12051 as it e.g., does not change the CFG at all (that is the major difference I see atm plus a few smaller design choices). If you have good reasons we should pursue with this patch please tell me.

The problem with pairalign share both so we need to fix that anyway.

include/polly/CodeGen/BlockGenerators.h
426	This change is unrelated and adds a lot of diff lines.
include/polly/Support/ScopHelper.h
93	This change is unrelated and adds a lot of diff lines.
150	I think this and the other big splitting change are actually not necessary. Maybe you could compare the output with http://reviews.llvm.org/D12051 ?
lib/CodeGen/BlockGenerators.cpp
486	I wouldn't bet on it. Textual order != execution order, I had to fix a bug like this the other week.

Clean-up and rebasing to trunk which includes the patch for scalar dependencies (r246414)

All 1494 LNT tests passed.

Addressed comments

There is some more room for generating less code:

If the original region has just a single exiting node most of the mechanism here is not necessary and can take a shortcut path.
If the escaping value is in a .phiops alloca, it is reloaded and then stored again into a .s2a alloca to satisfy the existing code.

I suggest to work on these post-commit, if requested.

The function createExitingBlockForExitNodePHIs is relatively central to my idea of an implementation as it preserves already stored references to the exit node and its PHI instructions and therefore avoids surprises. In the code generation part of Polly we can normalize the IR to avoid additional condition-checking in the code generation itself. Otherwise, we should go for D12051 instead.

include/polly/CodeGen/BlockGenerators.h
426	It is required because getNewValue gets called in a context without ScopStmt available. All that's needed of ScopStmt in the body is the region. However, to address your concern, I changed it to a Scop object to reduce the impact in other parts. On request I can also commit it separately.
include/polly/Support/ScopHelper.h
150	IMHO it is the cleaner approach than to expect PHI nodes outside of the region to have a specific form. Also, there are existing references to the PHI nodes on the llvm::Scop structure which we should preserve.
lib/Analysis/TempScopInfo.cpp
166	This condition has switched location with if (UseParent == ParentBB) continue; to ensure that if the exit block has a use of a PHI (i.e. intra-block use-def), that AnyCrossStmtUse is set and a scalar access is generated.
352	I don't understand this one
lib/CodeGen/BlockGenerators.cpp
486	Can you elaborate on the bug? Do you mean the iteration order? There is a lot of other code in LLVM's core (e.g. BasicBlock::getFirstNonPHI or SimplifyCFG) that assumes that instruction lists are iterated in execution order, so I guess we can assume this as well.
lib/Support/SCEVValidator.cpp
494	Addendum: I tried this and it changed the output (test case non-affine-loop-condition-dependent-access_3.ll) because the pre-check would be overly pessimistic in case the PHI results in a constant. Hence I prefer this solution. LNT passes both variants, though.
test/ScopDetect/multidim_indirect_access.ll
6	The error message changes with this patch
7–8	I reformulated this in a previous commit such that it is not implementation-specific anymore.

rebase to r246926

Hi Michael,

nice that this patch now works flawless!

I have still a couple of things/items that I would like to check in more detail, but am a little low on time ATM. As none of these is likely to affect the general structure of the patch, I propose that after cross-checking with Johannes, we commit this as this is and then enhance this in-tree, if needed.

Some smaller comments inline.

Best,
Tobias

include/polly/CodeGen/BlockGenerators.h
426	Committing it separately would be nice.
include/polly/Support/ScopHelper.h
150	Similar to Johannes, I would also prefer to not split here. However, as splitting seems to cause other issues, I am fine leaving the splitting in for now and taking the larger but known correct patch. After this patch is committed, we can always improve on what we have at the moment. Maybe this also untangles the current two-patch review, as we can get a first version in and then discuss additional improvements individually.
lib/Analysis/ScopDetection.cpp
972	Nice!
1019	Nice!
lib/Analysis/TempScopInfo.cpp
166	OK, but please ensure that at least one test case covers this case.
lib/Support/SCEVValidator.cpp
494	Interesting. It would be good to add a test case to ensure we do not regress here. With the current patch, I can uncomment this code and no test starts failing.

jdoerfert added inline comments.Sep 7 2015, 8:30 AM

include/polly/Support/ScopHelper.h
150	@grosser I don't get your comment. If splitting causes issues then we take the other patch that does not split. If we agree splitting is not what we want we also take the other patch. What is the reason we should go with a larger patch for now?

Meinersbur marked 2 inline comments as done.Sep 8 2015, 5:15 AM

Meinersbur added inline comments.

include/polly/CodeGen/BlockGenerators.h
426	kk, will commit this before the main commit
include/polly/Support/ScopHelper.h
150	If not done using this function. simplifyRegion at latest will also split the exit block, i.e. D12051 also "splits". However, createExitingBlockForExitNodePHIs does this in a more predictable fashion: It splits unconditionally, i.e. follow-up code has less cases to handle. It keeps the original PHI node references inside the scop. The polly::Scop data structure already stores references to these PHIs. D12051 has (IMHO) fragile code to search for where the PHIs have been moved to. If you do not share my opinion of a more structured approach, I'd agree to let Johannes do it his way.
lib/Analysis/TempScopInfo.cpp
166	I'll think of one
lib/Support/SCEVValidator.cpp
494	I'll think of one

Meinersbur marked 2 inline comments as done.

@grosser I don't get your comment. If splitting causes issues then we take the other patch that does not split. If we agree splitting is not what we want we also take the other patch. What is the reason we should go with a larger patch for now?

If not done using this function. simplifyRegion at latest will also split the exit block, i.e. D12051 also "splits".

We currently split some blocks but in D12051 there is no additional
splitting going on, hence I do not agree with the sentence above.
Especially, since we talk about the ~250 lines of additional
splitting code here.

However, createExitingBlockForExitNodePHIs does this in a more predictable fashion:

It splits unconditionally, i.e. follow-up code has less cases to handle.

True, but it also adds yet another function in another source file that
does some region "simplification".

It keeps the original PHI node references inside the scop. The polly::Scop data structure already stores references to these PHIs.

SCoPs always "somehow" reference these PHIs, in your commit as well as
in D12051. In the latter they are handled by representing the operands
as scalars that escape into the PHI node (which isn't a part of the
SCoP).

D12051 has (IMHO) fragile code to search for where the PHIs have been moved to.

What part are your referring too? There is no search going on so I
again disagree with the sentence above. Please be more specific if you
believe there is something fragile or a search.

If you do not share my opinion of a more structured approach, I'd agree to let Johannes do it his way.

This is just condescending. While you believe that your approach is more
structured I think it should be up for discussion. I presented in the
review of this patch arguments against some of the decisions and a
counter proposal in D12051. However, except the claims above I haven't
seen any reasons agains D12051 and for this patch.

msg-15361-729.dat219 BDownload

In D11870#241317, @jdoerfert wrote:

Meinersbur marked 2 inline comments as done.

@grosser I don't get your comment. If splitting causes issues then we take the other patch that does not split. If we agree splitting is not what we want we also take the other patch. What is the reason we should go with a larger patch for now?

If not done using this function. simplifyRegion at latest will also split the exit block, i.e. D12051 also "splits".

We currently split some blocks but in D12051 there is no additional
splitting going on, hence I do not agree with the sentence above.

The block is split, by either simplifyRegionExit or createExitingBlockForExitNodePHIs. but still at most once. Hence, no additional splitting.

Especially, since we talk about the ~250 lines of additional
splitting code here.

Most of the these lines are comments.

However, of course any code needs to justify its existence and I elaborated why I think it is. Now, simplifyRegionExit becomes dead code (i.e. it is replaced) and maybe we should remove it and call simplifyRegionEntry instead directly (I still think the function simplifyRegion is more general purpose and belongs into RegionInfo).

However, createExitingBlockForExitNodePHIs does this in a more predictable fashion:

It splits unconditionally, i.e. follow-up code has less cases to handle.

True, but it also adds yet another function in another source file that
does some region "simplification".

If it is better then I don't see the amount of code as an argument against it.

It keeps the original PHI node references inside the scop. The polly::Scop data structure already stores references to these PHIs.

SCoPs always "somehow" reference these PHIs, in your commit as well as
in D12051. In the latter they are handled by representing the operands
as scalars that escape into the PHI node (which isn't a part of the
SCoP).

It matters where these references are pointing to. TempScopInfo created them as belonging to a block or subregion, but in order to simplify the region, the PHIs are scattered into and out of the region. That's what I tried to avoid.

D12051 has (IMHO) fragile code to search for where the PHIs have been moved to.

What part are your referring too? There is no search going on so I
again disagree with the sentence above. Please be more specific if you
believe there is something fragile or a search.

When I wrote this I thought about the function getSingleInRegionPHIOperandPHI which I cannot find anymore in the latest update. Some description about major changes would have been nice. I am just today back from my holidays and still tired from traveling.

If you do not share my opinion of a more structured approach, I'd agree to let Johannes do it his way.

This is just condescending. While you believe that your approach is more
structured I think it should be up for discussion.

I am sorry for not having made clear that it is just my opinion about how "structured" a solution is, of course heavily influenced by the fact that I wrote this one and put more thoughts into it.

I presented in the
review of this patch arguments against some of the decisions and a
counter proposal in D12051. However, except the claims above I haven't
seen any reasons agains D12051 and for this patch.

I also haven't seen arguments for D12051 and against D11870 that convinced me. We should just accept that we can have different opinions.

As you seem to strongly prefer D12051 which also has become much better, and it is more important to me to have a working solution than to discuss which one, I hereby retract this diff.

If you are interested in a discussion about your and my arguments, I invite you to a discussion in or after the Wednesday's phone call.

Meinersbur mentioned this in D13762: [Polly] Ensure unique implicit reads/writes at beginning/end of ScopStmts.Nov 1 2015, 9:32 AM

Revision Contents

Path

Size

include/

polly/

CodeGen/

BlockGenerators.h

18 lines

ScopDetection.h

7 lines

Support/

ScopHelper.h

49 lines

TempScopInfo.h

17 lines

lib/

Analysis/

ScopDetection.cpp

17 lines

TempScopInfo.cpp

61 lines

CodeGen/

BlockGenerators.cpp

82 lines

CodeGeneration.cpp

65 lines

Support/

SCEVValidator.cpp

6 lines

ScopHelper.cpp

132 lines

test/

Isl/

CodeGen/

inner_scev_sdiv_2.ll

2 lines

loop_with_conditional_entry_edge_split_hard_case.ll

6 lines

phi_loop_carried_float.ll

5 lines

phi_loop_carried_float_escape.ll

7 lines

phi_scalar_simple_1.ll

2 lines

phi_scalar_simple_2.ll

2 lines

ScopDetect/

keep_going_expansion.ll

2 lines

multidim_indirect_access.ll

7 lines

non-affine-loop-condition-dependent-access_2.ll

4 lines

non-affine-loop-condition-dependent-access_3.ll

4 lines

phi_with_multi_exiting_edges.ll

2 lines

phi_with_multi_exiting_edges_2.ll

35 lines

simple_non_single_entry.ll

2 lines

ScopInfo/

NonAffine/

non-affine-loop-condition-dependent-access_2.ll

2 lines

non-affine-loop-condition-dependent-access_3.ll

2 lines

Diff 34106

include/polly/CodeGen/BlockGenerators.h

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	public:
/// GlobalMap.		/// GlobalMap.
Value *getOrCreateAlloca(MemoryAccess &Access);		Value *getOrCreateAlloca(MemoryAccess &Access);

/// @brief Finalize the code generation for the SCoP @p S.		/// @brief Finalize the code generation for the SCoP @p S.
///		///
/// This will initialize and finalize the scalar variables we demoted during		/// This will initialize and finalize the scalar variables we demoted during
/// the code generation.		/// the code generation.
///		///
		/// @see handleExitingPHIs(Scop &, ValueMapT &)
/// @see createScalarInitialization(Scop &)		/// @see createScalarInitialization(Scop &)
/// @see createScalarFinalization(Region &)		/// @see createScalarFinalization(Region &)
void finalizeSCoP(Scop &S);		void finalizeSCoP(Scop &S);

/// @brief An empty destructor		/// @brief An empty destructor
virtual ~BlockGenerator(){};		virtual ~BlockGenerator(){};

protected:		protected:
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	protected:
/// @brief Handle users of @p Inst outside the SCoP.		/// @brief Handle users of @p Inst outside the SCoP.
///		///
/// @param R The current SCoP region.		/// @param R The current SCoP region.
/// @param Inst The current instruction we check.		/// @param Inst The current instruction we check.
/// @param InstCopy The copy of the instruction @p Inst in the optimized		/// @param InstCopy The copy of the instruction @p Inst in the optimized
/// SCoP.		/// SCoP.
void handleOutsideUsers(const Region &R, Instruction Inst, Value InstCopy);		void handleOutsideUsers(const Region &R, Instruction Inst, Value InstCopy);

		/// @brief Handle PHIs in the exiting block.
		///
		/// PHI nodes in the exiting block have been in the region exit block before
		/// region simplification (a pre transformation run before code generation).
		grosserUnsubmitted Done Reply Inline Actions space after 'by' grosser: space after 'by'
		jdoerfertUnsubmitted Done Reply Inline Actions The comment is not what I would have expected as it does not describe the function but somehow tries to justify it's existens. Maybe somethin like: PHI nodes in the exiting block have been in the region exit block before region simplification (a pre transformation run before code generation). As the region exit is not part of the region they have not been modeled in the SCoP, hence general code generarion did not consider these PHI nodes. However, they have to be "replicated" in the optimized version of the SCoP and these results need to be merged with the original PHI nodes in order to get the correct live out scalar values. jdoerfert: The comment is not what I would have expected as it does not describe the function but somehow…
		grosserUnsubmitted Done Reply Inline Actions The comment johannes suggested here seems indeed useful. grosser: The comment johannes suggested here seems indeed useful.
		/// As the region exit is not part of the region they have not been modeled in
		/// the SCoP, hence general code generation did not consider these PHI nodes.
		/// However, they have to be "replicated" in the optimized version of the SCoP
		/// and these results need to be merged with the original PHI nodes in order
		/// to get the correct live out scalar values.
		///
		/// @param S The current SCoP.
		void handleExitingPHIs(Scop &S);

/// @brief Initialize the memory of demoted scalars.		/// @brief Initialize the memory of demoted scalars.
///		///
/// @param S The scop for which to generate the scalar initializers.		/// @param S The scop for which to generate the scalar initializers.
void createScalarInitialization(Scop &S);		void createScalarInitialization(Scop &S);

/// @brief Promote the values of demoted scalars after the SCoP.		/// @brief Promote the values of demoted scalars after the SCoP.
///		///
/// If a scalar value was used outside the SCoP we need to promote the value		/// If a scalar value was used outside the SCoP we need to promote the value
/// stored in the memory cell allocated for that scalar and combine it with		/// stored in the memory cell allocated for that scalar and combine it with
/// the original value in the non-optimized SCoP.		/// the original value in the non-optimized SCoP.
void createScalarFinalization(Region &R);		void createScalarFinalization(Region &R);

/// @brief Get the new version of a value.		/// @brief Get the new version of a value.
///		///
/// Given an old value, we first check if a new version of this value is		/// Given an old value, we first check if a new version of this value is
/// available in the BBMap or GlobalMap. In case it is not and the value can		/// available in the BBMap or GlobalMap. In case it is not and the value can
/// be recomputed using SCEV, we do so. If we can not recompute a value		/// be recomputed using SCEV, we do so. If we can not recompute a value
/// using SCEV, but we understand that the value is constant within the scop,		/// using SCEV, but we understand that the value is constant within the scop,
/// we return the old value. If the value can still not be derived, this		/// we return the old value. If the value can still not be derived, this
/// function will assert.		/// function will assert.
///		///
/// @param Stmt The statement to code generate.		/// @param S The parent SCoP.
		jdoerfertUnsubmitted Done Reply Inline Actions This change is unrelated and adds a lot of diff lines. jdoerfert: This change is unrelated and adds a lot of diff lines.
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions It is required because getNewValue gets called in a context without ScopStmt available. All that's needed of ScopStmt in the body is the region. However, to address your concern, I changed it to a Scop object to reduce the impact in other parts. On request I can also commit it separately. Meinersbur: It is required because getNewValue gets called in a context without ScopStmt available. All…
		grosserUnsubmitted Not Done Reply Inline Actions Committing it separately would be nice. grosser: Committing it separately would be nice.
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions kk, will commit this before the main commit Meinersbur: kk, will commit this before the main commit
/// @param Old The old Value.		/// @param Old The old Value.
/// @param BBMap A mapping from old values to their new values		/// @param BBMap A mapping from old values to their new values
/// (for values recalculated within this basic block).		/// (for values recalculated within this basic block).
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block).		/// within this basic block).
/// @param L The loop that surrounded the instruction that referenced		/// @param L The loop that surrounded the instruction that referenced
/// this value in the original code. This loop is used to		/// this value in the original code. This loop is used to
/// evaluate the scalar evolution at the right scope.		/// evaluate the scalar evolution at the right scope.
///		///
/// @returns o The old value, if it is still valid.		/// @returns o The old value, if it is still valid.
/// o The new value, if available.		/// o The new value, if available.
/// o NULL, if no value is found.		/// o NULL, if no value is found.
Value getNewValue(ScopStmt &Stmt, const Value Old, ValueMapT &BBMap,		Value getNewValue(Scop &S, const Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const;		LoopToScevMapT &LTS, Loop *L) const;

void copyInstScalar(ScopStmt &Stmt, const Instruction *Inst, ValueMapT &BBMap,		void copyInstScalar(ScopStmt &Stmt, const Instruction *Inst, ValueMapT &BBMap,
LoopToScevMapT &LTS);		LoopToScevMapT &LTS);

/// @brief Get the innermost loop that surrounds an instruction.		/// @brief Get the innermost loop that surrounds an instruction.
///		///
/// @param Inst The instruction for which we get the loop.		/// @param Inst The instruction for which we get the loop.
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

include/polly/ScopDetection.h

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	private:

/// @brief Check if all basic block in the region are valid.		/// @brief Check if all basic block in the region are valid.
///		///
/// @param Context The context of scop detection.		/// @param Context The context of scop detection.
///		///
/// @return True if all blocks in R are valid, false otherwise.		/// @return True if all blocks in R are valid, false otherwise.
bool allBlocksValid(DetectionContext &Context) const;		bool allBlocksValid(DetectionContext &Context) const;

/// @brief Check the exit block of a region is valid.
///
/// @param Context The context of scop detection.
///
/// @return True if the exit of R is valid, false otherwise.
bool isValidExit(DetectionContext &Context) const;

/// @brief Check if a region is a Scop.		/// @brief Check if a region is a Scop.
///		///
/// @param Context The context of scop detection.		/// @param Context The context of scop detection.
///		///
/// @return True if R is a Scop, false otherwise.		/// @return True if R is a Scop, false otherwise.
bool isValidRegion(DetectionContext &Context) const;		bool isValidRegion(DetectionContext &Context) const;

/// @brief Check if a call instruction can be part of a Scop.		/// @brief Check if a call instruction can be part of a Scop.
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

include/polly/Support/ScopHelper.h

	//===------ Support/ScopHelper.h -- Some Helper Functions for Scop. -------===//			//===------ Support/ScopHelper.h -- Some Helper Functions for Scop. -------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Small functions that help with LLVM-IR.			// Small functions that help with LLVM-IR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef POLLY_SUPPORT_IRHELPER_H			#ifndef POLLY_SUPPORT_IRHELPER_H
	#define POLLY_SUPPORT_IRHELPER_H			#define POLLY_SUPPORT_IRHELPER_H

				#include "llvm/ADT/ArrayRef.h"

	namespace llvm {			namespace llvm {
	class Type;			class Type;
	class Instruction;			class Instruction;
	class LoopInfo;			class LoopInfo;
	class Loop;			class Loop;
	class ScalarEvolution;			class ScalarEvolution;
	class SCEV;			class SCEV;
	class Value;			class Value;
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	///			///
	/// This wrapper will internally call the SCEVExpander but also makes sure that			/// This wrapper will internally call the SCEVExpander but also makes sure that
	/// all additional features not represented in SCEV (e.g., SDiv/SRem are not			/// all additional features not represented in SCEV (e.g., SDiv/SRem are not
	/// black boxes but can be part of the function) will be expanded correctly.			/// black boxes but can be part of the function) will be expanded correctly.
	///			///
	/// The parameters are the same as for the creation of a SCEVExpander as well			/// The parameters are the same as for the creation of a SCEVExpander as well
	/// as the call to SCEVExpander::expandCodeFor:			/// as the call to SCEVExpander::expandCodeFor:
	///			///
	/// @param S The current Scop.			/// @param S The current Scop.
				jdoerfertUnsubmitted Done Reply Inline Actions This change is unrelated and adds a lot of diff lines. jdoerfert: This change is unrelated and adds a lot of diff lines.
	/// @param SE The Scalar Evolution pass.			/// @param SE The Scalar Evolution pass.
	/// @param DL The module data layout.			/// @param DL The module data layout.
	/// @param Name The suffix added to the new instruction names.			/// @param Name The suffix added to the new instruction names.
	/// @param E The expression for which code is actually generated.			/// @param E The expression for which code is actually generated.
	/// @param Ty The type of the resulting code.			/// @param Ty The type of the resulting code.
	/// @param IP The insertion point for the new code.			/// @param IP The insertion point for the new code.
	llvm::Value *expandCodeFor(Scop &S, llvm::ScalarEvolution &SE,			llvm::Value *expandCodeFor(Scop &S, llvm::ScalarEvolution &SE,
	const llvm::DataLayout &DL, const char *Name,			const llvm::DataLayout &DL, const char *Name,
	const llvm::SCEV E, llvm::Type Ty,			const llvm::SCEV E, llvm::Type Ty,
	llvm::Instruction *IP);			llvm::Instruction *IP);

				/// @brief Move a set of incoming edges to a new successor.
				///
				/// Similar to llvm::SplitPredecessors, but instead of creating a new
				/// predecessor, this creates a new successor.
				/// Preserves LCSSA because all PHIs will exist in both BasicBlocks. A PHI node
				/// is created in the successor for every PHI in the original block, even if
				/// there is only one incoming edge. PHI nodes in the orginal block are never
				/// removed, even if they have fewer than two incoming edges left.
				/// We generally cannot update RegionInfo since it could invalidate regions,
				/// depending on the list of edges.
				///
				/// Before: After:
				///
				/// NonPred[0] NonPred[0]
				/// \ / \ /
				/// /---->Entry /-->Entry /
				/// \| \| \| \ /
				/// \--BlockInRegion \| Entry.split
				/// \| \|
				/// \-------BlockInRegion
				///
				/// If Entry is split, neither "Entry" nor "Entry.split" can be made the
				/// region's entry node. "Entry" cannot because there is en edge from an
				/// ouside-of-region block NonPred[0] to another region block, violating the
				/// single entry requirement of a region. The new block "Entry.split" cannot be
				/// the entry block because there is an exiting edge to "Entry", which would
				/// then be an alternative to the region's other exit block.
				/// There is a similar situation with the exit block conflicting with other
				/// regions.
				/// Hence, the caller of this function should make sure such cases do not happen
				/// and update RegionInfo accordingly.
				///
				/// @param BB Block to split.
				/// @param NonPreds Set of incoming edges to move to the new successor.
				/// @param Suffix Suffix to append to the name of the new block.
				/// @param DT DominatorTree to update.
				/// @param LI LoopInfo to update.
				/// @param SE ScalarEvolution to update.
				///
				/// @return The newly created successor of BB or null if the block could not be
				/// split.
				llvm::BasicBlock *
				splitBlockNonPredecessors(llvm::BasicBlock *BB,
				llvm::ArrayRef<llvm::BasicBlock *> NonPreds,
				llvm::StringRef Suffix, llvm::DominatorTree *DT,
				llvm::LoopInfo LI, llvm::ScalarEvolution SE);
				jdoerfertUnsubmitted Not Done Reply Inline Actions I think this and the other big splitting change are actually not necessary. Maybe you could compare the output with http://reviews.llvm.org/D12051 ? jdoerfert: I think this and the other big splitting change are actually not necessary. Maybe you could…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions IMHO it is the cleaner approach than to expect PHI nodes outside of the region to have a specific form. Also, there are existing references to the PHI nodes on the llvm::Scop structure which we should preserve. Meinersbur: IMHO it is the cleaner approach than to expect PHI nodes outside of the region to have a…
				grosserUnsubmitted Not Done Reply Inline Actions Similar to Johannes, I would also prefer to not split here. However, as splitting seems to cause other issues, I am fine leaving the splitting in for now and taking the larger but known correct patch. After this patch is committed, we can always improve on what we have at the moment. Maybe this also untangles the current two-patch review, as we can get a first version in and then discuss additional improvements individually. grosser: Similar to Johannes, I would also prefer to not split here. However, as splitting seems to…
				jdoerfertUnsubmitted Not Done Reply Inline Actions @grosser I don't get your comment. If splitting causes issues then we take the other patch that does not split. If we agree splitting is not what we want we also take the other patch. What is the reason we should go with a larger patch for now? jdoerfert: @grosser I don't get your comment. If splitting causes issues then we take the other patch that…
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions If not done using this function. simplifyRegion at latest will also split the exit block, i.e. D12051 also "splits". However, createExitingBlockForExitNodePHIs does this in a more predictable fashion: It splits unconditionally, i.e. follow-up code has less cases to handle. It keeps the original PHI node references inside the scop. The polly::Scop data structure already stores references to these PHIs. D12051 has (IMHO) fragile code to search for where the PHIs have been moved to. If you do not share my opinion of a more structured approach, I'd agree to let Johannes do it his way. Meinersbur: If not done using this function. simplifyRegion at latest will also split the exit block, i.e.
	}			}
	#endif			#endif

include/polly/TempScopInfo.h

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	class TempScopInfo : public RegionPass {
/// @param R The SCoP region.		/// @param R The SCoP region.
/// @param SR A subregion of @p R.		/// @param SR A subregion of @p R.
void buildAccessFunctions(Region &R, Region &SR);		void buildAccessFunctions(Region &R, Region &SR);

/// @brief Build the access functions for the basic block @p BB		/// @brief Build the access functions for the basic block @p BB
///		///
/// @param R The SCoP region.		/// @param R The SCoP region.
/// @param BB A basic block in @p R.		/// @param BB A basic block in @p R.
		/// @param OnlyPHIs Process only the PHI nodes of the block. Used
		jdoerfertUnsubmitted Done Reply Inline Actions We use references for Regions here if they cannot be null. jdoerfert: We use references for Regions here if they cannot be null.
		/// for the region's exit block.
/// @param NonAffineSubRegion The non affine sub-region @p BB is in.		/// @param NonAffineSubRegion The non affine sub-region @p BB is in.
void buildAccessFunctions(Region &R, BasicBlock &BB,		void buildAccessFunctions(Region &R, BasicBlock &BB, bool OnlyPHIs,
Region *NonAffineSubRegion = nullptr);		Region *NonAffineSubRegion = nullptr);

		/// @brief Determine whether an instruction logically belongs to the Scop.
		///
		/// In addition to the instruction in the scop's region, the PHI node in the
		/// region's exit block is also considered to belong into the region. When the
		/// region is simplified (single exiting node), the PHIs are necessarily moved
		/// (or replicated) into the region.
		///
		jdoerfertUnsubmitted Done Reply Inline Actions We use references for Regions here if they cannot be null. jdoerfert: We use references for Regions here if they cannot be null.
		/// @param R The region representing the scop.
		/// @param Inst The instruction to query.
		///
		/// @return True iff the instruction belongs to the scop.
		bool scopContains(llvm::Region &R, llvm::Instruction *Inst) const;

public:		public:
static char ID;		static char ID;
explicit TempScopInfo() : RegionPass(ID), TempScopOfRegion(nullptr) {}		explicit TempScopInfo() : RegionPass(ID), TempScopOfRegion(nullptr) {}
~TempScopInfo();		~TempScopInfo();

/// @brief Get the temporay Scop information in LLVM IR for this region.		/// @brief Get the temporay Scop information in LLVM IR for this region.
///		///
/// @return The Scop information in LLVM IR represent.		/// @return The Scop information in LLVM IR represent.
Show All 19 Lines

lib/Analysis/ScopDetection.cpp

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	Region *ScopDetection::expandRegion(Region &R) {
while (ExpandedRegion) {		while (ExpandedRegion) {
DetectionContext Context(		DetectionContext Context(
ExpandedRegion, AA, NonAffineSubRegionMap[ExpandedRegion.get()],		ExpandedRegion, AA, NonAffineSubRegionMap[ExpandedRegion.get()],
BoxedLoopsMap[ExpandedRegion.get()], false /* verifying */);		BoxedLoopsMap[ExpandedRegion.get()], false /* verifying */);
DEBUG(dbgs() << "\t\tTrying " << ExpandedRegion->getNameStr() << "\n");		DEBUG(dbgs() << "\t\tTrying " << ExpandedRegion->getNameStr() << "\n");
// Only expand when we did not collect errors.		// Only expand when we did not collect errors.

// Check the exit first (cheap)		// Check the exit first (cheap)
if (isValidExit(Context) && !Context.Log.hasErrors()) {		if (!Context.Log.hasErrors()) {
// If the exit is valid check all blocks		// If the exit is valid check all blocks
// - if true, a valid region was found => store it + keep expanding		// - if true, a valid region was found => store it + keep expanding
// - if false, .tbd. => stop (should this really end the loop?)		// - if false, .tbd. => stop (should this really end the loop?)
if (!allBlocksValid(Context) \|\| Context.Log.hasErrors())		if (!allBlocksValid(Context) \|\| Context.Log.hasErrors())
break;		break;

// Store this region, because it is the greatest valid (encountered so		// Store this region, because it is the greatest valid (encountered so
// far).		// far).
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = BB->begin(), E = --BB->end(); I != E; ++I)
return false;		return false;

if (!hasAffineMemoryAccesses(Context))		if (!hasAffineMemoryAccesses(Context))
return false;		return false;

return true;		return true;
}		}

bool ScopDetection::isValidExit(DetectionContext &Context) const {

// PHI nodes are not allowed in the exit basic block.
if (BasicBlock *Exit = Context.CurRegion.getExit()) {
BasicBlock::iterator I = Exit->begin();
if (I != Exit->end() && isa<PHINode>(*I))
return invalid<ReportPHIinExit>(Context, /Assert=/true, I);
}

return true;
}

bool ScopDetection::isValidRegion(DetectionContext &Context) const {		bool ScopDetection::isValidRegion(DetectionContext &Context) const {
grosserUnsubmitted Done Reply Inline Actions Nice! grosser: Nice!
Region &CurRegion = Context.CurRegion;		Region &CurRegion = Context.CurRegion;

DEBUG(dbgs() << "Checking region: " << CurRegion.getNameStr() << "\n\t");		DEBUG(dbgs() << "Checking region: " << CurRegion.getNameStr() << "\n\t");

if (CurRegion.isTopLevelRegion()) {		if (CurRegion.isTopLevelRegion()) {
DEBUG(dbgs() << "Top level region is invalid\n");		DEBUG(dbgs() << "Top level region is invalid\n");
return false;		return false;
}		}
Show All 28 Lines	bool ScopDetection::isValidRegion(DetectionContext &Context) const {
// to insert alloca instruction there when translate scalar to array.		// to insert alloca instruction there when translate scalar to array.
if (CurRegion.getEntry() ==		if (CurRegion.getEntry() ==
&(CurRegion.getEntry()->getParent()->getEntryBlock()))		&(CurRegion.getEntry()->getParent()->getEntryBlock()))
return invalid<ReportEntry>(Context, /Assert=/true, CurRegion.getEntry());		return invalid<ReportEntry>(Context, /Assert=/true, CurRegion.getEntry());

if (!DetectUnprofitable && !hasMoreThanOneLoop(&CurRegion))		if (!DetectUnprofitable && !hasMoreThanOneLoop(&CurRegion))
invalid<ReportUnprofitable>(Context, /Assert=/true, &CurRegion);		invalid<ReportUnprofitable>(Context, /Assert=/true, &CurRegion);

if (!isValidExit(Context))
return false;

grosserUnsubmitted Done Reply Inline Actions Nice! grosser: Nice!
if (!allBlocksValid(Context))		if (!allBlocksValid(Context))
return false;		return false;

// We can probably not do a lot on scops that only write or only read		// We can probably not do a lot on scops that only write or only read
// data.		// data.
if (!DetectUnprofitable && (!Context.hasStores \|\| !Context.hasLoads))		if (!DetectUnprofitable && (!Context.hasStores \|\| !Context.hasLoads))
invalid<ReportUnprofitable>(Context, /Assert=/true, &CurRegion);		invalid<ReportUnprofitable>(Context, /Assert=/true, &CurRegion);

▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Analysis/TempScopInfo.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	bool TempScopInfo::buildScalarDependences(Instruction Inst, Region R,

for (User *U : Inst->users()) {		for (User *U : Inst->users()) {
Instruction *UI = dyn_cast<Instruction>(U);		Instruction *UI = dyn_cast<Instruction>(U);

// Ignore the strange user		// Ignore the strange user
if (UI == 0)		if (UI == 0)
continue;		continue;

		// Check whether or not the use is in the SCoP.
		// Uses outside the SCoP are called "exposed" and it's value must be
		// reloaded after the SCoP's execution to be used by the original user.
		if (!scopContains(*R, UI)) {
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions This condition has switched location with if (UseParent == ParentBB) continue; to ensure that if the exit block has a use of a PHI (i.e. intra-block use-def), that AnyCrossStmtUse is set and a scalar access is generated. Meinersbur: This condition has switched location with if (UseParent == ParentBB) continue; to…
		grosserUnsubmitted Not Done Reply Inline Actions OK, but please ensure that at least one test case covers this case. grosser: OK, but please ensure that at least one test case covers this case.
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I'll think of one Meinersbur: I'll think of one
		AnyCrossStmtUse = true;
		continue;
		}

BasicBlock *UseParent = UI->getParent();		BasicBlock *UseParent = UI->getParent();

// Ignore the users in the same BB (statement)		// Ignore the users in the same BB (statement)
		// Use-def chains within the same statement do not need to be modelled as
		// memory location.
if (UseParent == ParentBB)		if (UseParent == ParentBB)
continue;		continue;

// Do not build scalar dependences inside a non-affine subregion.		// Do not build scalar dependences inside a non-affine subregion.
		jdoerfertUnsubmitted Done Reply Inline Actions I do not understand this condition? Why is it needed and what is the implication? jdoerfert: I do not understand this condition? Why is it needed and what is the implication?
if (NonAffineSubRegion && NonAffineSubRegion->contains(UseParent))		if (NonAffineSubRegion && NonAffineSubRegion->contains(UseParent))
continue;		continue;

// Check whether or not the use is in the SCoP.
if (!R->contains(UseParent)) {
AnyCrossStmtUse = true;
continue;
}

// If the instruction can be synthesized and the user is in the region		// If the instruction can be synthesized and the user is in the region
// we do not need to add scalar dependences.		// we do not need to add scalar dependences.
if (canSynthesizeInst)		if (canSynthesizeInst)
continue;		continue;

// No need to translate these scalar dependences into polyhedral form,		// No need to translate these scalar dependences into polyhedral form,
// because synthesizable scalars can be generated by the code generator.		// because synthesizable scalars can be generated by the code generator.
if (canSynthesize(UI, LI, SE, R))		if (canSynthesize(UI, LI, SE, R))
Show All 14 Lines	bool TempScopInfo::buildScalarDependences(Instruction Inst, Region R,
}		}

if (ModelReadOnlyScalars) {		if (ModelReadOnlyScalars) {
for (Value *Op : Inst->operands()) {		for (Value *Op : Inst->operands()) {
if (canSynthesize(Op, LI, SE, R))		if (canSynthesize(Op, LI, SE, R))
continue;		continue;

if (Instruction *OpInst = dyn_cast<Instruction>(Op))		if (Instruction *OpInst = dyn_cast<Instruction>(Op))
if (R->contains(OpInst))		if (scopContains(*R, OpInst))
continue;		continue;

if (isa<Constant>(Op))		if (isa<Constant>(Op))
continue;		continue;

IRAccess ScalarAccess(IRAccess::READ, Op, ZeroOffset, 1, true, Op);		IRAccess ScalarAccess(IRAccess::READ, Op, ZeroOffset, 1, true, Op);
AccFuncMap[Inst->getParent()].push_back(		AccFuncMap[Inst->getParent()].push_back(
std::make_pair(ScalarAccess, Inst));		std::make_pair(ScalarAccess, Inst));
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	TempScopInfo::buildIRAccess(Instruction Inst, Loop L, Region *R,
return IRAccess(Type, BasePointer->getValue(), AccessFunction, Size, IsAffine,		return IRAccess(Type, BasePointer->getValue(), AccessFunction, Size, IsAffine,
Subscripts, Sizes, Val);		Subscripts, Sizes, Val);
}		}

void TempScopInfo::buildAccessFunctions(Region &R, Region &SR) {		void TempScopInfo::buildAccessFunctions(Region &R, Region &SR) {

if (SD->isNonAffineSubRegion(&SR, &R)) {		if (SD->isNonAffineSubRegion(&SR, &R)) {
for (BasicBlock *BB : SR.blocks())		for (BasicBlock *BB : SR.blocks())
buildAccessFunctions(R, *BB, &SR);		buildAccessFunctions(R, *BB, false, &SR);
return;		return;
}		}

for (auto I = SR.element_begin(), E = SR.element_end(); I != E; ++I)		for (auto I = SR.element_begin(), E = SR.element_end(); I != E; ++I)
if (I->isSubRegion())		if (I->isSubRegion())
buildAccessFunctions(R, *I->getNodeAs<Region>());		buildAccessFunctions(R, *I->getNodeAs<Region>());
else		else
buildAccessFunctions(R, *I->getNodeAs<BasicBlock>());		buildAccessFunctions(R, *I->getNodeAs<BasicBlock>(), false);
		jdoerfertUnsubmitted Done Reply Inline Actions Can we remove the `buildExitPHIAccessFunctions` part if we add here something like: if (&R == &SR) buidAccessFunctions(R, R->getExit(), /* only PHI's / true); and an `onlyPHIs` parameter in the buildAccessFunctions method? jdoerfert:* Can we remove the ``buildExitPHIAccessFunctions`` part if we add here something like: if (&R…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I refactored this part, has less specialized code now. Meinersbur: I refactored this part, has less specialized code now.
}		}

void TempScopInfo::buildAccessFunctions(Region &R, BasicBlock &BB,		void TempScopInfo::buildAccessFunctions(Region &R, BasicBlock &BB,
		bool OnlyPHIs,
		grosserUnsubmitted Done Reply Inline Actions they belong grosser: they belong
Region *NonAffineSubRegion) {		Region *NonAffineSubRegion) {
AccFuncSetType Functions;		AccFuncSetType Functions;
Loop *L = LI->getLoopFor(&BB);		Loop *L = LI->getLoopFor(&BB);

// The set of loops contained in non-affine subregions that are part of R.		// The set of loops contained in non-affine subregions that are part of R.
const ScopDetection::BoxedLoopsSetTy *BoxedLoops = SD->getBoxedLoops(&R);		const ScopDetection::BoxedLoopsSetTy *BoxedLoops = SD->getBoxedLoops(&R);

for (BasicBlock::iterator I = BB.begin(), E = --BB.end(); I != E; ++I) {		for (BasicBlock::iterator I = BB.begin(), E = --BB.end(); I != E; ++I) {
Instruction *Inst = I;		Instruction *Inst = I;

		grosserUnsubmitted Done Reply Inline Actions This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not their reads. grosser: This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions refactored Meinersbur: refactored
		// The PHI nodes of the exit block are precessed as if they were in the
		// scop's region. We need to create memory accesses for the PHI's arguments
		// such that we can demote it to an alloca location.
		// A block always starts with its PHI nodes and hence once we encouter a
		// non-PHI, subsequent instruction can be skipped.
		grosserUnsubmitted Done Reply Inline Actions To avoid spreading the knowledge about how to handle the exit nodes, you could give this function a parameter "OnlyPHINodes" and then set it to true when calling it for the exit node. Like this this piece of code does not need to know about exit-node specialities and the comment above buildAccessFunctions can explain everything. grosser: To avoid spreading the knowledge about how to handle the exit nodes, you could give this…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions I don't see any advantage of that. The unit of cohesion is the class, not the method. Meinersbur: I don't see any advantage of that. The unit of cohesion is the class, not the method.
		grosserUnsubmitted Done Reply Inline Actions Sure, but if I want to learn about how exit-node PHI nodes are code generated it is easier if I can look at one place and one comment, rather than having to look through the whole class to find the places they need special handling. Anyhow, this is just a minor stylistic comment. If you feel strong about this, feel free to leave it as it is. grosser: Sure, but if I want to learn about how exit-node PHI nodes are code generated it is easier if I…
		if (OnlyPHIs && !isa<PHINode>(Inst))
		break;

if (isa<LoadInst>(Inst) \|\| isa<StoreInst>(Inst))		if (isa<LoadInst>(Inst) \|\| isa<StoreInst>(Inst))
Functions.push_back(		Functions.push_back(
std::make_pair(buildIRAccess(Inst, L, &R, BoxedLoops), Inst));		std::make_pair(buildIRAccess(Inst, L, &R, BoxedLoops), Inst));

if (isIgnoredIntrinsic(Inst))		if (isIgnoredIntrinsic(Inst))
continue;		continue;

if (PHINode *PHI = dyn_cast<PHINode>(Inst))		if (PHINode *PHI = dyn_cast<PHINode>(Inst))
Show All 12 Lines	void TempScopInfo::buildAccessFunctions(Region &R, BasicBlock &BB,

if (Functions.empty())		if (Functions.empty())
return;		return;

AccFuncSetType &Accs = AccFuncMap[&BB];		AccFuncSetType &Accs = AccFuncMap[&BB];
Accs.insert(Accs.end(), Functions.begin(), Functions.end());		Accs.insert(Accs.end(), Functions.begin(), Functions.end());
}		}

		bool TempScopInfo::scopContains(Region &R, llvm::Instruction *Inst) const {
		if (R.contains(Inst))
		grosserUnsubmitted Not Done Reply Inline Actions This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not their reads. grosser: This does not seem to be needed, if we only model the writes to the exit PHI nodes, but not…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I don't understand this one Meinersbur: I don't understand this one
		return true;

		return (Inst->getParent() == R.getExit()) && isa<PHINode>(Inst);
		jdoerfertUnsubmitted Done Reply Inline Actions typo jdoerfert: typo
		}
		grosserUnsubmitted Done Reply Inline Actions point grosser: point

TempScop *TempScopInfo::buildTempScop(Region &R) {		TempScop *TempScopInfo::buildTempScop(Region &R) {
TempScop *TScop = new TempScop(R, AccFuncMap);		TempScop *TScop = new TempScop(R, AccFuncMap);

buildAccessFunctions(R, R);		buildAccessFunctions(R, R);

		grosserUnsubmitted Done Reply Inline Actions Would this be a good place for the large comment that explains why we need to model access functions in the exit node? grosser: Would this be a good place for the large comment that explains why we need to model access…
		// Simple instructions outside of the scop's region that depend on a scalar
		// inside the scop (an escaping value) can be handled at code generation by
		// just replacing the operand with the scop's value at exit time. (More
		// exactly, a PHI node in the merge block will select between the orginal
		// scalar and the generated one, depending on whether the original or
		// generated version has been executed)
		//
		// If there is a PHI instruction in the scop's region this is not directly
		// possible. The escaping value depends on which edge is taken from the
		// region. This property is not preserved/modeled in SCoPs; the generated code
		// always has just one exiting edge.
		// We therefore model these exit node PHIs like other PHIs in there region:
		// The PHI value is demoted to a memory location and the incoming blocks write
		// the incoming values to that location such that the value can be loaded from
		// there instead. The dependencies from these memory accesses ensure that
		// either only one of the is stored, or at least stored on the correct order
		// after code generation.
		// Usually there is also a read access for the PHI, but ScopInfo does not
		// create a statement for the exit block such they must be handled explicitely
		// in the code generator.
		buildAccessFunctions(R, *R.getExit(), true);

return TScop;		return TScop;
}		}

TempScop *TempScopInfo::getTempScop() const { return TempScopOfRegion; }		TempScop *TempScopInfo::getTempScop() const { return TempScopOfRegion; }

void TempScopInfo::print(raw_ostream &OS, const Module *) const {		void TempScopInfo::print(raw_ostream &OS, const Module *) const {
if (TempScopOfRegion)		if (TempScopOfRegion)
TempScopOfRegion->print(OS, SE, LI);		TempScopOfRegion->print(OS, SE, LI);
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

lib/CodeGen/BlockGenerators.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	BlockGenerator::BlockGenerator(PollyIRBuilder &B, LoopInfo &LI,
ScalarAllocaMapTy &PHIOpMap,		ScalarAllocaMapTy &PHIOpMap,
EscapeUsersAllocaMapTy &EscapeMap,		EscapeUsersAllocaMapTy &EscapeMap,
ValueToValueMap &GlobalMap,		ValueToValueMap &GlobalMap,
IslExprBuilder *ExprBuilder)		IslExprBuilder *ExprBuilder)
: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),		: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),
EntryBB(nullptr), PHIOpMap(PHIOpMap), ScalarMap(ScalarMap),		EntryBB(nullptr), PHIOpMap(PHIOpMap), ScalarMap(ScalarMap),
EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}		EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}

Value BlockGenerator::getNewValue(ScopStmt &Stmt, const Value Old,		Value BlockGenerator::getNewValue(Scop &S, const Value Old, ValueMapT &BBMap,
ValueMapT &BBMap, LoopToScevMapT &LTS,		LoopToScevMapT &LTS, Loop *L) const {
Loop *L) const {
// We assume constants never change.		// We assume constants never change.
// This avoids map lookups for many calls to this function.		// This avoids map lookups for many calls to this function.
if (isa<Constant>(Old))		if (isa<Constant>(Old))
return const_cast<Value *>(Old);		return const_cast<Value *>(Old);

if (Value *New = GlobalMap.lookup(Old)) {		if (Value *New = GlobalMap.lookup(Old)) {
if (Old->getType()->getScalarSizeInBits() <		if (Old->getType()->getScalarSizeInBits() <
New->getType()->getScalarSizeInBits())		New->getType()->getScalarSizeInBits())
Show All 9 Lines	if (SE.isSCEVable(Old->getType()))
if (const SCEV Scev = SE.getSCEVAtScope(const_cast<Value >(Old), L)) {		if (const SCEV Scev = SE.getSCEVAtScope(const_cast<Value >(Old), L)) {
if (!isa<SCEVCouldNotCompute>(Scev)) {		if (!isa<SCEVCouldNotCompute>(Scev)) {
const SCEV *NewScev = apply(Scev, LTS, SE);		const SCEV *NewScev = apply(Scev, LTS, SE);
ValueToValueMap VTV;		ValueToValueMap VTV;
VTV.insert(BBMap.begin(), BBMap.end());		VTV.insert(BBMap.begin(), BBMap.end());
VTV.insert(GlobalMap.begin(), GlobalMap.end());		VTV.insert(GlobalMap.begin(), GlobalMap.end());
NewScev = SCEVParameterRewriter::rewrite(NewScev, SE, VTV);		NewScev = SCEVParameterRewriter::rewrite(NewScev, SE, VTV);

Scop &S = *Stmt.getParent();
const DataLayout &DL =		const DataLayout &DL =
S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();		S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();
auto IP = Builder.GetInsertPoint();		auto IP = Builder.GetInsertPoint();

assert(IP != Builder.GetInsertBlock()->end() &&		assert(IP != Builder.GetInsertBlock()->end() &&
"Only instructions can be insert points for SCEVExpander");		"Only instructions can be insert points for SCEVExpander");
Value *Expanded =		Value *Expanded =
expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), IP);		expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), IP);

BBMap[Old] = Expanded;		BBMap[Old] = Expanded;
return Expanded;		return Expanded;
}		}
}		}

// A scop-constant value defined by a global or a function parameter.		// A scop-constant value defined by a global or a function parameter.
if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))		if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))
return const_cast<Value *>(Old);		return const_cast<Value *>(Old);

// A scop-constant value defined by an instruction executed outside the scop.		// A scop-constant value defined by an instruction executed outside the scop.
if (const Instruction *Inst = dyn_cast<Instruction>(Old))		if (const Instruction *Inst = dyn_cast<Instruction>(Old))
if (!Stmt.getParent()->getRegion().contains(Inst->getParent()))		if (!S.getRegion().contains(Inst->getParent()))
return const_cast<Value *>(Old);		return const_cast<Value *>(Old);

// The scalar dependence is neither available nor SCEVCodegenable.		// The scalar dependence is neither available nor SCEVCodegenable.
llvm_unreachable("Unexpected scalar dependence in region!");		llvm_unreachable("Unexpected scalar dependence in region!");
return nullptr;		return nullptr;
}		}

void BlockGenerator::copyInstScalar(ScopStmt &Stmt, const Instruction *Inst,		void BlockGenerator::copyInstScalar(ScopStmt &Stmt, const Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS) {		ValueMapT &BBMap, LoopToScevMapT &LTS) {
// We do not generate debug intrinsics as we did not investigate how to		// We do not generate debug intrinsics as we did not investigate how to
// copy them correctly. At the current state, they just crash the code		// copy them correctly. At the current state, they just crash the code
// generation as the meta-data operands are not correctly copied.		// generation as the meta-data operands are not correctly copied.
if (isa<DbgInfoIntrinsic>(Inst))		if (isa<DbgInfoIntrinsic>(Inst))
return;		return;

Instruction *NewInst = Inst->clone();		Instruction *NewInst = Inst->clone();

// Replace old operands with the new ones.		// Replace old operands with the new ones.
for (Value *OldOperand : Inst->operands()) {		for (Value *OldOperand : Inst->operands()) {
Value *NewOperand =		Value NewOperand = getNewValue(Stmt.getParent(), OldOperand, BBMap, LTS,
getNewValue(Stmt, OldOperand, BBMap, LTS, getLoopForInst(Inst));		getLoopForInst(Inst));

if (!NewOperand) {		if (!NewOperand) {
assert(!isa<StoreInst>(NewInst) &&		assert(!isa<StoreInst>(NewInst) &&
"Store instructions are always needed!");		"Store instructions are always needed!");
delete NewInst;		delete NewInst;
return;		return;
}		}

Show All 14 Lines	Value *BlockGenerator::generateLocationAccessed(

isl_ast_expr *AccessExpr = isl_id_to_ast_expr_get(NewAccesses, MA.getId());		isl_ast_expr *AccessExpr = isl_id_to_ast_expr_get(NewAccesses, MA.getId());

if (AccessExpr) {		if (AccessExpr) {
AccessExpr = isl_ast_expr_address_of(AccessExpr);		AccessExpr = isl_ast_expr_address_of(AccessExpr);
return ExprBuilder->create(AccessExpr);		return ExprBuilder->create(AccessExpr);
}		}

return getNewValue(Stmt, Pointer, BBMap, LTS, getLoopForInst(Inst));		return getNewValue(*Stmt.getParent(), Pointer, BBMap, LTS,
		getLoopForInst(Inst));
}		}

Loop BlockGenerator::getLoopForInst(const llvm::Instruction Inst) {		Loop BlockGenerator::getLoopForInst(const llvm::Instruction Inst) {
return LI.getLoopFor(Inst->getParent());		return LI.getLoopFor(Inst->getParent());
}		}

Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, const LoadInst Load,		Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, const LoadInst Load,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
const Value *Pointer = Load->getPointerOperand();		const Value *Pointer = Load->getPointerOperand();
Value *NewPointer =		Value *NewPointer =
generateLocationAccessed(Stmt, Load, Pointer, BBMap, LTS, NewAccesses);		generateLocationAccessed(Stmt, Load, Pointer, BBMap, LTS, NewAccesses);
Value *ScalarLoad = Builder.CreateAlignedLoad(		Value *ScalarLoad = Builder.CreateAlignedLoad(
NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");		NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");
return ScalarLoad;		return ScalarLoad;
}		}

void BlockGenerator::generateScalarStore(ScopStmt &Stmt, const StoreInst *Store,		void BlockGenerator::generateScalarStore(ScopStmt &Stmt, const StoreInst *Store,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
const Value *Pointer = Store->getPointerOperand();		const Value *Pointer = Store->getPointerOperand();
Value *NewPointer =		Value *NewPointer =
generateLocationAccessed(Stmt, Store, Pointer, BBMap, LTS, NewAccesses);		generateLocationAccessed(Stmt, Store, Pointer, BBMap, LTS, NewAccesses);
Value *ValueOperand = getNewValue(Stmt, Store->getValueOperand(), BBMap, LTS,		Value ValueOperand = getNewValue(Stmt.getParent(), Store->getValueOperand(),
getLoopForInst(Store));		BBMap, LTS, getLoopForInst(Store));

Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());		Builder.CreateAlignedStore(ValueOperand, NewPointer, Store->getAlignment());
}		}

void BlockGenerator::copyInstruction(ScopStmt &Stmt, const Instruction *Inst,		void BlockGenerator::copyInstruction(ScopStmt &Stmt, const Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {

// First check for possible scalar dependences for this instruction.		// First check for possible scalar dependences for this instruction.
generateScalarLoads(Stmt, Inst, BBMap);		generateScalarLoads(Stmt, Inst, BBMap);

// Terminator instructions control the control flow. They are explicitly		// Terminator instructions control the control flow. They are explicitly
// expressed in the clast and do not need to be copied.		// expressed in the clast and do not need to be copied.
if (Inst->isTerminator())		if (Inst->isTerminator())
return;		return;

Loop *L = getLoopForInst(Inst);		Loop *L = getLoopForInst(Inst);
if ((Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&		if ((Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&
canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion())) {		canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion())) {
Value *NewValue = getNewValue(Stmt, Inst, BBMap, LTS, L);		Value NewValue = getNewValue(Stmt.getParent(), Inst, BBMap, LTS, L);
BBMap[Inst] = NewValue;		BBMap[Inst] = NewValue;
return;		return;
}		}

if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {		if (const LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
Value *NewLoad = generateScalarLoad(Stmt, Load, BBMap, LTS, NewAccesses);		Value *NewLoad = generateScalarLoad(Stmt, Load, BBMap, LTS, NewAccesses);
// Compute NewLoad before its insertion in BBMap to make the insertion		// Compute NewLoad before its insertion in BBMap to make the insertion
// deterministic.		// deterministic.
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	assert(Stmt.isBlockStmt() && BB == Stmt.getBasicBlock() &&
"function in the RegionGenerator");		"function in the RegionGenerator");

for (MemoryAccess *MA : Stmt) {		for (MemoryAccess *MA : Stmt) {
if (!MA->isScalar() \|\| MA->isRead())		if (!MA->isScalar() \|\| MA->isRead())
continue;		continue;

Value *Val = MA->getAccessValue();		Value *Val = MA->getAccessValue();
auto Address = getOrCreateAlloca(MA);		auto Address = getOrCreateAlloca(MA);

Val = getNewScalarValue(Val, R, BBMap);		Val = getNewScalarValue(Val, R, BBMap);
		MeinersburAuthorUnsubmitted Done Reply Inline Actions The change has been dropped as it is not necessary anymore. Meinersbur: The change has been dropped as it is not necessary anymore.
Builder.CreateStore(Val, Address);		Builder.CreateStore(Val, Address);
}		}
		grosserUnsubmitted Done Reply Inline Actions I do not think this change is needed. If I drop this no test case fails and we also will never iterate over the instructions of an exit node, hence 'Base' will always be contained in the region. grosser: I do not think this change is needed. If I drop this no test case fails and we also will never…
		jdoerfertUnsubmitted Done Reply Inline Actions I think we have to options here. Either we treat the exit PHI nodes as "almost regular" PHI nodes than we need some special handling here or we add the special handling later in the pipeline and skip them here. As this patch choose the latter option I am not so sure that we can drop the above code, however we probably could if we later use the location created for the phi operands here in the special handling of the exit PHI nodes. jdoerfert: I think we have to options here. Either we treat the exit PHI nodes as "almost regular" PHI…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions "Base" is not the instruction to copy (which is "Inst"), but the virtual address accessed. In case of scalars the PHI node itself is abused as address and can well be the one in the exit node. I know for certain that this condition hits. However, it might be still a mistake. The code below is responsible to write the value to the "virtual PHI address" for this incoming block. Not writing it means undef value. Meinersbur: "Base" is not the instruction to copy (which is "Inst"), but the virtual address accessed. In…
		jdoerfertUnsubmitted Done Reply Inline Actions The code below is responsible to write the value to the "virtual PHI address" for this incoming block. That's what I was hinting at with: we probably could [remove this code] if we later use the location created for the phi operands here in the special handling of the exit PHI nodes. jdoerfert: > The code below is responsible to write the value to the "virtual PHI address" for this…
}		}

		void BlockGenerator::handleExitingPHIs(Scop &S) {
		grosserUnsubmitted Done Reply Inline Actions It seems you are also running this code on ExitingBlocks that do _not_ result from region simplification. PHI nodes in these BasicBlocks never need this code to be run, no? If we keep a note that we simplified the region (and that the exiting node is now not modeled) and only then run this code, this code below could possiby be a little bit shorter and we would also not need to reason about if this code actually does the right thing for ExitNodes that do not result from simplification. grosser: It seems you are also running this code on ExitingBlocks that do _not_ result from region…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions polly::simplifyRegion is not about "modeling exit nodes". It is just to ensure that there is a single exiting edge. Adding such thing would take away the general purpose of that function and I still think it belongs closer to RegionInfo than to polly. Having robust functions that do work in general cases is a good thing. Meinersbur: polly::simplifyRegion is not about "modeling exit nodes". It is just to ensure that there is a…
		grosserUnsubmitted Done Reply Inline Actions @Meinsburg: I was just proposing that polly::simplifyRegion reports if it simplified the exit node (or we detect it otherwise). This still seems to be general purpose. From this information we can then derive that we only need to run this code if the simplification actually happened and can get rid of the code that makes sure we do not do anything if we hit PHI-nodes that are modeled as part of the region. Again, this is getting very detailed. If you feel strong, I am fine leaving the code as it is and possibly improve it later. grosser: @Meinsburg: I was just proposing that polly::simplifyRegion reports if it simplified the exit…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I ensured that there is a distinct block just for such PHIs. That block will be empty if there is nothing to be done. I may have dropped that part prematurely in the last patch. Meinersbur: I ensured that there is a distinct block just for such PHIs. That block will be empty if there…
		jdoerfertUnsubmitted Done Reply Inline Actions I think Tobias idea is good but only one possibility. What comes to mind are at least three possibilites: We could track if we simplified a region. We could only model the exit PHI nodes if we will simplify the region. We could always run the code and distinguish here what to do. I am voting for 1) or 2) while I slightly favour 2). Regarding the code here, it looks like the handleOutsideUses code somewhere and does basically the same thing, hence we should probably try to reuse a maybe generalized version of the handleOutsideUses here instead of copying it. jdoerfert: I think Tobias idea is good but only one possibility. What comes to mind are at least three…
		Region &R = S.getRegion();

		// The exit block of the __unoptimized__ region.
		BasicBlock *OrigExitingBB = R.getExitingBlock();
		// The merge block __just after__ the region and the optimized region.
		BasicBlock *MergeBB = R.getExit();

		// The exit block of the __optimized__ region.
		BasicBlock OptExitingBB = (pred_begin(MergeBB));
		if (OptExitingBB == OrigExitingBB)
		OptExitingBB = *(++pred_begin(MergeBB));

		EntryBB = &MergeBB->getParent()->getEntryBlock();
		Builder.SetInsertPoint(OptExitingBB->getTerminator());
		// TODO: Refactor with same functionality to find the optimized exiting BB in
		// createScalarFinalization.

		// Due to region simplification the exit node PHIs are now in the region's
		// exiting node. CodeGeneration creates this new exiting node unconditionally
		// such that there are only such PHIs in this node.
		// Because there is no ScopStmt for this BB, we iterate manually over all
		// instructions and process its instructions as they would if they were inside
		// the region.
		for (Instruction &Inst : *OrigExitingBB) {
		if (Inst.isTerminator())
		break;
		assert(isa<PHINode>(Inst));

		jdoerfertUnsubmitted Not Done Reply Inline Actions I wouldn't bet on it. Textual order != execution order, I had to fix a bug like this the other week. jdoerfert: I wouldn't bet on it. Textual order != execution order, I had to fix a bug like this the other…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Can you elaborate on the bug? Do you mean the iteration order? There is a lot of other code in LLVM's core (e.g. BasicBlock::getFirstNonPHI or SimplifyCFG) that assumes that instruction lists are iterated in execution order, so I guess we can assume this as well. Meinersbur: Can you elaborate on the bug? Do you mean the iteration order? There is a lot of other code in…
		// Get the PHI's value.
		Value *Val;
		if (canSynthesize(&Inst, &LI, &SE, &R)) {
		LoopToScevMapT EmptyLoopMap;
		ValueMapT EmptyValueMap;
		Val = getNewValue(S, &Inst, EmptyValueMap, EmptyLoopMap, nullptr);
		} else {
		Value *Address = getOrCreatePHIAlloca(&Inst);
		Val = Builder.CreateLoad(Address, Address->getName() + ".reload");
		}

		// Store the value as scalar (and update ScalarMap), as expected for
		// escaping values.
		Value *S2a = getOrCreateScalarAlloca(&Inst);
		Builder.CreateStore(Val, S2a);

		// Update the set of escaping values.
		handleOutsideUsers(R, &Inst, Val);
		}
		}

void BlockGenerator::createScalarInitialization(Scop &S) {		void BlockGenerator::createScalarInitialization(Scop &S) {
Region &R = S.getRegion();		Region &R = S.getRegion();
// The split block __just before__ the region and optimized region.		// The split block __just before__ the region and optimized region.
BasicBlock *SplitBB = R.getEnteringBlock();		BasicBlock *SplitBB = R.getEnteringBlock();
BranchInst *SplitBBTerm = cast<BranchInst>(SplitBB->getTerminator());		BranchInst *SplitBBTerm = cast<BranchInst>(SplitBB->getTerminator());
assert(SplitBBTerm->getNumSuccessors() == 2 && "Bad region entering block!");		assert(SplitBBTerm->getNumSuccessors() == 2 && "Bad region entering block!");

// Get the start block of the __optimized__ region.		// Get the start block of the __optimized__ region.
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	for (const auto &EscapeMapping : EscapeMap) {

// Replace all uses of the demoted instruction with the merge PHI.		// Replace all uses of the demoted instruction with the merge PHI.
for (Instruction *EUser : EscapeUsers)		for (Instruction *EUser : EscapeUsers)
EUser->replaceUsesOfWith(EscapeInst, MergePHI);		EUser->replaceUsesOfWith(EscapeInst, MergePHI);
}		}
}		}

void BlockGenerator::finalizeSCoP(Scop &S) {		void BlockGenerator::finalizeSCoP(Scop &S) {
		Region &R = S.getRegion();

		handleExitingPHIs(S);
createScalarInitialization(S);		createScalarInitialization(S);
createScalarFinalization(S.getRegion());		createScalarFinalization(R);
}		}

VectorBlockGenerator::VectorBlockGenerator(BlockGenerator &BlockGen,		VectorBlockGenerator::VectorBlockGenerator(BlockGenerator &BlockGen,
std::vector<LoopToScevMapT> &VLTS,		std::vector<LoopToScevMapT> &VLTS,
isl_map *Schedule)		isl_map *Schedule)
: BlockGenerator(BlockGen), VLTS(VLTS), Schedule(Schedule) {		: BlockGenerator(BlockGen), VLTS(VLTS), Schedule(Schedule) {
assert(Schedule && "No statement domain provided");		assert(Schedule && "No statement domain provided");
}		}

Value VectorBlockGenerator::getVectorValue(ScopStmt &Stmt, const Value Old,		Value VectorBlockGenerator::getVectorValue(ScopStmt &Stmt, const Value Old,
ValueMapT &VectorMap,		ValueMapT &VectorMap,
VectorValueMapT &ScalarMaps,		VectorValueMapT &ScalarMaps,
Loop *L) {		Loop *L) {
if (Value *NewValue = VectorMap.lookup(Old))		if (Value *NewValue = VectorMap.lookup(Old))
return NewValue;		return NewValue;

int Width = getVectorWidth();		int Width = getVectorWidth();

Value *Vector = UndefValue::get(VectorType::get(Old->getType(), Width));		Value *Vector = UndefValue::get(VectorType::get(Old->getType(), Width));

for (int Lane = 0; Lane < Width; Lane++)		for (int Lane = 0; Lane < Width; Lane++)
Vector = Builder.CreateInsertElement(		Vector = Builder.CreateInsertElement(
Vector, getNewValue(Stmt, Old, ScalarMaps[Lane], VLTS[Lane], L),		Vector,
		getNewValue(*Stmt.getParent(), Old, ScalarMaps[Lane], VLTS[Lane], L),
Builder.getInt32(Lane));		Builder.getInt32(Lane));

VectorMap[Old] = Vector;		VectorMap[Old] = Vector;

return Vector;		return Vector;
}		}

Type VectorBlockGenerator::getVectorPtrTy(const Value Val, int Width) {		Type VectorBlockGenerator::getVectorPtrTy(const Value Val, int Width) {
▲ Show 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	void RegionGenerator::addOperandToPHI(ScopStmt &Stmt, const PHINode *PHI,

Value *OpCopy = nullptr;		Value *OpCopy = nullptr;
if (StmtR->contains(IncomingBB)) {		if (StmtR->contains(IncomingBB)) {
assert(RegionMaps.count(BBCopy) &&		assert(RegionMaps.count(BBCopy) &&
"Incoming PHI block did not have a BBMap");		"Incoming PHI block did not have a BBMap");
ValueMapT &BBCopyMap = RegionMaps[BBCopy];		ValueMapT &BBCopyMap = RegionMaps[BBCopy];

Value *Op = PHI->getIncomingValueForBlock(IncomingBB);		Value *Op = PHI->getIncomingValueForBlock(IncomingBB);
OpCopy = getNewValue(Stmt, Op, BBCopyMap, LTS, getLoopForInst(PHI));		OpCopy =
		getNewValue(*Stmt.getParent(), Op, BBCopyMap, LTS, getLoopForInst(PHI));
} else {		} else {

if (PHICopy->getBasicBlockIndex(BBCopy) >= 0)		if (PHICopy->getBasicBlockIndex(BBCopy) >= 0)
return;		return;

Value PHIOpAddr = getOrCreatePHIAlloca(const_cast<PHINode >(PHI));		Value PHIOpAddr = getOrCreatePHIAlloca(const_cast<PHINode >(PHI));
OpCopy = new LoadInst(PHIOpAddr, PHIOpAddr->getName() + ".reload",		OpCopy = new LoadInst(PHIOpAddr, PHIOpAddr->getName() + ".reload",
BlockMap[IncomingBB]->getTerminator());		BlockMap[IncomingBB]->getTerminator());
Show All 20 Lines

lib/CodeGen/CodeGeneration.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	DEBUG({
F.print(errs());		F.print(errs());
errs() << "\n== The errors ==\n";		errs() << "\n== The errors ==\n";
verifyFunction(F, &errs());		verifyFunction(F, &errs());
});		});

return true;		return true;
}		}

		// Create a dedicated block for exit node PHIs.
		// This special block is made the exiting block such that the PHIs become
		// included into the region. The exit block itself keeps its identity and have
		// a single predecessor. The dedicated PHI block is returned.
		BasicBlock createExitingBlockForExitNodePHIs(Region R) {
		BasicBlock *ExitBB = R->getExit();

		SmallVector<BasicBlock *, 4> NonPreds;
		for (BasicBlock *P : predecessors(ExitBB))
		if (!R->contains(P))
		NonPreds.push_back(P);

		// Preds[0] Preds[1] otherBB //
		// \ \| ________/ //
		// \ \| / //
		// ExitBB //
		BasicBlock *NewExitBlock =
		splitBlockNonPredecessors(ExitBB, NonPreds, ".exit", DT, LI, SE);
		// Preds[0] Preds[1] otherBB //
		// \ / / //
		// ExitBB / //
		// \ / //
		// NewExitBlock //

		// Set the region of the newly created block.
		Region *RegionOfExit = RI->getRegionFor(ExitBB);
		RI->setRegionFor(NewExitBlock, RegionOfExit);

		// If there was a region with ExitBB as entry block, change it to
		// NewExitBlock.
		while (RegionOfExit && !RegionOfExit->isTopLevelRegion() &&
		RegionOfExit->getEntry() == ExitBB) {
		RegionOfExit->replaceEntry(NewExitBlock);
		RegionOfExit = RegionOfExit->getParent();
		}

		// Change the exit node of R, but not its subregions because they can keep
		// ExitBB as their exit block.
		// This ensures that the region of ExitBB is always R itself.
		R->replaceExit(NewExitBlock);
		RI->setRegionFor(ExitBB, R);

		// Make NewExitBlock the new exit block of all other regions that previously
		// had ExitBB as exit block.
		for (BasicBlock *PotentialExiting : predecessors(NewExitBlock)) {
		if (PotentialExiting == NewExitBlock)
		continue;

		Region *OtherR = RI->getRegionFor(PotentialExiting);
		while (OtherR && !OtherR->isTopLevelRegion() &&
		OtherR->getExit() == ExitBB) {
		OtherR->replaceExit(NewExitBlock);
		OtherR = OtherR->getParent();
		}
		}

		assert(R->contains(ExitBB));
		assert(!R->contains(NewExitBlock));
		assert(R->getExit() == NewExitBlock);
		assert(R->getExitingBlock() == ExitBB);

		return ExitBB; // The block containing the PHIs
		}

// CodeGeneration adds a lot of BBs without updating the RegionInfo		// CodeGeneration adds a lot of BBs without updating the RegionInfo
// We make all created BBs belong to the scop's parent region without any		// We make all created BBs belong to the scop's parent region without any
// nested structure to keep the RegionInfo verifier happy.		// nested structure to keep the RegionInfo verifier happy.
void fixRegionInfo(Function F, Region ParentRegion) {		void fixRegionInfo(Function F, Region ParentRegion) {
for (BasicBlock &BB : *F) {		for (BasicBlock &BB : *F) {
if (RI->getRegionFor(&BB))		if (RI->getRegionFor(&BB))
continue;		continue;

Show All 14 Lines	bool runOnScop(Scop &S) override {
SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
DL = &S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();		DL = &S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();
RI = &getAnalysis<RegionInfoPass>().getRegionInfo();		RI = &getAnalysis<RegionInfoPass>().getRegionInfo();
Region *R = &S.getRegion();		Region *R = &S.getRegion();
assert(!R->isTopLevelRegion() && "Top level regions are not supported");		assert(!R->isTopLevelRegion() && "Top level regions are not supported");

Annotator.buildAliasScopes(S);		Annotator.buildAliasScopes(S);

		createExitingBlockForExitNodePHIs(R);
simplifyRegion(R, DT, LI, RI);		simplifyRegion(R, DT, LI, RI);
assert(R->isSimple());		assert(R->isSimple());
BasicBlock *EnteringBB = S.getRegion().getEnteringBlock();		BasicBlock *EnteringBB = S.getRegion().getEnteringBlock();
assert(EnteringBB);		assert(EnteringBB);
PollyIRBuilder Builder = createPollyIRBuilder(EnteringBB, Annotator);		PollyIRBuilder Builder = createPollyIRBuilder(EnteringBB, Annotator);

IslNodeBuilder NodeBuilder(Builder, Annotator, this, DL, LI, SE, DT, S);		IslNodeBuilder NodeBuilder(Builder, Annotator, this, DL, LI, SE, DT, S);

▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/Support/SCEVValidator.cpp

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	for (size_t i = 0; i < Expr->getNumOperands(); ++i)
return true;		return true;

return false;		return false;
}		}

bool visitUnknown(const SCEVUnknown *Expr) {		bool visitUnknown(const SCEVUnknown *Expr) {
Instruction *Inst = dyn_cast<Instruction>(Expr->getValue());		Instruction *Inst = dyn_cast<Instruction>(Expr->getValue());

		// For non-simple regions, the value of a PHI depends on the control flow
		// when exiting the region. Hence, we consider this a region dependency.
		if (Inst && isa<PHINode>(Inst) && Inst->getParent() == R->getExit() &&
		!R->getExitingBlock())
		return true;

// Return true when Inst is defined inside the region R.		// Return true when Inst is defined inside the region R.
if (Inst && R->contains(Inst))		if (Inst && R->contains(Inst))
return true;		return true;
		grosserUnsubmitted Done Reply Inline Actions Is this really needed? To my understanding, there is no way a SCEV expression that is used in the scop will reference any of the PHI nodes in the exit block. grosser: Is this really needed? To my understanding, there is no way a SCEV expression that is used in…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions I am not sure enough to remove this condition. Are you? Could it be the induction variable of a parent loop? Meinersbur: I am not sure enough to remove this condition. Are you? Could it be the induction variable of…
		grosserUnsubmitted Done Reply Inline Actions I am pretty certain none of these conditions is needed. I propose to drop them if we can not find a test case which requires them. My reasoning. Any value that is part of a SCEV needs to dominate the location at which this SCEV is evaluated. The values in the exit block of a scop do not dominate any of the values inside the scop. There is one very funny case which I did not think fully throuhg, which is a scop in the backedge of a larger loop, where the exit of this scop is the header of the larger loop and where some of the PHI nodes in this header are again used in the scop. However, if such a case actually can be created, I am not yet convinced that what you do is right here. Before we put some random instructions in, I propose to create a test-case first or, if this fails, to just place an assert that warns us in case we encounter such piece of code. grosser: I am pretty certain none of these conditions is needed. I propose to drop them if we can not…
		MeinersburAuthorUnsubmitted Done Reply Inline Actions It actually is needed or lnt test will fail. However, I can instead pre-check whether the PHI is in the exit block as in D12051. Meinersbur: It actually is needed or lnt test will fail. However, I can instead pre-check whether the PHI…
		grosserUnsubmitted Not Done Reply Inline Actions Interesting. It would be good to add a test case to ensure we do not regress here. With the current patch, I can uncomment this code and no test starts failing. grosser: Interesting. It would be good to add a test case to ensure we do not regress here. With the…
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I'll think of one Meinersbur: I'll think of one
		MeinersburAuthorUnsubmitted Not Done Reply Inline Actions Addendum: I tried this and it changed the output (test case non-affine-loop-condition-dependent-access_3.ll) because the pre-check would be overly pessimistic in case the PHI results in a constant. Hence I prefer this solution. LNT passes both variants, though. Meinersbur: Addendum: I tried this and it changed the output (test case non-affine-loop-condition-dependent…

return false;		return false;
}		}

private:		private:
const Region *R;		const Region *R;
};		};

▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

lib/Support/ScopHelper.cpp

	Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines
	};			};

	Value *polly::expandCodeFor(Scop &S, ScalarEvolution &SE, const DataLayout &DL,			Value *polly::expandCodeFor(Scop &S, ScalarEvolution &SE, const DataLayout &DL,
	const char Name, const SCEV E, Type *Ty,			const char Name, const SCEV E, Type *Ty,
	Instruction *IP) {			Instruction *IP) {
	ScopExpander Expander(S.getRegion(), SE, DL, Name);			ScopExpander Expander(S.getRegion(), SE, DL, Name);
	return Expander.expandCodeFor(E, Ty, IP);			return Expander.expandCodeFor(E, Ty, IP);
	}			}

				BasicBlock polly::splitBlockNonPredecessors(BasicBlock BB,
				ArrayRef<BasicBlock *> NonPreds,
				llvm::StringRef Suffix,
				DominatorTree DT, LoopInfo LI,
				ScalarEvolution *SE) {
				// Do not attempt to split blocks that cannot be split.
				if (!BB->canSplitPredecessors())
				return nullptr;

				// Currently we cannot handle landing pads.
				if (BB->isLandingPad())
				return nullptr;

				// Before:
				//
				// NonPreds[0] //
				// \ / //
				// BB //
				// / \ //

				// Create new basic block, insert right after the original block.
				auto OldName = BB->getName();
				BasicBlock *NewBB =
				splitBlock(BB, BB->getFirstInsertionPt(), DT, LI, nullptr);
				NewBB->setName(OldName + Suffix);

				// Create new PHIs into the new block.
				for (auto I = BB->begin(); isa<PHINode>(I); ++I) {
				PHINode OldPHI = cast<PHINode>(&I);
				PHINode *NewPHI = PHINode::Create(OldPHI->getType(), NonPreds.size() + 1,
				OldPHI->getName() + ".np",
				NewBB->getFirstInsertionPt());

				// Replaces uses of the original PHI by the new ones.
				OldPHI->replaceAllUsesWith(NewPHI);

				// Add an edge from the original block (this adds a use)
				NewPHI->addIncoming(OldPHI, BB);

				// Move the incoming block's values.
				for (BasicBlock *Edge : NonPreds) {
				BasicBlock *RedirectedEdge = Edge == BB ? NewBB : Edge;
				auto i = OldPHI->getBasicBlockIndex(RedirectedEdge);
				assert(i >= 0);
				Value *Val = OldPHI->getIncomingValue(i);
				NewPHI->addIncoming(Val, RedirectedEdge);
				OldPHI->removeIncomingValue(i, false);

				if (SE)
				SE->forgetValue(OldPHI);
				}
				}

				// If there were no edges to move, there is nothing more to do.
				if (NonPreds.empty())
				return NewBB;

				// Move all non-pred edges to the new BB.
				for (BasicBlock *Edge : NonPreds) {
				BasicBlock *RedirectedEdge = Edge == BB ? NewBB : Edge;
				assert(!isa<IndirectBrInst>(RedirectedEdge->getTerminator()) &&
				"Cannot split an edge from an IndirectBrInst");
				RedirectedEdge->getTerminator()->replaceUsesOfWith(BB, NewBB);
				}

				if (DT) {
				// DominatorTree::splitBlock assumes that it is the predecessor which is
				// new. In order to reuse that function, we recreate the situation as if it
				// was.
				auto BBNode = DT->getNode(BB);
				auto NewBBNode = DT->getNode(NewBB);
				DT->changeImmediateDominator(NewBBNode, BBNode->getIDom());
				DT->eraseNode(BB);

				DT->splitBlock(BB);
				}

				if (LI) {
				BasicBlock *OldBB = BB;
				Loop *L = LI->getLoopFor(OldBB);
				while (L) {
				assert(L->contains(OldBB));

				bool OldBBReachableFromInside = false;
				for (BasicBlock *Pred : predecessors(OldBB)) {
				if (Pred == NewBB \|\| L->contains(Pred)) {
				OldBBReachableFromInside = true;
				break;
				}
				}

				bool NewBBReachableFromOutside = false;
				for (auto Pred : make_range(pred_begin(NewBB), pred_end(NewBB))) {
				if (Pred == OldBB)
				continue;

				if (Pred != NewBB && !L->contains(Pred)) {
				NewBBReachableFromOutside = true;
				break;
				}
				}

				if (NewBBReachableFromOutside \|\| !OldBBReachableFromInside) {
				// Remove BB from the loop and make the NewBB the next loop header.
				// Since NewBB dominates OldBB, it can be the new header.
				if (L->getHeader() == OldBB)
				L->moveToHeader(NewBB);
				}

				if (!OldBBReachableFromInside) {
				// OldBB cannot be reached from the loop header anymore, i.e. it has
				// been excluded from the loop.
				L->removeBlockFromLoop(BB);
				LI->changeLoopFor(BB, L->getParentLoop());
				}

				L = L->getParentLoop();
				}
				}

				// After:
				//
				// NonPreds[0] //
				// \ / //
				// BB / //
				// \ / //
				// NewBB //
				// / \ //

				return NewBB;
				}
				No newline at end of file

test/Isl/CodeGen/inner_scev_sdiv_2.ll

	; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable -polly-codegen < %s
	;			;
	; The SCEV expression in this test case refers to a sequence of sdiv			; The SCEV expression in this test case refers to a sequence of sdiv
	; instructions, which are part of different bbs in the SCoP. When code			; instructions, which are part of different bbs in the SCoP. When code
	; generating the parameter expressions, the code that is generated by the SCEV			; generating the parameter expressions, the code that is generated by the SCEV
	; expander has still references to the in-scop instructions, which is invalid.			; expander has still references to the in-scop instructions, which is invalid.
	;			;
	; CHECK: polly.split_new_and_old:			; CHECK: polly.split_new_and_old:
	; CHECK-NOT: = sdiv i64 0, -4			; CHECK-NOT: = sdiv i64 0, -4
	Show All 38 Lines

test/Isl/CodeGen/loop_with_conditional_entry_edge_split_hard_case.ll

	Show All 13 Lines
	; }			; }
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @jd(i32 %b, i32* %A) {			define void @jd(i32 %b, i32* %A) {
	entry:			entry:
	br label %while.begin			br label %while.begin

	; CHECK-LABEL: while.begin.region_exiting:			; CHECK-LABEL: while.begin:
	; CHECK: br label %polly.merge_new_and_old			; CHECK: br label %polly.merge_new_and_old

	; CHECK-LABEL: while.begin:			; CHECK-LABEL: while.begin.exit:
	while.begin:			while.begin:
	; CHECK: %call = call i32 @f()			; CHECK: %call = call i32 @f()
	%call = call i32 @f()			%call = call i32 @f()
	; CHECK: %tobool = icmp eq i32 %call, 0			; CHECK: %tobool = icmp eq i32 %call, 0
	%tobool = icmp eq i32 %call, 0			%tobool = icmp eq i32 %call, 0
	; CHECK: br i1 %tobool, label %while.end, label %polly.split_new_and_old			; CHECK: br i1 %tobool, label %while.end, label %polly.split_new_and_old
	br i1 %tobool, label %while.end, label %if			br i1 %tobool, label %while.end, label %if

	Show All 30 Lines

test/Isl/CodeGen/phi_loop_carried_float.ll

	; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable -polly-codegen < %s \| FileCheck %s			; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable -polly-codegen < %s \| FileCheck %s
	;			;
	; float f(float *A, int N) {			; float f(float *A, int N) {
	; float tmp = 0;			; float tmp = 0;
	; for (int i = 0; i < N; i++)			; for (int i = 0; i < N; i++)
	; tmp += A[i];			; tmp += A[i];
	; }			; }
	;			;
	; CHECK: bb:			; CHECK: bb:
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float			; CHECK-NOT: %tmp7{{[.*]}} = alloca float
	; CHECK-DAG: %tmp.0.s2a = alloca float			; CHECK-DAG: %tmp.0.s2a = alloca float
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float			; CHECK-NOT: %tmp7{{[.*]}} = alloca float
	; CHECK-DAG: %tmp.0.phiops = alloca float			; CHECK-DAG: %tmp.0.phiops = alloca float
	; CHECK-NOT: %tmp7{{[.*]}} = alloca float			; CHECK-NOT: %tmp7{{[.*]}} = alloca float

	; CHECK-LABEL: exit:			; CHECK-LABEL: polly.merge_new_and_old:
				; CHECK-NEXT: br label %exit.exit

				; CHECK-LABEL: exit.exit:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops			; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops

	; CHECK-LABEL: polly.merge:			; CHECK-LABEL: polly.merge:
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Isl/CodeGen/phi_loop_carried_float_escape.ll

	; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable \			; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable \
	; RUN: -polly-analyze-read-only-scalars=false -polly-codegen < %s \| FileCheck %s			; RUN: -polly-analyze-read-only-scalars=false -polly-codegen < %s \| FileCheck %s

	; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable \			; RUN: opt %loadPolly -S -polly-no-early-exit -polly-detect-unprofitable \
	; RUN: -polly-analyze-read-only-scalars=true -polly-codegen < %s \| FileCheck %s			; RUN: -polly-analyze-read-only-scalars=true -polly-codegen < %s \| FileCheck %s
	;			;
	; float f(float *A, int N) {			; float f(float *A, int N) {
	; float tmp = 0;			; float tmp = 0;
	; for (int i = 0; i < N; i++)			; for (int i = 0; i < N; i++)
	; tmp += A[i];			; tmp += A[i];
	; return tmp;			; return tmp;
	; }			; }

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK-NEXT: %tmp.0.merge = phi float [ %tmp.0.final_reload, %polly.merge ], [ %tmp.0, %bb8 ]			; CHECK-NEXT: %tmp.0.merge = phi float [ %tmp.0.final_reload, %polly.merge ], [ %tmp.0, %exit ]
	; CHECK-NEXT: br label %exit			; CHECK-NEXT: br label %exit.exit

				; CHECK-LABEL: exit.exit:
				; CHECK-NEXT: ret float %tmp.0.merge

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops			; CHECK-NEXT: store float 0.000000e+00, float* %tmp.0.phiops

	; CHECK-LABEL: polly.merge:			; CHECK-LABEL: polly.merge:
	; CHECK-NEXT: %tmp.0.final_reload = load float, float* %tmp.0.s2a			; CHECK-NEXT: %tmp.0.final_reload = load float, float* %tmp.0.s2a
	; CHECK-NEXT: br label %polly.merge_new_and_old			; CHECK-NEXT: br label %polly.merge_new_and_old

	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Isl/CodeGen/phi_scalar_simple_1.ll

	Show All 16 Lines
	; CHECK-DAG: %x.addr.1.s2a = alloca i32			; CHECK-DAG: %x.addr.1.s2a = alloca i32
	; CHECK-DAG: %x.addr.1.phiops = alloca i32			; CHECK-DAG: %x.addr.1.phiops = alloca i32
	; CHECK-DAG: %x.addr.0.s2a = alloca i32			; CHECK-DAG: %x.addr.0.s2a = alloca i32
	; CHECK-DAG: %x.addr.0.phiops = alloca i32			; CHECK-DAG: %x.addr.0.phiops = alloca i32
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	br label %for.cond			br label %for.cond

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.merge ], [ %x.addr.0, %for.cond ]			; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.merge ], [ %x.addr.0, %for.end6 ]
	; CHECK: ret i32 %x.addr.0.merge			; CHECK: ret i32 %x.addr.0.merge

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK-NEXT: store i32 %x, i32* %x.addr.0.phiops			; CHECK-NEXT: store i32 %x, i32* %x.addr.0.phiops

	; CHECK-LABEL: polly.merge:			; CHECK-LABEL: polly.merge:
	; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a			; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

test/Isl/CodeGen/phi_scalar_simple_2.ll

	Show All 18 Lines
	; CHECK-DAG: %x.addr.1.phiops = alloca i32			; CHECK-DAG: %x.addr.1.phiops = alloca i32
	; CHECK-DAG: %x.addr.0.s2a = alloca i32			; CHECK-DAG: %x.addr.0.s2a = alloca i32
	; CHECK-DAG: %x.addr.0.phiops = alloca i32			; CHECK-DAG: %x.addr.0.phiops = alloca i32
	%tmp = sext i32 %N to i64			%tmp = sext i32 %N to i64
	%tmp1 = sext i32 %c to i64			%tmp1 = sext i32 %c to i64
	br label %for.cond			br label %for.cond

	; CHECK-LABEL: polly.merge_new_and_old:			; CHECK-LABEL: polly.merge_new_and_old:
	; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.merge ], [ %x.addr.0, %for.cond ]			; CHECK: %x.addr.0.merge = phi i32 [ %x.addr.0.final_reload, %polly.merge ], [ %x.addr.0, %for.end7 ]
	; CHECK: ret i32 %x.addr.0.merge			; CHECK: ret i32 %x.addr.0.merge

	; CHECK-LABEL: polly.start:			; CHECK-LABEL: polly.start:
	; CHECK-NEXT: store i32 %x, i32* %x.addr.0.phiops			; CHECK-NEXT: store i32 %x, i32* %x.addr.0.phiops

	; CHECK-LABEL: polly.merge:			; CHECK-LABEL: polly.merge:
	; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a			; CHECK: %x.addr.0.final_reload = load i32, i32* %x.addr.0.s2a

	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/ScopDetect/keep_going_expansion.ll

Show All 36 Lines	for.end9: ; preds = %for.body4
%2 = load i32, i32* %arrayidx11, align 4		%2 = load i32, i32* %arrayidx11, align 4
%idxprom12 = sext i32 %n to i64		%idxprom12 = sext i32 %n to i64
%arrayidx13 = getelementptr inbounds i32, i32* %B, i64 %idxprom12		%arrayidx13 = getelementptr inbounds i32, i32* %B, i64 %idxprom12
%3 = load i32, i32* %arrayidx13, align 4		%3 = load i32, i32* %arrayidx13, align 4
%add = add nsw i32 %3, %2		%add = add nsw i32 %3, %2
ret i32 %add		ret i32 %add
}		}

; CHECK: Valid Region for Scop: for.body => for.cond2.preheader		; CHECK: Valid Region for Scop: for.body => for.body4

test/ScopDetect/multidim_indirect_access.ll

	; RUN: opt %loadPolly -polly-detect-unprofitable -polly-detect -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-detect-unprofitable -polly-detect -analyze < %s \| FileCheck %s
	;			;
	; The outer loop of this function will correctly not be recognized with the			; The outer loop of this function will correctly not be recognized with the
				jdoerfertUnsubmitted Done Reply Inline Actions We do not have independent blocks anymore. This is not for this commit to change but we should just remove the one run of this test case and the comment here. jdoerfert: We do not have independent blocks anymore. This is not for this commit to change but we should…
				MeinersburAuthorUnsubmitted Done Reply Inline Actions OK, I will update this test case. Meinersbur: OK, I will update this test case.
	; message:			; message:
	;			;
	; Non affine access function: (sext i32 %tmp to i64)			; SCEV of PHI node refers to SSA names in region
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions The error message changes with this patch Meinersbur: The error message changes with this patch
	;			;
	; The access A[x] might mistakenly be treated as a multidimensional access with			; The access A[x] might mistakenly be treated as a multidimensional access with
				jdoerfertUnsubmitted Done Reply Inline Actions Is this comment still valid afterwards? jdoerfert: Is this comment still valid afterwards?
				MeinersburAuthorUnsubmitted Not Done Reply Inline Actions I reformulated this in a previous commit such that it is not implementation-specific anymore. Meinersbur: I reformulated this in a previous commit such that it is not implementation-specific anymore.
	; dimension size x. This test will check that we correctly invalidate the			; dimension size x. This test will check that we correctly invalidate the
	; region and do not detect an outer SCoP.			; region and do not detect an outer SCoP.
	;			;
	; FIXME:
	; We should detect the inner region but the PHI node in the exit blocks
	; prohibits that.
	;
	; void f(int *A, long N) {			; void f(int *A, long N) {
	; int j = 0;			; int j = 0;
	; while (N > j) {			; while (N > j) {
	; int x = A[0];			; int x = A[0];
	; int i = 1;			; int i = 1;
	; do {			; do {
	; A[x] = 42;			; A[x] = 42;
	; A += x;			; A += x;
	; } while (i++ < N);			; } while (i++ < N);
	; }			; }
	; }			; }
	;			;
				; CHECK: Valid Region for Scop: bb1 => bb0
	; CHECK-NOT: Valid Region for Scop: bb0 => bb13			; CHECK-NOT: Valid Region for Scop: bb0 => bb13
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i64 %N) {			define void @f(i32* %A, i64 %N) {
	bb:			bb:
	br label %bb0			br label %bb0

	Show All 22 Lines

test/ScopDetect/non-affine-loop-condition-dependent-access_2.ll

	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \
	; RUN: -polly-allow-nonaffine-loops=false -polly-detect-unprofitable \			; RUN: -polly-allow-nonaffine-loops=false -polly-detect-unprofitable \
	; RUN: -analyze < %s \| FileCheck %s --check-prefix=REJECTNONAFFINELOOPS			; RUN: -analyze < %s \| FileCheck %s --check-prefix=REJECTNONAFFINELOOPS
	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \
	; RUN: -polly-allow-nonaffine-loops=true -polly-detect-unprofitable \			; RUN: -polly-allow-nonaffine-loops=true -polly-detect-unprofitable \
	; RUN: -analyze < %s \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPS			; RUN: -analyze < %s \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPS
	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine \
	; RUN: -polly-allow-nonaffine-branches -polly-allow-nonaffine-loops=true \			; RUN: -polly-allow-nonaffine-branches -polly-allow-nonaffine-loops=true \
	; RUN: -polly-detect-unprofitable -polly-detect-unprofitable -analyze < %s \			; RUN: -polly-detect-unprofitable -polly-detect-unprofitable -analyze < %s \
	; RUN: \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPSANDACCESSES			; RUN: \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPSANDACCESSES
	;			;
	; Here we have a non-affine loop (in the context of the loop nest)			; Here we have a non-affine loop (in the context of the loop nest)
	; and also a non-affine access (A[k]). While we can always detect the			; and also a non-affine access (A[k]). While we can always detect the
	; innermost loop as a SCoP of depth 1, we have to reject the loop nest if not			; innermost loop as a SCoP of depth 1, we have to reject the loop nest if not
	; both, non-affine loops as well as non-affine accesses are allowed.			; both, non-affine loops as well as non-affine accesses are allowed.
	;			;
	; REJECTNONAFFINELOOPS: Valid Region for Scop: bb15 => bb26			; REJECTNONAFFINELOOPS: Valid Region for Scop: bb15 => bb13
	; REJECTNONAFFINELOOPS-NOT: Valid			; REJECTNONAFFINELOOPS-NOT: Valid
	; ALLOWNONAFFINELOOPS: Valid Region for Scop: bb15 => bb26			; ALLOWNONAFFINELOOPS: Valid Region for Scop: bb15 => bb13
	; ALLOWNONAFFINELOOPS-NOT: Valid			; ALLOWNONAFFINELOOPS-NOT: Valid
	; ALLOWNONAFFINELOOPSANDACCESSES: Valid Region for Scop: bb11 => bb29			; ALLOWNONAFFINELOOPSANDACCESSES: Valid Region for Scop: bb11 => bb29
	;			;
	; void f(int *A) {			; void f(int *A) {
	; for (int i = 0; i < 1024; i++)			; for (int i = 0; i < 1024; i++)
	; for (int j = 0; j < 1024; j++)			; for (int j = 0; j < 1024; j++)
	; for (int k = i *j; k < 1024; k++)			; for (int k = i *j; k < 1024; k++)
	; A[k] += A[i] + A[j];			; A[k] += A[i] + A[j];
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

test/ScopDetect/non-affine-loop-condition-dependent-access_3.ll

	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \
	; RUN: -polly-allow-nonaffine-loops=false -polly-detect-unprofitable \			; RUN: -polly-allow-nonaffine-loops=false -polly-detect-unprofitable \
	; RUN: -analyze < %s \| FileCheck %s --check-prefix=REJECTNONAFFINELOOPS			; RUN: -analyze < %s \| FileCheck %s --check-prefix=REJECTNONAFFINELOOPS
	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine-branches \
	; RUN: -polly-allow-nonaffine-loops=true -polly-detect-unprofitable \			; RUN: -polly-allow-nonaffine-loops=true -polly-detect-unprofitable \
	; RUN: -analyze < %s \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPS			; RUN: -analyze < %s \| FileCheck %s --check-prefix=ALLOWNONAFFINELOOPS
	; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine \			; RUN: opt %loadPolly -basicaa -polly-detect -polly-allow-nonaffine \
	; RUN: -polly-allow-nonaffine-branches -polly-allow-nonaffine-loops=true \			; RUN: -polly-allow-nonaffine-branches -polly-allow-nonaffine-loops=true \
	; RUN: -polly-detect-unprofitable -analyze < %s \| FileCheck %s \			; RUN: -polly-detect-unprofitable -analyze < %s \| FileCheck %s \
	; RUN: --check-prefix=ALLOWNONAFFINELOOPSANDACCESSES			; RUN: --check-prefix=ALLOWNONAFFINELOOPSANDACCESSES
	;			;
	; Here we have a non-affine loop (in the context of the loop nest)			; Here we have a non-affine loop (in the context of the loop nest)
	; and also a non-affine access (A[k]). While we can always detect the			; and also a non-affine access (A[k]). While we can always detect the
	; innermost loop as a SCoP of depth 1, we have to reject the loop nest if not			; innermost loop as a SCoP of depth 1, we have to reject the loop nest if not
	; both, non-affine loops as well as non-affine accesses are allowed.			; both, non-affine loops as well as non-affine accesses are allowed.
	;			;
	; REJECTNONAFFINELOOPS: Valid Region for Scop: bb15 => bb26			; REJECTNONAFFINELOOPS: Valid Region for Scop: bb15 => bb13
	; REJECTNONAFFINELOOPS-NOT: Valid			; REJECTNONAFFINELOOPS-NOT: Valid
	; ALLOWNONAFFINELOOPS: Valid Region for Scop: bb15 => bb26			; ALLOWNONAFFINELOOPS: Valid Region for Scop: bb15 => bb13
	; ALLOWNONAFFINELOOPS-NOT: Valid			; ALLOWNONAFFINELOOPS-NOT: Valid
	; ALLOWNONAFFINELOOPSANDACCESSES: Valid Region for Scop: bb11 => bb29			; ALLOWNONAFFINELOOPSANDACCESSES: Valid Region for Scop: bb11 => bb29
	;			;
	; void f(int *A) {			; void f(int *A) {
	; for (int i = 0; i < 1024; i++)			; for (int i = 0; i < 1024; i++)
	; for (int j = 0; j < 1024; j++)			; for (int j = 0; j < 1024; j++)
	; for (int k = 0; k < i * j; k++)			; for (int k = 0; k < i * j; k++)
	; A[k] += A[i] + A[j];			; A[k] += A[i] + A[j];
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

test/ScopDetect/phi_with_multi_exiting_edges.ll

	; RUN: opt %loadPolly -polly-detect-unprofitable -polly-detect -analyze -S < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-detect-unprofitable -polly-detect -analyze -S < %s \| FileCheck %s
	;			;
	; XFAIL: *
	;
	; Region with an exit node that has a PHI node multiple incoming edges from			; Region with an exit node that has a PHI node multiple incoming edges from
	; inside the region. Motivation for supporting such cases in Polly.			; inside the region. Motivation for supporting such cases in Polly.
	;			;
	; float test(long n, float A[static const restrict n]) {			; float test(long n, float A[static const restrict n]) {
	; float sum = 0;			; float sum = 0;
	; for (long i = 0; i < n; i += 1)			; for (long i = 0; i < n; i += 1)
	; sum += A[i];			; sum += A[i];
	; for (long i = 0; i < n; i += 1)			; for (long i = 0; i < n; i += 1)
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/ScopDetect/phi_with_multi_exiting_edges_2.ll

This file was added.

				; RUN: opt %loadPolly -polly-detect-unprofitable -polly-detect -analyze -S < %s \| FileCheck %s

				define float @foo(float* %A, i64 %param) {
				entry:
				br label %entry.split

				entry.split:
				%branchcond = icmp slt i64 %param, 64
				br i1 %branchcond, label %loopA, label %loopB

				loopA:
				%indvarA = phi i64 [0, %entry.split], [%indvar.nextA, %loopA]
				%indvar.nextA = add i64 %indvarA, 1
				%valA = load float, float* %A
				%sumA = fadd float %valA, %valA
				store float %valA, float* %A
				%cndA = icmp eq i64 %indvar.nextA, 100
				br i1 %cndA, label %next, label %loopA

				loopB:
				%indvarB = phi i64 [0, %entry.split], [%indvar.nextB, %loopB]
				%indvar.nextB = add i64 %indvarB, 1
				%valB = load float, float* %A
				%sumB = fadd float %valB, %valB
				store float %valB, float* %A
				%cndB = icmp eq i64 %indvar.nextB, 100
				br i1 %cndB, label %next, label %loopB

				next:
				%result = phi float [%sumA, %loopA], [%sumB, %loopB]
				ret float %result

				}

				; CHECK: Valid Region for Scop: entry.split => next

test/ScopDetect/simple_non_single_entry.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for.i:
%exitcond = icmp eq i64 %indvar.next, %N		%exitcond = icmp eq i64 %indvar.next, %N
br i1 %exitcond, label %return, label %for.i		br i1 %exitcond, label %return, label %for.i

return:		return:
fence seq_cst		fence seq_cst
ret void		ret void
}		}

; CHECK: Valid Region for Scop: next => for.i.head1		; CHECK: Valid Region for Scop: next => for.i

test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_2.ll

	Show All 10 Lines
	;			;
	; Here we have a non-affine loop (in the context of the loop nest)			; Here we have a non-affine loop (in the context of the loop nest)
	; and also a non-affine access (A[k]). While we can always model the			; and also a non-affine access (A[k]). While we can always model the
	; innermost loop as a SCoP of depth 1, we can overapproximate the			; innermost loop as a SCoP of depth 1, we can overapproximate the
	; innermost loop in the whole loop nest and model A[k] as a non-affine			; innermost loop in the whole loop nest and model A[k] as a non-affine
	; access.			; access.
	;			;
	; INNERMOST: Function: f			; INNERMOST: Function: f
	; INNERMOST: Region: %bb15---%bb26			; INNERMOST: Region: %bb15---%bb13
	; INNERMOST: Max Loop Depth: 1			; INNERMOST: Max Loop Depth: 1
	; INNERMOST: p0: {0,+,{0,+,1}<nuw><nsw><%bb11>}<nuw><nsw><%bb13>			; INNERMOST: p0: {0,+,{0,+,1}<nuw><nsw><%bb11>}<nuw><nsw><%bb13>
	; INNERMOST: p1: {0,+,{0,+,-1}<nw><%bb11>}<nw><%bb13>			; INNERMOST: p1: {0,+,{0,+,-1}<nw><%bb11>}<nw><%bb13>
	; INNERMOST: p2: {0,+,4}<nuw><nsw><%bb11>			; INNERMOST: p2: {0,+,4}<nuw><nsw><%bb11>
	; INNERMOST: p3: {0,+,4}<nuw><nsw><%bb13>			; INNERMOST: p3: {0,+,4}<nuw><nsw><%bb13>
	; INNERMOST: p4: {0,+,{0,+,4}<nuw><nsw><%bb11>}<%bb13>			; INNERMOST: p4: {0,+,{0,+,4}<nuw><nsw><%bb11>}<%bb13>
	; INNERMOST: Alias Groups (0):			; INNERMOST: Alias Groups (0):
	; INNERMOST: n/a			; INNERMOST: n/a
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll

	Show All 9 Lines
	;			;
	; Here we have a non-affine loop (in the context of the loop nest)			; Here we have a non-affine loop (in the context of the loop nest)
	; and also a non-affine access (A[k]). While we can always model the			; and also a non-affine access (A[k]). While we can always model the
	; innermost loop as a SCoP of depth 1, we can overapproximate the			; innermost loop as a SCoP of depth 1, we can overapproximate the
	; innermost loop in the whole loop nest and model A[k] as a non-affine			; innermost loop in the whole loop nest and model A[k] as a non-affine
	; access.			; access.
	;			;
	; INNERMOST: Function: f			; INNERMOST: Function: f
	; INNERMOST: Region: %bb15---%bb26			; INNERMOST: Region: %bb15---%bb13
	; INNERMOST: Max Loop Depth: 1			; INNERMOST: Max Loop Depth: 1
	; INNERMOST: Context:			; INNERMOST: Context:
	; INNERMOST: [p_0, p_1, p_2] -> { : p_0 >= 0 and p_0 <= 2147483647 and p_1 >= 0 and p_1 <= 4096 and p_2 >= 0 and p_2 <= 4096 }			; INNERMOST: [p_0, p_1, p_2] -> { : p_0 >= 0 and p_0 <= 2147483647 and p_1 >= 0 and p_1 <= 4096 and p_2 >= 0 and p_2 <= 4096 }
	; INNERMOST: Assumed Context:			; INNERMOST: Assumed Context:
	; INNERMOST: [p_0, p_1, p_2] -> { : }			; INNERMOST: [p_0, p_1, p_2] -> { : }
	; INNERMOST: p0: {0,+,{0,+,1}<nuw><nsw><%bb11>}<nuw><nsw><%bb13>			; INNERMOST: p0: {0,+,{0,+,1}<nuw><nsw><%bb11>}<nuw><nsw><%bb13>
	; INNERMOST: p1: {0,+,4}<nuw><nsw><%bb11>			; INNERMOST: p1: {0,+,4}<nuw><nsw><%bb11>
	; INNERMOST: p2: {0,+,4}<nuw><nsw><%bb13>			; INNERMOST: p2: {0,+,4}<nuw><nsw><%bb13>
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Allow PHI nodes in exit blocksAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 34106

include/polly/CodeGen/BlockGenerators.h

include/polly/ScopDetection.h

include/polly/Support/ScopHelper.h

include/polly/TempScopInfo.h

lib/Analysis/ScopDetection.cpp

lib/Analysis/TempScopInfo.cpp

lib/CodeGen/BlockGenerators.cpp

lib/CodeGen/CodeGeneration.cpp

lib/Support/SCEVValidator.cpp

lib/Support/ScopHelper.cpp

test/Isl/CodeGen/inner_scev_sdiv_2.ll

test/Isl/CodeGen/loop_with_conditional_entry_edge_split_hard_case.ll

test/Isl/CodeGen/phi_loop_carried_float.ll

test/Isl/CodeGen/phi_loop_carried_float_escape.ll

test/Isl/CodeGen/phi_scalar_simple_1.ll

test/Isl/CodeGen/phi_scalar_simple_2.ll

test/ScopDetect/keep_going_expansion.ll

test/ScopDetect/multidim_indirect_access.ll

test/ScopDetect/non-affine-loop-condition-dependent-access_2.ll

test/ScopDetect/non-affine-loop-condition-dependent-access_3.ll

test/ScopDetect/phi_with_multi_exiting_edges.ll

test/ScopDetect/phi_with_multi_exiting_edges_2.ll

test/ScopDetect/simple_non_single_entry.ll

test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_2.ll

test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll

[Polly] Allow PHI nodes in exit blocks
AbandonedPublic