This is an archive of the discontinued LLVM Phabricator instance.

[FIX] Schedule generation PR25879
ClosedPublic

Authored by jdoerfert on Dec 20 2015, 9:16 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
grosser

Commits

rGb6ee445f69da: Merged: https://llvm.org/svn/llvm-project/polly/trunk@259354
rGc2fd8b411df9: ScopInfo: Correct schedule construction
rPLO259354: ScopInfo: Correct schedule construction
rL259709: Merged: https://llvm.org/svn/llvm-project/polly/trunk@259354
rL259354: ScopInfo: Correct schedule construction

Summary

For schedule generation we assumed that the reverse post order
traversal used by the domain generation is sufficient, however it is
not. Once a loop is discovered we have to completely traverse it
before we can generate the schedule for any block/region that is only
reachable through a loop exiting block.

To this end, we add a "loop stack" that will keep track of loops we
discovered during the traversal but have not yet traversed completely.
We will never visit a basic block (or region) outside the most recent
(thus smallest) loop in the loop stack but instead queue such blocks
(or regions) in a waiting list. If the waiting list is not empty and
(might) contain blocks from the most recent loop in the loop stack the
next block/region to visit is drawn from there, otherwise from the
reverse post order iterator.

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 43332.Dec 20 2015, 9:16 AM

jdoerfert retitled this revision from to [FIX] Schedule generation PR25879.

jdoerfert added reviewers: grosser, Meinersbur.

jdoerfert updated this object.

jdoerfert added a subscriber: Restricted Project.

I did not see any lnt compile time changes for this patch. There were two execution time regression which I believe are either noise or wrong schedules that caused a better result.

Hi Johannes,

thanks for working on this. I have some two in-line comments.

Tobias

lib/Analysis/ScopInfo.cpp
3472	This piece of code looks a little complex. At the very least it requires some in-source comments. From my point of view, this code would possibly be easier to understand if we find a way to somehow (more explicitly) walk down the loop tree and just do the ReversePostOrderTraversal to process the statements within the loop bodies. We had such a tree walk already earlier, but it was removed when you generalized the schedule construction. Would adding back such a tree walk be difficult due to the generality of the CFGs we allow? One of the reasons this might be difficult is if we aim to support irreducible control flow (http://llvm.org/PR25909). Due to LoopInfo working with natural loops which can not be formed from irregular control flow, we do not correctly model irregular control flow before and after this patch and I believe we should also not try to do so. Do you have any intentions to support such control flow with your patch?
test/ScopInfo/wrong_schedule.ll
66	Could you drop the control flow outside the scop? Most of it seems unnecessary to reproduce the test case and makes it harder to see the actual issue when looking at the IR itself.

jdoerfert added inline comments.Dec 21 2015, 3:07 AM

lib/Analysis/ScopInfo.cpp
3472	I could outline it and/or add a comment, is that OK? I am unsure how your want to achieve a reverse post order traversal of all blocks while staying in loops first more explicitly. What we had used the Region::element_XXX() iterator. I was not aware of any guarantees on the traversal order when you use this iterator (it does not even have a comment). If you know of any tree traversal that will traverse blocks in reverse post order traverse loops (even if they are not covered by a sub-region) first once the header was visisted. we probably use it. I am unsure why you talk about irreducible control flow now. The patches comes with an reducible CFG test case for which we generated a plain wrong schedule. The irreducible case will be broken as long as LoopInfo does not offer information about a irreducible loop, there is nothing I can do about that in the ScopInfo anyway (except write an own loop info ...).

Hi Johannes,

thanks for the quick reply

lib/Analysis/ScopInfo.cpp
3472	I do not yet have a good idea regarding the tree traversal, but would like to think about it a little. Hence, I try to understand how general the scops are that you expect to handle. Can you point me to a test case for "traverse loops (even if they are not covered by a sub-region)"? I just talked about irreducible control flow to understand if you plan to support it. It seems we agree that we do not want to handle it (but somehow detect it in the scop detection). From this I conclude that we could theoretically walk over the loop info tree and then for each loop enumerate the basic block it contains (which might allow us to structure this without introducing a worker list). Now, the issue you point out is that we need to iterate over the blocks in a given order. I have no solution yet, but will think about it later today. Documentation will clearly help me to understand this issue. Especially, if it would contain some difficult cases you thought about while implementing.

jdoerfert added inline comments.Dec 21 2015, 4:47 AM

lib/Analysis/ScopInfo.cpp
3472	Take your time. This is a bug that is in Polly but (for some reason) does not cause to much trouble. Generally all CFGs should be fine iff LoopInfo and ScalarEvolution play along. As LoopInfo breaks for irreducible CFGs and ScalarEvolution for some other cases (e.g., piecewiese AddRecs) we should not detetect them as SCoPs. I think the example below should illustrate some interstring problems: pre_header: store br loop_header loop_header: store switch [loop_body1, loop_body2, loop_exit1, loop_exit2] loop_body1: store br loop_header loop_body2: store br loop_header loop_exit1: store br after_loop loop_exit2: store br after_loop after_loop store br ... We have to guarantee the blocks are visited in the following order (/ means "or"): pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 loop_exit1/loop_exit2 after_loop While reverse post order can traverse the CFG that way, it is not unique and might choose: pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 [BAD!] loop_body1/loop_body2 loop_exit1/loop_exit2 after_loop

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

mapToDimension(__isl_take isl_union_set *Domain, int N) {                        
  assert(N > 0);                                                                 
                                                                                 
  if (!Domain)                                                                   
    return nullptr;                                                              
                                                                                 
  assert(!isl_union_set_is_empty(Domain));                                       
                                                                                 
  struct MapToDimensionDataTy Data;                                              
                                                                                 
  auto *Space = isl_union_set_get_space(Domain);                                 
  auto *PwAff = isl_union_pw_multi_aff_empty(Space);                             
                                                                                 
  Data = {N, PwAff};                                                             
                                                                                 
  auto Res = isl_union_set_foreach_set(Domain, &mapToDimension_AddSet, &Data);   
                                                                                 
  assert(Res == isl_stat_ok);                                                    
                                                                                 
  isl_union_set_free(Domain);                                                    
  return isl_multi_union_pw_aff_from_union_pw_multi_aff(Data.Res);               
}

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

It seems LSchedule can become nullptr, as the following patch asserts:

@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {

This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?

Maybe you can consider these questions when documenting the code. Thank you.

In D15679#314985, @grosser wrote:

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

I can do that.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

Wait. This was in there for months and is core part of the schedule generation. Maybe We should talk about two different things here. The existing code and what I propose to change to fix it. Apparently neither is clear.

The second component of the schedule pair counts how many blocks of a loop have been visited. If this number is equal to the number of blocks in the loop we add this loop to the schedule tree. This is similar to what we did before (in the old schedule generation) but extended to the case that the loop is not perfectely covered by regions.

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

Except the first conditional the mapToDimensions was implemented by you.
In the current implementation N should always be positive.
I am not 100% sure why the empty check was in there... maybe it is not needed anymore, maybe it occures somewhere in lnt. If lnt is green i'm fine with you removing that check.

mapToDimension(__isl_take isl_union_set *Domain, int N) {                        
  assert(N > 0);                                                                 
                                                                                 
  if (!Domain)                                                                   
    return nullptr;                                                              
                                                                                 
  assert(!isl_union_set_is_empty(Domain));                                       
                                                                                 
  struct MapToDimensionDataTy Data;                                              
                                                                                 
  auto *Space = isl_union_set_get_space(Domain);                                 
  auto *PwAff = isl_union_pw_multi_aff_empty(Space);                             
                                                                                 
  Data = {N, PwAff};                                                             
                                                                                 
  auto Res = isl_union_set_foreach_set(Domain, &mapToDimension_AddSet, &Data);   
                                                                                 
  assert(Res == isl_stat_ok);                                                    
                                                                                 
  isl_union_set_free(Domain);                                                    
  return isl_multi_union_pw_aff_from_union_pw_multi_aff(Data.Res);               
}

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

It seems LSchedule can become nullptr, as the following patch asserts:
@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {
This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?

Initially (when a loop is discovered) this is a nullptr as there is no schedule for the loop yet. This nullptr is combined "in sequence" with some real schedule and then stored back in the first component of LSchedule.

Maybe you can consider these questions when documenting the code. Thank you.

I will add comments to the whole thing as it is apperently not clear how schedules are actually build.

In D15679#315101, @jdoerfert wrote:

In D15679#314985, @grosser wrote:

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

I can do that.

Thank you.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

Wait. This was in there for months and is core part of the schedule generation. Maybe We should talk about two different things here. The existing code and what I propose to change to fix it. Apparently neither is clear.

Right. Let's start with the old code, then the new code is most likely a lot easier to understand.

The second component of the schedule pair counts how many blocks of a loop have been visited. If this number is equal to the number of blocks in the loop we add this loop to the schedule tree. This is similar to what we did before (in the old schedule generation) but extended to the case that the loop is not perfectely covered by regions.

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

Except the first conditional the mapToDimensions was implemented by you.

In the current implementation N should always be positive.

I am not 100% sure why the empty check was in there... maybe it is not needed anymore, maybe it occures somewhere in lnt. If lnt is green i'm fine with you removing that check.

LNT is green. I committed an improved version of this change in r256208.

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

OK, now I see. Adding this in the documentation would clearly be useful.

It seems LSchedule can become nullptr, as the following patch asserts:
@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {
This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?
Initially (when a loop is discovered) this is a nullptr as there is no schedule for the loop yet. This nullptr is combined "in sequence" with some real schedule and then stored back in the first component of LSchedule.

OK. I re-factored the code slightly in r256209 to not perform any computation on the nullptr. This makes the code more
readable for me.

Best,
Tobias

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

OK, now I see. Adding this in the documentation would clearly be useful.

Hi Johannes,

did you still plan to add documentation about the schedule generation that clarifies the points you explained right before Christmas.

All the best and a happy new year,
Tobias

Added documentation

Hi Johannes,

did you still plan to add documentation about the schedule generation that clarifies the points you explained right before Christmas.

Done in the newest version of the patch.

signature.asc219 BDownload

Hi Johannes,

thanks for the detailed comments. They clearly help to both review the new code and to understand more of the existing code. I have a couple of follow up questions inline.

Best,
Tobias

include/polly/ScopInfo.h
1417	Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself (non-trivial) Regions, which are all connected through control flow possibly containing loops. By walking over the AST in reverse-post-order we ensure which condition precisely? Also, can we state something about the loops processed in a region. Do we assume reducible control flow? Can/do we exploit the additional properties this gives us?
1433	Nice. This comment clearly helps me to better understand the algorithm used. Now I need to relate the actual implementation to the above description. One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. void buildSchedule( Region R, DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) { } to isl_schedule buildSchedule(Region R) { DenseMap<Loop , std::pair<isl_schedule *, unsigned>> &LocalLoopSchedules; ... }
lib/Analysis/ScopInfo.cpp
3437	This comment clarifies a question I was wondering about before. For me this would become even more clear if we split off the non-recursive from the recursive part of the code. To try how if this actually works, I implemented such a patch ( 0001-ScopInfo-Split-recursive-and-non-recursive-part-of-s.patch4 KBDownload ). If you agree this is useful I could commit it ahead of time.
3447	queued
3472	These different cases, iterators and lists look confusing to me. I wonder if this code would look more clear, if we would just add all RNs into a work list, to which we append RNs that need to be delayed? Moving all RNs to a work-list clearly slows down the common case, but I have doubts we see this in any profile.
test/ScopInfo/wrong_schedule.ll
68	This patch still needs some simplifications (either manually or you could use my bugpoint patch).

Hi Johannes,

I played around with this code a little bit more and it seems we can pull out the tree traversal to remove the single-non-affine-region special case and generally simplify the code. This helps me to better understand that we _either_ walk the tree in case we have an affine-region-node or we build schedule information for a ScopStmt.

0001-Define-buildScheduleRec-on-RegionNodes-and-pull-out-.patch6 KBDownload

Rebase after refactoring

In D15679#320281, @grosser wrote:

Hi Johannes,

I played around with this code a little bit more and it seems we can pull out the tree traversal to remove the single-non-affine-region special case and generally simplify the code. This helps me to better understand that we _either_ walk the tree in case we have an affine-region-node or we build schedule information for a ScopStmt.

0001-Define-buildScheduleRec-on-RegionNodes-and-pull-out-.patch6 KBDownload

I commited a refactoring in r256931. It is different to your's but I think it makes your point even more clear. I hope you agree.

jdoerfert marked 2 inline comments as done.Jan 6 2016, 5:30 AM

jdoerfert added inline comments.

include/polly/ScopInfo.h
1417	Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? I do not follow completely. What abstract level except "recursively in reverse post-order wrt. sub-regions" is there? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself non-trivial) Regions, which are all connected through control flow possibly containing loops. Yes. By walking over the AST in reverse-post-order we ensure which condition precisely? That we visit predecessors of a block before the block itself (same as for domain generation). This is necessary since we can then add the schedule for a loop or block in sequence to what we already have and do not need to find a place where to insert it. Also, can we state something about the loops processed in a region. I do not understand. What should I state about the loops? Do we assume reducible control flow? LoopInfo does not work on irreducible control => we will currently crash or do undefined things on irreducible control. If LoopInfo would work (in some sence) this code would probably work too (but I do not give a gurarantee as it heavily depends on the way irreducible control loops are modeled.). Can/do we exploit the additional properties this gives us? None crossed my mind so far but I do not know if there are none.
1433	One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Mh, invariants I can think of: LoopStack, contains loops that are contained in each other, hence LoopStack[i]->getParentLoop() == LoopStack[i+1] (if there are at least i+1 elements). If a loop was never in the LoopStack it is not mapped in the LoopSchedules map (except the loop that surrounds the SCoP, or nullptr if none). If a loop is in the LoopStack it isl_schedule in the LoopSchedules is valid and the number of visited blocks in that loop is less than the number of blocks. If a loop leaves the LoopStack its mapping in the LoopSchedules becomes invalid, thus the isl_schedule ptr is dangling and the second component (number of visited blocks) is equal to the number of blocks in the loop. I can add them if it helps. Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. There is always "one active loop", thus one could rewrite it somehow to only pass the information for that one. However, we still need to pass the current schedule for that active loop as well as the number of visited blocks to the recursive calls and we need to update them during the recursion. One local map + std::pair<isl_schedule, unsigned>& argument should do the trick but I do not see how this is more efficent or easier.
lib/Analysis/ScopInfo.cpp
3437	Done.
3472	But a workl list doesn't magically remove the different cases and iterators. The same logic is needed (if it is even that simple to decide if we need to delay a region node in case we do not yet build the schedule).

Hi Johannes,

for our discussion, here a code snippet that explains my worklist approach. It basically replaces the deque you anyway already added.

lib/Analysis/ScopInfo.cpp

3447

queued

3472

What do you think of the following piece of code? It passes for me all tests and at least avoids the need for LastRNWaiting as well as explicit iterators.

ReversePostOrderTraversal<Region *> RTraversal(R);                             
std::deque<RegionNode *> WorkList(RTraversal.begin(), RTraversal.end());       
                                                                               
while (!WorkList.empty()) {                                                    
  RegionNode *RN = WorkList.front();                                           
  WorkList.pop_front();                                                        
                                                                               
  Loop *L = getRegionNodeLoop(RN, LI);                                         
  if (!getRegion().contains(L))                                                
    L = OuterScopLoop;                                                         
                                                                               
  Loop *LastLoop = LoopStack.back();                                           
  if (LastLoop != L) {                                                         
    if (!LastLoop->contains(L)) {                                              
      WorkList.push_back(RN);                                                  
      continue;                                                                
    }                                                                          
    LoopStack.push_back(L);                                                    
  }

Hi Johannes,

I just wanted to put down my latest findings here:

The simplified code I proposed is incorrect, as the elements we push back are pushed back to the end of the worklist which will result in an invalid work list when leaving the loop.

I believe your patch makes the LoopSchedules map unnecessary. As we now always process a stack of loops. Consequently, we could use a single stack of type:

struct BuildScheduleInfo {
  Loop *L;
  isl_schedule *Schedule;
  unsigned *NumBBsProcessed;
}
SmallVectorImpl<struct BuildScheduleInfo*> &LoopStack

instead of the LoopSchedules map. I think using such a stack is preferable, as as it indicates nicely that there is only one active loop at a time.

After having thought about this code now for a while I think I understood all the corner cases. Thanks for your patience!

Closed by commit rL259354: ScopInfo: Correct schedule construction (authored by grosser). · Explain WhyFeb 1 2016, 3:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

polly/

ScopInfo.h

3 lines

lib/

Analysis/

ScopInfo.cpp

49 lines

test/

ScopInfo/

wrong_schedule.ll

67 lines

Diff 43332

include/polly/ScopInfo.h

Show First 20 Lines • Show All 1,408 Lines • ▼ Show 20 Lines	private:
/// may not have been recognized as separate dimensions. This function goes		/// may not have been recognized as separate dimensions. This function goes
/// again over all memory accesses and updates their dimensionality to match		/// again over all memory accesses and updates their dimensionality to match
/// the dimensionality of the underlying ScopArrayInfo object.		/// the dimensionality of the underlying ScopArrayInfo object.
void updateAccessDimensionality();		void updateAccessDimensionality();

/// @brief Build Schedule and ScopStmts.		/// @brief Build Schedule and ScopStmts.
///		///
/// @param R The current region traversed.		/// @param R The current region traversed.
		/// @param LoopStack Stack to remember currently traversed loops.
		grosserUnsubmitted Not Done Reply Inline Actions Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself (non-trivial) Regions, which are all connected through control flow possibly containing loops. By walking over the AST in reverse-post-order we ensure which condition precisely? Also, can we state something about the loops processed in a region. Do we assume reducible control flow? Can/do we exploit the additional properties this gives us? grosser: Could you add one more sentence to clarify for the reader how the region tree traversal works…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? I do not follow completely. What abstract level except "recursively in reverse post-order wrt. sub-regions" is there? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself non-trivial) Regions, which are all connected through control flow possibly containing loops. Yes. By walking over the AST in reverse-post-order we ensure which condition precisely? That we visit predecessors of a block before the block itself (same as for domain generation). This is necessary since we can then add the schedule for a loop or block in sequence to what we already have and do not need to find a place where to insert it. Also, can we state something about the loops processed in a region. I do not understand. What should I state about the loops? Do we assume reducible control flow? LoopInfo does not work on irreducible control => we will currently crash or do undefined things on irreducible control. If LoopInfo would work (in some sence) this code would probably work too (but I do not give a gurarantee as it heavily depends on the way irreducible control loops are modeled.). Can/do we exploit the additional properties this gives us? None crossed my mind so far but I do not know if there are none. jdoerfert: > Could you add one more sentence to clarify for the reader how the region tree traversal works…
/// @param LoopSchedules Map from loops to their schedule and progress.		/// @param LoopSchedules Map from loops to their schedule and progress.
void buildSchedule(		void buildSchedule(
Region *R,		Region R, SmallVectorImpl<Loop > &LoopStack,
DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules);		DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules);

/// @brief Collect all memory access relations of a given type.		/// @brief Collect all memory access relations of a given type.
///		///
/// @param Predicate A predicate function that returns true if an access is		/// @param Predicate A predicate function that returns true if an access is
/// of a given type.		/// of a given type.
///		///
/// @returns The set of memory accesses in the scop that match the predicate.		/// @returns The set of memory accesses in the scop that match the predicate.
__isl_give isl_union_map *		__isl_give isl_union_map *
getAccessesOfType(std::function<bool(MemoryAccess &)> Predicate);		getAccessesOfType(std::function<bool(MemoryAccess &)> Predicate);

/// @name Helper function for printing the Scop.		/// @name Helper function for printing the Scop.
///		///
		grosserUnsubmitted Not Done Reply Inline Actions Nice. This comment clearly helps me to better understand the algorithm used. Now I need to relate the actual implementation to the above description. One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. void buildSchedule( Region R, DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) { } to isl_schedule buildSchedule(Region R) { DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LocalLoopSchedules; ... } grosser:* Nice. This comment clearly helps me to better understand the algorithm used. Now I need to…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Mh, invariants I can think of: LoopStack, contains loops that are contained in each other, hence LoopStack[i]->getParentLoop() == LoopStack[i+1] (if there are at least i+1 elements). If a loop was never in the LoopStack it is not mapped in the LoopSchedules map (except the loop that surrounds the SCoP, or nullptr if none). If a loop is in the LoopStack it isl_schedule in the LoopSchedules is valid and the number of visited blocks in that loop is less than the number of blocks. If a loop leaves the LoopStack its mapping in the LoopSchedules becomes invalid, thus the isl_schedule ptr is dangling and the second component (number of visited blocks) is equal to the number of blocks in the loop. I can add them if it helps. Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. There is always "one active loop", thus one could rewrite it somehow to only pass the information for that one. However, we still need to pass the current schedule for that active loop as well as the number of visited blocks to the recursive calls and we need to update them during the recursion. One local map + std::pair<isl_schedule, unsigned>& argument should do the trick but I do not see how this is more efficent or easier. jdoerfert: > One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack).
///{		///{
void printContext(raw_ostream &OS) const;		void printContext(raw_ostream &OS) const;
void printArrayInfo(raw_ostream &OS) const;		void printArrayInfo(raw_ostream &OS) const;
void printStatements(raw_ostream &OS) const;		void printStatements(raw_ostream &OS) const;
void printAliasAssumptions(raw_ostream &OS) const;		void printAliasAssumptions(raw_ostream &OS) const;
///}		///}

friend class ScopInfo;		friend class ScopInfo;
▲ Show 20 Lines • Show All 567 Lines • Show Last 20 Lines

lib/Analysis/ScopInfo.cpp

Show First 20 Lines • Show All 2,669 Lines • ▼ Show 20 Lines	if (Stmts.empty())
return;		return;

// The ScopStmts now have enough information to initialize themselves.		// The ScopStmts now have enough information to initialize themselves.
for (ScopStmt &Stmt : Stmts)		for (ScopStmt &Stmt : Stmts)
Stmt.init();		Stmt.init();

DenseMap<Loop , std::pair<isl_schedule , unsigned>> LoopSchedules;		DenseMap<Loop , std::pair<isl_schedule , unsigned>> LoopSchedules;
Loop *L = getLoopSurroundingRegion(R, LI);		Loop *L = getLoopSurroundingRegion(R, LI);
		SmallVector<Loop *, 8> LoopStack;
		LoopStack.push_back(L);
LoopSchedules[L];		LoopSchedules[L];
buildSchedule(&R, LoopSchedules);		buildSchedule(&R, LoopStack, LoopSchedules);
Schedule = LoopSchedules[L].first;		Schedule = LoopSchedules[L].first;

if (isl_set_is_empty(AssumedContext))		if (isl_set_is_empty(AssumedContext))
return;		return;

updateAccessDimensionality();		updateAccessDimensionality();
realignParams();		realignParams();
addParameterBounds();		addParameterBounds();
▲ Show 20 Lines • Show All 727 Lines • ▼ Show 20 Lines	if (BB) {
Stmts.emplace_back(this, R);		Stmts.emplace_back(this, R);
auto Stmt = &Stmts.back();		auto Stmt = &Stmts.back();
for (BasicBlock *BB : R->blocks())		for (BasicBlock *BB : R->blocks())
StmtMap[BB] = Stmt;		StmtMap[BB] = Stmt;
}		}
}		}

void Scop::buildSchedule(		void Scop::buildSchedule(
Region *R,		Region R, SmallVectorImpl<Loop > &LoopStack,
DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) {		DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) {

if (SD.isNonAffineSubRegion(R, &getRegion())) {		if (SD.isNonAffineSubRegion(R, &getRegion())) {
Loop L = getLoopSurroundingRegion(R, LI);		Loop L = getLoopSurroundingRegion(R, LI);
auto &LSchedulePair = LoopSchedules[L];		auto &LSchedulePair = LoopSchedules[L];
ScopStmt *Stmt = getStmtForBasicBlock(R->getEntry());		ScopStmt *Stmt = getStmtForBasicBlock(R->getEntry());
isl_set *Domain = Stmt->getDomain();		isl_set *Domain = Stmt->getDomain();
auto *UDomain = isl_union_set_from_set(Domain);		auto *UDomain = isl_union_set_from_set(Domain);
auto *StmtSchedule = isl_schedule_from_domain(UDomain);		auto *StmtSchedule = isl_schedule_from_domain(UDomain);
LSchedulePair.first = StmtSchedule;		LSchedulePair.first = StmtSchedule;
return;		return;
}		}
		grosserUnsubmitted Not Done Reply Inline Actions This comment clarifies a question I was wondering about before. For me this would become even more clear if we split off the non-recursive from the recursive part of the code. To try how if this actually works, I implemented such a patch ( 0001-ScopInfo-Split-recursive-and-non-recursive-part-of-s.patch4 KBDownload ). If you agree this is useful I could commit it ahead of time. grosser: This comment clarifies a question I was wondering about before. For me this would become even…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Done. jdoerfert: Done.

		Loop *OuterScopLoop = getLoopSurroundingRegion(getRegion(), LI);

		std::deque<RegionNode *> WaitingRegionNodes;
		bool LastRNWaiting = false;

ReversePostOrderTraversal<Region *> RTraversal(R);		ReversePostOrderTraversal<Region *> RTraversal(R);
for (auto *RN : RTraversal) {		auto RNIt = RTraversal.begin();
		auto RNEnd = RTraversal.end();

		grosserUnsubmitted Done Reply Inline Actions queued grosser: queued
		grosserUnsubmitted Not Done Reply Inline Actions queued grosser: queued
		while (RNIt != RNEnd \|\| WaitingRegionNodes.size() != 0) {
		RegionNode *RN;

		if ((LastRNWaiting && RNIt != RNEnd) \|\| WaitingRegionNodes.size() == 0) {
		assert(RNIt != RNEnd);
		RN = *RNIt++;
		LastRNWaiting = false;
		} else {
		RN = WaitingRegionNodes.front();
		WaitingRegionNodes.pop_front();
		}

		Loop *L = getRegionNodeLoop(RN, LI);
		if (!getRegion().contains(L))
		L = OuterScopLoop;

		Loop *LastLoop = LoopStack.back();
		if (LastLoop != L) {
		if (!LastLoop->contains(L)) {
		WaitingRegionNodes.push_back(RN);
		LastRNWaiting = true;
		continue;
		}
		LoopStack.push_back(L);
		}
		grosserUnsubmitted Not Done Reply Inline Actions This piece of code looks a little complex. At the very least it requires some in-source comments. From my point of view, this code would possibly be easier to understand if we find a way to somehow (more explicitly) walk down the loop tree and just do the ReversePostOrderTraversal to process the statements within the loop bodies. We had such a tree walk already earlier, but it was removed when you generalized the schedule construction. Would adding back such a tree walk be difficult due to the generality of the CFGs we allow? One of the reasons this might be difficult is if we aim to support irreducible control flow (http://llvm.org/PR25909). Due to LoopInfo working with natural loops which can not be formed from irregular control flow, we do not correctly model irregular control flow before and after this patch and I believe we should also not try to do so. Do you have any intentions to support such control flow with your patch? grosser: This piece of code looks a little complex. At the very least it requires some in-source…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I could outline it and/or add a comment, is that OK? I am unsure how your want to achieve a reverse post order traversal of all blocks while staying in loops first more explicitly. What we had used the Region::element_XXX() iterator. I was not aware of any guarantees on the traversal order when you use this iterator (it does not even have a comment). If you know of any tree traversal that will traverse blocks in reverse post order traverse loops (even if they are not covered by a sub-region) first once the header was visisted. we probably use it. I am unsure why you talk about irreducible control flow now. The patches comes with an reducible CFG test case for which we generated a plain wrong schedule. The irreducible case will be broken as long as LoopInfo does not offer information about a irreducible loop, there is nothing I can do about that in the ScopInfo anyway (except write an own loop info ...). jdoerfert: I could outline it and/or add a comment, is that OK? I am unsure how your want to achieve a…
		grosserUnsubmitted Not Done Reply Inline Actions I do not yet have a good idea regarding the tree traversal, but would like to think about it a little. Hence, I try to understand how general the scops are that you expect to handle. Can you point me to a test case for "traverse loops (even if they are not covered by a sub-region)"? I just talked about irreducible control flow to understand if you plan to support it. It seems we agree that we do not want to handle it (but somehow detect it in the scop detection). From this I conclude that we could theoretically walk over the loop info tree and then for each loop enumerate the basic block it contains (which might allow us to structure this without introducing a worker list). Now, the issue you point out is that we need to iterate over the blocks in a given order. I have no solution yet, but will think about it later today. Documentation will clearly help me to understand this issue. Especially, if it would contain some difficult cases you thought about while implementing. grosser: I do not yet have a good idea regarding the tree traversal, but would like to think about it a…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Take your time. This is a bug that is in Polly but (for some reason) does not cause to much trouble. Generally all CFGs should be fine iff LoopInfo and ScalarEvolution play along. As LoopInfo breaks for irreducible CFGs and ScalarEvolution for some other cases (e.g., piecewiese AddRecs) we should not detetect them as SCoPs. I think the example below should illustrate some interstring problems: pre_header: store br loop_header loop_header: store switch [loop_body1, loop_body2, loop_exit1, loop_exit2] loop_body1: store br loop_header loop_body2: store br loop_header loop_exit1: store br after_loop loop_exit2: store br after_loop after_loop store br ... We have to guarantee the blocks are visited in the following order (/ means "or"): pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 loop_exit1/loop_exit2 after_loop While reverse post order can traverse the CFG that way, it is not unique and might choose: pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 [BAD!] loop_body1/loop_body2 loop_exit1/loop_exit2 after_loop jdoerfert: Take your time. This is a bug that is in Polly but (for some reason) does not cause to much…
		grosserUnsubmitted Not Done Reply Inline Actions These different cases, iterators and lists look confusing to me. I wonder if this code would look more clear, if we would just add all RNs into a work list, to which we append RNs that need to be delayed? Moving all RNs to a work-list clearly slows down the common case, but I have doubts we see this in any profile. grosser: These different cases, iterators and lists look confusing to me. I wonder if this code would…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions But a workl list doesn't magically remove the different cases and iterators. The same logic is needed (if it is even that simple to decide if we need to delay a region node in case we do not yet build the schedule). jdoerfert: But a workl list doesn't magically remove the different cases and iterators. The same logic is…
		grosserUnsubmitted Not Done Reply Inline Actions What do you think of the following piece of code? It passes for me all tests and at least avoids the need for LastRNWaiting as well as explicit iterators. ReversePostOrderTraversal<Region > RTraversal(R); std::deque<RegionNode > WorkList(RTraversal.begin(), RTraversal.end()); while (!WorkList.empty()) { RegionNode RN = WorkList.front(); WorkList.pop_front(); Loop L = getRegionNodeLoop(RN, LI); if (!getRegion().contains(L)) L = OuterScopLoop; Loop LastLoop = LoopStack.back(); if (LastLoop != L) { if (!LastLoop->contains(L)) { WorkList.push_back(RN); continue; } LoopStack.push_back(L); } grosser:* What do you think of the following piece of code? It passes for me all tests and at least…

if (RN->isSubRegion()) {		if (RN->isSubRegion()) {
Region *SubRegion = RN->getNodeAs<Region>();		Region *SubRegion = RN->getNodeAs<Region>();
if (!SD.isNonAffineSubRegion(SubRegion, &getRegion())) {		if (!SD.isNonAffineSubRegion(SubRegion, &getRegion())) {
buildSchedule(SubRegion, LoopSchedules);		buildSchedule(SubRegion, LoopStack, LoopSchedules);
continue;		continue;
}		}
}		}

Loop *L = getRegionNodeLoop(RN, LI);
if (!getRegion().contains(L))
L = getLoopSurroundingRegion(getRegion(), LI);

int LD = getRelativeLoopDepth(L);		int LD = getRelativeLoopDepth(L);
auto &LSchedulePair = LoopSchedules[L];		auto &LSchedulePair = LoopSchedules[L];
LSchedulePair.second += getNumBlocksInRegionNode(RN);		LSchedulePair.second += getNumBlocksInRegionNode(RN);

BasicBlock *BB = getRegionNodeBasicBlock(RN);		BasicBlock *BB = getRegionNodeBasicBlock(RN);
ScopStmt *Stmt = getStmtForBasicBlock(BB);		ScopStmt *Stmt = getStmtForBasicBlock(BB);
if (Stmt) {		if (Stmt) {
auto *UDomain = isl_union_set_from_set(Stmt->getDomain());		auto *UDomain = isl_union_set_from_set(Stmt->getDomain());
auto *StmtSchedule = isl_schedule_from_domain(UDomain);		auto *StmtSchedule = isl_schedule_from_domain(UDomain);
LSchedulePair.first =		LSchedulePair.first =
combineInSequence(LSchedulePair.first, StmtSchedule);		combineInSequence(LSchedulePair.first, StmtSchedule);
}		}

isl_schedule *LSchedule = LSchedulePair.first;		isl_schedule *LSchedule = LSchedulePair.first;
unsigned NumVisited = LSchedulePair.second;		unsigned NumVisited = LSchedulePair.second;
while (L && NumVisited == L->getNumBlocks()) {		while (L && NumVisited == L->getNumBlocks()) {
auto *LDomain = isl_schedule_get_domain(LSchedule);		auto *LDomain = isl_schedule_get_domain(LSchedule);
if (auto *MUPA = mapToDimension(LDomain, LD + 1))		if (auto *MUPA = mapToDimension(LDomain, LD + 1))
LSchedule = isl_schedule_insert_partial_schedule(LSchedule, MUPA);		LSchedule = isl_schedule_insert_partial_schedule(LSchedule, MUPA);

		assert(LoopStack.back() == L);
		LoopStack.pop_back();

auto *PL = L->getParentLoop();		auto *PL = L->getParentLoop();

// Either we have a proper loop and we also build a schedule for the		// Either we have a proper loop and we also build a schedule for the
// parent loop or we have a infinite loop that does not have a proper		// parent loop or we have a infinite loop that does not have a proper
// parent loop. In the former case this conditional will be skipped, in		// parent loop. In the former case this conditional will be skipped, in
// the latter case however we will break here as we do not build a domain		// the latter case however we will break here as we do not build a domain
// nor a schedule for a infinite loop.		// nor a schedule for a infinite loop.
assert(LoopSchedules.count(PL) \|\| LSchedule == nullptr);		assert(LoopSchedules.count(PL) \|\| LSchedule == nullptr);
▲ Show 20 Lines • Show All 597 Lines • Show Last 20 Lines

test/ScopInfo/wrong_schedule.ll

This file was added.

				; RUN: opt %loadPolly -analyze -polly-scops < %s \| FileCheck %s
				;
				; CHECK: { Stmt_land_rhs[i0] -> [0, 0] };
				; CHECK: { Stmt_land_rhs_for_cond_53_loopexit_crit_edge[] -> [1, 0] };

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; Function Attrs: nounwind uwtable
				define void @vorbis_staticbook_unpack() #0 {
				entry:
				br label %if.end.6

				if.end.6: ; preds = %entry
				switch i32 undef, label %return [
				i32 0, label %sw.epilog.loopexit68
				i32 1, label %land.rhs.lr.ph
				]

				for.cond.67.for.cond.53.loopexit_crit_edge: ; preds = %for.body.73
				br label %sw.epilog

				land.rhs.for.cond.53.loopexit_crit_edge: ; preds = %land.rhs
				%i.380.lcssa = phi i64 [ %i.380, %land.rhs ]
				br label %sw.epilog

				land.rhs.lr.ph: ; preds = %if.end.6
				br label %land.rhs

				land.rhs: ; preds = %for.body.73, %land.rhs.lr.ph
				%i.380 = phi i64 [ 0, %land.rhs.lr.ph ], [ 0, %for.body.73 ]
				br i1 false, label %for.body.73, label %land.rhs.for.cond.53.loopexit_crit_edge

				for.body.73: ; preds = %land.rhs
				br i1 false, label %land.rhs, label %for.cond.67.for.cond.53.loopexit_crit_edge

				sw.epilog.loopexit68: ; preds = %if.end.6
				unreachable

				sw.epilog: ; preds = %land.rhs.for.cond.53.loopexit_crit_edge, %for.cond.67.for.cond.53.loopexit_crit_edge
				%i.3.lcssa = phi i64 [ undef, %for.cond.67.for.cond.53.loopexit_crit_edge ], [ %i.380.lcssa, %land.rhs.for.cond.53.loopexit_crit_edge ]
				switch i32 undef, label %_eofout [
				i32 0, label %return
				i32 1, label %sw.bb.85
				i32 2, label %sw.bb.85
				]

				sw.bb.85: ; preds = %sw.epilog, %sw.epilog
				switch i32 undef, label %for.body.110.lr.ph [
				i32 1, label %sw.bb.94
				i32 2, label %sw.bb.97
				]

				sw.bb.94: ; preds = %sw.bb.85
				unreachable

				sw.bb.97: ; preds = %sw.bb.85
				unreachable

				for.body.110.lr.ph: ; preds = %sw.bb.85
				unreachable

				_eofout: ; preds = %sw.epilog
				br label %return

				return: ; preds = %_eofout, %sw.epilog, %if.end.6
				ret void
				grosserUnsubmitted Not Done Reply Inline Actions Could you drop the control flow outside the scop? Most of it seems unnecessary to reproduce the test case and makes it harder to see the actual issue when looking at the IR itself. grosser: Could you drop the control flow outside the scop? Most of it seems unnecessary to reproduce the…
				}
				grosserUnsubmitted Not Done Reply Inline Actions This patch still needs some simplifications (either manually or you could use my bugpoint patch). grosser: This patch still needs some simplifications (either manually or you could use my bugpoint…

This is an archive of the discontinued LLVM Phabricator instance.

[FIX] Schedule generation PR25879ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 43332

include/polly/ScopInfo.h

lib/Analysis/ScopInfo.cpp

test/ScopInfo/wrong_schedule.ll

[FIX] Schedule generation PR25879
ClosedPublic