This is an archive of the discontinued LLVM Phabricator instance.

[FIX] Schedule generation PR25879
ClosedPublic

Authored by jdoerfert on Dec 20 2015, 9:16 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
grosser

Commits

rGb6ee445f69da: Merged: https://llvm.org/svn/llvm-project/polly/trunk@259354
rGc2fd8b411df9: ScopInfo: Correct schedule construction
rPLO259354: ScopInfo: Correct schedule construction
rL259709: Merged: https://llvm.org/svn/llvm-project/polly/trunk@259354
rL259354: ScopInfo: Correct schedule construction

Summary

For schedule generation we assumed that the reverse post order
traversal used by the domain generation is sufficient, however it is
not. Once a loop is discovered we have to completely traverse it
before we can generate the schedule for any block/region that is only
reachable through a loop exiting block.

To this end, we add a "loop stack" that will keep track of loops we
discovered during the traversal but have not yet traversed completely.
We will never visit a basic block (or region) outside the most recent
(thus smallest) loop in the loop stack but instead queue such blocks
(or regions) in a waiting list. If the waiting list is not empty and
(might) contain blocks from the most recent loop in the loop stack the
next block/region to visit is drawn from there, otherwise from the
reverse post order iterator.

Diff Detail

Repository: rL LLVM

Event Timeline

jdoerfert updated this revision to Diff 43332.Dec 20 2015, 9:16 AM

jdoerfert retitled this revision from to [FIX] Schedule generation PR25879.

jdoerfert added reviewers: grosser, Meinersbur.

jdoerfert updated this object.

jdoerfert added a subscriber: Restricted Project.

I did not see any lnt compile time changes for this patch. There were two execution time regression which I believe are either noise or wrong schedules that caused a better result.

Hi Johannes,

thanks for working on this. I have some two in-line comments.

Tobias

lib/Analysis/ScopInfo.cpp
3472 ↗	(On Diff #43332)	This piece of code looks a little complex. At the very least it requires some in-source comments. From my point of view, this code would possibly be easier to understand if we find a way to somehow (more explicitly) walk down the loop tree and just do the ReversePostOrderTraversal to process the statements within the loop bodies. We had such a tree walk already earlier, but it was removed when you generalized the schedule construction. Would adding back such a tree walk be difficult due to the generality of the CFGs we allow? One of the reasons this might be difficult is if we aim to support irreducible control flow (http://llvm.org/PR25909). Due to LoopInfo working with natural loops which can not be formed from irregular control flow, we do not correctly model irregular control flow before and after this patch and I believe we should also not try to do so. Do you have any intentions to support such control flow with your patch?
test/ScopInfo/wrong_schedule.ll
66 ↗	(On Diff #43332)	Could you drop the control flow outside the scop? Most of it seems unnecessary to reproduce the test case and makes it harder to see the actual issue when looking at the IR itself.

jdoerfert added inline comments.Dec 21 2015, 3:07 AM

lib/Analysis/ScopInfo.cpp
3472 ↗	(On Diff #43332)	I could outline it and/or add a comment, is that OK? I am unsure how your want to achieve a reverse post order traversal of all blocks while staying in loops first more explicitly. What we had used the Region::element_XXX() iterator. I was not aware of any guarantees on the traversal order when you use this iterator (it does not even have a comment). If you know of any tree traversal that will traverse blocks in reverse post order traverse loops (even if they are not covered by a sub-region) first once the header was visisted. we probably use it. I am unsure why you talk about irreducible control flow now. The patches comes with an reducible CFG test case for which we generated a plain wrong schedule. The irreducible case will be broken as long as LoopInfo does not offer information about a irreducible loop, there is nothing I can do about that in the ScopInfo anyway (except write an own loop info ...).

Hi Johannes,

thanks for the quick reply

lib/Analysis/ScopInfo.cpp
3472 ↗	(On Diff #43332)	I do not yet have a good idea regarding the tree traversal, but would like to think about it a little. Hence, I try to understand how general the scops are that you expect to handle. Can you point me to a test case for "traverse loops (even if they are not covered by a sub-region)"? I just talked about irreducible control flow to understand if you plan to support it. It seems we agree that we do not want to handle it (but somehow detect it in the scop detection). From this I conclude that we could theoretically walk over the loop info tree and then for each loop enumerate the basic block it contains (which might allow us to structure this without introducing a worker list). Now, the issue you point out is that we need to iterate over the blocks in a given order. I have no solution yet, but will think about it later today. Documentation will clearly help me to understand this issue. Especially, if it would contain some difficult cases you thought about while implementing.

jdoerfert added inline comments.Dec 21 2015, 4:47 AM

lib/Analysis/ScopInfo.cpp
3472 ↗	(On Diff #43332)	Take your time. This is a bug that is in Polly but (for some reason) does not cause to much trouble. Generally all CFGs should be fine iff LoopInfo and ScalarEvolution play along. As LoopInfo breaks for irreducible CFGs and ScalarEvolution for some other cases (e.g., piecewiese AddRecs) we should not detetect them as SCoPs. I think the example below should illustrate some interstring problems: pre_header: store br loop_header loop_header: store switch [loop_body1, loop_body2, loop_exit1, loop_exit2] loop_body1: store br loop_header loop_body2: store br loop_header loop_exit1: store br after_loop loop_exit2: store br after_loop after_loop store br ... We have to guarantee the blocks are visited in the following order (/ means "or"): pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 loop_exit1/loop_exit2 after_loop While reverse post order can traverse the CFG that way, it is not unique and might choose: pre_header loop_header loop_body1/loop_body2 [not loop_exit1/loop_exit2] loop_exit1/loop_exit2 [BAD!] loop_body1/loop_body2 loop_exit1/loop_exit2 after_loop

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

mapToDimension(__isl_take isl_union_set *Domain, int N) {                        
  assert(N > 0);                                                                 
                                                                                 
  if (!Domain)                                                                   
    return nullptr;                                                              
                                                                                 
  assert(!isl_union_set_is_empty(Domain));                                       
                                                                                 
  struct MapToDimensionDataTy Data;                                              
                                                                                 
  auto *Space = isl_union_set_get_space(Domain);                                 
  auto *PwAff = isl_union_pw_multi_aff_empty(Space);                             
                                                                                 
  Data = {N, PwAff};                                                             
                                                                                 
  auto Res = isl_union_set_foreach_set(Domain, &mapToDimension_AddSet, &Data);   
                                                                                 
  assert(Res == isl_stat_ok);                                                    
                                                                                 
  isl_union_set_free(Domain);                                                    
  return isl_multi_union_pw_aff_from_union_pw_multi_aff(Data.Res);               
}

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

It seems LSchedule can become nullptr, as the following patch asserts:

@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {

This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?

Maybe you can consider these questions when documenting the code. Thank you.

In D15679#314985, @grosser wrote:

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

I can do that.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

Wait. This was in there for months and is core part of the schedule generation. Maybe We should talk about two different things here. The existing code and what I propose to change to fix it. Apparently neither is clear.

The second component of the schedule pair counts how many blocks of a loop have been visited. If this number is equal to the number of blocks in the loop we add this loop to the schedule tree. This is similar to what we did before (in the old schedule generation) but extended to the case that the loop is not perfectely covered by regions.

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

Except the first conditional the mapToDimensions was implemented by you.
In the current implementation N should always be positive.
I am not 100% sure why the empty check was in there... maybe it is not needed anymore, maybe it occures somewhere in lnt. If lnt is green i'm fine with you removing that check.

mapToDimension(__isl_take isl_union_set *Domain, int N) {                        
  assert(N > 0);                                                                 
                                                                                 
  if (!Domain)                                                                   
    return nullptr;                                                              
                                                                                 
  assert(!isl_union_set_is_empty(Domain));                                       
                                                                                 
  struct MapToDimensionDataTy Data;                                              
                                                                                 
  auto *Space = isl_union_set_get_space(Domain);                                 
  auto *PwAff = isl_union_pw_multi_aff_empty(Space);                             
                                                                                 
  Data = {N, PwAff};                                                             
                                                                                 
  auto Res = isl_union_set_foreach_set(Domain, &mapToDimension_AddSet, &Data);   
                                                                                 
  assert(Res == isl_stat_ok);                                                    
                                                                                 
  isl_union_set_free(Domain);                                                    
  return isl_multi_union_pw_aff_from_union_pw_multi_aff(Data.Res);               
}

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

It seems LSchedule can become nullptr, as the following patch asserts:
@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {
This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?

Initially (when a loop is discovered) this is a nullptr as there is no schedule for the loop yet. This nullptr is combined "in sequence" with some real schedule and then stored back in the first component of LSchedule.

Maybe you can consider these questions when documenting the code. Thank you.

I will add comments to the whole thing as it is apperently not clear how schedules are actually build.

In D15679#315101, @jdoerfert wrote:

In D15679#314985, @grosser wrote:

For reference test/ScopInfo/multiple_exiting_blocks_two_loop.ll is a test case that contains a scop with a loop that does not have a corresponding region.

Johannes, I looked through the code and I have some difficulties to get a picture of what the exiting code does both in detail but also on a high-level. I started to think through it, but if you could add some documentation to the current code that would clearly help.

I can do that.

Thank you.

Some questions I have:

What is LSchedulePair.second/NumVisited used for? It seems when the condition NumVisited == L->getNumBlocks() holds you assume all basic blocks of the loop have been processed. Then, you can start to walk up the loop tree and add the other loop dimensions?

Wait. This was in there for months and is core part of the schedule generation. Maybe We should talk about two different things here. The existing code and what I propose to change to fix it. Apparently neither is clear.

Right. Let's start with the old code, then the new code is most likely a lot easier to understand.

The second component of the schedule pair counts how many blocks of a loop have been visited. If this number is equal to the number of blocks in the loop we add this loop to the schedule tree. This is similar to what we did before (in the old schedule generation) but extended to the case that the loop is not perfectely covered by regions.

I tried to understand the implementation of mapToDomain, which is simple in general but seems to do some additional stuff for certain corner cases that do not occur in our tests. Is the following a equivalent implementation? Under which conditions can Domain become 'NULL' (I checked and it indeed becomes NULL)? I removed the check for isl_union_set_is_empty, as this never happens in any test case and instead assert. Is this correct?

Except the first conditional the mapToDimensions was implemented by you.

In the current implementation N should always be positive.

I am not 100% sure why the empty check was in there... maybe it is not needed anymore, maybe it occures somewhere in lnt. If lnt is green i'm fine with you removing that check.

LNT is green. I committed an improved version of this change in r256208.

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

OK, now I see. Adding this in the documentation would clearly be useful.

It seems LSchedule can become nullptr, as the following patch asserts:
@@ -3464,6 +3468,7 @@ void Scop::buildSchedule(
 
     isl_schedule *LSchedule = LSchedulePair.first;
     unsigned NumVisited = LSchedulePair.second;
+    assert(LSchedule);
     while (L && NumVisited == L->getNumBlocks()) {
This is a little surprising. What does it mean if LSchedule is nullptr? When does this happen?
Initially (when a loop is discovered) this is a nullptr as there is no schedule for the loop yet. This nullptr is combined "in sequence" with some real schedule and then stored back in the first component of LSchedule.

OK. I re-factored the code slightly in r256209 to not perform any computation on the nullptr. This makes the code more
readable for me.

Best,
Tobias

How does the content of LoopSchedules change throught the walk? Can you give some invariant. E.g. when buildSchedule returns, what does LoopSchedules contain?

Initially, it is empty. Nothing is in there and LSchedulePair for every loop will be <nullptr, 0>. Then we will build up isl_schedules and add the number of blocks visited to change both components until all blocks of a loop have been visited and its schedule is integrated into the parent loops schedule. During the recursive construction LoopSchedules is constructed not completely but almost "bottom up". While the schedules for the inner-most loops are constructed the schedule of outer loops can be constructed as well but inner loops should be traversed completely before outer ones and their schedule will then be integrated into the outer loop.

OK, now I see. Adding this in the documentation would clearly be useful.

Hi Johannes,

did you still plan to add documentation about the schedule generation that clarifies the points you explained right before Christmas.

All the best and a happy new year,
Tobias

Added documentation

Hi Johannes,

did you still plan to add documentation about the schedule generation that clarifies the points you explained right before Christmas.

Done in the newest version of the patch.

signature.asc219 BDownload

Hi Johannes,

thanks for the detailed comments. They clearly help to both review the new code and to understand more of the existing code. I have a couple of follow up questions inline.

Best,
Tobias

include/polly/ScopInfo.h
1410 ↗	(On Diff #44034)	Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself (non-trivial) Regions, which are all connected through control flow possibly containing loops. By walking over the AST in reverse-post-order we ensure which condition precisely? Also, can we state something about the loops processed in a region. Do we assume reducible control flow? Can/do we exploit the additional properties this gives us?
1426 ↗	(On Diff #44034)	Nice. This comment clearly helps me to better understand the algorithm used. Now I need to relate the actual implementation to the above description. One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. void buildSchedule( Region R, DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) { } to isl_schedule buildSchedule(Region R) { DenseMap<Loop , std::pair<isl_schedule *, unsigned>> &LocalLoopSchedules; ... }
lib/Analysis/ScopInfo.cpp
3448 ↗	(On Diff #44034)	This comment clarifies a question I was wondering about before. For me this would become even more clear if we split off the non-recursive from the recursive part of the code. To try how if this actually works, I implemented such a patch ( 0001-ScopInfo-Split-recursive-and-non-recursive-part-of-s.patch4 KBDownload ). If you agree this is useful I could commit it ahead of time.
3458 ↗	(On Diff #44034)	queued
3493 ↗	(On Diff #44034)	These different cases, iterators and lists look confusing to me. I wonder if this code would look more clear, if we would just add all RNs into a work list, to which we append RNs that need to be delayed? Moving all RNs to a work-list clearly slows down the common case, but I have doubts we see this in any profile.
test/ScopInfo/wrong_schedule.ll
67 ↗	(On Diff #44034)	This patch still needs some simplifications (either manually or you could use my bugpoint patch).

Hi Johannes,

I played around with this code a little bit more and it seems we can pull out the tree traversal to remove the single-non-affine-region special case and generally simplify the code. This helps me to better understand that we _either_ walk the tree in case we have an affine-region-node or we build schedule information for a ScopStmt.

0001-Define-buildScheduleRec-on-RegionNodes-and-pull-out-.patch6 KBDownload

Rebase after refactoring

In D15679#320281, @grosser wrote:

Hi Johannes,

I played around with this code a little bit more and it seems we can pull out the tree traversal to remove the single-non-affine-region special case and generally simplify the code. This helps me to better understand that we _either_ walk the tree in case we have an affine-region-node or we build schedule information for a ScopStmt.

0001-Define-buildScheduleRec-on-RegionNodes-and-pull-out-.patch6 KBDownload

I commited a refactoring in r256931. It is different to your's but I think it makes your point even more clear. I hope you agree.

jdoerfert marked 2 inline comments as done.Jan 6 2016, 5:30 AM

jdoerfert added inline comments.

include/polly/ScopInfo.h
1414 ↗	(On Diff #44107)	Could you add one more sentence to clarify for the reader how the region tree traversal works on an abstract level? I do not follow completely. What abstract level except "recursively in reverse post-order wrt. sub-regions" is there? To my understanding for each call of buildSchedule we walk over a given region @R. @R consists of RegionNodes that are either BBs or itself non-trivial) Regions, which are all connected through control flow possibly containing loops. Yes. By walking over the AST in reverse-post-order we ensure which condition precisely? That we visit predecessors of a block before the block itself (same as for domain generation). This is necessary since we can then add the schedule for a loop or block in sequence to what we already have and do not need to find a place where to insert it. Also, can we state something about the loops processed in a region. I do not understand. What should I state about the loops? Do we assume reducible control flow? LoopInfo does not work on irreducible control => we will currently crash or do undefined things on irreducible control. If LoopInfo would work (in some sence) this code would probably work too (but I do not give a gurarantee as it heavily depends on the way irreducible control loops are modeled.). Can/do we exploit the additional properties this gives us? None crossed my mind so far but I do not know if there are none.
1430 ↗	(On Diff #44107)	One thing I am not fully sure how it works is the LoopSchedules map (and the new LoopStack). Can we give an invariant on the state of these when a recursive call to buildSchedule returns? Mh, invariants I can think of: LoopStack, contains loops that are contained in each other, hence LoopStack[i]->getParentLoop() == LoopStack[i+1] (if there are at least i+1 elements). If a loop was never in the LoopStack it is not mapped in the LoopSchedules map (except the loop that surrounds the SCoP, or nullptr if none). If a loop is in the LoopStack it isl_schedule in the LoopSchedules is valid and the number of visited blocks in that loop is less than the number of blocks. If a loop leaves the LoopStack its mapping in the LoopSchedules becomes invalid, thus the isl_schedule ptr is dangling and the second component (number of visited blocks) is equal to the number of blocks in the loop. I can add them if it helps. Specifically, why do we need to return a full map of schedules? Will there possibly be always exactly one schedule in the map that is returned, e.g. the one for the current region R? Or is there an inherent reason (besides irreducible control flow) that prevents us from using function local state. There is always "one active loop", thus one could rewrite it somehow to only pass the information for that one. However, we still need to pass the current schedule for that active loop as well as the number of visited blocks to the recursive calls and we need to update them during the recursion. One local map + std::pair<isl_schedule, unsigned>& argument should do the trick but I do not see how this is more efficent or easier.
lib/Analysis/ScopInfo.cpp
3438 ↗	(On Diff #44107)	Done.
3498 ↗	(On Diff #44107)	But a workl list doesn't magically remove the different cases and iterators. The same logic is needed (if it is even that simple to decide if we need to delay a region node in case we do not yet build the schedule).

Hi Johannes,

for our discussion, here a code snippet that explains my worklist approach. It basically replaces the deque you anyway already added.

lib/Analysis/ScopInfo.cpp

3463 ↗

(On Diff #44107)

queued

3498 ↗

(On Diff #44107)

What do you think of the following piece of code? It passes for me all tests and at least avoids the need for LastRNWaiting as well as explicit iterators.

ReversePostOrderTraversal<Region *> RTraversal(R);                             
std::deque<RegionNode *> WorkList(RTraversal.begin(), RTraversal.end());       
                                                                               
while (!WorkList.empty()) {                                                    
  RegionNode *RN = WorkList.front();                                           
  WorkList.pop_front();                                                        
                                                                               
  Loop *L = getRegionNodeLoop(RN, LI);                                         
  if (!getRegion().contains(L))                                                
    L = OuterScopLoop;                                                         
                                                                               
  Loop *LastLoop = LoopStack.back();                                           
  if (LastLoop != L) {                                                         
    if (!LastLoop->contains(L)) {                                              
      WorkList.push_back(RN);                                                  
      continue;                                                                
    }                                                                          
    LoopStack.push_back(L);                                                    
  }

Hi Johannes,

I just wanted to put down my latest findings here:

The simplified code I proposed is incorrect, as the elements we push back are pushed back to the end of the worklist which will result in an invalid work list when leaving the loop.

I believe your patch makes the LoopSchedules map unnecessary. As we now always process a stack of loops. Consequently, we could use a single stack of type:

struct BuildScheduleInfo {
  Loop *L;
  isl_schedule *Schedule;
  unsigned *NumBBsProcessed;
}
SmallVectorImpl<struct BuildScheduleInfo*> &LoopStack

instead of the LoopSchedules map. I think using such a stack is preferable, as as it indicates nicely that there is only one active loop at a time.

After having thought about this code now for a while I think I understood all the corner cases. Thanks for your patience!

Closed by commit rL259354: ScopInfo: Correct schedule construction (authored by grosser). · Explain WhyFeb 1 2016, 3:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

polly/

trunk/

include/

polly/

ScopInfo.h

63 lines

lib/

Analysis/

ScopInfo.cpp

140 lines

test/

ScopInfo/

schedule-const-post-dominator-walk-2.ll

42 lines

schedule-const-post-dominator-walk.ll

34 lines

Diff 46522

polly/trunk/include/polly/ScopInfo.h

Show First 20 Lines • Show All 1,468 Lines • ▼ Show 20 Lines	private:
///		///
/// When detecting memory accesses different accesses to the same array may		/// When detecting memory accesses different accesses to the same array may
/// have built with different dimensionality, as outer zero-values dimensions		/// have built with different dimensionality, as outer zero-values dimensions
/// may not have been recognized as separate dimensions. This function goes		/// may not have been recognized as separate dimensions. This function goes
/// again over all memory accesses and updates their dimensionality to match		/// again over all memory accesses and updates their dimensionality to match
/// the dimensionality of the underlying ScopArrayInfo object.		/// the dimensionality of the underlying ScopArrayInfo object.
void updateAccessDimensionality();		void updateAccessDimensionality();

/// @brief Build Schedule for the SCoP region.		/// @brief Construct the schedule of this SCoP.
///
void buildSchedule();		void buildSchedule();

/// @brief Build Schedule for the region @p RN.		/// @brief A loop stack element to keep track of per-loop information during
///		/// schedule construction.
/// @param RN The current region traversed.		typedef struct LoopStackElement {
/// @param LoopSchedules Map from loops to their schedule and progress.		// The loop for which we keep information.
void buildSchedule(		Loop *L;
RegionNode *RN,
DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules);		// The (possibly incomplete) schedule for this loop.
		isl_schedule *Schedule;

		// The number of basic blocks in the current loop, for which a schedule has
		// already been constructed.
		unsigned NumBlocksProcessed;

		LoopStackElement(Loop L, __isl_give isl_schedule S,
		unsigned NumBlocksProcessed)
		: L(L), Schedule(S), NumBlocksProcessed(NumBlocksProcessed){};
		} LoopStackElementTy;

		/// @brief The loop stack used for schedule construction.
		///
		/// The loop stack keeps track of schedule information for a set of nested
		/// loops as well as an (optional) 'nullptr' loop that models the outermost
		/// schedule dimension. The loops in a loop stack always have a parent-child
		/// relation where the loop at position n is the parent of the loop at
		/// position n + 1.
		typedef SmallVector<LoopStackElementTy, 4> LoopStackTy;

		/// @brief Construct schedule information for a given Region and add the
		/// derived information to @p LoopStack.
		///
		/// Given a Region we derive schedule information for all RegionNodes
		/// contained in this region ensuring that the assigned execution times
		/// correctly model the existing control flow relations.
		///
		/// @param R The region which to process.
		/// @param LoopStack A stack of loops that are currently under
		/// construction.
		void buildSchedule(Region *R, LoopStackTy &LoopStack);

		/// @brief Build Schedule for the region node @p RN and add the derived
		/// information to @p LoopStack.
		///
		/// In case @p RN is a BasicBlock or a non-affine Region, we construct the
		/// schedule for this @p RN and also finalize loop schedules in case the
		/// current @p RN completes the loop.
		///
		/// In case @p RN is a not-non-affine Region, we delegate the construction to
		/// buildSchedule(Region *R, ...).
		///
		/// @param RN The RegionNode region traversed.
		/// @param LoopStack A stack of loops that are currently under
		/// construction.
		void buildSchedule(RegionNode *RN, LoopStackTy &LoopStack);

/// @brief Collect all memory access relations of a given type.		/// @brief Collect all memory access relations of a given type.
///		///
/// @param Predicate A predicate function that returns true if an access is		/// @param Predicate A predicate function that returns true if an access is
/// of a given type.		/// of a given type.
///		///
/// @returns The set of memory accesses in the scop that match the predicate.		/// @returns The set of memory accesses in the scop that match the predicate.
__isl_give isl_union_map *		__isl_give isl_union_map *
▲ Show 20 Lines • Show All 568 Lines • Show Last 20 Lines

polly/trunk/lib/Analysis/ScopInfo.cpp

Show First 20 Lines • Show All 3,438 Lines • ▼ Show 20 Lines	if (BB) {
Stmts.emplace_back(this, R);		Stmts.emplace_back(this, R);
auto Stmt = &Stmts.back();		auto Stmt = &Stmts.back();
for (BasicBlock *BB : R->blocks())		for (BasicBlock *BB : R->blocks())
StmtMap[BB] = Stmt;		StmtMap[BB] = Stmt;
}		}
}		}

void Scop::buildSchedule() {		void Scop::buildSchedule() {
DenseMap<Loop , std::pair<isl_schedule , unsigned>> LoopSchedules;
Loop *L = getLoopSurroundingRegion(getRegion(), LI);		Loop *L = getLoopSurroundingRegion(getRegion(), LI);
LoopSchedules[L];		LoopStackTy LoopStack({LoopStackElementTy(L, nullptr, 0)});
buildSchedule(getRegion().getNode(), LoopSchedules);		buildSchedule(getRegion().getNode(), LoopStack);
Schedule = LoopSchedules[L].first;		assert(LoopStack.size() == 1 && LoopStack.back().L == L);
		Schedule = LoopStack[0].Schedule;
}		}

void Scop::buildSchedule(		/// To generate a schedule for the elements in a Region we traverse the Region
RegionNode *RN,		/// in reverse-post-order and add the contained RegionNodes in traversal order
DenseMap<Loop , std::pair<isl_schedule , unsigned>> &LoopSchedules) {		/// to the schedule of the loop that is currently at the top of the LoopStack.
		/// For loop-free codes, this results in a correct sequential ordering.
		///
		/// Example:
		/// bb1(0)
		/// / \.
		/// bb2(1) bb3(2)
		/// \ / \.
		/// bb4(3) bb5(4)
		/// \ /
		/// bb6(5)
		///
		/// Including loops requires additional processing. Whenever a loop header is
		/// encountered, the corresponding loop is added to the @p LoopStack. Starting
		/// from an empty schedule, we first process all RegionNodes that are within
		/// this loop and complete the sequential schedule at this loop-level before
		/// processing about any other nodes. To implement this
		/// loop-nodes-first-processing, the reverse post-order traversal is
		/// insufficient. Hence, we additionally check if the traversal yields
		/// sub-regions or blocks that are outside the last loop on the @p LoopStack.
		/// These region-nodes are then queue and only traverse after the all nodes
		/// within the current loop have been processed.
		void Scop::buildSchedule(Region *R, LoopStackTy &LoopStack) {
		Loop *OuterScopLoop = getLoopSurroundingRegion(getRegion(), LI);

		ReversePostOrderTraversal<Region *> RTraversal(R);
		std::deque<RegionNode *> WorkList(RTraversal.begin(), RTraversal.end());
		std::deque<RegionNode *> DelayList;
		bool LastRNWaiting = false;

		// Iterate over the region @p R in reverse post-order but queue
		// sub-regions/blocks iff they are not part of the last encountered but not
		// completely traversed loop. The variable LastRNWaiting is a flag to indicate
		// that we queued the last sub-region/block from the reverse post-order
		// iterator. If it is set we have to explore the next sub-region/block from
		// the iterator (if any) to guarantee progress. If it is not set we first try
		// the next queued sub-region/blocks.
		while (!WorkList.empty() \|\| !DelayList.empty()) {
		RegionNode *RN;

		if ((LastRNWaiting && !WorkList.empty()) \|\| DelayList.size() == 0) {
		RN = WorkList.front();
		WorkList.pop_front();
		LastRNWaiting = false;
		} else {
		RN = DelayList.front();
		DelayList.pop_front();
		}

		Loop *L = getRegionNodeLoop(RN, LI);
		if (!getRegion().contains(L))
		L = OuterScopLoop;

		Loop *LastLoop = LoopStack.back().L;
		if (LastLoop != L) {
		if (!LastLoop->contains(L)) {
		LastRNWaiting = true;
		DelayList.push_back(RN);
		continue;
		}
		LoopStack.push_back({L, nullptr, 0});
		}
		buildSchedule(RN, LoopStack);
		}

		return;
		}

		void Scop::buildSchedule(RegionNode *RN, LoopStackTy &LoopStack) {

if (RN->isSubRegion()) {		if (RN->isSubRegion()) {
auto *LocalRegion = RN->getNodeAs<Region>();		auto *LocalRegion = RN->getNodeAs<Region>();
if (!SD.isNonAffineSubRegion(LocalRegion, &getRegion())) {		if (!SD.isNonAffineSubRegion(LocalRegion, &getRegion())) {
ReversePostOrderTraversal<Region *> RTraversal(LocalRegion);		buildSchedule(LocalRegion, LoopStack);
for (auto *Child : RTraversal)
buildSchedule(Child, LoopSchedules);
return;		return;
}		}
}		}

Loop *L = getRegionNodeLoop(RN, LI);		auto &LoopData = LoopStack.back();
if (!getRegion().contains(L))		LoopData.NumBlocksProcessed += getNumBlocksInRegionNode(RN);
L = getLoopSurroundingRegion(getRegion(), LI);

int LD = getRelativeLoopDepth(L);
auto &LSchedulePair = LoopSchedules[L];
LSchedulePair.second += getNumBlocksInRegionNode(RN);

if (auto *Stmt = getStmtForRegionNode(RN)) {		if (auto *Stmt = getStmtForRegionNode(RN)) {
auto *UDomain = isl_union_set_from_set(Stmt->getDomain());		auto *UDomain = isl_union_set_from_set(Stmt->getDomain());
auto *StmtSchedule = isl_schedule_from_domain(UDomain);		auto *StmtSchedule = isl_schedule_from_domain(UDomain);
LSchedulePair.first = combineInSequence(LSchedulePair.first, StmtSchedule);		LoopData.Schedule = combineInSequence(LoopData.Schedule, StmtSchedule);
}		}

isl_schedule *LSchedule = LSchedulePair.first;		// Check if we just processed the last node in this loop. If we did, finalize
unsigned NumVisited = LSchedulePair.second;		// the loop by:
while (L && NumVisited == L->getNumBlocks()) {		//
auto *PL = L->getParentLoop();		// - adding new schedule dimensions
		// - folding the resulting schedule into the parent loop schedule
auto &PSchedulePair = LoopSchedules[PL];		// - dropping the loop schedule from the LoopStack.
		//
if (LSchedule) {		// Then continue to check surrounding loops, which might also have been
auto *LDomain = isl_schedule_get_domain(LSchedule);		// completed by this node.
auto *MUPA = mapToDimension(LDomain, LD + 1);		while (LoopData.L &&
LSchedule = isl_schedule_insert_partial_schedule(LSchedule, MUPA);		LoopData.NumBlocksProcessed == LoopData.L->getNumBlocks()) {
PSchedulePair.first = combineInSequence(PSchedulePair.first, LSchedule);		auto Schedule = LoopData.Schedule;
		auto NumBlocksProcessed = LoopData.NumBlocksProcessed;

		LoopStack.pop_back();
		auto &NextLoopData = LoopStack.back();

		if (Schedule) {
		auto *Domain = isl_schedule_get_domain(Schedule);
		auto *MUPA = mapToDimension(Domain, LoopStack.size());
		Schedule = isl_schedule_insert_partial_schedule(Schedule, MUPA);
		NextLoopData.Schedule =
		combineInSequence(NextLoopData.Schedule, Schedule);
}		}

PSchedulePair.second += NumVisited;		NextLoopData.NumBlocksProcessed += NumBlocksProcessed;
		LoopData = NextLoopData;
L = PL;
LD--;
NumVisited = PSchedulePair.second;
LSchedule = PSchedulePair.first;
}		}
}		}

ScopStmt Scop::getStmtForBasicBlock(BasicBlock BB) const {		ScopStmt Scop::getStmtForBasicBlock(BasicBlock BB) const {
auto StmtMapIt = StmtMap.find(BB);		auto StmtMapIt = StmtMap.find(BB);
if (StmtMapIt == StmtMap.end())		if (StmtMapIt == StmtMap.end())
return nullptr;		return nullptr;
return StmtMapIt->second;		return StmtMapIt->second;
▲ Show 20 Lines • Show All 604 Lines • Show Last 20 Lines

polly/trunk/test/ScopInfo/schedule-const-post-dominator-walk-2.ll

				; RUN: opt %loadPolly -analyze -polly-scops < %s \| FileCheck %s

				; CHECK: Stmt_loopA[i0] -> [0, 0, 0]
				; CHECK-DAG: Stmt_loopB[i0] -> [0, 0, 1]
				; CHECK-DAG: Stmt_bbB[] -> [1, 0, 0]
				; CHECK-DAG: Stmt_bbA[] -> [2, 0, 0]
				; CHECK-DAG: Stmt_bbMerge[] -> [3, 0, 0]

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @hoge(i64 %p0, i64 %p1, i64 %p2, i64 %p3, float* %A) {
				entry:
				br label %loopA

				loopA:
				%tmp4 = phi i64 [ 0, %entry ], [ 0, %loopB]
				store float 42.0, float* %A
				%cmp0 = icmp sle i64 %p0, 100
				br i1 %cmp0, label %loopB, label %bbB

				loopB:
				store float 42.0, float* %A
				%cmp1 = icmp sle i64 %p1, 100
				br i1 %cmp1, label %loopA, label %bbA

				bbA:
				store float 42.0, float* %A
				%cmpbbA = icmp sle i64 %p2, 50
				br i1 %cmpbbA, label %bbMerge, label %exit

				bbB:
				store float 42.0, float* %A
				%cmpbbB= icmp sle i64 %p3, 200
				br i1 %cmpbbB, label %exit, label %bbMerge

				bbMerge:
				store float 42.0, float* %A
				br label %exit

				exit:
				ret void
				}

polly/trunk/test/ScopInfo/schedule-const-post-dominator-walk.ll

				; RUN: opt %loadPolly -analyze -polly-scops < %s \| FileCheck %s

				; CHECK: { Stmt_bb3[i0] -> [0, 0] };
				; CHECK: { Stmt_bb2[] -> [1, 0] };

				; Verify that we generate the correct schedule. In older versions of Polly,
				; we generated an incorrect schedule:
				;
				; { Stmt_bb3[i0] -> [1, 0]; Stmt_bb2[] -> [0, 0] }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @hoge() {
				bb:
				br label %bb3

				bb1: ; preds = %bb5
				br label %bb6

				bb2: ; preds = %bb3
				%tmp = phi i64 [ %tmp4, %bb3 ]
				br label %bb6

				bb3: ; preds = %bb5, %bb
				%tmp4 = phi i64 [ 0, %bb ], [ 0, %bb5 ]
				br i1 false, label %bb5, label %bb2

				bb5: ; preds = %bb3
				br i1 false, label %bb3, label %bb1

				bb6: ; preds = %bb2, %bb1
				%tmp2 = phi i64 [ %tmp, %bb2 ], [ undef, %bb1 ]
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[FIX] Schedule generation PR25879ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 46522

polly/trunk/include/polly/ScopInfo.h

polly/trunk/lib/Analysis/ScopInfo.cpp

polly/trunk/test/ScopInfo/schedule-const-post-dominator-walk-2.ll

polly/trunk/test/ScopInfo/schedule-const-post-dominator-walk.ll

[FIX] Schedule generation PR25879
ClosedPublic