This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
polly/trunk/
-
trunk/
-
lib/Transform/
-
Transform/
-
DeLICM.cpp
-
test/DeLICM/
-
DeLICM/
-
reduction_preheader.ll

Differential D24716

[Polly] DeLICM/DePRE (WIP)
ClosedPublic

Authored by Meinersbur on Sep 19 2016, 2:27 AM.

Download Raw Diff

Details

Reviewers

grosser

Commits

rG9e52c39f0a25: [DeLICM] Map values hoisted by LICM back to the array.
rPLO295713: [DeLICM] Map values hoisted by LICM back to the array.
rL295713: [DeLICM] Map values hoisted by LICM back to the array.

Summary

Implement -polly-delicm pass. The pass intends to undo the effects of LoadInvariantCodeMotion and GVN's Partial Redundancy Elimination (Load PTR) which adds additional scalar dependencies into scops.

DeLICM/DePRE will try to map those scalars back to the array elements they were promoted from, as long as the array element is unused.

This is work in progress. The patch is not as tidy as it could be, not all functions have been commended yet, most test cases fail and some refactoring is still planned.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

grosser added inline comments.Jan 21 2017, 2:00 AM

include/polly/DeLICM.h
61 ↗	(On Diff #85103)	InclRedef=true RETURNS a
67 ↗	(On Diff #85103)	ARE ignored
70 ↗	(On Diff #85103)	"as where" -- Not sure what this means. Do you want to say: "Whether to include the definition's timepoint to the set of well-defined elements" You can probably also drop the "Whether to" at the beginning of the sentence.
72 ↗	(On Diff #85103)	enabling THIS OPTION adds
97 ↗	(On Diff #85103)	RESULTS in:
102 ↗	(On Diff #85103)	IS overwritten THE FIRST TIME by Write "when before timepoint" : when what is before timepoint? Can you add the object for clarity? Is it "the write at timepoint [i] is before timepoint 0"?
108 ↗	(On Diff #85103)	ARE ignored
111 ↗	(On Diff #85103)	"an overwrite timepoints": the numerus seems to not match
146 ↗	(On Diff #85103)	The result IS
159 ↗	(On Diff #85103)	ARE ignored
164 ↗	(On Diff #85103)	load READS
166 ↗	(On Diff #85103)	loads use
167 ↗	(On Diff #85103)	THAT it had
175 ↗	(On Diff #85103)	written IS used
176 ↗	(On Diff #85103)	IS considered
lib/Transform/DeLICM.cpp
23 ↗	(On Diff #85103)	i.e.,
24 ↗	(On Diff #85103)	isl rational
40 ↗	(On Diff #85103)	i.e., strictly speaking
59 ↗	(On Diff #85103)	isl reference (isl is written lowercase) follow isl syntax
234 ↗	(On Diff #85103)	I would place the "." after the ")"
260 ↗	(On Diff #85103)	"." after ")"
300 ↗	(On Diff #85103)	IN the result
302 ↗	(On Diff #85103)	IN the result
307 ↗	(On Diff #85103)	isl max
334 ↗	(On Diff #85103)	i.e.,
357 ↗	(On Diff #85103)	does not
359 ↗	(On Diff #85103)	i.e.,
504 ↗	(On Diff #85103)	are ignored.
508 ↗	(On Diff #85103)	comment flow seems wrong (below as well)
556 ↗	(On Diff #85103)	AND does not need
615 ↗	(On Diff #85103)	comment flow seems wrong
654 ↗	(On Diff #85103)	compute_divs can sometimes make a map a lot more complex. Did you had a case where this is necessary. We also should ask sven at some point if he can provide a simplify function for isl that does all simplifications in the right order.
682 ↗	(On Diff #85103)	"." at the end?
700 ↗	(On Diff #85103)	"st."?
712 ↗	(On Diff #85103)	of the contents <sth missing here?> any array elements in any zone
715 ↗	(On Diff #85103)	one OF three
716 ↗	(On Diff #85103)	so A transformation
719 ↗	(On Diff #85103)	Leftover documentation from the 'Known' part?
735 ↗	(On Diff #85103)	problems problem?
736 ↗	(On Diff #85103)	known ? Is this a leftover?
737 ↗	(On Diff #85103)	Last sentence is incomplete
842 ↗	(On Diff #85103)	Comment flow broken.
853 ↗	(On Diff #85103)	ARE the new lifetimes required for proposed UNUSED in existing?
899 ↗	(On Diff #85103)	? at the end
921 ↗	(On Diff #85103)	isl
946 ↗	(On Diff #85103)	a reference a counter????
953 ↗	(On Diff #85103)	I doubt this has any performance impact. I would prefer to keep the state local and not pollute global space with such caching things.
1080 ↗	(On Diff #85103)	IS as
1119 ↗	(On Diff #85103)	Get
1121 ↗	(On Diff #85103)	IS as
1213 ↗	(On Diff #85103)	A MemoryKind::Value ... A MemoryKind
1218 ↗	(On Diff #85103)	A ... A
1230 ↗	(On Diff #85103)	the transformation IS applied to the
1280 ↗	(On Diff #85103)	WITH each other
1343 ↗	(On Diff #85103)	of A
1403 ↗	(On Diff #85103)	HAS.
1524 ↗	(On Diff #85103)	comment flow broken
1536 ↗	(On Diff #85103)	USED?
1661 ↗	(On Diff #85103)	Map A
unittests/DeLICM/DeLICMTest.cpp
22 ↗	(On Diff #85103)	indef -> undef?

Meinersbur marked 39 inline comments as done.Jan 21 2017, 3:20 PM

Meinersbur added inline comments.

lib/Transform/DeLICM.cpp
654 ↗	(On Diff #85103)	I am not aware that this can make sets more complex, it just provides an additional explicit representation of a div that should make follow-up computations faster and was sometimes necessary for coalescing. According to the source comments the issue is rather that compute_divs can be an expensive operation, but so is coalescing. If possible I'd not call `simplify()` at all but that results in other operations taking much longer, e.g. `lexmin`.
unittests/DeLICM/DeLICMTest.cpp
22 ↗	(On Diff #85103)	This refers to the definition of `indef` below

Address Tobias' comments

Hi Michael,

some of the comments from our discussion.

Best,
Tobias

include/polly/DeLICM.h
36 ↗	(On Diff #85252)	Add comment?
lib/Transform/DeLICM.cpp
83 ↗	(On Diff #85252)	Maybe give an overview comment describing the high-level structure of this approach.
477 ↗	(On Diff #85252)	Several of these functions above are not really tied to your DeLICM approach, but are generally useful. I would suggest to add these separately in an ISLTools.cpp file, in the optimal case with separate unit tests. You can probably already upstream these today.
689 ↗	(On Diff #85252)	Pull in getAccessRelation.
923 ↗	(On Diff #85252)	Drop this one as well?
1092 ↗	(On Diff #85252)	The above all really seem to be belong into ScopInfo. Let's leave them here for know, as this makes updating this patch easier, but maybe we can keep in mind that we want to unify this at some point.
1845 ↗	(On Diff #85252)	I believe we should not use auto so much. As you correctly pointed out, auto is suggested by LLVM to be only used in rare cases. We had in Barcelone a discussion how much we should use auto and it seems we choose a rather liberal direction. I now believe that following LLVM here closer is better. Now, currently auto makes sense as it saves the long IslPTr<.....> type so you probably do not need to change anything here immediately. We should just keep this in mind, when potentially switching to the C++ bindings.
40 ↗	(On Diff #85103)	Comma after i.e., missing.
359 ↗	(On Diff #85103)	Comma after i.e., missing.
654 ↗	(On Diff #85103)	OK. We probably need to evaluate this in practice. Let's start with what you have.
1080 ↗	(On Diff #85103)	IS BE?? Just 'is as'?
1791 ↗	(On Diff #79595)	OK. I think this functionality should at some point be merged with ScopInfo, but let's leave it here for know to reduce the dependences to ScopInfo.
unittests/DeLICM/DeLICMTest.cpp
423 ↗	(On Diff #85252)	To fix formatting, maybe use: struct { type Known, type undef, type written, type unkown } input; void foo() { EXPECT_FALSE(conflicts( {aasdfsadf, basdfsadf, casdfasf, dsadfsadfsadfdsafdsafsafdfasdfsadf}, {a, basdfsafd, casdfdsaf, dsadfsadf})); }
426 ↗	(On Diff #85252)	It also seems as if these tests still cover the Known part. It would be nice if we could have a set of simpler tests that just take the current simpler isConflicting implementation and do not go through the complexity of checkIsConflicting.

Rebase to r293429
Remove MapReport
Review isConflicting unittests.
Address remaining comments

Hi Michael,

thanks a lot for the very nice cleanup. I am very glad this patch proceeds this way. I may now seem to be very picky, but I really try to understand this one in detail. I added a bunch of smaller comments. I am not yet fully through, but believe I identified a new set of changes we can commit. All of checkConvertZoneToTimepoint, computeReachingDefinition, computeReachingOverwrite, and checkComputeArrayUnused are clearly self-contained and thanks to your nice unit tests are now easy to test independently. I suggest you commit them one-by-one. I did not go through all corner cases, but as they are well tested that seems to be fine.

The only concern I have is that there is some -- possibly unnecessary -- code duplication between computeReachingDefinition and computeReachingOverwrite, which I would prefer to avoid.

I also noted that you provide implementations for all corner cases always including or not including bounds. On the other side, most of the actual code is currently not using any of these (there is commonly only one configuration used). As this all looks very well tested, I don't want to ask you to drop all the special cases, especially as I have the impression that these may be useful for some of the later code or later discussions on how to model certain things best. Hence, I am fine leaving them in. However, if you have any cases in mind that are certainly not needed any more, don't hesitate to drop them.

As a next step, I will go through Knowledge and isConflicting.

Best,
Tobias

include/polly/DeLICM.h
45 ↗	(On Diff #86279)	The set OF timepoints
51 ↗	(On Diff #86279)	IS overwritten
79 ↗	(On Diff #86279)	This is only ever called with InclStart = true and InclEnd=false. Is this intentional and do you keep the other options just for completness?
146 ↗	(On Diff #86279)	(dover)write -> (over)write
180 ↗	(On Diff #86279)	NOT read in between
lib/Transform/DeLICM.cpp
135 ↗	(On Diff #86279)	one definition PER SCoP
143 ↗	(On Diff #86279)	The definition (write MemoryAccess) of a MemoryKind::Value scalar. definitions -> definion / -> () an ->a
148 ↗	(On Diff #86279)	uses (read MemoryAccesses) of a MemoryKind::Value / -> () an -> a
152 ↗	(On Diff #86279)	The PHI instruction (read Memory Access) of a MemoryKind::PHI or MemoryKind::ExitPHI. / -> () an -> a / -> or (makes clear if this is an 'or' or an 'and')
156 ↗	(On Diff #86279)	List of all incoming values (write MemoryAccesses) for a MemoryKind::PHI or MemoryKind::ExitPHI scalar.
175 ↗	(On Diff #86279)	This fits into two lines
245 ↗	(On Diff #86279)	This is only ever called with computeScalarReachingOverwrite(Schedule, TargetDom, false, true);
295 ↗	(On Diff #86279)	This is only ever called with "computeScalarReachingDefinition(Schedule, Domain, false, true)".
1489 ↗	(On Diff #86279)	The previous four lines can be dropped, but changing !MA->isWrite() to !MA->isMustWrite() a couple of lines earlier.
1709 ↗	(On Diff #86279)	This is only called with: auto ArrayUnused = computeArrayUnused(Schedule, AllMustWrites, AllReads,
unittests/DeLICM/DeLICMTest.cpp
22 ↗	(On Diff #85103)	Right, I still don't understand why you call this 'indef'. This seems to be an abbreviation of "undefined", but then I would expect this word to be called "undef". What does "indef" stand for?

grosser added inline comments.Feb 3 2017, 8:33 AM

lib/Transform/DeLICM.cpp
187 ↗	(On Diff #86279)	This fits into two lines.
235 ↗	(On Diff #86279)	What is an "overwrite timepoints"?
284 ↗	(On Diff #86279)	What does not need to be specified? and therefore THIS ELEMENT does not ...
356 ↗	(On Diff #86279)	of TWO states
370 ↗	(On Diff #86279)	the theses? KnowledgeS need to know
468 ↗	(On Diff #86279)	This -> Existing That -> Proposed ?
560 ↗	(On Diff #86279)	must be the declared .. Grammar?
619 ↗	(On Diff #86279)	before other accessES AND
621 ↗	(On Diff #86279)	Is this true? What about region statements that write to a PHI node?
638 ↗	(On Diff #86279)	Maybe add a test case?
641 ↗	(On Diff #86279)	If you add a continue above, the second MA->isWrite is not needed. This reduces indentation and also makes clear that this is an if-else construct.
647 ↗	(On Diff #86279)	What would be such a case? Is there an easy opportunity for a test case?
653 ↗	(On Diff #86279)	Should we have a test case for this? (including a comment in the test case that states that this is the reason it is not DeLICMed)
655 ↗	(On Diff #86279)	Should we have a test case for this? (including a comment in the test case that states that this is the reason it is not DeLICMed.
685 ↗	(On Diff #86279)	No {}.
689 ↗	(On Diff #86279)	No {}.
712 ↗	(On Diff #86279)	is as narrow Drop 'be'
820 ↗	(On Diff #86279)	Are the last two definitions dead?
1617 ↗	(On Diff #86279)	Should this be ScatterRead instead of ScatterUse? You do not seem to modify the range. In computeReachingOverwrite you seem to call it ScatterRead as well.
1651 ↗	(On Diff #86279)	This assert is missing in computeReachingDefinition
1668 ↗	(On Diff #86279)	This should be called AfterMap, right? Maybe an oversight when renaming and adopting computeReachingDefinition?
1692 ↗	(On Diff #86279)	In computeReachingDefinition SelfUse has been hoisted outside the condition. Not a semantic difference, but a minor change that lets the code look different.
1702 ↗	(On Diff #86279)	Besides the comparisions isl_map_lex_lt/isl_map_lex_le this function seems to be indentical with computeReachingDefinition. It feels to me as if this is unnecessary complication of rather complicated code. Would it make sense to have one common implementation that just takes a comparision function to then deliver the two different implementations?

Hi Michael,

I just looked closer into Knowledge and isConflicting.

To me it is not well documented what "implicit" means. As far as I understand implicit means that only one of unknown and unused is actually stored and the other one is "implicitly" the opposite. It would be good to document this and also give a reason why you implemented it this way. I assume because the complement of one of these sets can possibly be very large.

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

That's all for now. I will later look into the remaining stuff.

lib/Transform/DeLICM.cpp
403 ↗	(On Diff #86279)	a nullptr
435 ↗	(On Diff #86279)	What does it mean to be implicitly unused. Can you somehow print this, e.g. as "universe - Occupied". What happens if the first set does not contain elements from each space? Assuming we never write to a given array? This would mean it does not appear in Occupied and is completely unused. Now, the universe we compute won't contain an element from this space, after the (implicit) subtraction it won't appear, right? You seem to be working around this in the tests, but it is not clear how this is resolved in general or which preconditions need to be satisfied to make this work.
450 ↗	(On Diff #86279)	The previous line does not have any test coverage? Should we just assert that such a condition is not expected to arise? Or is there a test case that would cover this situation? (Same for the condition below).
454 ↗	(On Diff #86279)	The previous line does not have any test coverage.
465 ↗	(On Diff #86279)	WITH each other
477 ↗	(On Diff #86279)	What do you mean by "use case X" here?
477 ↗	(On Diff #86279)	What do you mean with 'use case X'? I am surprised that the current implementation only works with the first parameter being "implicit" unknown. It seems the tests provide an explicit representation of "unknown", no? I tried to assert(!Existing.Occupied), but this fails (due to the tests using explicit constraints.
1761 ↗	(On Diff #86279)	Why do we move this stuff here? This does not seem to be performance sensitive.
unittests/DeLICM/DeLICMTest.cpp
379 ↗	(On Diff #86279)	Why do you compute the universe here? I was under the impression that Knowledges can also be created with implicit Unkown, Unused sets and should work well with them. In fact, there is a comment in isConflicting that claims that it _only_ works with implicit sets? Would the tests still work if you do not create explicit representations of Unknown and Unused?
397 ↗	(On Diff #86279)	vs. Also end this and the sentences below with "."
419 ↗	(On Diff #86279)	Maybe add a comment that states why these two conflict. Dom[1] is a zone that covers [0,1]. This took me a little while to understand.

Thanks for the review

In D24716#665836, @grosser wrote:

I also noted that you provide implementations for all corner cases always including or not including bounds. On the other side, most of the actual code is currently not using any of these (there is commonly only one configuration used). As this all looks very well tested, I don't want to ask you to drop all the special cases, especially as I have the impression that these may be useful for some of the later code or later discussions on how to model certain things best. Hence, I am fine leaving them in. However, if you have any cases in mind that are certainly not needed any more, don't hesitate to drop them.

I feel relative strongly about keeping them. They were certainly helpful in development when I did not yet have a singular definition of what a "zone" is. Currently a zone 0 < 1 < 1 is consistently represented by the integer set { [1] }. There is no strict necessity to fo this everywhere. E.g. it might be more intuitive to use the start of the range instead. These computations are currently agnostic to what a zone is (could be moved to ISLTools.cpp as well maybe?) and I would perceive it as a loss when they suddenly would.

In D24716#666155, @grosser wrote:

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

That is comment was outdated. In previous revisions Unused, Known and Unknown was stored in the same map (instead in one Unsued and one Occupied). Two of them had to be defined, the third was implicit, hence "required" to be implicit.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

It's just the opposite, it makes it clearer that Unused/Occupied can be nullptr and therefore easier to understand. Previous versions of this patch used a flag which one is implicit, hence also requiring that one of the is implicit. I invite you to look into these versions to see if whether they are easier to understand.

The sets can be nullptr only because we want avoid computing them, not because of something conceptional. If they are still given, they are ignored. This is fine for the unit tests which are small, but the complement operation blew up easily in my experiments with even small programs.

Michael

lib/Transform/DeLICM.cpp
435 ↗	(On Diff #86279)	What does it mean to be implicitly unused. The unused set has not been computed explicitly, but is assumed to be the complement of `Occupied` Can you somehow print this, e.g. as "universe - Occupied". I don't understand this suggestion. What happens if the first set does not contain elements from each space? Does not happen. In ZoneAlgorithm one of the two sets will always be implicit. In DeLICMTests `unionSpace()` is used. Assuming we never write to a given array? This would mean it does not appear in Occupied and is completely unused. Now, the universe we compute won't contain an element from this space, after the (implicit) subtraction it won't appear, right? As there is no write to the array, `greedyCollapse()` will not try to map anything to it.
621 ↗	(On Diff #86279)	True since commit r258809: Unique phi write accesses
638 ↗	(On Diff #86279)	You previously asked me to remove the non-"After Accesses" parts from the test cases. This currently cannot be tested effectively without more diagnostics. There would be the possibility to copy reduction_preheader.ll a few times and add variations. In most cases the variations would already cause it not to be mapped (longer lifetimes), not because the SCoP is incompatible (Test would succeed even if this check was removed) So I could add loads/store combinations to other array elements than the ones necessary for at least one mapping. But in the mid term I'd only want to mark the elements as incompatible, not the entire SCoP. That would then require to rewrite all tests. Originally I intended to more systematically test all combinations of loads and stores orders in one ScopStmt when more diagnostics is available. I'd add that separately sometime later.
647 ↗	(On Diff #86279)	memcpy, memmove, memset About testing, see comment above
653 ↗	(On Diff #86279)	About testing, see comment above
655 ↗	(On Diff #86279)	About testing, see comment above
1489 ↗	(On Diff #86279)	That wouldn't print a debug message.
1761 ↗	(On Diff #86279)	This is to avoid to exposing `Knowledge` only for testing `isConflicting`.
unittests/DeLICM/DeLICMTest.cpp
379 ↗	(On Diff #86279)	Why do you compute the universe here? To compute an argument that got only a `nullptr`. I was under the impression that Knowledges can also be created with implicit Unkown, Unused sets and should work well with them. In fact, there is a comment in isConflicting that claims that it _only_ works with implicit sets? See main comment. Would the tests still work if you do not create explicit representations of Unknown and Unused? Yes, but the explicit representation is ignored.

Rebase to r293890
Address Tobias' comments
Merge reachingDefinition and reachingOverwrite to reachingWrite
Revise unittests

In D24716#666384, @Meinersbur wrote:

Thanks for the review

In D24716#665836, @grosser wrote:

I also noted that you provide implementations for all corner cases always including or not including bounds. On the other side, most of the actual code is currently not using any of these (there is commonly only one configuration used). As this all looks very well tested, I don't want to ask you to drop all the special cases, especially as I have the impression that these may be useful for some of the later code or later discussions on how to model certain things best. Hence, I am fine leaving them in. However, if you have any cases in mind that are certainly not needed any more, don't hesitate to drop them.

I feel relative strongly about keeping them. They were certainly helpful in development when I did not yet have a singular definition of what a "zone" is. Currently a zone 0 < 1 < 1 is consistently represented by the integer set { [1] }. There is no strict necessity to fo this everywhere. E.g. it might be more intuitive to use the start of the range instead. These computations are currently agnostic to what a zone is (could be moved to ISLTools.cpp as well maybe?) and I would perceive it as a loss when they suddenly would.

Sure, that's why I said I am fine with these to be kept in the current generality, as long as they are well

In D24716#666155, @grosser wrote:

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

That is comment was outdated. In previous revisions Unused, Known and Unknown was stored in the same map (instead in one Unsued and one Occupied). Two of them had to be defined, the third was implicit, hence "required" to be implicit.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

It's just the opposite, it makes it clearer that Unused/Occupied can be nullptr and therefore easier to understand. Previous versions of this patch used a flag which one is implicit, hence also requiring that one of the is implicit. I invite you to look into these versions to see if whether they are easier to understand.

The sets can be nullptr only because we want avoid computing them, not because of something conceptional. If they are still given, they are ignored. This is fine for the unit tests which are small, but the complement operation blew up easily in my experiments with even small programs.

Michael

In D24716#666384, @Meinersbur wrote:

Thanks for the review

In D24716#665836, @grosser wrote:

I also noted that you provide implementations for all corner cases always including or not including bounds. On the other side, most of the actual code is currently not using any of these (there is commonly only one configuration used). As this all looks very well tested, I don't want to ask you to drop all the special cases, especially as I have the impression that these may be useful for some of the later code or later discussions on how to model certain things best. Hence, I am fine leaving them in. However, if you have any cases in mind that are certainly not needed any more, don't hesitate to drop them.

I feel relative strongly about keeping them. They were certainly helpful in development when I did not yet have a singular definition of what a "zone" is. Currently a zone 0 < 1 < 1 is consistently represented by the integer set { [1] }. There is no strict necessity to fo this everywhere. E.g. it might be more intuitive to use the start of the range instead. These computations are currently agnostic to what a zone is (could be moved to ISLTools.cpp as well maybe?) and I would perceive it as a loss when they suddenly would.

In D24716#666155, @grosser wrote:

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

That is comment was outdated. In previous revisions Unused, Known and Unknown was stored in the same map (instead in one Unsued and one Occupied). Two of them had to be defined, the third was implicit, hence "required" to be implicit.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

It's just the opposite, it makes it clearer that Unused/Occupied can be nullptr and therefore easier to understand. Previous versions of this patch used a flag which one is implicit, hence also requiring that one of the is implicit. I invite you to look into these versions to see if whether they are easier to understand.

The sets can be nullptr only because we want avoid computing them, not because of something conceptional. If they are still given, they are ignored. This is fine for the unit tests which are small, but the complement operation blew up easily in my experiments with even small programs.

Michael

In D24716#666384, @Meinersbur wrote:

Thanks for the review

In D24716#665836, @grosser wrote:

I also noted that you provide implementations for all corner cases always including or not including bounds. On the other side, most of the actual code is currently not using any of these (there is commonly only one configuration used). As this all looks very well tested, I don't want to ask you to drop all the special cases, especially as I have the impression that these may be useful for some of the later code or later discussions on how to model certain things best. Hence, I am fine leaving them in. However, if you have any cases in mind that are certainly not needed any more, don't hesitate to drop them.

I feel relative strongly about keeping them. They were certainly helpful in development when I did not yet have a singular definition of what a "zone" is. Currently a zone 0 < 1 < 1 is consistently represented by the integer set { [1] }. There is no strict necessity to fo this everywhere. E.g. it might be more intuitive to use the start of the range instead. These computations are currently agnostic to what a zone is (could be moved to ISLTools.cpp as well maybe?) and I would perceive it as a loss when they suddenly would.

In D24716#666155, @grosser wrote:

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

That is comment was outdated. In previous revisions Unused, Known and Unknown was stored in the same map (instead in one Unsued and one Occupied). Two of them had to be defined, the third was implicit, hence "required" to be implicit.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

It's just the opposite, it makes it clearer that Unused/Occupied can be nullptr and therefore easier to understand. Previous versions of this patch used a flag which one is implicit, hence also requiring that one of the is implicit. I invite you to look into these versions to see if whether they are%2

lib/Transform/DeLICM.cpp
1669 ↗	(On Diff #87032)	Nice, this nicely removes the code duplication.
unittests/DeLICM/DeLICMTest.cpp
210 ↗	(On Diff #87032)	reduction_embedded.ll

Hi Michael,

please ignore the previous email. It got sent incomplete.

First, I like the changes to computeReachingWrite. The code duplication is nicely removed. Very cool!

I feel relative strongly about keeping them. They were certainly helpful in development when I did not yet have a singular definition of what a "zone" is. Currently a zone 0 < 1 < 1 is consistently represented by the integer set { [1] }. There is no strict necessity to fo this everywhere. E.g. it might be more intuitive to use the start of the range instead. These computations are currently agnostic to what a zone is (could be moved to ISLTools.cpp as well maybe?) and I would perceive it as a loss when they suddenly would.

I also agree with you that keeping the additional cases for now makes sense -- especially as we have good test coverage. In fact, I like the idea of adding this functionality to ISLTools.cpp. It clearly is self-contained, well documented, and well tested. Please feel free to push these out already.

In D24716#666155, @grosser wrote:

I am not sure if Knowledges must be implicit or if also both sets can be explicit. In isConflicting you state "current implementation requires it to be implicit", but when testing you seem to compute two explicit sets using completeLifetime. This seems inconsistent.

That is comment was outdated. In previous revisions Unused, Known and Unknown was stored in the same map (instead in one Unsued and one Occupied). Two of them had to be defined, the third was implicit, hence "required" to be implicit.

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Could such an example be added to the tests of isConflicting that illustrates why such a merge would result in an incorrect / unprecise answer.

Assuming both implicit and explicit representation are supported. I wonder if one of them would be enough? Or is there a reason to keep both? I have the feeling that having both currently makes the code (unnecessarily) hard to read?

It's just the opposite, it makes it clearer that Unused/Occupied can be nullptr and therefore easier to understand. Previous versions of this patch used a flag which one is implicit, hence also requiring that one of the is implicit. I invite you to look into these versions to see if whether they are%2

I did not mean that we should merge Unused/Occupied. I agree that having both makes a lot of sense. What I was suggesting above was if we should prohibit that both can be set simultaniously. The reason for this is that we now need to always support an explicit and an implicit representation of these sets. As such I always need to convince myself that both code paths are correct (or that ignoring one is correct). In the actual code that we run, we always seem to use implicit sets. So I wonder why we do not just limit our implementation to implicit sets and also test it with implicit sets.

In D24716#666573, @grosser wrote:

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Could such an example be added to the tests of isConflicting that illustrates why such a merge would result in an incorrect / unprecise answer.

Could you suggest an EXPECT(...) line? I have no clue what kind of example you have in mind.

Rebase to r294094
Use implicit sets in unittests

Hi Michael,

here a first update. I would like to think a little bit more about this to get some illustrative examples (am busy on CC at the moment), but here already some comments on what I am thinking. (No need to change anything yet. I currently try to complete my understanding of this code).

In D24716#666792, @Meinersbur wrote:

In D24716#666573, @grosser wrote:

Do you have an example that makes clear why you need zones, and cannot just track the values / life ranges? That seems like an interesting test for isConflicting.

See file comment in DeLICM.cpp

Could such an example be added to the tests of isConflicting that illustrates why such a merge would result in an incorrect / unprecise answer.

Could you suggest an EXPECT(...) line? I have no clue what kind of example you have in mind.

I am not sure how such test case can be constructed. You use "zones" to model occupied and undef sets in a knowledge. In isConflicting you always convert the zones to a set of timepoints (when comparing to writes). To me it looks as if we could as well store the zones as set of timepoints. I am looking for an argument that explains why we need to store occupied and undef indeed as zones.

lib/Transform/DeLICM.cpp
411 ↗	(On Diff #87095)	In your documentation and the code, you use "Unused" and "Undef" for the same thing. The set here is e.g., called Unused, but in the class documentation and the tests you talk about "Undef". Would it not be better to just use one of the two?
442 ↗	(On Diff #87095)	I wonder if it makes sense to document the implicit Occupied and Unused sets at this position and to explain precisely that the complement here means the complement for all spaces that are mentioned in the explicit of the two sets. In terms of interface, I could use see us enforcing and documenting that one of Occupied and Unused must be nullptr.
477 ↗	(On Diff #87095)	This "if" is not needed. It is always implied by the asserts.
509 ↗	(On Diff #87095)	I tried to replace these lines by: + assert(Existing.Unused && !Existing.Occupied); + assert(!Proposed.Unused && Proposed.Occupied); This seems to work. Would this be correct and could we make Knowledge consistently limited to this pattern? Would this make sense at all? Are there cases where isConflicting would return wrong results for certain values of Existing.Occupied or Proposed.Unused (which are not useful).
435 ↗	(On Diff #86279)	What does it mean to be implicitly unused. The unused set has not been computed explicitly, but is assumed to be the complement of Occupied OK. The comments now make this very clear. Thank you! Can you somehow print this, e.g. as "universe - Occupied". I don't understand this suggestion. In "print(llvm::raw_ostream &OS, unsigned Indent = 0)" you just print "<implicit>" in case the corresponding set is a nullptr. Without reading the source code comments, I do not know what <implicit> means. Hence, this is difficult to understand. I wonder if we could print this in a way that one can easily understand what this means. E.g. by printing a string that shows how "implicit" is computed. What happens if the first set does not contain elements from each space? Does not happen. In ZoneAlgorithm one of the two sets will always be implicit. In DeLICMTests unionSpace() is used. Assuming we never write to a given array? This would mean it does not appear in Occupied and is completely unused. Now, the universe we compute won't contain an element from this space, after the (implicit) subtraction it won't appear, right? As there is no write to the array, greedyCollapse() will not try to map anything to it. I wonder if there is an implicit assumption hidden. I need to think a little bit more about this to see if I can construct a test case that would validate this assumption and consequently result in incorrect answers being given by isConflicting. Overall, the correctness and the behavior of Knowledge should not depend on how Knowledge is used, but knowledge should have defined behavior for any input. (In some way, we should make very clear which spaces a complement is actually formed. And how isConflicting works in case the sets are inconsistent -- if this can happen at all).

Hi Michael,

thanks for explaining me this patch again on the phone in great detail. I think I now got the information I was lacking and added a couple of comments which I think might help others to get this information as well.

Would be great if you could cross-check this information and integrate it where appropriate. After this, we should be able to commit the Knowledge part and move on to the next pieces.

Best,
Tobias

lib/Transform/DeLICM.cpp
404 ↗	(On Diff #87095)	Could we possibly explain why zones are used to represent lifetimes? Something like: The set of alive array elements is represented as zone, as the set of live values can differ depending on how the elements are interpreted. Assuming a value X is written at timestep [0] and read at timestep [1] without being used at any later point, then the value is alive in the interval ]0,1[. This interval cannot be represented by an integer set, as it does not contain any integer point. Zones allow us to represent this interval and can be converted to sets of timepoints when needed (e.g., in isConflicting when comparing to the write sets). @see convertZoneToTimepoints for more details. This might be a little verbose, but might help the reader.
460 ↗	(On Diff #87095)	What about "everything not in 'Occupied'"? What about "everything not in 'Unused'"?
503 ↗	(On Diff #87095)	Can you add a comment that ensures that the universe of both Existing and Proposed need to be identical. Also, if both are fully defined, can we add asserts to validate this.
526 ↗	(On Diff #87095)	Maybe an additional comment: " We convert here the set of lifetimes to actual timesteps. A lifetime is in conflict with a set of write timepoints, if either a lite timepoint is clearly within the lifetime or if a write happens at the beginning of the lifetime (where it would conflict with the value that actually writes the value alive). There is no conflict at the end of a lifetime, as the alive value will always be read, before it is overwritten again. The last property holds in Polly for all scalar values and we expect all users of Knowledge to check this property also for accesses to MemoryKind::Array. "
unittests/DeLICM/DeLICMTest.cpp
108 ↗	(On Diff #87095)	Maybe test four cases?

Refine documentation about zones
Testing and assertions for Knowledges with Occupied and Unused defined
Rebase to r294894
Address Tobias' other comments

Hi Michael,

thanks for the very fast update. This was exactly what I was looking for. I have a couple of minor typos and final questions on the Knowledge and Zone description. Otherwise, I think the knowledge stuff is good to go.

(I believe this was the most difficult part of the patch. The remaining part is still a lot larger, but I believe most of it is well explained and also not technically too complicated).

lib/Transform/DeLICM.cpp
47 ↗	(On Diff #88302)	AT the at THE end
50 ↗	(On Diff #88302)	e.g., a LOAD
51 ↗	(On Diff #88302)	WE exclude ... starting the zone FROM THE LIVE-RANGE.
57 ↗	(On Diff #88302)	It is unclear to me why a write may overwrite a variable's value. Is it because it is a may write, due to the order of the writes in the statement, or because it may write an identical value (and consequently the write has no effect)? Also, I am not sure why you mention "undefined behavior" here. In the context of C this term has a very strong meaning. Is this really what you mean here? Can you explain what exactly is undefined and why?
60 ↗	(On Diff #88302)	Hence, WE include
62 ↗	(On Diff #88302)	What is a contradiction of a live-range. You can contradict statements that state something, but what does contradicting a live-range mean?
64 ↗	(On Diff #88302)	starts
65 ↗	(On Diff #88302)	IN the live-range
68 ↗	(On Diff #88302)	e.g.,
69 ↗	(On Diff #88302)	e.g.,
519 ↗	(On Diff #88302)	I don't get the grammar of the sentence in the assert. Could you look at it again?

Clarify considerations in comment about zones.

Hi Michael,

thank you for the quick update. The knowledge stuff is now really well polished. Thanks! Feel free to commit it.

I will try to have a look at the remaining parts later tonight!

lib/Transform/DeLICM.cpp
50 ↗	(On Diff #88365)	IN the following

Rebase to r295204

Hi Michael,

this patch looks good to me. Just two minor typos. Also, I would really appreciate if you could add some negative tests.

Best,
Tobias

lib/Transform/DeLICM.cpp
1364 ↗	(On Diff #88563)	NOT yet processed
1537 ↗	(On Diff #88563)	processed ELEMENT

Also mark officially as accepted.

This revision is now accepted and ready to land.Feb 16 2017, 8:17 AM

Closed by commit rL295713: [DeLICM] Map values hoisted by LICM back to the array. (authored by Meinersbur). · Explain WhyFeb 21 2017, 2:32 AM

This revision was automatically updated to reflect the committed changes.

Meinersbur marked an inline comment as done.

Revision Contents

Path

Size

polly/

trunk/

lib/

Transform/

DeLICM.cpp

1254 lines

test/

DeLICM/

reduction_preheader.ll

130 lines

Diff 89179

polly/trunk/lib/Transform/DeLICM.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
// * Zone[] - Range between timepoints as described above		// * Zone[] - Range between timepoints as described above
// Has no tuple id		// Has no tuple id
//		//
// An annotation "{ Domain[] -> Scatter[] }" therefore means: A map from a		// An annotation "{ Domain[] -> Scatter[] }" therefore means: A map from a
// statement instance to a timepoint, aka a schedule. There is only one scatter		// statement instance to a timepoint, aka a schedule. There is only one scatter
// space, but most of the time multiple statements are processed in one set.		// space, but most of the time multiple statements are processed in one set.
// This is why most of the time isl_union_map has to be used.		// This is why most of the time isl_union_map has to be used.
//		//
		// The basic algorithm works as follows:
		// At first we verify that the SCoP is compatible with this technique. For
		// instance, two writes cannot write to the same location at the same statement
		// instance because we cannot determine within the polyhedral model which one
		// comes first. Once this was verified, we compute zones at which an array
		// element is unused. This computation can fail if it takes too long. Then the
		// main algorithm is executed. Because every store potentially trails an unused
		// zone, we start at stores. We search for a scalar (MemoryKind::Value or
		// MemoryKind::PHI) that we can map to the array element overwritten by the
		// store, preferably one that is used by the store or at least the ScopStmt.
		// When it does not conflict with the lifetime of the values in the array
		// element, the map is applied and the unused zone updated as it is now used. We
		// continue to try to map scalars to the array element until there are no more
		// candidates to map. The algorithm is greedy in the sense that the first scalar
		// not conflicting will be mapped. Other scalars processed later that could have
		// fit the same unused zone will be rejected. As such the result depends on the
		// processing order.
		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "polly/DeLICM.h"		#include "polly/DeLICM.h"
		#include "polly/Options.h"
#include "polly/ScopInfo.h"		#include "polly/ScopInfo.h"
#include "polly/ScopPass.h"		#include "polly/ScopPass.h"
#include "polly/Support/ISLTools.h"		#include "polly/Support/ISLTools.h"
		#include "llvm/ADT/Statistic.h"
#define DEBUG_TYPE "polly-delicm"		#define DEBUG_TYPE "polly-delicm"

using namespace polly;		using namespace polly;
using namespace llvm;		using namespace llvm;

namespace {		namespace {

		cl::opt<unsigned long>
		DelicmMaxOps("polly-delicm-max-ops",
		cl::desc("Maximum number of isl operations to invest for "
		"lifetime analysis; 0=no limit"),
		cl::init(1000000), cl::cat(PollyCategory));

		STATISTIC(DeLICMAnalyzed, "Number of successfully analyzed SCoPs");
		STATISTIC(DeLICMOutOfQuota,
		"Analyses aborted because max_operations was reached");
		STATISTIC(DeLICMIncompatible, "Number of SCoPs incompatible for analysis");
		STATISTIC(MappedValueScalars, "Number of mapped Value scalars");
		STATISTIC(MappedPHIScalars, "Number of mapped PHI scalars");
		STATISTIC(TargetsMapped, "Number of stores used for at least one mapping");
		STATISTIC(DeLICMScopsModified, "Number of SCoPs optimized");

		/// Class for keeping track of scalar def-use chains in the polyhedral
		/// representation.
		///
		/// MemoryKind::Value:
		/// There is one definition per llvm::Value or zero (read-only values defined
		/// before the SCoP) and an arbitrary number of reads.
		///
		/// MemoryKind::PHI, MemoryKind::ExitPHI:
		/// There is at least one write (the incoming blocks/stmts) and one
		/// (MemoryKind::PHI) or zero (MemoryKind::ExitPHI) reads per llvm::PHINode.
		class ScalarDefUseChains {
		private:
		/// The definitions (i.e. write MemoryAccess) of a MemoryKind::Value scalar.
		DenseMap<const ScopArrayInfo , MemoryAccess > ValueDefAccs;

		/// List of all uses (i.e. read MemoryAccesses) for a MemoryKind::Value
		/// scalar.
		DenseMap<const ScopArrayInfo , SmallVector<MemoryAccess , 4>> ValueUseAccs;

		/// The receiving part (i.e. read MemoryAccess) of a MemoryKind::PHI scalar.
		DenseMap<const ScopArrayInfo , MemoryAccess > PHIReadAccs;

		/// List of all incoming values (write MemoryAccess) of a MemoryKind::PHI or
		/// MemoryKind::ExitPHI scalar.
		DenseMap<const ScopArrayInfo , SmallVector<MemoryAccess , 4>>
		PHIIncomingAccs;

		public:
		/// Find the MemoryAccesses that access the ScopArrayInfo-represented memory.
		///
		/// @param S The SCoP to analyze.
		void compute(Scop *S) {
		// Purge any previous result.
		reset();

		for (auto &Stmt : *S) {
		for (auto *MA : Stmt) {
		if (MA->isOriginalValueKind() && MA->isWrite()) {
		auto *SAI = MA->getScopArrayInfo();
		assert(!ValueDefAccs.count(SAI) &&
		"There can be at most one "
		"definition per MemoryKind::Value scalar");
		ValueDefAccs[SAI] = MA;
		}

		if (MA->isOriginalValueKind() && MA->isRead())
		ValueUseAccs[MA->getScopArrayInfo()].push_back(MA);

		if (MA->isOriginalAnyPHIKind() && MA->isRead()) {
		auto *SAI = MA->getScopArrayInfo();
		assert(!PHIReadAccs.count(SAI) &&
		"There must be exactly one read "
		"per PHI (that's where the PHINode is)");
		PHIReadAccs[SAI] = MA;
		}

		if (MA->isOriginalAnyPHIKind() && MA->isWrite())
		PHIIncomingAccs[MA->getScopArrayInfo()].push_back(MA);
		}
		}
		}

		/// Free all memory used by the analysis.
		void reset() {
		ValueDefAccs.clear();
		ValueUseAccs.clear();
		PHIReadAccs.clear();
		PHIIncomingAccs.clear();
		}

		MemoryAccess getValueDef(const ScopArrayInfo SAI) const {
		return ValueDefAccs.lookup(SAI);
		}

		ArrayRef<MemoryAccess > getValueUses(const ScopArrayInfo SAI) const {
		auto It = ValueUseAccs.find(SAI);
		if (It == ValueUseAccs.end())
		return {};
		return It->second;
		}

		MemoryAccess getPHIRead(const ScopArrayInfo SAI) const {
		return PHIReadAccs.lookup(SAI);
		}

		ArrayRef<MemoryAccess > getPHIIncomings(const ScopArrayInfo SAI) const {
		auto It = PHIIncomingAccs.find(SAI);
		if (It == PHIIncomingAccs.end())
		return {};
		return It->second;
		}
		};

		IslPtr<isl_union_map> computeReachingDefinition(IslPtr<isl_union_map> Schedule,
		IslPtr<isl_union_map> Writes,
		bool InclDef, bool InclRedef) {
		return computeReachingWrite(Schedule, Writes, false, InclDef, InclRedef);
		}

		IslPtr<isl_union_map> computeReachingOverwrite(IslPtr<isl_union_map> Schedule,
		IslPtr<isl_union_map> Writes,
		bool InclPrevWrite,
		bool InclOverwrite) {
		return computeReachingWrite(Schedule, Writes, true, InclPrevWrite,
		InclOverwrite);
		}

		/// Compute the next overwrite for a scalar.
		///
		/// @param Schedule { DomainWrite[] -> Scatter[] }
		/// Schedule of (at least) all writes. Instances not in @p
		/// Writes are ignored.
		/// @param Writes { DomainWrite[] }
		/// The element instances that write to the scalar.
		/// @param InclPrevWrite Whether to extend the timepoints to include
		/// the timepoint where the previous write happens.
		/// @param InclOverwrite Whether the reaching overwrite includes the timepoint
		/// of the overwrite itself.
		///
		/// @return { Scatter[] -> DomainDef[] }
		IslPtr<isl_union_map>
		computeScalarReachingOverwrite(IslPtr<isl_union_map> Schedule,
		IslPtr<isl_union_set> Writes, bool InclPrevWrite,
		bool InclOverwrite) {

		// { DomainWrite[] }
		auto WritesMap = give(isl_union_map_from_domain(Writes.take()));

		// { [Element[] -> Scatter[]] -> DomainWrite[] }
		auto Result = computeReachingOverwrite(
		std::move(Schedule), std::move(WritesMap), InclPrevWrite, InclOverwrite);

		return give(isl_union_map_domain_factor_range(Result.take()));
		}

		/// Overload of computeScalarReachingOverwrite, with only one writing statement.
		/// Consequently, the result consists of only one map space.
		///
		/// @param Schedule { DomainWrite[] -> Scatter[] }
		/// @param Writes { DomainWrite[] }
		/// @param InclPrevWrite Include the previous write to result.
		/// @param InclOverwrite Include the overwrite to the result.
		///
		/// @return { Scatter[] -> DomainWrite[] }
		IslPtr<isl_map> computeScalarReachingOverwrite(IslPtr<isl_union_map> Schedule,
		IslPtr<isl_set> Writes,
		bool InclPrevWrite,
		bool InclOverwrite) {
		auto ScatterSpace = getScatterSpace(Schedule);
		auto DomSpace = give(isl_set_get_space(Writes.keep()));

		auto ReachOverwrite = computeScalarReachingOverwrite(
		Schedule, give(isl_union_set_from_set(Writes.take())), InclPrevWrite,
		InclOverwrite);

		auto ResultSpace = give(isl_space_map_from_domain_and_range(
		ScatterSpace.take(), DomSpace.take()));
		return singleton(std::move(ReachOverwrite), ResultSpace);
		}

		/// Compute the reaching definition of a scalar.
		///
		/// Compared to computeReachingDefinition, there is just one element which is
		/// accessed and therefore only a set if instances that accesses that element is
		/// required.
		///
		/// @param Schedule { DomainWrite[] -> Scatter[] }
		/// @param Writes { DomainWrite[] }
		/// @param InclDef Include the timepoint of the definition to the result.
		/// @param InclRedef Include the timepoint of the overwrite into the result.
		///
		/// @return { Scatter[] -> DomainWrite[] }
		IslPtr<isl_union_map>
		computeScalarReachingDefinition(IslPtr<isl_union_map> Schedule,
		IslPtr<isl_union_set> Writes, bool InclDef,
		bool InclRedef) {

		// { DomainWrite[] -> Element[] }
		auto Defs = give(isl_union_map_from_domain(Writes.take()));

		// { [Element[] -> Scatter[]] -> DomainWrite[] }
		auto ReachDefs =
		computeReachingDefinition(Schedule, Defs, InclDef, InclRedef);

		// { Scatter[] -> DomainWrite[] }
		return give(isl_union_set_unwrap(
		isl_union_map_range(isl_union_map_curry(ReachDefs.take()))));
		}

		/// Compute the reaching definition of a scalar.
		///
		/// This overload accepts only a single writing statement as an isl_map,
		/// consequently the result also is only a single isl_map.
		///
		/// @param Schedule { DomainWrite[] -> Scatter[] }
		/// @param Writes { DomainWrite[] }
		/// @param InclDef Include the timepoint of the definition to the result.
		/// @param InclRedef Include the timepoint of the overwrite into the result.
		///
		/// @return { Scatter[] -> DomainWrite[] }
		IslPtr<isl_map> computeScalarReachingDefinition( // { Domain[] -> Zone[] }
		IslPtr<isl_union_map> Schedule, IslPtr<isl_set> Writes, bool InclDef,
		bool InclRedef) {
		auto DomainSpace = give(isl_set_get_space(Writes.keep()));
		auto ScatterSpace = getScatterSpace(Schedule);

		// { Scatter[] -> DomainWrite[] }
		auto UMap = computeScalarReachingDefinition(
		Schedule, give(isl_union_set_from_set(Writes.take())), InclDef,
		InclRedef);

		auto ResultSpace = give(isl_space_map_from_domain_and_range(
		ScatterSpace.take(), DomainSpace.take()));
		return singleton(UMap, ResultSpace);
		}

		/// If InputVal is not defined in the stmt itself, return the MemoryAccess that
		/// reads the scalar. Return nullptr otherwise (if the value is defined in the
		/// scop, or is synthesizable).
		MemoryAccess getInputAccessOf(Value InputVal, ScopStmt *Stmt) {
		for (auto MA : Stmt) {
		if (!MA->isRead())
		continue;
		if (!MA->isLatestScalarKind())
		continue;

		assert(MA->getAccessValue() == MA->getBaseAddr());
		if (MA->getAccessValue() == InputVal)
		return MA;
		}
		return nullptr;
		}

/// Represent the knowledge of the contents of any array elements in any zone or		/// Represent the knowledge of the contents of any array elements in any zone or
/// the knowledge we would add when mapping a scalar to an array element.		/// the knowledge we would add when mapping a scalar to an array element.
///		///
/// Every array element at every zone unit has one of two states:		/// Every array element at every zone unit has one of two states:
///		///
/// - Unused: Not occupied by any value so a transformation can change it to		/// - Unused: Not occupied by any value so a transformation can change it to
/// other values.		/// other values.
///		///
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	if (isl_union_set_is_disjoint(Existing.Written.keep(),
}		}
return true;		return true;
}		}

return false;		return false;
}		}
};		};

		/// Base class for algorithms based on zones, like DeLICM.
		class ZoneAlgorithm {
		protected:
		/// Hold a reference to the isl_ctx to avoid it being freed before we released
		/// all of the isl objects.
		///
		/// This must be declared before any other member that holds an isl object.
		/// This guarantees that the shared_ptr and its isl_ctx is destructed last,
		/// after all other members free'd the isl objects they were holding.
		std::shared_ptr<isl_ctx> IslCtx;

		/// Cached reaching definitions for each ScopStmt.
		///
		/// Use getScalarReachingDefinition() to get its contents.
		DenseMap<ScopStmt *, IslPtr<isl_map>> ScalarReachDefZone;

		/// The analyzed Scop.
		Scop *S;

		/// Parameter space that does not need realignment.
		IslPtr<isl_space> ParamSpace;

		/// Space the schedule maps to.
		IslPtr<isl_space> ScatterSpace;

		/// Cached version of the schedule and domains.
		IslPtr<isl_union_map> Schedule;

		/// Set of all referenced elements.
		/// { Element[] -> Element[] }
		IslPtr<isl_union_set> AllElements;

		/// Combined access relations of all MemoryKind::Array READ accesses.
		/// { DomainRead[] -> Element[] }
		IslPtr<isl_union_map> AllReads;

		/// Combined access relations of all MemoryKind::Array, MAY_WRITE accesses.
		/// { DomainMayWrite[] -> Element[] }
		IslPtr<isl_union_map> AllMayWrites;

		/// Combined access relations of all MemoryKind::Array, MUST_WRITE accesses.
		/// { DomainMustWrite[] -> Element[] }
		IslPtr<isl_union_map> AllMustWrites;

		/// Prepare the object before computing the zones of @p S.
		ZoneAlgorithm(Scop *S)
		: IslCtx(S->getSharedIslCtx()), S(S), Schedule(give(S->getSchedule())) {

		auto Domains = give(S->getDomains());

		Schedule =
		give(isl_union_map_intersect_domain(Schedule.take(), Domains.take()));
		ParamSpace = give(isl_union_map_get_space(Schedule.keep()));
		ScatterSpace = getScatterSpace(Schedule);
		}

		private:
		/// Check whether @p Stmt can be accurately analyzed by zones.
		///
		/// What violates our assumptions:
		/// - A load after a write of the same location; we assume that all reads
		/// occur before the writes.
		/// - Two writes to the same location; we cannot model the order in which
		/// these occur.
		///
		/// Scalar reads implicitly always occur before other accesses therefore never
		/// violate the first condition. There is also at most one write to a scalar,
		/// satisfying the second condition.
		bool isCompatibleStmt(ScopStmt *Stmt) {
		auto Stores = makeEmptyUnionMap();
		auto Loads = makeEmptyUnionMap();

		// This assumes that the MemoryKind::Array MemoryAccesses are iterated in
		// order.
		for (auto MA : Stmt) {
		if (!MA->isLatestArrayKind())
		continue;

		auto AccRel =
		give(isl_union_map_from_map(getAccessRelationFor(MA).take()));

		if (MA->isRead()) {
		// Reject store after load to same location.
		if (!isl_union_map_is_disjoint(Stores.keep(), AccRel.keep()))
		return false;

		Loads = give(isl_union_map_union(Loads.take(), AccRel.take()));

		continue;
		}

		if (!isa<StoreInst>(MA->getAccessInstruction())) {
		DEBUG(dbgs() << "WRITE that is not a StoreInst not supported\n");
		return false;
		}

		// In region statements the order is less clear, eg. the load and store
		// might be in a boxed loop.
		if (Stmt->isRegionStmt() &&
		!isl_union_map_is_disjoint(Loads.keep(), AccRel.keep()))
		return false;

		// Do not allow more than one store to the same location.
		if (!isl_union_map_is_disjoint(Stores.keep(), AccRel.keep()))
		return false;

		Stores = give(isl_union_map_union(Stores.take(), AccRel.take()));
		}

		return true;
		}

		void addArrayReadAccess(MemoryAccess *MA) {
		assert(MA->isLatestArrayKind());
		assert(MA->isRead());

		// { DomainRead[] -> Element[] }
		auto AccRel = getAccessRelationFor(MA);
		AllReads = give(isl_union_map_add_map(AllReads.take(), AccRel.copy()));
		}

		void addArrayWriteAccess(MemoryAccess *MA) {
		assert(MA->isLatestArrayKind());
		assert(MA->isWrite());

		// { Domain[] -> Element[] }
		auto AccRel = getAccessRelationFor(MA);

		if (MA->isMustWrite())
		AllMustWrites =
		give(isl_union_map_add_map(AllMustWrites.take(), AccRel.copy()));

		if (MA->isMayWrite())
		AllMayWrites =
		give(isl_union_map_add_map(AllMayWrites.take(), AccRel.copy()));
		}

		protected:
		IslPtr<isl_union_set> makeEmptyUnionSet() {
		return give(isl_union_set_empty(ParamSpace.copy()));
		}

		IslPtr<isl_union_map> makeEmptyUnionMap() {
		return give(isl_union_map_empty(ParamSpace.copy()));
		}

		/// Check whether @p S can be accurately analyzed by zones.
		bool isCompatibleScop() {
		for (auto &Stmt : *S) {
		if (!isCompatibleStmt(&Stmt))
		return false;
		}
		return true;
		}

		/// Get the schedule for @p Stmt.
		///
		/// The domain of the result is as narrow as possible.
		IslPtr<isl_map> getScatterFor(ScopStmt *Stmt) const {
		auto ResultSpace = give(isl_space_map_from_domain_and_range(
		Stmt->getDomainSpace(), ScatterSpace.copy()));
		return give(isl_union_map_extract_map(Schedule.keep(), ResultSpace.take()));
		}

		/// Get the schedule of @p MA's parent statement.
		IslPtr<isl_map> getScatterFor(MemoryAccess *MA) const {
		return getScatterFor(MA->getStatement());
		}

		/// Get the schedule for the statement instances of @p Domain.
		IslPtr<isl_union_map> getScatterFor(IslPtr<isl_union_set> Domain) const {
		return give(isl_union_map_intersect_domain(Schedule.copy(), Domain.take()));
		}

		/// Get the schedule for the statement instances of @p Domain.
		IslPtr<isl_map> getScatterFor(IslPtr<isl_set> Domain) const {
		auto ResultSpace = give(isl_space_map_from_domain_and_range(
		isl_set_get_space(Domain.keep()), ScatterSpace.copy()));
		auto UDomain = give(isl_union_set_from_set(Domain.copy()));
		auto UResult = getScatterFor(std::move(UDomain));
		auto Result = singleton(std::move(UResult), std::move(ResultSpace));
		assert(isl_set_is_equal(give(isl_map_domain(Result.copy())).keep(),
		Domain.keep()) == isl_bool_true);
		return Result;
		}

		/// Get the domain of @p Stmt.
		IslPtr<isl_set> getDomainFor(ScopStmt *Stmt) const {
		return give(Stmt->getDomain());
		}

		/// Get the domain @p MA's parent statement.
		IslPtr<isl_set> getDomainFor(MemoryAccess *MA) const {
		return getDomainFor(MA->getStatement());
		}

		/// Get the access relation of @p MA.
		///
		/// The domain of the result is as narrow as possible.
		IslPtr<isl_map> getAccessRelationFor(MemoryAccess *MA) const {
		auto Domain = getDomainFor(MA);
		auto AccRel = give(MA->getLatestAccessRelation());
		return give(isl_map_intersect_domain(AccRel.take(), Domain.take()));
		}

		/// Get the reaching definition of a scalar defined in @p Stmt.
		///
		/// Note that this does not depend on the llvm::Instruction, only on the
		/// statement it is defined in. Therefore the same computation can be reused.
		///
		/// @param Stmt The statement in which a scalar is defined.
		///
		/// @return { Scatter[] -> DomainDef[] }
		IslPtr<isl_map> getScalarReachingDefinition(ScopStmt *Stmt) {
		auto &Result = ScalarReachDefZone[Stmt];
		if (Result)
		return Result;

		auto Domain = getDomainFor(Stmt);
		Result = computeScalarReachingDefinition(Schedule, Domain, false, true);
		simplify(Result);

		assert(Result);
		return Result;
		}

		/// Compute the different zones.
		void computeCommon() {
		AllReads = makeEmptyUnionMap();
		AllMayWrites = makeEmptyUnionMap();
		AllMustWrites = makeEmptyUnionMap();

		for (auto &Stmt : *S) {
		for (auto *MA : Stmt) {
		if (!MA->isLatestArrayKind())
		continue;

		if (MA->isRead())
		addArrayReadAccess(MA);

		if (MA->isWrite())
		addArrayWriteAccess(MA);
		}
		}

		// { DomainWrite[] -> Element[] }
		auto AllWrites =
		give(isl_union_map_union(AllMustWrites.copy(), AllMayWrites.copy()));

		// { Element[] }
		AllElements = makeEmptyUnionSet();
		foreachElt(AllWrites, [this](IslPtr<isl_map> Write) {
		auto Space = give(isl_map_get_space(Write.keep()));
		auto EltSpace = give(isl_space_range(Space.take()));
		auto EltUniv = give(isl_set_universe(EltSpace.take()));
		AllElements =
		give(isl_union_set_add_set(AllElements.take(), EltUniv.take()));
		});
		}

		/// Print the current state of all MemoryAccesses to @p.
		void printAccesses(llvm::raw_ostream &OS, int Indent = 0) const {
		OS.indent(Indent) << "After accesses {\n";
		for (auto &Stmt : *S) {
		OS.indent(Indent + 4) << Stmt.getBaseName() << "\n";
		for (auto *MA : Stmt)
		MA->print(OS);
		}
		OS.indent(Indent) << "}\n";
		}

		public:
		/// Return the SCoP this object is analyzing.
		Scop *getScop() const { return S; }
		};

		/// Implementation of the DeLICM/DePRE transformation.
		class DeLICMImpl : public ZoneAlgorithm {
		private:
		/// Knowledge before any transformation took place.
		Knowledge OriginalZone;

		/// Current knowledge of the SCoP including all already applied
		/// transformations.
		Knowledge Zone;

		ScalarDefUseChains DefUse;

		/// Determine whether two knowledges are conflicting with each other.
		///
		/// @see Knowledge::isConflicting
		bool isConflicting(const Knowledge &Proposed) {
		raw_ostream *OS = nullptr;
		DEBUG(OS = &llvm::dbgs());
		return Knowledge::isConflicting(Zone, Proposed, OS, 4);
		}

		/// Determine whether @p SAI is a scalar that can be mapped to an array
		/// element.
		bool isMappable(const ScopArrayInfo *SAI) {
		assert(SAI);

		if (SAI->isValueKind()) {
		auto *MA = DefUse.getValueDef(SAI);
		if (!MA) {
		DEBUG(dbgs()
		<< " Reject because value is read-only within the scop\n");
		return false;
		}

		// Mapping if value is used after scop is not supported. The code
		// generator would need to reload the scalar after the scop, but it
		// does not have the information to where it is mapped to. Only the
		// MemoryAccesses have that information, not the ScopArrayInfo.
		auto Inst = MA->getAccessInstruction();
		for (auto User : Inst->users()) {
		if (!isa<Instruction>(User))
		return false;
		auto UserInst = cast<Instruction>(User);

		if (!S->contains(UserInst)) {
		DEBUG(dbgs() << " Reject because value is escaping\n");
		return false;
		}
		}

		return true;
		}

		if (SAI->isPHIKind()) {
		auto *MA = DefUse.getPHIRead(SAI);
		assert(MA);

		// Mapping of an incoming block from before the SCoP is not supported by
		// the code generator.
		auto PHI = cast<PHINode>(MA->getAccessInstruction());
		for (auto Incoming : PHI->blocks()) {
		if (!S->contains(Incoming)) {
		DEBUG(dbgs() << " Reject because at least one incoming block is "
		"not in the scop region\n");
		return false;
		}
		}

		return true;
		}

		DEBUG(dbgs() << " Reject ExitPHI or other non-value\n");
		return false;
		}

		/// Compute the uses of a MemoryKind::Value and its lifetime (from its
		/// definition to the last use).
		///
		/// @param SAI The ScopArrayInfo representing the value's storage.
		///
		/// @return { DomainDef[] -> DomainUse[] }, { DomainDef[] -> Zone[] }
		/// First element is the set of uses for each definition.
		/// The second is the lifetime of each definition.
		std::tuple<IslPtr<isl_union_map>, IslPtr<isl_map>>
		computeValueUses(const ScopArrayInfo *SAI) {
		assert(SAI->isValueKind());

		// { DomainRead[] }
		auto Reads = makeEmptyUnionSet();

		// Find all uses.
		for (auto *MA : DefUse.getValueUses(SAI))
		Reads =
		give(isl_union_set_add_set(Reads.take(), getDomainFor(MA).take()));

		// { DomainRead[] -> Scatter[] }
		auto ReadSchedule = getScatterFor(Reads);

		auto *DefMA = DefUse.getValueDef(SAI);
		assert(DefMA);

		// { DomainDef[] }
		auto Writes = getDomainFor(DefMA);

		// { DomainDef[] -> Scatter[] }
		auto WriteScatter = getScatterFor(Writes);

		// { Scatter[] -> DomainDef[] }
		auto ReachDef = getScalarReachingDefinition(DefMA->getStatement());

		// { [DomainDef[] -> Scatter[]] -> DomainUse[] }
		auto Uses = give(
		isl_union_map_apply_range(isl_union_map_from_map(isl_map_range_map(
		isl_map_reverse(ReachDef.take()))),
		isl_union_map_reverse(ReadSchedule.take())));

		// { DomainDef[] -> Scatter[] }
		auto UseScatter =
		singleton(give(isl_union_set_unwrap(isl_union_map_domain(Uses.copy()))),
		give(isl_space_map_from_domain_and_range(
		isl_set_get_space(Writes.keep()), ScatterSpace.copy())));

		// { DomainDef[] -> Zone[] }
		auto Lifetime = betweenScatter(WriteScatter, UseScatter, false, true);

		// { DomainDef[] -> DomainRead[] }
		auto DefUses = give(isl_union_map_domain_factor_domain(Uses.take()));

		return std::make_pair(DefUses, Lifetime);
		}

		/// For each 'execution' of a PHINode, get the incoming block that was
		/// executed before.
		///
		/// For each PHI instance we can directly determine which was the incoming
		/// block, and hence derive which value the PHI has.
		///
		/// @param SAI The ScopArrayInfo representing the PHI's storage.
		///
		/// @return { DomainPHIRead[] -> DomainPHIWrite[] }
		IslPtr<isl_union_map> computePerPHI(const ScopArrayInfo *SAI) {
		assert(SAI->isPHIKind());

		// { DomainPHIWrite[] -> Scatter[] }
		auto PHIWriteScatter = makeEmptyUnionMap();

		// Collect all incoming block timepoint.
		for (auto *MA : DefUse.getPHIIncomings(SAI)) {
		auto Scatter = getScatterFor(MA);
		PHIWriteScatter =
		give(isl_union_map_add_map(PHIWriteScatter.take(), Scatter.take()));
		}

		// { DomainPHIRead[] -> Scatter[] }
		auto PHIReadScatter = getScatterFor(DefUse.getPHIRead(SAI));

		// { DomainPHIRead[] -> Scatter[] }
		auto BeforeRead = beforeScatter(PHIReadScatter, true);

		// { Scatter[] }
		auto WriteTimes = singleton(
		give(isl_union_map_range(PHIWriteScatter.copy())), ScatterSpace);

		// { DomainPHIRead[] -> Scatter[] }
		auto PHIWriteTimes =
		give(isl_map_intersect_range(BeforeRead.take(), WriteTimes.take()));
		auto LastPerPHIWrites = give(isl_map_lexmax(PHIWriteTimes.take()));

		// { DomainPHIRead[] -> DomainPHIWrite[] }
		auto Result = give(isl_union_map_apply_range(
		isl_union_map_from_map(LastPerPHIWrites.take()),
		isl_union_map_reverse(PHIWriteScatter.take())));
		assert(isl_union_map_is_single_valued(Result.keep()) == isl_bool_true);
		assert(isl_union_map_is_injective(Result.keep()) == isl_bool_true);
		return Result;
		}

		/// Try to map a MemoryKind::Value to a given array element.
		///
		/// @param SAI Representation of the scalar's memory to map.
		/// @param TargetElt { Scatter[] -> Element[] }
		/// Suggestion where to map a scalar to when at a timepoint.
		///
		/// @return true if the scalar was successfully mapped.
		bool tryMapValue(const ScopArrayInfo *SAI, IslPtr<isl_map> TargetElt) {
		assert(SAI->isValueKind());

		auto *DefMA = DefUse.getValueDef(SAI);
		assert(DefMA->isValueKind());
		assert(DefMA->isMustWrite());

		// Stop if the scalar has already been mapped.
		if (!DefMA->getLatestScopArrayInfo()->isValueKind())
		return false;

		// { DomainDef[] -> Scatter[] }
		auto DefSched = getScatterFor(DefMA);

		// Where each write is mapped to, according to the suggestion.
		// { DomainDef[] -> Element[] }
		auto DefTarget = give(isl_map_apply_domain(
		TargetElt.copy(), isl_map_reverse(DefSched.copy())));
		simplify(DefTarget);
		DEBUG(dbgs() << " Def Mapping: " << DefTarget << '\n');

		auto OrigDomain = getDomainFor(DefMA);
		auto MappedDomain = give(isl_map_domain(DefTarget.copy()));
		if (!isl_set_is_subset(OrigDomain.keep(), MappedDomain.keep())) {
		DEBUG(dbgs()
		<< " Reject because mapping does not encompass all instances\n");
		return false;
		}

		// { DomainDef[] -> Zone[] }
		IslPtr<isl_map> Lifetime;

		// { DomainDef[] -> DomainUse[] }
		IslPtr<isl_union_map> DefUses;

		std::tie(DefUses, Lifetime) = computeValueUses(SAI);
		DEBUG(dbgs() << " Lifetime: " << Lifetime << '\n');

		/// { [Element[] -> Zone[]] }
		auto EltZone = give(
		isl_map_wrap(isl_map_apply_domain(Lifetime.copy(), DefTarget.copy())));
		simplify(EltZone);

		// { [Element[] -> Scatter[]] }
		auto DefEltSched = give(isl_map_wrap(isl_map_reverse(
		isl_map_apply_domain(DefTarget.copy(), DefSched.copy()))));
		simplify(DefEltSched);

		Knowledge Proposed(EltZone, nullptr, DefEltSched);
		if (isConflicting(Proposed))
		return false;

		// { DomainUse[] -> Element[] }
		auto UseTarget = give(
		isl_union_map_apply_range(isl_union_map_reverse(DefUses.take()),
		isl_union_map_from_map(DefTarget.copy())));

		mapValue(SAI, std::move(DefTarget), std::move(UseTarget),
		std::move(Lifetime), std::move(Proposed));
		return true;
		}

		/// After a scalar has been mapped, update the global knowledge.
		void applyLifetime(Knowledge Proposed) {
		Zone.learnFrom(std::move(Proposed));
		}

		/// Map a MemoryKind::Value scalar to an array element.
		///
		/// Callers must have ensured that the mapping is valid and not conflicting.
		///
		/// @param SAI The ScopArrayInfo representing the scalar's memory to
		/// map.
		/// @param DefTarget { DomainDef[] -> Element[] }
		/// The array element to map the scalar to.
		/// @param UseTarget { DomainUse[] -> Element[] }
		/// The array elements the uses are mapped to.
		/// @param Lifetime { DomainDef[] -> Zone[] }
		/// The lifetime of each llvm::Value definition for
		/// reporting.
		/// @param Proposed Mapping constraints for reporting.
		void mapValue(const ScopArrayInfo *SAI, IslPtr<isl_map> DefTarget,
		IslPtr<isl_union_map> UseTarget, IslPtr<isl_map> Lifetime,
		Knowledge Proposed) {
		// Redirect the read accesses.
		for (auto *MA : DefUse.getValueUses(SAI)) {
		// { DomainUse[] }
		auto Domain = getDomainFor(MA);

		// { DomainUse[] -> Element[] }
		auto NewAccRel = give(isl_union_map_intersect_domain(
		UseTarget.copy(), isl_union_set_from_set(Domain.take())));
		simplify(NewAccRel);

		assert(isl_union_map_n_map(NewAccRel.keep()) == 1);
		MA->setNewAccessRelation(isl_map_from_union_map(NewAccRel.take()));
		}

		auto *WA = DefUse.getValueDef(SAI);
		WA->setNewAccessRelation(DefTarget.copy());
		applyLifetime(Proposed);

		MappedValueScalars++;
		}

		/// Try to map a MemoryKind::PHI scalar to a given array element.
		///
		/// @param SAI Representation of the scalar's memory to map.
		/// @param TargetElt { Scatter[] -> Element[] }
		/// Suggestion where to map the scalar to when at a
		/// timepoint.
		///
		/// @return true if the PHI scalar has been mapped.
		bool tryMapPHI(const ScopArrayInfo *SAI, IslPtr<isl_map> TargetElt) {
		auto *PHIRead = DefUse.getPHIRead(SAI);
		assert(PHIRead->isPHIKind());
		assert(PHIRead->isRead());

		// Skip if already been mapped.
		if (!PHIRead->getLatestScopArrayInfo()->isPHIKind())
		return false;

		// { DomainRead[] -> Scatter[] }
		auto PHISched = getScatterFor(PHIRead);

		// { DomainRead[] -> Element[] }
		auto PHITarget =
		give(isl_map_apply_range(PHISched.copy(), TargetElt.copy()));
		simplify(PHITarget);
		DEBUG(dbgs() << " Mapping: " << PHITarget << '\n');

		auto OrigDomain = getDomainFor(PHIRead);
		auto MappedDomain = give(isl_map_domain(PHITarget.copy()));
		if (!isl_set_is_subset(OrigDomain.keep(), MappedDomain.keep())) {
		DEBUG(dbgs()
		<< " Reject because mapping does not encompass all instances\n");
		return false;
		}

		// { DomainRead[] -> DomainWrite[] }
		auto PerPHIWrites = computePerPHI(SAI);

		// { DomainWrite[] -> Element[] }
		auto WritesTarget = give(isl_union_map_reverse(isl_union_map_apply_domain(
		PerPHIWrites.copy(), isl_union_map_from_map(PHITarget.copy()))));
		simplify(WritesTarget);

		// { DomainWrite[] }
		auto ExpandedWritesDom = give(isl_union_map_domain(WritesTarget.copy()));
		auto UniverseWritesDom = give(isl_union_set_empty(ParamSpace.copy()));

		for (auto *MA : DefUse.getPHIIncomings(SAI))
		UniverseWritesDom = give(isl_union_set_add_set(UniverseWritesDom.take(),
		getDomainFor(MA).take()));

		if (!isl_union_set_is_subset(UniverseWritesDom.keep(),
		ExpandedWritesDom.keep())) {
		DEBUG(dbgs() << " Reject because did not find PHI write mapping for "
		"all instances\n");
		DEBUG(dbgs() << " Deduced Mapping: " << WritesTarget << '\n');
		DEBUG(dbgs() << " Missing instances: "
		<< give(isl_union_set_subtract(UniverseWritesDom.copy(),
		ExpandedWritesDom.copy()))
		<< '\n');
		return false;
		}

		// { DomainRead[] -> Scatter[] }
		auto PerPHIWriteScatter = give(isl_map_from_union_map(
		isl_union_map_apply_range(PerPHIWrites.copy(), Schedule.copy())));

		// { DomainRead[] -> Zone[] }
		auto Lifetime = betweenScatter(PerPHIWriteScatter, PHISched, false, true);
		simplify(Lifetime);
		DEBUG(dbgs() << " Lifetime: " << Lifetime << "\n");

		// { DomainWrite[] -> Zone[] }
		auto WriteLifetime = give(isl_union_map_apply_domain(
		isl_union_map_from_map(Lifetime.copy()), PerPHIWrites.copy()));

		// { DomainWrite[] -> [Element[] -> Scatter[]] }
		auto WrittenTranslator =
		give(isl_union_map_range_product(WritesTarget.copy(), Schedule.copy()));

		// { [Element[] -> Scatter[]] }
		auto Written = give(isl_union_map_range(WrittenTranslator.copy()));
		simplify(Written);

		// { DomainWrite[] -> [Element[] -> Zone[]] }
		auto LifetimeTranslator = give(
		isl_union_map_range_product(WritesTarget.copy(), WriteLifetime.take()));

		// { [Element[] -> Zone[] }
		auto Occupied = give(isl_union_map_range(LifetimeTranslator.copy()));
		simplify(Occupied);

		Knowledge Proposed(Occupied, nullptr, Written);
		if (isConflicting(Proposed))
		return false;

		mapPHI(SAI, std::move(PHITarget), std::move(WritesTarget),
		std::move(Lifetime), std::move(Proposed));
		return true;
		}

		/// Map a MemoryKind::PHI scalar to an array element.
		///
		/// Callers must have ensured that the mapping is valid and not conflicting
		/// with the common knowledge.
		///
		/// @param SAI The ScopArrayInfo representing the scalar's memory to
		/// map.
		/// @param ReadTarget { DomainRead[] -> Element[] }
		/// The array element to map the scalar to.
		/// @param WriteTarget { DomainWrite[] -> Element[] }
		/// New access target for each PHI incoming write.
		/// @param Lifetime { DomainRead[] -> Zone[] }
		/// The lifetime of each PHI for reporting.
		/// @param Proposed Mapping constraints for reporting.
		void mapPHI(const ScopArrayInfo *SAI, IslPtr<isl_map> ReadTarget,
		IslPtr<isl_union_map> WriteTarget, IslPtr<isl_map> Lifetime,
		Knowledge Proposed) {
		// Redirect the PHI incoming writes.
		for (auto *MA : DefUse.getPHIIncomings(SAI)) {
		// { DomainWrite[] }
		auto Domain = getDomainFor(MA);

		// { DomainWrite[] -> Element[] }
		auto NewAccRel = give(isl_union_map_intersect_domain(
		WriteTarget.copy(), isl_union_set_from_set(Domain.take())));
		simplify(NewAccRel);

		assert(isl_union_map_n_map(NewAccRel.keep()) == 1);
		MA->setNewAccessRelation(isl_map_from_union_map(NewAccRel.take()));
		}

		// Redirect the PHI read.
		auto *PHIRead = DefUse.getPHIRead(SAI);
		PHIRead->setNewAccessRelation(ReadTarget.copy());
		applyLifetime(Proposed);

		MappedPHIScalars++;
		}

		/// Search and map scalars to memory overwritten by @p TargetStoreMA.
		///
		/// Start trying to map scalars that are used in the same statement as the
		/// store. For every successful mapping, try to also map scalars of the
		/// statements where those are written. Repeat, until no more mapping
		/// opportunity is found.
		///
		/// There is currently no preference in which order scalars are tried.
		/// Ideally, we would direct it towards a load instruction of the same array
		/// element.
		bool collapseScalarsToStore(MemoryAccess *TargetStoreMA) {
		assert(TargetStoreMA->isLatestArrayKind());
		assert(TargetStoreMA->isMustWrite());

		auto TargetStmt = TargetStoreMA->getStatement();

		// { DomTarget[] }
		auto TargetDom = getDomainFor(TargetStmt);

		// { DomTarget[] -> Element[] }
		auto TargetAccRel = getAccessRelationFor(TargetStoreMA);

		// { Zone[] -> DomTarget[] }
		// For each point in time, find the next target store instance.
		auto Target =
		computeScalarReachingOverwrite(Schedule, TargetDom, false, true);

		// { Zone[] -> Element[] }
		// Use the target store's write location as a suggestion to map scalars to.
		auto EltTarget =
		give(isl_map_apply_range(Target.take(), TargetAccRel.take()));
		simplify(EltTarget);
		DEBUG(dbgs() << " Target mapping is " << EltTarget << '\n');

		// Stack of elements not yet processed.
		SmallVector<MemoryAccess *, 16> Worklist;

		// Set of scalars already tested.
		SmallPtrSet<const ScopArrayInfo *, 16> Closed;

		// Lambda to add all scalar reads to the work list.
		auto ProcessAllIncoming = [&](ScopStmt *Stmt) {
		for (auto MA : Stmt) {
		if (!MA->isLatestScalarKind())
		continue;
		if (!MA->isRead())
		continue;

		Worklist.push_back(MA);
		}
		};

		// Add initial scalar. Either the value written by the store, or all inputs
		// of its statement.
		auto WrittenVal = TargetStoreMA->getAccessValue();
		if (auto InputAcc = getInputAccessOf(WrittenVal, TargetStmt))
		Worklist.push_back(InputAcc);
		else
		ProcessAllIncoming(TargetStmt);

		auto AnyMapped = false;
		auto &DL =
		S->getRegion().getEntry()->getParent()->getParent()->getDataLayout();
		auto StoreSize =
		DL.getTypeAllocSize(TargetStoreMA->getAccessValue()->getType());

		while (!Worklist.empty()) {
		auto *MA = Worklist.pop_back_val();

		auto *SAI = MA->getScopArrayInfo();
		if (Closed.count(SAI))
		continue;
		Closed.insert(SAI);
		DEBUG(dbgs() << "\n Trying to map " << MA << " (SAI: " << SAI
		<< ")\n");

		// Skip non-mappable scalars.
		if (!isMappable(SAI))
		continue;

		auto MASize = DL.getTypeAllocSize(MA->getAccessValue()->getType());
		if (MASize > StoreSize) {
		DEBUG(dbgs() << " Reject because storage size is insufficient\n");
		continue;
		}

		// Try to map MemoryKind::Value scalars.
		if (SAI->isValueKind()) {
		if (!tryMapValue(SAI, EltTarget))
		continue;

		auto *DefAcc = DefUse.getValueDef(SAI);
		ProcessAllIncoming(DefAcc->getStatement());

		AnyMapped = true;
		continue;
		}

		// Try to map MemoryKind::PHI scalars.
		if (SAI->isPHIKind()) {
		if (!tryMapPHI(SAI, EltTarget))
		continue;
		// Add inputs of all incoming statements to the worklist.
		for (auto *PHIWrite : DefUse.getPHIIncomings(SAI))
		ProcessAllIncoming(PHIWrite->getStatement());

		AnyMapped = true;
		continue;
		}
		}

		if (AnyMapped)
		TargetsMapped++;
		return AnyMapped;
		}

		/// Compute when an array element is unused.
		///
		/// @return { [Element[] -> Zone[]] }
		IslPtr<isl_union_set> computeLifetime() const {
		// { Element[] -> Zone[] }
		auto ArrayUnused = computeArrayUnused(Schedule, AllMustWrites, AllReads,
		false, false, true);

		auto Result = give(isl_union_map_wrap(ArrayUnused.copy()));

		simplify(Result);
		return Result;
		}

		/// Determine when an array element is written to.
		///
		/// @return { [Element[] -> Scatter[]] }
		IslPtr<isl_union_set> computeWritten() const {
		// { WriteDomain[] -> Element[] }
		auto AllWrites =
		give(isl_union_map_union(AllMustWrites.copy(), AllMayWrites.copy()));

		// { Scatter[] -> Element[] }
		auto WriteTimepoints =
		give(isl_union_map_apply_domain(AllWrites.copy(), Schedule.copy()));

		auto Result =
		give(isl_union_map_wrap(isl_union_map_reverse(WriteTimepoints.copy())));

		simplify(Result);
		return Result;
		}

		/// Determine whether an access touches at most one element.
		///
		/// The accessed element could be a scalar or accessing an array with constant
		/// subscript, such that all instances access only that element.
		///
		/// @param MA The access to test.
		///
		/// @return True, if zero or one elements are accessed; False if at least two
		/// different elements are accessed.
		bool isScalarAccess(MemoryAccess *MA) {
		auto Map = getAccessRelationFor(MA);
		auto Set = give(isl_map_range(Map.take()));
		return isl_set_is_singleton(Set.keep()) == isl_bool_true;
		}

		public:
		DeLICMImpl(Scop *S) : ZoneAlgorithm(S) {}

		/// Calculate the lifetime (definition to last use) of every array element.
		///
		/// @return True if the computed lifetimes (#Zone) is usable.
		bool computeZone() {
		// Check that nothing strange occurs.
		if (!isCompatibleScop()) {
		DeLICMIncompatible++;
		return false;
		}

		DefUse.compute(S);
		IslPtr<isl_union_set> EltUnused, EltWritten;

		{
		IslMaxOperationsGuard MaxOpGuard(IslCtx.get(), DelicmMaxOps);

		computeCommon();

		EltUnused = computeLifetime();
		EltWritten = computeWritten();
		}

		if (isl_ctx_last_error(IslCtx.get()) == isl_error_quota) {
		DeLICMOutOfQuota++;
		DEBUG(dbgs() << "DeLICM analysis exceeded max_operations\n");
		}

		DeLICMAnalyzed++;
		OriginalZone = Knowledge(nullptr, EltUnused, EltWritten);
		DEBUG(dbgs() << "Computed Zone:\n"; OriginalZone.print(dbgs(), 4));

		Zone = OriginalZone;

		return DelicmMaxOps == 0 \|\| Zone.isUsable();
		}

		/// Try to map as many scalars to unused array elements as possible.
		///
		/// Multiple scalars might be mappable to intersecting unused array element
		/// zones, but we can only chose one. This is a greedy algorithm, therefore
		/// the first processed element claims it.
		void greedyCollapse() {
		bool Modified = false;

		for (auto &Stmt : *S) {
		for (auto *MA : Stmt) {
		if (!MA->isLatestArrayKind())
		continue;
		if (!MA->isWrite())
		continue;

		if (MA->isMayWrite()) {
		DEBUG(dbgs() << "Access " << MA
		<< " pruned because it is a MAY_WRITE\n");
		continue;
		}

		if (Stmt.getNumIterators() == 0) {
		DEBUG(dbgs() << "Access " << MA
		<< " pruned because it is not in a loop\n");
		continue;
		}

		if (isScalarAccess(MA)) {
		DEBUG(dbgs() << "Access " << MA
		<< " pruned because it writes only a single element\n");
		continue;
		}

		DEBUG(dbgs() << "Analyzing target access " << MA << "\n");
		if (collapseScalarsToStore(MA))
		Modified = true;
		}
		}

		if (Modified)
		DeLICMScopsModified++;
		}

		/// Dump the internal information about a performed DeLICM to @p OS.
		void print(llvm::raw_ostream &OS, int indent = 0) {
		printAccesses(OS, indent);
		}
		};

class DeLICM : public ScopPass {		class DeLICM : public ScopPass {
private:		private:
DeLICM(const DeLICM &) = delete;		DeLICM(const DeLICM &) = delete;
const DeLICM &operator=(const DeLICM &) = delete;		const DeLICM &operator=(const DeLICM &) = delete;

		/// The pass implementation, also holding per-scop data.
		std::unique_ptr<DeLICMImpl> Impl;

		void collapseToUnused(Scop &S) {
		Impl = make_unique<DeLICMImpl>(&S);

		if (!Impl->computeZone()) {
		DEBUG(dbgs() << "Abort because cannot reliably compute lifetimes\n");
		return;
		}

		DEBUG(dbgs() << "Collapsing scalars to unused array elements...\n");
		Impl->greedyCollapse();

		DEBUG(dbgs() << "\nFinal Scop:\n");
		DEBUG(S.print(dbgs()));
		}

public:		public:
static char ID;		static char ID;
explicit DeLICM() : ScopPass(ID) {}		explicit DeLICM() : ScopPass(ID) {}

virtual void getAnalysisUsage(AnalysisUsage &AU) const override {		virtual void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequiredTransitive<ScopInfoRegionPass>();		AU.addRequiredTransitive<ScopInfoRegionPass>();
AU.setPreservesAll();		AU.setPreservesAll();
}		}

virtual bool runOnScop(Scop &S) override {		virtual bool runOnScop(Scop &S) override {
// Free resources for previous scop's computation, if not yet done.		// Free resources for previous scop's computation, if not yet done.
releaseMemory();		releaseMemory();

// TODO: Run DeLICM algorithm		collapseToUnused(S);

return false;		return false;
}		}

virtual void printScop(raw_ostream &OS, Scop &S) const override {		virtual void printScop(raw_ostream &OS, Scop &S) const override {
		if (!Impl)
		return;
		assert(Impl->getScop() == &S);

OS << "DeLICM result:\n";		OS << "DeLICM result:\n";
// TODO: Print analysis results and performed transformation details		Impl->print(OS);
}		}

virtual void releaseMemory() override {		virtual void releaseMemory() override { Impl.reset(); }
// TODO: Release resources (eg. shared_ptr to isl_ctx)
}
};		};

char DeLICM::ID;		char DeLICM::ID;
} // anonymous namespace		} // anonymous namespace

Pass *polly::createDeLICMPass() { return new DeLICM(); }		Pass *polly::createDeLICMPass() { return new DeLICM(); }

INITIALIZE_PASS_BEGIN(DeLICM, "polly-delicm", "Polly - DeLICM/DePRE", false,		INITIALIZE_PASS_BEGIN(DeLICM, "polly-delicm", "Polly - DeLICM/DePRE", false,
Show All 19 Lines

polly/trunk/test/DeLICM/reduction_preheader.ll

				; RUN: opt %loadPolly -polly-flatten-schedule -polly-delicm -analyze < %s \| FileCheck %s
				;
				; void func(double *A) {
				; for (int j = 0; j < 2; j += 1) { /* outer */
				; double phi = 0.0;
				; for (int i = 0; i < 4; i += 1) /* reduction */
				; phi += 4.2;
				; A[j] = phi;
				; }
				; }
				;
				define void @func(double* noalias nonnull %A) {
				entry:
				br label %outer.preheader

				outer.preheader:
				br label %outer.for

				outer.for:
				%j = phi i32 [0, %outer.preheader], [%j.inc, %outer.inc]
				%j.cmp = icmp slt i32 %j, 2
				br i1 %j.cmp, label %reduction.preheader, label %outer.exit


				reduction.preheader:
				br label %reduction.for

				reduction.for:
				%i = phi i32 [0, %reduction.preheader], [%i.inc, %reduction.inc]
				%phi = phi double [0.0, %reduction.preheader], [%add, %reduction.inc]
				%i.cmp = icmp slt i32 %i, 4
				br i1 %i.cmp, label %body, label %reduction.exit



				body:
				%add = fadd double %phi, 4.2
				br label %reduction.inc



				reduction.inc:
				%i.inc = add nuw nsw i32 %i, 1
				br label %reduction.for

				reduction.exit:
				%A_idx = getelementptr inbounds double, double* %A, i32 %j
				store double %phi, double* %A_idx
				br label %outer.inc



				outer.inc:
				%j.inc = add nuw nsw i32 %j, 1
				br label %outer.for

				outer.exit:
				br label %return

				return:
				ret void
				}


				; Unrolled flattened schedule:
				; [0] Stmt_reduction_preheader[0]
				; [1] Stmt_reduction_for[0, 0]
				; [2] Stmt_body[0, 0]
				; [3] Stmt_reduction_inc[0, 0]
				; [4] Stmt_reduction_for[0, 1]
				; [5] Stmt_body[0, 1]
				; [6] Stmt_reduction_inc[0, 1]
				; [7] Stmt_reduction_for[0, 2]
				; [8] Stmt_body[0, 2]
				; [9] Stmt_reduction_inc[0, 2]
				; [10] Stmt_reduction_for[0, 3]
				; [11] Stmt_body[0, 3]
				; [12] Stmt_reduction_inc[0, 3]
				; [13] Stmt_reduction_for[0, 4]
				; [14] Stmt_reduction_exit[0]
				; [15] Stmt_reduction_preheader[0]
				; [16] Stmt_reduction_for[1, 0]
				; [17] Stmt_body[1, 0]
				; [18] Stmt_reduction_inc[1, 0]
				; [19] Stmt_reduction_for[1, 1]
				; [20] Stmt_body[1, 1]
				; [21] Stmt_reduction_inc[1, 1]
				; [22] Stmt_reduction_for[1, 2]
				; [23] Stmt_body[1, 2]
				; [24] Stmt_reduction_inc[1, 2]
				; [25] Stmt_reduction_for[1, 3]
				; [26] Stmt_body[1, 3]
				; [27] Stmt_reduction_inc[1, 3]
				; [28] Stmt_reduction_for[1, 4]
				; [29] Stmt_reduction_exit[1]

				; CHECK: After accesses {
				; CHECK-NEXT: Stmt_reduction_preheader
				; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_preheader[i0] -> MemRef_phi__phi[] };
				; CHECK-NEXT: new: { Stmt_reduction_preheader[i0] -> MemRef_A[i0] : 0 <= i0 <= 1 };
				; CHECK-NEXT: Stmt_reduction_for
				; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_for[i0, i1] -> MemRef_phi__phi[] };
				; CHECK-NEXT: new: { Stmt_reduction_for[i0, i1] -> MemRef_A[i0] : 0 <= i0 <= 1 and 0 <= i1 <= 4 };
				; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_for[i0, i1] -> MemRef_phi[] };
				; CHECK-NEXT: new: { Stmt_reduction_for[i0, i1] -> MemRef_A[i0] : 0 <= i0 <= 1 and 0 <= i1 <= 4 };
				; CHECK-NEXT: Stmt_body
				; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_body[i0, i1] -> MemRef_add[] };
				; CHECK-NEXT: new: { Stmt_body[i0, i1] -> MemRef_A[i0] : 0 <= i0 <= 1 and 0 <= i1 <= 3 };
				; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_body[i0, i1] -> MemRef_phi[] };
				; CHECK-NEXT: new: { Stmt_body[i0, i1] -> MemRef_A[i0] : 0 <= i0 <= 1 and i1 >= 0 and -5i0 <= i1 <= 8 - 5i0 and i1 <= 3 };
				; CHECK-NEXT: Stmt_reduction_inc
				; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_inc[i0, i1] -> MemRef_add[] };
				; CHECK-NEXT: new: { Stmt_reduction_inc[i0, i1] -> MemRef_A[i0] : i1 >= 0 and -5i0 <= i1 <= 7 - 5i0 and i1 <= 3; Stmt_reduction_inc[1, 3] -> MemRef_A[1] };
				; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_inc[i0, i1] -> MemRef_phi__phi[] };
				; CHECK-NEXT: new: { Stmt_reduction_inc[i0, i1] -> MemRef_A[i0] : 0 <= i0 <= 1 and i1 >= 0 and -5i0 <= i1 <= 3 };
				; CHECK-NEXT: Stmt_reduction_exit
				; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
				; CHECK-NEXT: { Stmt_reduction_exit[i0] -> MemRef_A[i0] };
				; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: { Stmt_reduction_exit[i0] -> MemRef_phi[] };
				; CHECK-NEXT: new: { Stmt_reduction_exit[i0] -> MemRef_A[i0] : 0 <= i0 <= 1 };
				; CHECK-NEXT: }

This is an archive of the discontinued LLVM Phabricator instance.

[Polly] DeLICM/DePRE (WIP)ClosedPublic

Details

Diff Detail

Event Timeline

@see convertZoneToTimepoints for more details.

Revision Contents

Diff 89179

polly/trunk/lib/Transform/DeLICM.cpp

polly/trunk/test/DeLICM/reduction_preheader.ll

[Polly] DeLICM/DePRE (WIP)
ClosedPublic