This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Create virtual independent blocks
AbandonedPublic

Authored by jdoerfert on Oct 9 2015, 5:28 PM.

Download Raw Diff

Details

Reviewers

etherzhhb
Meinersbur
grosser
bollu

Summary

Even with the scalar/PHI modeling and code generation we still
utilized the independent blocks pass to move scalars around in order to
decrease the dependences caused by them. However, this has two major
drawbacks:
  o Polly could not be used as analysis only pass by default and
    without the independent blocks pass the result suffered.
  o The independent blocks pass had limited knowledge about domains
    as well as accesses. Both limited the pass to simple
    movement/re-computation of scalars that can be trivially moved.

This patch is a drop-in replacement for the independent blocks pass
that first virtually moves/re-computes scalars in the SCoP description.
Only in the code generation re-computation will actually take place.

The logic is split in three traversals of the scalars in the SCoP:
  1) Collect interesting operands for each scalar definition that
     escapes the block it resides in.
  2) For each scalar use, check if the definition can be recomputed
     in its statement based on the interesting operands of the
     definition. If so, remove the access and remember that the
     definition has to be recomputed later.
  3) For each scalar definition, check if all uses have been
     eliminated through virtual re-computation. If so, remove the
     access as it does not need to be communicated anymore.

Subsequent patches can increase the capabilities easily as the domain
and access information is already available.

  --------------------------  REVIEW NOTES  --------------------------

    - lnt & unit tests are green
    - We had 245 scalar access in the test suite before, now 236. In
      total this logic eliminated 21 trivially recomputable scalars
      and allowed to eliminate 54 instead of 49 statements.
    - The buildbots show the following compile/execution time changes for
      a similar but prior version of this patch:

      Compile time:
        Improvements           : Δ       : Previous : Current : σ
        .../bmm                : -71.43% : 0.6440   : 0.1840  : 0.0054
        .../paq8p              : -12.48% : 5.8644   : 5.1323  : 0.0181
        .../consumer-lame      : -10.13% : 10.3444  : 9.2961  : 0.0376
        .../sieve              : -5.61%  : 0.4280   : 0.4040  : 0.0093
        .../espresso           : -1.35%  : 10.3841  : 10.2441 : 0.0395
        ..//Recurrences-dbl    : -1.24%  : 4.1923   : 4.1403  : 0.0150
        .../471_omnetpp        : -1.08%  : 42.1606  : 41.7046 : 0.1013

      Execution time:
        Regressions            : Δ       : Previous : Current : σ
        .../salsa20            : 4.32%   : 6.6684   : 6.9564  : 0.0059
        .../GlobalDataFlow-dbl : 1.53%   : 4.6963   : 4.7683  : 0.0047

        Improvements           : Δ       : Previous : Current : σ
        ..//viterbi            : -1.61%  : 2.7402   : 2.6962  : 0.0072

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 37007.Oct 9 2015, 5:28 PM

jdoerfert retitled this revision from to [Polly] Create virtual independent blocks.

jdoerfert added reviewers: grosser, Meinersbur.

jdoerfert updated this object.

jdoerfert added a subscriber: Restricted Project.

Herald added a subscriber: sanjoy. · View Herald TranscriptOct 9 2015, 5:28 PM

jdoerfert updated this object.Oct 9 2015, 7:32 PM

jdoerfert added a child revision: D13616: [Polly] Allow to re-load values to create independent blocks.Oct 9 2015, 7:37 PM

jdoerfert mentioned this in D13676: [Polly] Do not store scalar accesses in InstructionToAccess.Oct 13 2015, 2:01 AM

Ping

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

Post-processing statements vs. instruction-list ScopStmts

Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.

Best,
Tobias

include/polly/ScopInfo.h
1194	that needs to be recomputed
lib/Analysis/ScopInfo.cpp
1604–1605	use is / uses are
3175	"The latter to add" does not seem to make sense grammatically.
3300	grammar
3305	I am not yet convinced an exit ScopStmt is something we would like to have. Maybe leave this for another patch review.

In D13611#268449, @grosser wrote:

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

To add some details:

D12975 tracks which values flow between ScopStmts which this patch modifies at a very late phase. There should be fewer such implicit flows, but some of those flows might have already be created with origin MAPPED. The array element occupied by this MAPPED flow might be used for some other implicit flow, meaning De-LICM before virtual blocks is ineffective. It is also more difficult to modify MAPPEDs because lifetime issues have to be considered and there is no distinction between SCALAR and PHI origins (not sure how much of a problem this is though).

One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.

So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.

lib/Analysis/ScopInfo.cpp
1615	I do not understand this code. Why did you even remove any comment about what it is doing?
3163	God function antipattern

In D13611#268449, @grosser wrote:

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

Yeah, at the time this was written it was fine but as far as I understand it the MAPPED accesses (and possibly more) implicitly change the semantics of the SCoP description without it beeing actually changed (except the "MAPPED" flag). One could argue that this should look for MAPPED accesses but I don't, instead I will argue that "MAPPED" should not be an access property that is determined on the CFG and only a flag in the access description that turns on some alternate path in the code generation. An access that is not actually happening (no code is generated for) should not be in the SCoP description and the ones in there should happen where they are and read/write the location they define. If that is the case this patch should work properly.

Post-processing statements vs. instruction-list ScopStmts

Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.

Processing the SCoP allows general solutions which are not (easily) reproduceable on the CFG, e.g., determining if a load is overwritten is easy and precise but hard on the CFG. The instruction list is a different issue and it would remove not that much code/logic anyway. With the pending patches by Michael we will reload scalars at the beginning of a statmeent and as I pointed out on that review, alsmost the same is added here. While I did not get a valuable response on the idea (and implementation sketch) to merge the recomputation and the scalar reload at the beginning of a statement, I think it is what we ultimately want.

In D13611#268497, @Meinersbur wrote:

In D13611#268449, @grosser wrote:

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

To add some details:

D12975 tracks which values flow between ScopStmts which this patch modifies at a very late phase. There should be fewer such implicit flows, but some of those flows might have already be created with origin MAPPED. The array element occupied by this MAPPED flow might be used for some other implicit flow, meaning De-LICM before virtual blocks is ineffective. It is also more difficult to modify MAPPEDs because lifetime issues have to be considered and there is no distinction between SCALAR and PHI origins (not sure how much of a problem this is though).

See the comment on

One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.

While moving, copying and deleting MemoryAccesses sound like a lot, one has to remember the number of times this can happen and the cost of each operation.
I think the fact that we get generally better results with this patch than without (see commit message) shows that the "overhead" is low enough.

The implementation difficulty is another issue. While this patch currently only forwards/copies scalars its general structure allows more, e.g., reload values instead of communicate them (see subsequent patch) or remove conditional PHI nodes and replace them by selects. While e.g., reloading is generally hard and costly on the CFG level (one has to determine if the value has been overwritten somewhere in between) the SCoP abstraction allows to reason about it in a simple and consise way.

So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.

I already proposed the latter in a review last week but I don't recall a positive answer to that.

lib/Analysis/ScopInfo.cpp
1615	This code was basically moved and the original comment was left behind, hence the answer to your question is simple: The comment has to be moved too.
3163	I take this comment should read as: Please split the function in three parts. Sure, can do.

In D13611#268545, @jdoerfert wrote:

In D13611#268449, @grosser wrote:

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

Yeah, at the time this was written it was fine but as far as I understand it the MAPPED accesses (and possibly more) implicitly change the semantics of the SCoP description without it beeing actually changed (except the "MAPPED" flag). One could argue that this should look for MAPPED accesses but I don't, instead I will argue that "MAPPED" should not be an access property that is determined on the CFG and only a flag in the access description that turns on some alternate path in the code generation. An access that is not actually happening (no code is generated for) should not be in the SCoP description and the ones in there should happen where they are and read/write the location they define. If that is the case this patch should work properly.

The semantics of a Scop does not change. What does change is where implicits are stored. Like AccessInstruction and AccessValue and non-affine subregions, it is no information required by the polyhedral model itself, but required to associate polyhedral statments to the IR behind it.

To avoid modifying IR directly, we "virtualize" the relevant metainformation such that the Scop description and IR code do not map directly. Having to redirect loads/stores becomes unavoidable.

Post-processing statements vs. instruction-list ScopStmts

Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.

Processing the SCoP allows general solutions which are not (easily) reproduceable on the CFG, e.g., determining if a load is overwritten is easy and precise but hard on the CFG. The instruction list is a different issue and it would remove not that much code/logic anyway. With the pending patches by Michael we will reload scalars at the beginning of a statmeent and as I pointed out on that review, alsmost the same is added here.

I can't find such code in this patch. Where is it?

While I did not get a valuable response on the idea (and implementation sketch) to merge the recomputation and the scalar reload at the beginning of a statement, I think it is what we ultimately want.

What proposal are you refereing to? What scalar reload?

In D13611#268560, @jdoerfert wrote:

In D13611#268497, @Meinersbur wrote:

In D13611#268449, @grosser wrote:

Hi Johannes,

great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.

The two main items I would still like to look into:

This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)

I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).

To add some details:

D12975 tracks which values flow between ScopStmts which this patch modifies at a very late phase. There should be fewer such implicit flows, but some of those flows might have already be created with origin MAPPED. The array element occupied by this MAPPED flow might be used for some other implicit flow, meaning De-LICM before virtual blocks is ineffective. It is also more difficult to modify MAPPEDs because lifetime issues have to be considered and there is no distinction between SCALAR and PHI origins (not sure how much of a problem this is though).

See the comment on

Which coomment?

One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.

While moving, copying and deleting MemoryAccesses sound like a lot, one has to remember the number of times this can happen and the cost of each operation.
I think the fact that we get generally better results with this patch than without (see commit message) shows that the "overhead" is low enough.

I have no general objection to it. However, you occasionally raise such concerns yourself (e.g. D13341) so I try to take them into account.

The implementation difficulty is another issue. While this patch currently only forwards/copies scalars its general structure allows more, e.g., reload values instead of communicate them (see subsequent patch) or remove conditional PHI nodes and replace them by selects. While e.g., reloading is generally hard and costly on the CFG level (one has to determine if the value has been overwritten somewhere in between) the SCoP abstraction allows to reason about it in a simple and consise way.

Such analysis on the polyhedral model requires expensive ISL operations to compute the dependences. One of our intents of simplifying the scop description is to make its analysis run faster. We'd do still to the complicated analysis on the large scop description to get a simpler on and redo the same analysis. I am not saying this that this cannot make sense, but the simplifications we want to perform (IndependentBlocks, DeLICM) are both on the IR level on which reasoning is much cheaper.

So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.

I already proposed the latter in a review last week but I don't recall a positive answer to that.

AFAIU we about to discuss this.

Rebased + Changed according to first comments

Fix minor errors

I have the impression that this very much relies on that accesses (mostly SCALAR READ, but also PHI writes in non-affine subregions) are computed per instruction, whereas in D13762 I tried to make them per-ScopStmt. The difference is that e.g. if there are two reads in the same BB, two loads would be generated. In D13762 I tried to ensure that there is only one load.

This patch seems to skip PHI accesses completely. In San Joe you mentioned that only loop-carried phis would be skipped. Where is that part? Where would DeLICM tell what memory can be reused?

I got the impression we could generalize this with in SCEV expander. A SCEV-like object (we already support SDiv/SRem) could store an entire operand tree of movable instructions and CodeGen would synthesize/materialize it on request. This looks like more streamlined with current mechanisms instead of introducing a new one. What do you think?

include/polly/CodeGen/BlockGenerators.h
422	"all" does not match the description below
426	... where they are used.
428	... thus ensures that all ...
483	llvm_unreachable is not defined to trigger a trap.
488	Missing documentation?
541	Flag to indicate that the instruction ...
include/polly/LinkAllPasses.h
36 ↗	(On Diff #37007)	Separate patch to remove IndependentBlocks?
include/polly/ScopInfo.h
647	We can get the AccList from getStatement()->getParent()->AccFuncMap. Not necessary to pass it as parameter. Rename to "copyTo"?
691	Unrelated?
1163	I think we should not depend on the order of accesses. ISL does not understand such an order and depending on it will likely cause some bugs (Think of a load after storing to the same location; do we even handle that case yet?). AFAIK ISL assumes that all loads happen before the stores.
1171	OnlyMA=false removes multiple accesses, so move to a "removeAccessesCausedBy" function?
1198	It would be better if the class would not expose its implementation as an API. It could return iterators or an ArrayRef.
1565	It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation.
1581	I think you should clarify that you actually mean a flattened operand _tree_. I think LLVM uses the term operands as the direct arguments/parameters of an instruction.
1585	AFAICS it is only used during the execution of simplifyScalarAccesses. You could make it local and pass it to the functions that need it.
1620	The name "simplifyScalarAccesses" is very general. How about "recomputableDependentScalars"?
lib/Analysis/ScopInfo.cpp
829	What if it has fewer dimensions? If it is impossible, could you add an assertion to make it clear?
835	Doesn't this update InstructionToAccesses?
1614	I'd still ask you to refactor this code. It is hard to understand and looks fragile as it depends on order. Maybe you could use some functions from <algorithm>
3159	What are you plans here? When can instructions with side effects be recomputed?
3164	What is there to see?
3169	Why exit node PHIs processed, but not standard phis?
3181	Why for each access separately? Shouldn't this be per-ScopStmt because it only matters for cross-stmt operands?
3196	auto -> Use ?
3198	This probably will have issues with PHIs in the scop's entry just is in PR25394, ie. if InstOpInst is a PHI with an incoming block from outside the scop.
3205	and continue ?
3257	Wouldn't it be possible to recompute some of the dependent values, even if some operands have side effects?
3261	Why only instructions that have operands from outside? For movable instructions (DependentScalar) without such operands, but implicit accesses, don't we need to remove its memory accesses as well?
3266	This is using the list of accesses this function modifies on the fly ("InstructionToAccess"). As such, it seems to me to depend on the order we iterate over all statements. E.g. the previous iteration already removed the read access of "OutsideUser", hence will not be added to AdditionalAccesses anymore, hence we miss some read accesses. Or am I wrong? Why is it this cannot happen?
3271	What does this condition mean? I think we generally are misusing "BaseAddr" for scalar access, because it actually not an address.
3275	I'd expect this to be computed in collectNonTrivialOperands(), like a partition of instructions in the operand tree into Movable Outside/Sideeffect (operand tree leaves) I understand your algorithm works differently, but to understand, what is the rationale to only have direct operands in DependentScalars, while we want to copy/move an entire operand tree?
3313	If a user has no more accesses, is it because it is intended to be removed?
3316	auto -> MemoryAccess ?
3321	Is this break meant to leave both for-loops?
3328	This seems to be a different kind of "resolved" than in removeRecomputableScalarReads.
3344	collectNonTrivialOperands actually skips PHIs which are also implicit accesses.
3353	hence -> i.e.
lib/CodeGen/BlockGenerators.cpp
67	When does a dependent scalar become a global?
82	Shouldn't these be in the BBMap after recomputeDependentScalars() executed?
141–143	Why does this become necessary? Shouldn't "trySynthesizeNewValue" determine itself whether it is synthesizable?
158–159	This essentially becomes assert(TryOnly) return nullptr;
180	What is the rationale for this condition?
182	Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar? If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence?
235–238	I like this change, but could be committed independently. Maybe also as a member function of ScopStmt?
237	This idiom appears multiple times. Introduce a helper function?
344	Any reason to do this before generateScalarLoads()? Naively, I'd think it belongs after it because some if the dependent instructions might still use scalars (e.g. read-only ones).
test/Isl/CodeGen/srem-in-other-bb.ll
1	Nice, but unrelated

In D13611#284391, @Meinersbur wrote:

I have the impression that this very much relies on that accesses (mostly SCALAR READ, but also PHI writes in non-affine subregions) are computed per instruction, whereas in D13762 I tried to make them per-ScopStmt. The difference is that e.g. if there are two reads in the same BB, two loads would be generated. In D13762 I tried to ensure that there is only one load.

One vs. multiple loads/stores of scalars is only of little difference to us as it will only affect code generation. Everything before, especially dependence analysis, will be the same. Hence, consolidating scalar accesses only to save duplicate loads/stores in the genereted IR (that will be taken care of anyway) is no good reason to make any kind of scalar elimination harder.

This patch seems to skip PHI accesses completely. In San Joe you mentioned that only loop-carried phis would be skipped. Where is that part? Where would DeLICM tell what memory can be reused?

There are 5 patches. This is the first and it only deals with trivially recomputable scalars, thus the onces without any "interesting/complicated operands".
The following patches as well as DeLICM comes into play in the canRecomputeInStmt function which is (almost) the only place that needs to be updated to allow more functionality.

I got the impression we could generalize this with in SCEV expander. A SCEV-like object (we already support SDiv/SRem) could store an entire operand tree of movable instructions and CodeGen would synthesize/materialize it on request. This looks like more streamlined with current mechanisms instead of introducing a new one. What do you think?

If SCEV would allow floating point values maybe, till then, we would implement everything we have here somehow in our SCoPExpander which is not really helpful. Additionally, during the SCoP generation (e.g., when we check if we can synthezise scalars instead of communicating them) we cannot check for overwritten loads and free memory locations we can reuse.

include/polly/CodeGen/BlockGenerators.h
476–477	When we virtually move instructions we are not really interested in the loop surroinding them but their new parent block/statement, thus the change.

I tried to address (in words) all your comments. Once we agrred on how to proceed I will do code changes.

include/polly/CodeGen/BlockGenerators.h
426	I will check that.
428	I will check that.
541	I will check that.
include/polly/ScopInfo.h
1171	We could do that.
1565	We could do that.
1581	Where do you expect a tree here? It actually contains transitive operands and I could add that word.
1585	We could do that.
lib/Analysis/ScopInfo.cpp
3196	We could do that.
3205	This looks weird without the follow up commits that introduce some code here.
3316	We could do that.
lib/CodeGen/BlockGenerators.cpp
235–238	I can commit it a priori but I don't see the need for a ScopStmt member function at the moment.
237	That's something not conflicting or required for this commit we can do.
344	First, by construction we can place it here as all recomputable scalars do not depend on anything we cannot recompute. Second, we can probably move it after the scalar loads generation but it should be all the same.
test/Isl/CodeGen/srem-in-other-bb.ll
1	I probably forogot that while I prepared this commit and all the test cases. I'm sorry.

jdoerfert added inline comments.Nov 6 2015, 8:31 PM

include/polly/CodeGen/BlockGenerators.h
422	All that are not communicated via memory but need to be recomputed.
483	I can use the original message instead. If you have a better way to describe llvm_unreachable I can put it here alternatively.
488	Yeah, somebody should add documentation to all this undocumented functions at some point.
include/polly/ScopInfo.h
647	I would like to avoid accessing members of a different class.
691	Needed somewhere, but it was quite a while ago. If ppl need to know where I can check.
1163	I'm not sure about your isl assumption but in any case, I'm not assuming anything here. The different input locations are purely for user experience (as the accesses are ordered in a more natural fashion).
1620	We do not recompute here and later we do not always recompute to simplify scalars, thus I think it actually fits pretty well.
lib/Analysis/ScopInfo.cpp
829	Sure we can add an assertion.
835	Mh, there was no need but we can do that too I guess.
1614	This code (already in the current HEAD) is ugly because it tries to be smart. While it heavily depends on the order it might need a lot more iterations if it would not. I don't argue we have to keep it but only that it works in Polly HEAD for now and can be takled afterwards, if we find a nice and comparable way to do it.
3159	That was what I was telling you in SJ about. I even pushed the first patch that implements this TODO to reviews and it depends on this one.
3164	I'm not sure how to answer to that as I expected the link to the function that containts the comment for this one to be self-explanatory.
3169	I'm not sure why you ask this but as you can probably see in the code no "PHI access" is processed. Though, if something that is a PHI does not introduce a "PHI access" it might be processed here.
3181	I'm not sure what you mean. Why do we look at each access separatly or why do we look at the non-trivial-operands of each access separalty?
3198	Mh, if we have a test case I'll check that.
3257	Might be, but it won't help with removing the scalar dependences. Actually, it might make them worse.
3261	Outside operands are the read only scalars we track to easy OpenMP code generation. The scalar access we recompute is removed later.
3266	No read access is/should be removed if it is still connected to anything else. Hence, your senario should never happen. Additionally, we actually iterate over accesses here, thus the actual removal happens only at the end (after we traversed all accesses).
3271	It depends on how you see it. I think the (virtual) register is a unique address for a scalar, thus it fits well here.
3275	We copy/move the entire operand tree but for that I do not need to traverse it and put all instructions into a list. I simply store the root of the tree and traverse it during the actual copy/move.
3313	Actually, it has already been removed, hence the statement but no read access.
3321	It could be but doesn't need to.
3328	No it is the same but for the write accesses instead of the reads. Or to be more precise, it is the passive version as writes are (passively) resolved if all their reads have been (actively) resolved.
3344	It does not skip them but stops as they are non-trivial operands.
lib/CodeGen/BlockGenerators.cpp
67	If it is hoisted.
82	After recomputeDependentScalars is executed they are in BBMap. Though our synthesize code sometimes thinks it can sythesize something while it actually referes to old values we have to recompute first while we are in recomputeDependentScalars.
141–143	Good question. I don't remember anymore. I'll check once the general patch is agreed on.
158–159	Seems fine to me.
180	If we could not generate a new operand but we need to because the old one is an instruction in the region we have to do something.
182	Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar? This code is only triggered while we execute recomputeDependentScalars. If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence? Which order of dependence? We got a DAG here and have to code generate it. One way is to find some topological order on the whole dag and then generate code, the other (which is implemented) is a recursive demand driven code generation. As we do the second everywhere else I figured it makes sense to do it here too.

Meinersbur added inline comments.Dec 1 2015, 3:37 AM

include/polly/ScopInfo.h
1581	What is a "transitive operand"?
lib/Analysis/ScopInfo.cpp
3181	This is nested inside a "for (MemoryAccess *MA : Stmt)" loop and NonTrivialOperandsMap is indexed by its AccessInst. If an instruction has multiple write access (AFAIK not possible at the moment), the code below would run multiple times over the same AccessInst. Multiple instructions in the same Stmt might use the same value, thus its operand tree appears multiple times in this statement. At code generation, each of them would get its own copy of the operand tree, although only one per stmt is necessary (the same tree can be reused for both users)

etherzhhb added a subscriber: etherzhhb.Feb 16 2016, 1:35 AM

Rebase and adjustment according to comments

Herald added a subscriber: MatzeB. · View Herald TranscriptFeb 16 2016, 4:41 AM

jdoerfert added a reviewer: etherzhhb.Feb 16 2016, 4:55 AM

jdoerfert removed a subscriber: etherzhhb.

etherzhhb added inline comments.Feb 16 2016, 7:28 AM

include/polly/ScopInfo.h
1169	Does "Caused by the instruction of MA" mean the user of instruction of MA?
1426	SE is a member of Scop, no need to pass it as a parameter. I think it is reasonable to carry it around because we store SCEVs with Scop, e.g. Parameters.
1565	It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation. +1
1567	why we need they in SCoP Users?
1581	a = b + c d = a + e I guess b, c and hopefully a are the transitive operands of d. They are also the leaves of the operand DAG rooted on d. When we recompute d, the recomputation stop at b, c, e, right? That is, all instructions inside the DAG is recomputed. So NoTrivialInstructionSet and OutsideOperandsSet together are the boundaries/leaves of the recompuation DAG, and NoTrivialInstructionSet are the instructions inside the Scop while OutsideOperandsSet are the instructions outside the Scop, right?
1585	I read: <NoTrivialInstructions (first of NonTrivialOperandsPairTy )> <OutsideOperandsSetTy (second of NonTrivialOperandsPairTy)> \ / Recomputable DAG \| (root) Instruction (the key of NonTrivialOperandsMap) Actually can we check if an operand is outside Scop on the fly, such that only 1 operand set is need?
lib/Analysis/ScopInfo.cpp
2849	SE is a member of Scop
3171	// Now AccessInst is a write scalar access (and not a PHI).
3186–3211	How about using the following approach, which is all over LLVM codebase, and we can create a function for it. // Depth-first traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands) std::set Visited; SmallVector<std::pair<Instruction, Instruction::op_iterator> > VisitStack; VisitStack.push_back(std::make_pair(AccessInst, AccessInst->begin())); while (VisitStack.empty()) { auto CurNode = VisitStack.back().first; auto &CurChildIt = VisitStack.back().second; if (CurChildIt == CurNode->end()) { VisitStack.pop_back(); continue; } Instruction ChildInst = dyn_cast<Instruction>(*CurChildIt); ++CurChildIt; // Stop at invariances if (ChildInst == nullptr) continue; if (!Visited.insert(Inst).second) continue; // Invariance again. if (!R.contains(ChildInst)) { OutsideOperands.push_back(std::make_pair(InstOpInst, Inst)); continue; } if (Inst->mayHaveSideEffects() \|\| Inst->mayReadFromMemory()) { SideEffectOperands.insert(Inst); continue; } if (!isa<PHINode>(Inst)) continue; if (R.contains(Inst) && canSynthesize(Inst, &LI, &SE, &R)) continue; SideEffectOperands.insert(Inst); }
3265	This user is actually inside the Scop, right?
3266	So what we need is only the parent block of the user that is inside the Scop?
3271	I read: if (Stmt.containValueReadOf(OutsideOperand))
lib/CodeGen/BlockGenerators.cpp
183	If copyInstruction returns the newly copied instruction, the code copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); NewOperand = BBMap[OldOperand]; Become: NewOperand = copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); Which looks better and easier to understand. But this can be done in another patch.

See inline comments, otherwise I do not see any problem

Updated according to Hongbin's comments. Unified all non-trivial operands.

I addressed all of the comments except one (the new algorithm to collect non-trivial operands).

@etherzhhb, @Meinersbur, @grosser, @sebpop
Since this patch caused so much discussion I guess it would be nice to get two ppl to tell me it's fine to commit.

include/polly/ScopInfo.h
1169	I created two functions now. One to remove all accesses caused by an instruction and one to remove exactly the accesses passed as arguments.
1567	To find the access in the SCoP (if there is one). Only knowing the outside operand doesn't suffice to identify the MemoryAccess objects that we already created because of this outside operand.
1581	I guess b, c and hopefully a are the transitive operands of d. In the code above operands would mean {b,c,e} for the instruction d. They are also the leaves of the operand DAG rooted on d. True, but whit PHI nodes we might not have a DAG and the PHIs are not necessarily leaves. When we recompute d, the recomputation stop at b, c, e, right? That is, all instructions inside the DAG is recomputed. Yes. And since we do not recompute loop carried PHIs here we always have this nice DAG. So NoTrivialInstructionSet and OutsideOperandsSet together are the boundaries/leaves of the recompuation DAG, and NoTrivialInstructionSet are the instructions inside the Scop while OutsideOperandsSet are the instructions outside the Scop, right? Yes, except the PHI case where we do not have a DAG and even though PHI are not leaves we put them in the NoTrivialInstructionSet.
1585	Good point, I adjusted the code to use only one set of non-trivial operands and then compute the outside operands + inside users on the fly. Thanks!
lib/Analysis/ScopInfo.cpp
3171	Added a comment.
3186–3211	I don't get the problem you try to solve here. Why do we need a different algorithm and what is it that it does better?
3265	Renamed.
3266	True. I changed this code to use only one set of operands and compute everything else on the fly.
3271	Yes. Is that a problem?
lib/CodeGen/BlockGenerators.cpp
183	We could do that later, true.

etherzhhb added inline comments.Feb 22 2016, 6:57 AM

include/polly/ScopInfo.h
1581	Just to confirm, we do not recompute loop-carried PHIs, right?
lib/Analysis/ScopInfo.cpp
3186–3211	It is almost the same algorithm, but in different 'style'. I think I should not push too hard about the "style". So I think we are done with this. But we should put some comments to describe what the algorithm is doing like: // Traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands)
3271	It is a little bit not straight forward, but with the comment above, I think it is ok.
lib/CodeGen/BlockGenerators.cpp
154–158	TryOnly is only used here. And this block is equivalent to: assert(TryOnly && "Unexpected scalar dependence in region!"); return nullptr; Instead of passing TryOnly as function parameter, can we move the assertion out of this function, like: auto *V = getNewValue(....); assert((V \|\| TryOnly) && "Unexpected scalar dependence in region!"); Now we can remove this confusing TryOnly.
344	First, by construction we can place it here as all recomputable scalars do not depend on anything we cannot recompute. Second, we can probably move it after the scalar loads generation but it should be all the same. It would be good to put your above explanations as comments to the code.
1086–1087	Interesting duplication :) We should also add some comments to explain why we recomputeDependentScalar before generateScalarLoads here as well.

etherzhhb requested changes to this revision.Feb 22 2016, 7:03 AM

etherzhhb edited edge metadata.

This revision now requires changes to proceed.Feb 22 2016, 7:03 AM

etherzhhb requested changes to this revision
This revision now requires changes to proceed.

Do you want other changes than the missing comments and the TryOnly thing? Both seem to be minor but the status change seems to indicate a bigger problem with this patch.

include/polly/ScopInfo.h
1581	Not in this patch (or the following three that are more or less rdy), however we are working on loop-carried PHIs too.
lib/Analysis/ScopInfo.cpp
3186–3211	I can do that.
lib/CodeGen/BlockGenerators.cpp
344	I can add a comment. sure.
1086–1087	I am confused. Are you hinting on the fact that we call "recomputeDependentScalars" here too or that we call it again before "generateScalarLoads"? I talked about the order already in the comment above. And the duplication of the "recomputeDependentScalars" call has the reason as the duplication of the "generateScalarLoads" call, this is the starting point of a "copyStmt" function. While above it is for block statements, here it is for region statements. If you want I can extract both calls into a "initializeScalars" method that is called before a statement is copied.

I change the status to make this patch not block on me. There is no major issue.

etherzhhb added inline comments.Feb 22 2016, 2:14 PM

include/polly/ScopInfo.h
1581	ok
lib/CodeGen/BlockGenerators.cpp
1086–1087	I just think the duplicate here is interesting/funny ... My intention is to suggest that we put the same comments as 344 here to explain the order, otherwise people may confused when they see this before they see the comments in line 344. If you want I can extract both calls into a "initializeScalars" method that is called before a statement is copied. This is great but can be done in another patch.

Meinersbur mentioned this in D12975: [Polly] De-LICM and De-GVN (WIP).Feb 24 2016, 9:01 AM

I change the status to make this patch not block on me. There is no major issue.

I think this patch is good to go once the comments are fixed.

This revision is now accepted and ready to land.Feb 26 2016, 3:17 AM

I still want to have a look at this, but did not make it. I am a little
delayed on the larger patches, but first will need to have a look at
Johannes SSA patch.

Best,
Tobias

Ping?

jdoerfert abandoned this revision.Mar 28 2019, 5:02 PM

Herald added a reviewer: bollu. · View Herald TranscriptMar 28 2019, 5:02 PM

Herald added a subscriber: bollu. · View Herald Transcript

Revision Contents

Path

Size

include/

polly/

CodeGen/

BlockGenerators.h

29 lines

ScopInfo.h

107 lines

lib/

Analysis/

ScopInfo.cpp

331 lines

CodeGen/

BlockGenerators.cpp

63 lines

test/

Isl/

CodeGen/

OpenMP/

invariant_base_pointer_preloaded_different_bb.ll

4 lines

eliminate-multiple-scalar-fp-reads.ll

90 lines

eliminate-multiple-scalar-reads.ll

82 lines

eliminate-scalars-with-outside-load.ll

61 lines

srem-in-other-bb.ll

13 lines

ScopInfo/

eliminate-scalar-caused-by-load-reduction-2.ll

56 lines

eliminate-scalar-caused-by-load-reduction.ll

48 lines

inter_bb_scalar_dep.ll

6 lines

intra_and_inter_bb_scalar_dep.ll

2 lines

invariant-loads-leave-read-only-statements.ll

2 lines

multidim_fortran_srem.ll

15 lines

out-of-scop-use-in-region-entry-phi-node.ll

6 lines

scalar_dependence_cond_br.ll

8 lines

schedule-const-post-dominator-walk.ll

1 line

tempscop-printing.ll

5 lines

unneeded_scalar_dependences-1.ll

63 lines

unneeded_scalar_dependences-2.ll

65 lines

unneeded_scalar_dependences-3.ll

66 lines

unneeded_scalar_dependences-4.ll

69 lines

unneeded_scalar_dependences-5.ll

72 lines

unneeded_scalar_dependences-6.ll

49 lines

unneeded_scalar_dependences-7.ll

47 lines

unneeded_scalar_dependences-8.ll

51 lines

unneeded_scalar_dependences-9.ll

60 lines

Diff 48642

include/polly/CodeGen/BlockGenerators.h

Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	protected:

/// @brief Promote the values of demoted scalars after the SCoP.		/// @brief Promote the values of demoted scalars after the SCoP.
///		///
/// If a scalar value was used outside the SCoP we need to promote the value		/// If a scalar value was used outside the SCoP we need to promote the value
/// stored in the memory cell allocated for that scalar and combine it with		/// stored in the memory cell allocated for that scalar and combine it with
/// the original value in the non-optimized SCoP.		/// the original value in the non-optimized SCoP.
void createScalarFinalization(Region &R);		void createScalarFinalization(Region &R);

		/// @brief Recompute scalars needed in this statement.
		MeinersburUnsubmitted Done Reply Inline Actions "all" does not match the description below Meinersbur: "all" does not match the description below
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions All that are not communicated via memory but need to be recomputed. jdoerfert: All that are not communicated via memory but need to be recomputed.
		///
		/// During SCoP creation scalars can be virtually moved to simplify the SCoP
		/// description as well as the dependences. However, they are only moved if
		/// we can recompute them in the statements they are used in. This method will
		MeinersburUnsubmitted Done Reply Inline Actions ... where they are used. Meinersbur: ... where they are used.
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I will check that. jdoerfert: I will check that.
		/// perform the recomputation before we clone the original statement into the
		/// new, optimized region. Thus it ensures that all scalars are available.
		MeinersburUnsubmitted Done Reply Inline Actions ... thus ensures that all ... Meinersbur: ... thus ensures that all ...
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I will check that. jdoerfert: I will check that.
		void recomputeDependentScalars(ScopStmt &Stmt, ValueMapT &BBMap,
		LoopToScevMapT &LTS,
		isl_id_to_ast_expr *NewAccesses);

/// @brief Try to synthesize a new value		/// @brief Try to synthesize a new value
///		///
/// Given an old value, we try to synthesize it in a new context from its		/// Given an old value, we try to synthesize it in a new context from its
/// original SCEV expression. We start from the original SCEV expression,		/// original SCEV expression. We start from the original SCEV expression,
/// then replace outdated parameter and loop references, and finally		/// then replace outdated parameter and loop references, and finally
/// expand it to code that computes this updated expression.		/// expand it to code that computes this updated expression.
///		///
/// @param Stmt The statement to code generate		/// @param Stmt The statement to code generate
/// @param Old The old Value		/// @param Old The old Value
/// @param BBMap A mapping from old values to their new values		/// @param BBMap A mapping from old values to their new values
/// (for values recalculated within this basic block)		/// (for values recalculated within this basic block)
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block)		/// within this basic block)
/// @param L The loop that surrounded the instruction that referenced		/// @param L The loop that surrounded the instruction that referenced
/// this value in the original code. This loop is used to		/// this value in the original code. This loop is used to
/// evaluate the scalar evolution at the right scope.		/// evaluate the scalar evolution at the right scope.
///		///
/// @returns o A newly synthesized value.		/// @returns o A newly synthesized value.
/// o NULL, if synthesizing the value failed.		/// o NULL, if synthesizing the value failed.
Value trySynthesizeNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,		Value trySynthesizeNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const;		LoopToScevMapT &LTS, Loop *L);

/// @brief Get the new version of a value.		/// @brief Get the new version of a value.
///		///
/// Given an old value, we first check if a new version of this value is		/// Given an old value, we first check if a new version of this value is
/// available in the BBMap or GlobalMap. In case it is not and the value can		/// available in the BBMap or GlobalMap. In case it is not and the value can
/// be recomputed using SCEV, we do so. If we can not recompute a value		/// be recomputed using SCEV, we do so. If we can not recompute a value
/// using SCEV, but we understand that the value is constant within the scop,		/// using SCEV, but we understand that the value is constant within the scop,
/// we return the old value. If the value can still not be derived, this		/// we return the old value. If the value can still not be derived, this
/// function will assert.		/// function will assert.
///		///
/// @param Stmt The statement to code generate.		/// @param Stmt The statement to code generate.
/// @param Old The old Value.		/// @param Old The old Value.
/// @param BBMap A mapping from old values to their new values		/// @param BBMap A mapping from old values to their new values
/// (for values recalculated within this basic block).		/// (for values recalculated within this basic block).
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block).		/// within this basic block).
/// @param L The loop that surrounded the instruction that referenced		/// @param L The loop that surrounded the instruction that referenced
/// this value in the original code. This loop is used to		/// this value in the original code. This loop is used to
/// evaluate the scalar evolution at the right scope.		/// evaluate the scalar evolution at the right scope.
		/// @param TryOnly Flag to indicate that nullptr is a valid return value
		/// if no new value was found.
///		///
/// @returns o The old value, if it is still valid.		/// @returns o The old value, if it is still valid.
/// o The new value, if available.		/// o The new value, if available.
/// o NULL, if no value is found.		/// o NULL, if no value is found and TryOnly is set.
		/// o Otherwise a trap is triggered.
		MeinersburUnsubmitted Done Reply Inline Actions llvm_unreachable is not defined to trigger a trap. Meinersbur: llvm_unreachable is not defined to trigger a trap.
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I can use the original message instead. If you have a better way to describe llvm_unreachable I can put it here alternatively. jdoerfert: I can use the original message instead. If you have a better way to describe llvm_unreachable I…
Value getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,		Value getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const;		LoopToScevMapT &LTS, Loop *L, bool TryOnly = false);

void copyInstScalar(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,		void copyInstScalar(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,
LoopToScevMapT &LTS);		LoopToScevMapT &LTS, bool Recompute = false);
		MeinersburUnsubmitted Done Reply Inline Actions Missing documentation? Meinersbur: Missing documentation?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Yeah, somebody should add documentation to all this undocumented functions at some point. jdoerfert: Yeah, somebody should add documentation to all this undocumented functions at some point.

/// @brief Get the innermost loop that surrounds the statement @p Stmt.		/// @brief Get the innermost loop that surrounds the statement @p Stmt.
Loop *getLoopForStmt(const ScopStmt &Stmt) const;		Loop *getLoopForStmt(const ScopStmt &Stmt) const;
jdoerfertAuthorUnsubmitted Done Reply Inline Actions When we virtually move instructions we are not really interested in the loop surroinding them but their new parent block/statement, thus the change. jdoerfert: When we virtually move instructions we are not really interested in the loop surroinding them…

/// @brief Generate the operand address		/// @brief Generate the operand address
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
Value *generateLocationAccessed(ScopStmt &Stmt, MemAccInst Inst,		Value *generateLocationAccessed(ScopStmt &Stmt, MemAccInst Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses);		isl_id_to_ast_expr *NewAccesses);
Show All 33 Lines	protected:
/// within this basic block).		/// within this basic block).
/// @param LTS A mapping from loops virtual canonical induction		/// @param LTS A mapping from loops virtual canonical induction
/// variable to their new values		/// variable to their new values
/// (for values recalculated in the new ScoP, but not		/// (for values recalculated in the new ScoP, but not
/// within this basic block).		/// within this basic block).
/// @param NewAccesses A map from memory access ids to new ast expressions,		/// @param NewAccesses A map from memory access ids to new ast expressions,
/// which may contain new access expressions for certain		/// which may contain new access expressions for certain
/// memory accesses.		/// memory accesses.
		/// @param Recompute Flag to indicate that the instruction is a scalar that
		MeinersburUnsubmitted Done Reply Inline Actions Flag to indicate that the instruction ... Meinersbur: Flag to indicate that the instruction ...
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I will check that. jdoerfert: I will check that.
		/// needs to be recomputed in this statement. It basically
		/// forces us to copy not only the instruction but also all
		/// operands if we cannot find a local or global mapping.
void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,		void copyInstruction(ScopStmt &Stmt, Instruction *Inst, ValueMapT &BBMap,
LoopToScevMapT &LTS, isl_id_to_ast_expr *NewAccesses);		LoopToScevMapT &LTS, isl_id_to_ast_expr *NewAccesses,
		bool Recompute = false);

/// @brief Helper to determine if @p Inst can be synthezised in @p Stmt.		/// @brief Helper to determine if @p Inst can be synthezised in @p Stmt.
///		///
/// @returns false, iff @p Inst can be synthesized in @p Stmt.		/// @returns false, iff @p Inst can be synthesized in @p Stmt.
bool canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst);		bool canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst);
};		};

/// @brief Generate a new vector basic block for a polyhedral statement.		/// @brief Generate a new vector basic block for a polyhedral statement.
▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

include/polly/ScopInfo.h

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	public:
/// where the result of the PHI node is stored and later loaded from as well		/// where the result of the PHI node is stored and later loaded from as well
/// as a second one where the incoming values of the PHI nodes are stored		/// as a second one where the incoming values of the PHI nodes are stored
/// into and reloaded when the PHI is executed. As both memories use the		/// into and reloaded when the PHI is executed. As both memories use the
/// original PHI node as virtual base pointer, we have this additional		/// original PHI node as virtual base pointer, we have this additional
/// attribute to distinguish the PHI node specific array modeling from the		/// attribute to distinguish the PHI node specific array modeling from the
/// normal scalar array modeling.		/// normal scalar array modeling.
bool isPHIKind() const { return Kind == MK_PHI; };		bool isPHIKind() const { return Kind == MK_PHI; };

		/// @rbeif Return the memory kind of this array.
		MemoryKind getKind() const { return Kind; };

/// @brief Dump a readable representation to stderr.		/// @brief Dump a readable representation to stderr.
void dump() const;		void dump() const;

/// @brief Print a readable representation to @p OS.		/// @brief Print a readable representation to @p OS.
///		///
/// @param SizeAsPwAff Print the size as isl_pw_aff		/// @param SizeAsPwAff Print the size as isl_pw_aff
void print(raw_ostream &OS, bool SizeAsPwAff = false) const;		void print(raw_ostream &OS, bool SizeAsPwAff = false) const;

▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	private:
/// @brief Assemble the access relation from all availbale information.		/// @brief Assemble the access relation from all availbale information.
///		///
/// In particular, used the information passes in the constructor and the		/// In particular, used the information passes in the constructor and the
/// parent ScopStmt set by setStatment().		/// parent ScopStmt set by setStatment().
///		///
/// @param SAI Info object for the accessed array.		/// @param SAI Info object for the accessed array.
void buildAccessRelation(const ScopArrayInfo *SAI);		void buildAccessRelation(const ScopArrayInfo *SAI);

		/// @brief Copy this memory access into the given statement @p Stmt.
		///
		/// @param AccList The list that contains all accesses for @p Stmt.
		/// @param NewStmt The statement the copied access should reside in.
		MemoryAccess copyTo(AccFuncSetType &AccList, ScopStmt NewStmt) const;
		MeinersburUnsubmitted Not Done Reply Inline Actions We can get the AccList from getStatement()->getParent()->AccFuncMap. Not necessary to pass it as parameter. Rename to "copyTo"? Meinersbur: We can get the AccList from getStatement()->getParent()->AccFuncMap. Not necessary to pass it…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I would like to avoid accessing members of a different class. jdoerfert: I would like to avoid accessing members of a different class.

public:		public:
/// @brief Create a new MemoryAccess.		/// @brief Create a new MemoryAccess.
///		///
/// @param Stmt The parent statement.		/// @param Stmt The parent statement.
/// @param AccessInst The instruction doing the access.		/// @param AccessInst The instruction doing the access.
/// @param BaseAddr The accessed array's address.		/// @param BaseAddr The accessed array's address.
/// @param ElemType The type of the accessed array elements.		/// @param ElemType The type of the accessed array elements.
/// @param AccType Whether read or write access.		/// @param AccType Whether read or write access.
Show All 26 Lines	public:
/// anymore. For this reason we remember these explicitely for all PHI-kind		/// anymore. For this reason we remember these explicitely for all PHI-kind
/// accesses.		/// accesses.
ArrayRef<std::pair<BasicBlock , Value >> getIncoming() const {		ArrayRef<std::pair<BasicBlock , Value >> getIncoming() const {
assert(isAnyPHIKind());		assert(isAnyPHIKind());
return Incoming;		return Incoming;
}		}

/// @brief Get the type of a memory access.		/// @brief Get the type of a memory access.
enum AccessType getType() { return AccType; }		enum AccessType getType() const { return AccType; }
		MeinersburUnsubmitted Done Reply Inline Actions Unrelated? Meinersbur: Unrelated?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Needed somewhere, but it was quite a while ago. If ppl need to know where I can check. jdoerfert: Needed somewhere, but it was quite a while ago. If ppl need to know where I can check.

/// @brief Is this a reduction like access?		/// @brief Is this a reduction like access?
bool isReductionLike() const { return RedType != RT_NONE; }		bool isReductionLike() const { return RedType != RT_NONE; }

/// @brief Is this a read memory access?		/// @brief Is this a read memory access?
bool isRead() const { return AccType == MemoryAccess::READ; }		bool isRead() const { return AccType == MemoryAccess::READ; }

/// @brief Is this a must-write memory access?		/// @brief Is this a must-write memory access?
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	private:

/// @brief The isl AST build for the new generated AST.		/// @brief The isl AST build for the new generated AST.
isl_ast_build *Build;		isl_ast_build *Build;

SmallVector<Loop *, 4> NestLoops;		SmallVector<Loop *, 4> NestLoops;

std::string BaseName;		std::string BaseName;

		/// @brief Set of scalar values that need to be recomputed in this statement.
		SetVector<Instruction *> DependentScalars;

/// Build the statement.		/// Build the statement.
//@{		//@{
void buildDomain();		void buildDomain();

/// @brief Fill NestLoops with loops surrounding this statement.		/// @brief Fill NestLoops with loops surrounding this statement.
void collectSurroundingLoops();		void collectSurroundingLoops();

/// @brief Build the access relation of all memory accesses.		/// @brief Build the access relation of all memory accesses.
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	public:
void setBasicBlock(BasicBlock *Block) {		void setBasicBlock(BasicBlock *Block) {
// TODO: Handle the case where the statement is a region statement, thus		// TODO: Handle the case where the statement is a region statement, thus
// the entry block was split and needs to be changed in the region R.		// the entry block was split and needs to be changed in the region R.
assert(BB && "Cannot set a block for a region statement");		assert(BB && "Cannot set a block for a region statement");
BB = Block;		BB = Block;
}		}

/// @brief Add @p Access to this statement's list of accesses.		/// @brief Add @p Access to this statement's list of accesses.
void addAccess(MemoryAccess *Access);		///
		/// @param Access The access to add.
		/// @param Front Flag to indicate where the access should be added.
		void addAccess(MemoryAccess *Access, bool Front = false);
		MeinersburUnsubmitted Done Reply Inline Actions I think we should not depend on the order of accesses. ISL does not understand such an order and depending on it will likely cause some bugs (Think of a load after storing to the same location; do we even handle that case yet?). AFAIK ISL assumes that all loads happen before the stores. Meinersbur: I think we should not depend on the order of accesses. ISL does not understand such an order…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I'm not sure about your isl assumption but in any case, I'm not assuming anything here. The different input locations are purely for user experience (as the accesses are ordered in a more natural fashion). jdoerfert: I'm not sure about your isl assumption but in any case, I'm not assuming anything here. The…

		/// @brief Remove the memory access cause by @p Inst.
		void removeMemoryAccessesCausedBy(Instruction *Inst);

/// @brief Remove the memory access in @p InvMAs.		/// @brief Remove the memory access in @p InvMAs.
///		void removeMemoryAccesses(const MemoryAccessList &InvMAs);
		etherzhhbUnsubmitted Done Reply Inline Actions Does "Caused by the instruction of MA" mean the user of instruction of MA? etherzhhb: Does "Caused by the instruction of MA" mean the user of instruction of MA?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I created two functions now. One to remove all accesses caused by an instruction and one to remove exactly the accesses passed as arguments. jdoerfert: I created two functions now. One to remove all accesses caused by an instruction and one to…
/// Note that scalar accesses that are caused by any access in @p InvMAs will
/// be eliminated too.
void removeMemoryAccesses(MemoryAccessList &InvMAs);

typedef MemoryAccessVec::iterator iterator;		typedef MemoryAccessVec::iterator iterator;
		MeinersburUnsubmitted Not Done Reply Inline Actions OnlyMA=false removes multiple accesses, so move to a "removeAccessesCausedBy" function? Meinersbur: OnlyMA=false removes multiple accesses, so move to a "removeAccessesCausedBy" function?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We could do that. jdoerfert: We could do that.
typedef MemoryAccessVec::const_iterator const_iterator;		typedef MemoryAccessVec::const_iterator const_iterator;

iterator begin() { return MemAccs.begin(); }		iterator begin() { return MemAccs.begin(); }
iterator end() { return MemAccs.end(); }		iterator end() { return MemAccs.end(); }
const_iterator begin() const { return MemAccs.begin(); }		const_iterator begin() const { return MemAccs.begin(); }
const_iterator end() const { return MemAccs.end(); }		const_iterator end() const { return MemAccs.end(); }
size_t size() const { return MemAccs.size(); }		size_t size() const { return MemAccs.size(); }

unsigned getNumParams() const;		unsigned getNumParams() const;
unsigned getNumIterators() const;		unsigned getNumIterators() const;

Scop *getParent() { return &Parent; }		Scop *getParent() { return &Parent; }
const Scop *getParent() const { return &Parent; }		const Scop *getParent() const { return &Parent; }

const char *getBaseName() const;		const char *getBaseName() const;

/// @brief Set the isl AST build.		/// @brief Set the isl AST build.
void setAstBuild(__isl_keep isl_ast_build *B) { Build = B; }		void setAstBuild(__isl_keep isl_ast_build *B) { Build = B; }

/// @brief Get the isl AST build.		/// @brief Get the isl AST build.
__isl_keep isl_ast_build *getAstBuild() const { return Build; }		__isl_keep isl_ast_build *getAstBuild() const { return Build; }

		/// @brief Add a scalar that needs to be recomputed in this statement.
		grosserUnsubmitted Done Reply Inline Actions that needs to be recomputed grosser: that needs to be recomputed
		void addDependentScalar(Instruction *Inst) { DependentScalars.insert(Inst); }

		/// @brief Return the scalars that need to be recomputed in this statement.
		const SetVector<Instruction *> &getDependentScalars() const {
		MeinersburUnsubmitted Done Reply Inline Actions It would be better if the class would not expose its implementation as an API. It could return iterators or an ArrayRef. Meinersbur: It would be better if the class would not expose its implementation as an API. It could return…
		return DependentScalars;
		}

/// @brief Restrict the domain of the statement.		/// @brief Restrict the domain of the statement.
///		///
/// @param NewDomain The new statement domain.		/// @param NewDomain The new statement domain.
void restrictDomain(__isl_take isl_set *NewDomain);		void restrictDomain(__isl_take isl_set *NewDomain);

/// @brief Compute the isl representation for the SCEV @p E in this stmt.		/// @brief Compute the isl representation for the SCEV @p E in this stmt.
__isl_give isl_pw_aff getPwAff(const SCEV E);		__isl_give isl_pw_aff getPwAff(const SCEV E);

▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	private:
/// @brief Get or create the access function set in a BasicBlock		/// @brief Get or create the access function set in a BasicBlock
AccFuncSetType &getOrCreateAccessFunctions(const BasicBlock *BB) {		AccFuncSetType &getOrCreateAccessFunctions(const BasicBlock *BB) {
return AccFuncMap[BB];		return AccFuncMap[BB];
}		}
//@}		//@}

/// @brief Initialize this ScopInfo .		/// @brief Initialize this ScopInfo .
void init(AliasAnalysis &AA, AssumptionCache &AC, ScopDetection &SD,		void init(AliasAnalysis &AA, AssumptionCache &AC, ScopDetection &SD,
DominatorTree &DT, LoopInfo &LI);		DominatorTree &DT, LoopInfo &LI);
		etherzhhbUnsubmitted Done Reply Inline Actions SE is a member of Scop, no need to pass it as a parameter. I think it is reasonable to carry it around because we store SCEVs with Scop, e.g. Parameters. etherzhhb: SE is a member of Scop, no need to pass it as a parameter. I think it is reasonable to carry…

/// @brief Add loop carried constraints to the header block of the loop @p L.		/// @brief Add loop carried constraints to the header block of the loop @p L.
///		///
/// @param L The loop to process.		/// @param L The loop to process.
/// @param LI The LoopInfo for the current function.		/// @param LI The LoopInfo for the current function.
void addLoopBoundsToHeaderDomain(Loop *L, LoopInfo &LI);		void addLoopBoundsToHeaderDomain(Loop *L, LoopInfo &LI);

/// @brief Compute the branching constraints for each basic block in @p R.		/// @brief Compute the branching constraints for each basic block in @p R.
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	private:
/// Required inv. loads: LB[0], LB[1], (V, if it may alias with A or LB)		/// Required inv. loads: LB[0], LB[1], (V, if it may alias with A or LB)
///		///
/// @param SD The ScopDetection analysis for the current function.		/// @param SD The ScopDetection analysis for the current function.
void hoistInvariantLoads(ScopDetection &SD);		void hoistInvariantLoads(ScopDetection &SD);

/// @brief Add invariant loads listed in @p InvMAs with the domain of @p Stmt.		/// @brief Add invariant loads listed in @p InvMAs with the domain of @p Stmt.
void addInvariantLoads(ScopStmt &Stmt, MemoryAccessList &InvMAs);		void addInvariantLoads(ScopStmt &Stmt, MemoryAccessList &InvMAs);

		/// @brief Type for a set of values.
		using ValueSetTy = SmallPtrSet<Value *, 4>;

		/// @brief Check if we can recompute @p Inst in @p Stmt.
		///
		/// @param Stmt The statement we want to recompute @p Insts in.
		/// @param Inst The instruction we need to recompute.
		MeinersburUnsubmitted Not Done Reply Inline Actions It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation. Meinersbur: It would be more idomatic if each call would check only a single Instruction instead of passing…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We could do that. jdoerfert: We could do that.
		etherzhhbUnsubmitted Done Reply Inline Actions It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation. +1 etherzhhb: >It would be more idomatic if each call would check only a single Instruction instead of…
		///
		/// @returns True, if @p Inst can be recomputed in @p Stmt.
		etherzhhbUnsubmitted Not Done Reply Inline Actions why we need they in SCoP Users? etherzhhb: why we need they in SCoP Users?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions To find the access in the SCoP (if there is one). Only knowing the outside operand doesn't suffice to identify the MemoryAccess objects that we already created because of this outside operand. jdoerfert: To find the access in the SCoP (if there is one). Only knowing the outside operand doesn't…
		bool canRecomputeInStmt(ScopStmt &Stmt, Instruction *Inst);

		/// @brief Check if we can recompute all non-trivial operands in @p Stmt.
		///
		/// @param Stmt The statement we want copy the computation to.
		/// @param NonTrivialOperands The non-trivial operands that need to be copied.
		///
		/// @returns True, if all non-trivial operands can be recomputed in @p Stmt.
		bool canRecomputeInStmt(ScopStmt &Stmt, ValueSetTy &NonTrivialOperands);

		/// @brief Map from in SCoP instructions to their non-trivial operands.
		///
		/// This maps an instruction to all its non-trivial operands. Non-trivial
		/// operands include instructions that are actually not trivially
		MeinersburUnsubmitted Not Done Reply Inline Actions I think you should clarify that you actually mean a flattened operand _tree_. I think LLVM uses the term operands as the direct arguments/parameters of an instruction. Meinersbur: I think you should clarify that you actually mean a flattened operand _tree_. I think LLVM uses…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Where do you expect a tree here? It actually contains transitive operands and I could add that word. jdoerfert: Where do you expect a tree here? It actually contains __transitive__ operands and I could add…
		MeinersburUnsubmitted Not Done Reply Inline Actions What is a "transitive operand"? Meinersbur: What is a "transitive operand"?
		etherzhhbUnsubmitted Not Done Reply Inline Actions a = b + c d = a + e I guess b, c and hopefully a are the transitive operands of d. They are also the leaves of the operand DAG rooted on d. When we recompute d, the recomputation stop at b, c, e, right? That is, all instructions inside the DAG is recomputed. So NoTrivialInstructionSet and OutsideOperandsSet together are the boundaries/leaves of the recompuation DAG, and NoTrivialInstructionSet are the instructions inside the Scop while OutsideOperandsSet are the instructions outside the Scop, right? etherzhhb: ``` a = b + c d = a + e ``` I guess b, c and hopefully a are the transitive operands of d.
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I guess b, c and hopefully a are the transitive operands of d. In the code above operands would mean {b,c,e} for the instruction d. They are also the leaves of the operand DAG rooted on d. True, but whit PHI nodes we might not have a DAG and the PHIs are not necessarily leaves. When we recompute d, the recomputation stop at b, c, e, right? That is, all instructions inside the DAG is recomputed. Yes. And since we do not recompute loop carried PHIs here we always have this nice DAG. So NoTrivialInstructionSet and OutsideOperandsSet together are the boundaries/leaves of the recompuation DAG, and NoTrivialInstructionSet are the instructions inside the Scop while OutsideOperandsSet are the instructions outside the Scop, right? Yes, except the PHI case where we do not have a DAG and even though PHI are not leaves we put them in the NoTrivialInstructionSet. jdoerfert: > I guess b, c and hopefully a are the transitive operands of d. In the code above operands…
		etherzhhbUnsubmitted Not Done Reply Inline Actions Just to confirm, we do not recompute loop-carried PHIs, right? etherzhhb: Just to confirm, we do not recompute loop-carried PHIs, right?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Not in this patch (or the following three that are more or less rdy), however we are working on loop-carried PHIs too. jdoerfert: Not in this patch (or the following three that are more or less rdy), however we are working on…
		etherzhhbUnsubmitted Not Done Reply Inline Actions ok etherzhhb: ok
		/// recompute-able, e.g., PHI nodes and loads, but also all out-of-scop
		/// operands. They are included because we need to model the scalar read
		/// accesses of the outside operands properly.
		DenseMap<Instruction *, ValueSetTy> NonTrivialOperandsMap;
		MeinersburUnsubmitted Not Done Reply Inline Actions AFAICS it is only used during the execution of simplifyScalarAccesses. You could make it local and pass it to the functions that need it. Meinersbur: AFAICS it is only used during the execution of simplifyScalarAccesses. You could make it local…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We could do that. jdoerfert: We could do that.
		etherzhhbUnsubmitted Not Done Reply Inline Actions I read: <NoTrivialInstructions (first of NonTrivialOperandsPairTy )> <OutsideOperandsSetTy (second of NonTrivialOperandsPairTy)> \ / Recomputable DAG \| (root) Instruction (the key of NonTrivialOperandsMap) Actually can we check if an operand is outside Scop on the fly, such that only 1 operand set is need? etherzhhb: I read: ``` <NoTrivialInstructions (first of NonTrivialOperandsPairTy )>…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Good point, I adjusted the code to use only one set of non-trivial operands and then compute the outside operands + inside users on the fly. Thanks! jdoerfert: Good point, I adjusted the code to use only one set of non-trivial operands and then compute…

		/// @brief Collect non-trivial operands for all scalar write accesses.
		///
		/// @param LI The LoopInfo for the current function.
		///
		void collectNonTrivialOperands(LoopInfo &LI);

		/// @brief Remove scalar reads that can be recomputed.
		///
		/// This function will check if a scalar read can be recomputed instead of
		/// reloaded/communicated. If so it will remove that access and add the
		/// definition to the ScopStmt::DependentScalars of that statement instead.
		/// During code generation these scalars will be recomputed before the
		/// instructions in the statement are copied, thus can be used in the
		/// statement without the need communicate them through memory. Additionally,
		/// this function might introduce new read accesses for operands that are
		/// defined outside the SCoP in order to allow e.g., OpenMP code generation
		/// easy access to the values needed in a statement.
		void removeRecomputableScalarReads(ScopStmt &Stmt);

		/// @brief Remove scalar writes that are not read in the SCoP or at its exit.
		///
		/// When Scop::recomputeScalarReads() removes read accesses the corresponding
		/// write accesses might not be needed anymore. To this end, this function
		/// removes scalar writes without scalar reads inside the SCoP or an escaping
		/// use that would cause a reload and merge at the SCoP exit.
		void removeUnreadScalarWrites(ScopStmt &Stmt);

		/// @brief Simplify the scalar accesses in this SCoP.
		///
		/// Scalar accesses are often not needed and only caused by the placement in
		/// the code. Additionally it is sometimes possible to recompute scalars to
		/// avoid communication. As scalars basically sequentialize all loops they are
		/// in, we try to avoid scalar accesses as much as possible. To this end we
		/// will virtually move and later recompute them in the code generation. This
		MeinersburUnsubmitted Not Done Reply Inline Actions The name "simplifyScalarAccesses" is very general. How about "recomputableDependentScalars"? Meinersbur: The name "simplifyScalarAccesses" is very general. How about "recomputableDependentScalars"?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We do not recompute here and later we do not always recompute to simplify scalars, thus I think it actually fits pretty well. jdoerfert: We do not recompute here and later we do not always recompute to simplify scalars, thus I think…
		/// allows more freedom for the scheduler while we do not need to change the
		/// original code region at all.
		///
		/// @param LI The LoopInfo for the current function.
		///
		void simplifyScalarAccesses(LoopInfo &LI);

/// @brief Build the Context of the Scop.		/// @brief Build the Context of the Scop.
void buildContext();		void buildContext();

/// @brief Build the BoundaryContext based on the wrapping of expressions.		/// @brief Build the BoundaryContext based on the wrapping of expressions.
void buildBoundaryContext();		void buildBoundaryContext();

/// @brief Add user provided parameter constraints to context (source code).		/// @brief Add user provided parameter constraints to context (source code).
void addUserAssumptions(AssumptionCache &AC, DominatorTree &DT, LoopInfo &LI);		void addUserAssumptions(AssumptionCache &AC, DominatorTree &DT, LoopInfo &LI);
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	public:
typedef StmtSet::const_reverse_iterator const_reverse_iterator;		typedef StmtSet::const_reverse_iterator const_reverse_iterator;

reverse_iterator rbegin() { return Stmts.rbegin(); }		reverse_iterator rbegin() { return Stmts.rbegin(); }
reverse_iterator rend() { return Stmts.rend(); }		reverse_iterator rend() { return Stmts.rend(); }
const_reverse_iterator rbegin() const { return Stmts.rbegin(); }		const_reverse_iterator rbegin() const { return Stmts.rbegin(); }
const_reverse_iterator rend() const { return Stmts.rend(); }		const_reverse_iterator rend() const { return Stmts.rend(); }
//@}		//@}

		/// @brief Remove the ScopArrayInfo object @p SAI.
		void removeScopArrayInfo(const ScopArrayInfo *SAI);

/// @brief Return the (possibly new) ScopArrayInfo object for @p Access.		/// @brief Return the (possibly new) ScopArrayInfo object for @p Access.
///		///
/// @param ElementType The type of the elements stored in this array.		/// @param ElementType The type of the elements stored in this array.
/// @param Kind The kind of the array info object.		/// @param Kind The kind of the array info object.
const ScopArrayInfo getOrCreateScopArrayInfo(Value BasePtr,		const ScopArrayInfo getOrCreateScopArrayInfo(Value BasePtr,
Type *ElementType,		Type *ElementType,
ArrayRef<const SCEV *> Sizes,		ArrayRef<const SCEV *> Sizes,
ScopArrayInfo::MemoryKind Kind);		ScopArrayInfo::MemoryKind Kind);
▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

lib/Analysis/ScopInfo.cpp

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	MemoryAccess::MemoryAccess(ScopStmt Stmt, Instruction AccessInst,
static const std::string TypeStrings[] = {"", "_Read", "_Write", "_MayWrite"};		static const std::string TypeStrings[] = {"", "_Read", "_Write", "_MayWrite"};
const std::string Access = TypeStrings[AccType] + utostr(Stmt->size()) + "_";		const std::string Access = TypeStrings[AccType] + utostr(Stmt->size()) + "_";

std::string IdName =		std::string IdName =
getIslCompatibleName(Stmt->getBaseName(), Access, BaseName);		getIslCompatibleName(Stmt->getBaseName(), Access, BaseName);
Id = isl_id_alloc(Stmt->getParent()->getIslCtx(), IdName.c_str(), this);		Id = isl_id_alloc(Stmt->getParent()->getIslCtx(), IdName.c_str(), this);
}		}

		MemoryAccess *MemoryAccess::copyTo(AccFuncSetType &AccList,
		ScopStmt *NewStmt) const {
		AccList.emplace_back(NewStmt, getAccessInstruction(), getType(),
		getBaseAddr(), getElementType(), isAffine(), Subscripts,
		Sizes, getAccessValue(), Kind, getBaseName());
		MemoryAccess *CopyMA = &AccList.back();

		unsigned CurStmtDims = getStatement()->getNumIterators();
		unsigned NewStmtDims = NewStmt->getNumIterators();
		assert(NewStmtDims >= CurStmtDims);

		isl_map *AR = getAccessRelation();
		MeinersburUnsubmitted Done Reply Inline Actions What if it has fewer dimensions? If it is impossible, could you add an assertion to make it clear? Meinersbur: What if it has fewer dimensions? If it is impossible, could you add an assertion to make it…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Sure we can add an assertion. jdoerfert: Sure we can add an assertion.
		if (NewStmtDims > CurStmtDims)
		AR = isl_map_add_dims(AR, isl_dim_in, NewStmtDims - CurStmtDims);

		CopyMA->AccessRelation =
		isl_map_set_tuple_id(AR, isl_dim_in, NewStmt->getDomainId());
		return CopyMA;
		MeinersburUnsubmitted Done Reply Inline Actions Doesn't this update InstructionToAccesses? Meinersbur: Doesn't this update InstructionToAccesses?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Mh, there was no need but we can do that too I guess. jdoerfert: Mh, there was no need but we can do that too I guess.
		}

void MemoryAccess::realignParams() {		void MemoryAccess::realignParams() {
isl_space *ParamSpace = Statement->getParent()->getParamSpace();		isl_space *ParamSpace = Statement->getParent()->getParamSpace();
AccessRelation = isl_map_align_params(AccessRelation, ParamSpace);		AccessRelation = isl_map_align_params(AccessRelation, ParamSpace);
}		}

const std::string MemoryAccess::getReductionOperatorStr() const {		const std::string MemoryAccess::getReductionOperatorStr() const {
return MemoryAccess::getReductionOperatorStr(getReductionType());		return MemoryAccess::getReductionOperatorStr(getReductionType());
}		}
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	else
Ty = ScopArrayInfo::MK_Array;		Ty = ScopArrayInfo::MK_Array;

auto *SAI = S.getOrCreateScopArrayInfo(Access->getBaseAddr(), ElementType,		auto *SAI = S.getOrCreateScopArrayInfo(Access->getBaseAddr(), ElementType,
Access->Sizes, Ty);		Access->Sizes, Ty);
Access->buildAccessRelation(SAI);		Access->buildAccessRelation(SAI);
}		}
}		}

void ScopStmt::addAccess(MemoryAccess *Access) {		void ScopStmt::addAccess(MemoryAccess *Access, bool Front) {
Instruction *AccessInst = Access->getAccessInstruction();		Instruction *AccessInst = Access->getAccessInstruction();

if (Access->isArrayKind()) {		if (Access->isArrayKind()) {
MemoryAccessList &MAL = InstructionToAccess[AccessInst];		MemoryAccessList &MAL = InstructionToAccess[AccessInst];
MAL.emplace_front(Access);		MAL.emplace_front(Access);
} else if (Access->isValueKind() && Access->isWrite()) {		} else if (Access->isValueKind() && Access->isWrite()) {
Instruction *AccessVal = cast<Instruction>(Access->getAccessValue());		Instruction *AccessVal = cast<Instruction>(Access->getAccessValue());
assert(Parent.getStmtForBasicBlock(AccessVal->getParent()) == this);		assert(Parent.getStmtForBasicBlock(AccessVal->getParent()) == this);
assert(!ValueWrites.lookup(AccessVal));		assert(!ValueWrites.lookup(AccessVal));

ValueWrites[AccessVal] = Access;		ValueWrites[AccessVal] = Access;
} else if (Access->isValueKind() && Access->isRead()) {		} else if (Access->isValueKind() && Access->isRead()) {
Value *AccessVal = Access->getAccessValue();		Value *AccessVal = Access->getAccessValue();
assert(!ValueReads.lookup(AccessVal));		assert(!ValueReads.lookup(AccessVal));

ValueReads[AccessVal] = Access;		ValueReads[AccessVal] = Access;
} else if (Access->isAnyPHIKind() && Access->isWrite()) {		} else if (Access->isAnyPHIKind() && Access->isWrite()) {
PHINode *PHI = cast<PHINode>(Access->getBaseAddr());		PHINode *PHI = cast<PHINode>(Access->getBaseAddr());
assert(!PHIWrites.lookup(PHI));		assert(!PHIWrites.lookup(PHI));

PHIWrites[PHI] = Access;		PHIWrites[PHI] = Access;
}		}

		if (Front)
		MemAccs.insert(MemAccs.begin(), Access);
		else
MemAccs.push_back(Access);		MemAccs.push_back(Access);
}		}

void ScopStmt::realignParams() {		void ScopStmt::realignParams() {
for (MemoryAccess MA : this)		for (MemoryAccess MA : this)
MA->realignParams();		MA->realignParams();

Domain = isl_set_align_params(Domain, Parent.getParamSpace());		Domain = isl_set_align_params(Domain, Parent.getParamSpace());
}		}
▲ Show 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	void ScopStmt::print(raw_ostream &OS) const {
} else		} else
OS.indent(16) << "n/a\n";		OS.indent(16) << "n/a\n";

for (MemoryAccess *Access : MemAccs)		for (MemoryAccess *Access : MemAccs)
Access->print(OS);		Access->print(OS);
}		}

void ScopStmt::dump() const { print(dbgs()); }		void ScopStmt::dump() const { print(dbgs()); }

void ScopStmt::removeMemoryAccesses(MemoryAccessList &InvMAs) {		void ScopStmt::removeMemoryAccessesCausedBy(Instruction *Inst) {
		grosserUnsubmitted Not Done Reply Inline Actions use is / uses are grosser: use is / uses are
// Remove all memory accesses in @p InvMAs from this statement		MemoryAccessList MAs;
// together with all scalar accesses that were caused by them.		const auto &MAList = InstructionToAccess.lookup(Inst);
// MK_Value READs have no access instruction, hence would not be removed by		MAs.insert_after(MAs.before_begin(), MAList.begin(), MAList.end());
// this function. However, it is only used for invariant LoadInst accesses,		if (auto *MA = ValueWrites.lookup(Inst))
// its arguments are always affine, hence synthesizable, and therefore there		MAs.push_front(MA);
// are no MK_Value READ accesses to be removed.		if (auto *MA = ValueReads.lookup(Inst))
for (MemoryAccess *MA : InvMAs) {		MAs.push_front(MA);
auto Predicate = [&](MemoryAccess *Acc) {		if (auto *MA = PHIWrites.lookup(dyn_cast<PHINode>(Inst)))
return Acc->getAccessInstruction() == MA->getAccessInstruction();		MAs.push_front(MA);
		MeinersburUnsubmitted Done Reply Inline Actions I'd still ask you to refactor this code. It is hard to understand and looks fragile as it depends on order. Maybe you could use some functions from <algorithm> Meinersbur: I'd still ask you to refactor this code. It is hard to understand and looks fragile as it…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions This code (already in the current HEAD) is ugly because it tries to be smart. While it heavily depends on the order it might need a lot more iterations if it would not. I don't argue we have to keep it but only that it works in Polly HEAD for now and can be takled afterwards, if we find a nice and comparable way to do it. jdoerfert: This code (already in the current HEAD) is ugly because it tries to be smart. While it heavily…
};		removeMemoryAccesses(MAs);
		MeinersburUnsubmitted Not Done Reply Inline Actions I do not understand this code. Why did you even remove any comment about what it is doing? Meinersbur: I do not understand this code. Why did you even remove any comment about what it is doing?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions This code was basically moved and the original comment was left behind, hence the answer to your question is simple: The comment has to be moved too. jdoerfert: This code was basically moved and the original comment was left behind, hence the answer to…
		}

		void ScopStmt::removeMemoryAccesses(const MemoryAccessList &MAs) {
		for (MemoryAccess *MA : MAs) {
		auto Predicate = [MA](MemoryAccess *Acc) { return Acc == MA; };
MemAccs.erase(std::remove_if(MemAccs.begin(), MemAccs.end(), Predicate),		MemAccs.erase(std::remove_if(MemAccs.begin(), MemAccs.end(), Predicate),
MemAccs.end());		MemAccs.end());
InstructionToAccess.erase(MA->getAccessInstruction());		InstructionToAccess.erase(MA->getAccessInstruction());
		if (MA->isValueKind() && MA->isWrite())
		ValueWrites.erase(MA->getAccessInstruction());
		else if (MA->isValueKind() && MA->isRead())
		ValueReads.erase(MA->getAccessValue());
		else if (MA->isAnyPHIKind() && MA->isWrite())
		PHIWrites.erase(cast<PHINode>(MA->getBaseAddr()));
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// Scop class implement		/// Scop class implement

void Scop::setContext(__isl_take isl_set *NewContext) {		void Scop::setContext(__isl_take isl_set *NewContext) {
NewContext = isl_set_align_params(NewContext, isl_set_get_space(Context));		NewContext = isl_set_align_params(NewContext, isl_set_get_space(Context));
▲ Show 20 Lines • Show All 1,203 Lines • ▼ Show 20 Lines	void Scop::init(AliasAnalysis &AA, AssumptionCache &AC, ScopDetection &SD,
realignParams();		realignParams();
addParameterBounds();		addParameterBounds();
addUserContext();		addUserContext();
buildBoundaryContext();		buildBoundaryContext();
simplifyContexts();		simplifyContexts();
buildAliasChecks(AA);		buildAliasChecks(AA);

hoistInvariantLoads(SD);		hoistInvariantLoads(SD);
		simplifyScalarAccesses(LI);
		etherzhhbUnsubmitted Done Reply Inline Actions SE is a member of Scop etherzhhb: SE is a member of Scop
simplifySCoP(false, DT, LI);		simplifySCoP(false, DT, LI);
}		}

Scop::~Scop() {		Scop::~Scop() {
isl_set_free(Context);		isl_set_free(Context);
isl_set_free(AssumedContext);		isl_set_free(AssumedContext);
isl_set_free(BoundaryContext);		isl_set_free(BoundaryContext);
isl_schedule_free(Schedule);		isl_schedule_free(Schedule);
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	for (ScopStmt &Stmt : *this) {
// We inserted invariant accesses always in the front but need them to be		// We inserted invariant accesses always in the front but need them to be
// sorted in a "natural order". The statements are already sorted in reverse		// sorted in a "natural order". The statements are already sorted in reverse
// post order and that suffices for the accesses too. The reason we require		// post order and that suffices for the accesses too. The reason we require
// an order in the first place is the dependences between invariant loads		// an order in the first place is the dependences between invariant loads
// that can be caused by indirect loads.		// that can be caused by indirect loads.
InvariantAccesses.reverse();		InvariantAccesses.reverse();

// Transfer the memory access from the statement to the SCoP.		// Transfer the memory access from the statement to the SCoP.
Stmt.removeMemoryAccesses(InvariantAccesses);		for (auto *InvMA : InvariantAccesses)
		Stmt.removeMemoryAccessesCausedBy(InvMA->getAccessInstruction());
addInvariantLoads(Stmt, InvariantAccesses);		addInvariantLoads(Stmt, InvariantAccesses);
}		}
isl_union_map_free(Writes);		isl_union_map_free(Writes);

verifyInvariantLoads(SD);		verifyInvariantLoads(SD);
}		}

		bool Scop::canRecomputeInStmt(ScopStmt &Stmt, Instruction *Inst) {
		// TODO: Check if we can actually move the instructions.
		return false;
		}

		bool Scop::canRecomputeInStmt(ScopStmt &Stmt, ValueSetTy &NonTrivialOperands) {
		MeinersburUnsubmitted Done Reply Inline Actions What are you plans here? When can instructions with side effects be recomputed? Meinersbur: What are you plans here? When can instructions with side effects be recomputed?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions That was what I was telling you in SJ about. I even pushed the first patch that implements this TODO to reviews and it depends on this one. jdoerfert: That was what I was telling you in SJ about. I even pushed the first patch that implements this…
		for (auto *Op : NonTrivialOperands) {
		auto *OpInst = dyn_cast<Instruction>(Op);
		if (!OpInst \|\| !R.contains(OpInst))
		continue;
		MeinersburUnsubmitted Done Reply Inline Actions God function antipattern Meinersbur: God function antipattern
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I take this comment should read as: Please split the function in three parts. Sure, can do. jdoerfert: I take this comment should read as: Please split the function in three parts. Sure, can do.

		MeinersburUnsubmitted Done Reply Inline Actions What is there to see? Meinersbur: What is there to see?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I'm not sure how to answer to that as I expected the link to the function that containts the comment for this one to be self-explanatory. jdoerfert: I'm not sure how to answer to that as I expected the link to the function that containts the…
		if (!canRecomputeInStmt(Stmt, OpInst))
		return false;
		}

		return true;
		MeinersburUnsubmitted Done Reply Inline Actions Why exit node PHIs processed, but not standard phis? Meinersbur: Why exit node PHIs processed, but not standard phis?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I'm not sure why you ask this but as you can probably see in the code no "PHI access" is processed. Though, if something that is a PHI does not introduce a "PHI access" it might be processed here. jdoerfert: I'm not sure why you ask this but as you can probably see in the code no "PHI access" is…
		}

		etherzhhbUnsubmitted Done Reply Inline Actions // Now AccessInst is a write scalar access (and not a PHI). etherzhhb: // Now AccessInst is a write scalar access (and not a PHI).
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Added a comment. jdoerfert: Added a comment.
		void Scop::collectNonTrivialOperands(LoopInfo &LI) {
		// See: Scop::simplifyScalarAccesses()

		SmallPtrSet<Instruction *, 32> Visited;
		grosserUnsubmitted Not Done Reply Inline Actions "The latter to add" does not seem to make sense grammatically. grosser: "The latter to add" does not seem to make sense grammatically.
		for (ScopStmt &Stmt : *this) {
		for (MemoryAccess *MA : Stmt) {
		// Skip everything except scalar write accesses.
		if (MA->isArrayKind() \|\| MA->isRead() \|\| MA->isPHIKind())
		continue;

		MeinersburUnsubmitted Not Done Reply Inline Actions Why for each access separately? Shouldn't this be per-ScopStmt because it only matters for cross-stmt operands? Meinersbur: Why for each access separately? Shouldn't this be per-ScopStmt because it only matters for…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean. Why do we look at each access separatly or why do we look at the non-trivial-operands of each access separalty? jdoerfert: I'm not sure what you mean. Why do we look at each access separatly or why do we look at the…
		MeinersburUnsubmitted Not Done Reply Inline Actions This is nested inside a "for (MemoryAccess MA : Stmt)" loop and NonTrivialOperandsMap is indexed by its AccessInst. If an instruction has multiple write access (AFAIK not possible at the moment), the code below would run multiple times over the same AccessInst. Multiple instructions in the same Stmt might use the same value, thus its operand tree appears multiple times in this statement. At code generation, each of them would get its own copy of the operand tree, although only one per stmt is necessary (the same tree can be reused for both users) Meinersbur:* This is nested inside a "for (MemoryAccess *MA : Stmt)" loop and NonTrivialOperandsMap is…
		Instruction *AccessInst = cast<Instruction>(MA->getAccessValue());
		if (NonTrivialOperandsMap.count(AccessInst))
		continue;

		DEBUG(dbgs() << "\nCheck operand tree of '" << *AccessInst << "'\n");

		auto &NonTrivialOperands = NonTrivialOperandsMap[AccessInst];
		SmallPtrSet<Instruction *, 8> Worklist;
		Worklist.insert(AccessInst);
		Visited.clear();

		while (!Worklist.empty()) {
		Instruction Inst = Worklist.begin();
		Worklist.erase(Inst);

		MeinersburUnsubmitted Done Reply Inline Actions auto -> Use ? Meinersbur: auto -> Use ?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions We could do that. jdoerfert: We could do that.
		if (!Visited.insert(Inst).second)
		continue;
		MeinersburUnsubmitted Not Done Reply Inline Actions This probably will have issues with PHIs in the scop's entry just is in PR25394, ie. if InstOpInst is a PHI with an incoming block from outside the scop. Meinersbur: This probably will have issues with PHIs in the scop's entry just is in PR25394, ie. if…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Mh, if we have a test case I'll check that. jdoerfert: Mh, if we have a test case I'll check that.

		for (auto &InstOp : Inst->operands())
		if (Instruction *InstOpInst = dyn_cast<Instruction>(InstOp)) {
		if (R.contains(InstOpInst))
		Worklist.insert(InstOpInst);
		else
		NonTrivialOperands.insert(InstOp);
		MeinersburUnsubmitted Not Done Reply Inline Actions and continue ? Meinersbur: and continue ?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions This looks weird without the follow up commits that introduce some code here. jdoerfert: This looks weird without the follow up commits that introduce some code here.
		}

		if (Inst->mayHaveSideEffects() \|\| Inst->mayReadFromMemory())
		NonTrivialOperands.insert(Inst);

		if (!isa<PHINode>(Inst))
		etherzhhbUnsubmitted Not Done Reply Inline Actions How about using the following approach, which is all over LLVM codebase, and we can create a function for it. // Depth-first traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands) std::set Visited; SmallVector<std::pair<Instruction, Instruction::op_iterator> > VisitStack; VisitStack.push_back(std::make_pair(AccessInst, AccessInst->begin())); while (VisitStack.empty()) { auto CurNode = VisitStack.back().first; auto &CurChildIt = VisitStack.back().second; if (CurChildIt == CurNode->end()) { VisitStack.pop_back(); continue; } Instruction ChildInst = dyn_cast<Instruction>(CurChildIt); ++CurChildIt; // Stop at invariances if (ChildInst == nullptr) continue; if (!Visited.insert(Inst).second) continue; // Invariance again. if (!R.contains(ChildInst)) { OutsideOperands.push_back(std::make_pair(InstOpInst, Inst)); continue; } if (Inst->mayHaveSideEffects() \|\| Inst->mayReadFromMemory()) { SideEffectOperands.insert(Inst); continue; } if (!isa<PHINode>(Inst)) continue; if (R.contains(Inst) && canSynthesize(Inst, &LI, &SE, &R)) continue; SideEffectOperands.insert(Inst); } etherzhhb:* How about using the following approach, which is all over LLVM codebase, and we can create a…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I don't get the problem you try to solve here. Why do we need a different algorithm and what is it that it does better? jdoerfert: I don't get the problem you try to solve here. Why do we need a different algorithm and what is…
		etherzhhbUnsubmitted Not Done Reply Inline Actions It is almost the same algorithm, but in different 'style'. I think I should not push too hard about the "style". So I think we are done with this. But we should put some comments to describe what the algorithm is doing like: // Traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands) etherzhhb: It is almost the same algorithm, but in different 'style'. I think I should not push too hard…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I can do that. jdoerfert: I can do that.
		continue;

		if (R.contains(Inst) && canSynthesize(Inst, &LI, SE, &R))
		continue;

		NonTrivialOperands.insert(Inst);
		}

		DEBUG({
		dbgs() << "\nNon-trivial operands: {\n";
		for (auto *Op : NonTrivialOperands)
		dbgs() << "\t\t" << *Op << "\n";
		dbgs() << "\t}\n\n";
		});
		}
		}
		}

		void Scop::removeRecomputableScalarReads(ScopStmt &Stmt) {
		// See: Scop::simplifyScalarAccesses()

		BasicBlock *StmtBB =
		Stmt.isBlockStmt() ? Stmt.getBasicBlock() : Stmt.getRegion()->getEntry();
		AccFuncSetType &AccList = AccFuncMap[StmtBB];

		SmallVector<MemoryAccess *, 8> AdditionalAccesses;
		MemoryAccessList ResolvedAccesses;

		for (MemoryAccess *MA : Stmt) {
		if (!(MA->isValueKind() && MA->isRead()))
		continue;

		auto *DefInst = dyn_cast<Instruction>(MA->getAccessValue());

		// Skip read-only scalars.
		if (!DefInst)
		continue;

		// Skip read-only scalars and scalars defined in non-affine regions.
		auto *DefStmt = getStmtForBasicBlock(DefInst->getParent());
		if (!DefStmt \|\| DefStmt->isRegionStmt())
		continue;

		// Check if the scalar can be recomputed in this statement.
		auto &NonTrivialOperands = NonTrivialOperandsMap[DefInst];
		if (!canRecomputeInStmt(Stmt, NonTrivialOperands))
		MeinersburUnsubmitted Done Reply Inline Actions Wouldn't it be possible to recompute some of the dependent values, even if some operands have side effects? Meinersbur: Wouldn't it be possible to recompute some of the dependent values, even if some operands have…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Might be, but it won't help with removing the scalar dependences. Actually, it might make them worse. jdoerfert: Might be, but it won't help with removing the scalar dependences. Actually, it might make them…
		continue;

		// If the scalar can be recomputed, we copy all accesses of read-only
		// scalars that are part of the operand tree of the recomputed value.
		MeinersburUnsubmitted Done Reply Inline Actions Why only instructions that have operands from outside? For movable instructions (DependentScalar) without such operands, but implicit accesses, don't we need to remove its memory accesses as well? Meinersbur: Why only instructions that have operands from outside? For movable instructions…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Outside operands are the read only scalars we track to easy OpenMP code generation. The scalar access we recompute is removed later. jdoerfert: Outside operands are the read only scalars we track to easy OpenMP code generation. The scalar…
		for (auto *Op : NonTrivialOperands) {
		auto *OpInst = dyn_cast<Instruction>(Op);
		if (OpInst && R.contains(OpInst))
		continue;
		etherzhhbUnsubmitted Done Reply Inline Actions This user is actually inside the Scop, right? etherzhhb: This user is actually inside the Scop, right?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Renamed. jdoerfert: Renamed.

		MeinersburUnsubmitted Done Reply Inline Actions This is using the list of accesses this function modifies on the fly ("InstructionToAccess"). As such, it seems to me to depend on the order we iterate over all statements. E.g. the previous iteration already removed the read access of "OutsideUser", hence will not be added to AdditionalAccesses anymore, hence we miss some read accesses. Or am I wrong? Why is it this cannot happen? Meinersbur: This is using the list of accesses this function modifies on the fly ("InstructionToAccess").
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions No read access is/should be removed if it is still connected to anything else. Hence, your senario should never happen. Additionally, we actually iterate over accesses here, thus the actual removal happens only at the end (after we traversed all accesses). jdoerfert: No read access is/should be removed if it is still connected to anything else. Hence, your…
		etherzhhbUnsubmitted Not Done Reply Inline Actions So what we need is only the parent block of the user that is inside the Scop? etherzhhb: So what we need is only the parent block of the user that is inside the Scop?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions True. I changed this code to use only one set of operands and compute everything else on the fly. jdoerfert: True. I changed this code to use only one set of operands and compute everything else on the…
		for (auto *User : Op->users()) {
		auto *UserInst = dyn_cast<Instruction>(User);
		if (!UserInst \|\| !R.contains(UserInst))
		continue;

		MeinersburUnsubmitted Done Reply Inline Actions What does this condition mean? I think we generally are misusing "BaseAddr" for scalar access, because it actually not an address. Meinersbur: What does this condition mean? I think we generally are misusing "BaseAddr" for scalar access…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions It depends on how you see it. I think the (virtual) register is a unique address for a scalar, thus it fits well here. jdoerfert: It depends on how you see it. I think the (virtual) register is a unique address for a scalar…
		etherzhhbUnsubmitted Not Done Reply Inline Actions I read: if (Stmt.containValueReadOf(OutsideOperand)) etherzhhb: I read: ``` if (Stmt.containValueReadOf(OutsideOperand)) ```
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Yes. Is that a problem? jdoerfert: Yes. Is that a problem?
		etherzhhbUnsubmitted Done Reply Inline Actions It is a little bit not straight forward, but with the comment above, I think it is ok. etherzhhb: It is a little bit not straight forward, but with the comment above, I think it is ok.
		ScopStmt *UserStmt = getStmtForBasicBlock(UserInst->getParent());
		if (!UserStmt)
		continue;

		MeinersburUnsubmitted Not Done Reply Inline Actions I'd expect this to be computed in collectNonTrivialOperands(), like a partition of instructions in the operand tree into Movable Outside/Sideeffect (operand tree leaves) I understand your algorithm works differently, but to understand, what is the rationale to only have direct operands in DependentScalars, while we want to copy/move an entire operand tree? Meinersbur: I'd expect this to be computed in collectNonTrivialOperands(), like a partition of instructions…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions We copy/move the entire operand tree but for that I do not need to traverse it and put all instructions into a list. I simply store the root of the tree and traverse it during the actual copy/move. jdoerfert: We copy/move the entire operand tree but for that I do not need to traverse it and put all…
		// If an access to the read-only scalar is already present, continue.
		if (Stmt.lookupValueReadOf(Op))
		continue;

		// If an access to the read-only scalar was generated copy it.
		auto *UseMA = UserStmt->lookupValueReadOf(Op);
		if (!UseMA)
		continue;

		AdditionalAccesses.push_back(UseMA->copyTo(AccList, &Stmt));
		break;
		}
		}

		Stmt.addDependentScalar(DefInst);
		ResolvedAccesses.push_front(MA);
		}

		Stmt.removeMemoryAccesses(ResolvedAccesses);
		for (MemoryAccess *MA : AdditionalAccesses)
		Stmt.addAccess(MA, true);
		}

		void Scop::removeUnreadScalarWrites(ScopStmt &Stmt) {
		// See: Scop::simplifyScalarAccesses()
		grosserUnsubmitted Done Reply Inline Actions grammar grosser: grammar

		// Predicate that will evaluate to true if we have to assume that
		// @p AccessValue will be read in @p BB and additionally escape @p BB.
		auto ContainsRead = [&](BasicBlock BB, Value AccessValue) {
		auto *UStmt = getStmtForBasicBlock(BB);
		grosserUnsubmitted Done Reply Inline Actions I am not yet convinced an exit ScopStmt is something we would like to have. Maybe leave this for another patch review. grosser: I am not yet convinced an exit ScopStmt is something we would like to have. Maybe leave this…
		if (!UStmt)
		return false;

		// TODO: This is too conservative.
		if (UStmt->isRegionStmt())
		return true;

		// If there is no write in @p BB it will not escape @p BB, thus the
		MeinersburUnsubmitted Done Reply Inline Actions If a user has no more accesses, is it because it is intended to be removed? Meinersbur: If a user has no more accesses, is it because it is intended to be removed?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Actually, it has already been removed, hence the statement but no read access. jdoerfert: Actually, it has already been removed, hence the statement but no read access.
		// read is a no-op.
		bool ContainsWrite = false;
		for (auto UStmtMA : UStmt)
		MeinersburUnsubmitted Done Reply Inline Actions auto -> MemoryAccess ? Meinersbur: auto -> MemoryAccess ?
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We could do that. jdoerfert: We could do that.
		ContainsWrite \|= UStmtMA->isWrite();
		if (!ContainsWrite)
		return false;

		return UStmt->lookupValueReadOf(AccessValue) != nullptr;
		MeinersburUnsubmitted Done Reply Inline Actions Is this break meant to leave both for-loops? Meinersbur: Is this break meant to leave both for-loops?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions It could be but doesn't need to. jdoerfert: It could be but doesn't need to.
		};

		// Predicate that will evaluate to true if @p UserInst is a user of
		// @p AccessValue in the current polyhedral description of the SCoP.
		auto IsRemainingUser = [&](Instruction UserInst, Value AccessValue) {
		auto *UserBB = UserInst->getParent();

		MeinersburUnsubmitted Done Reply Inline Actions This seems to be a different kind of "resolved" than in removeRecomputableScalarReads. Meinersbur: This seems to be a different kind of "resolved" than in removeRecomputableScalarReads.
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions No it is the same but for the write accesses instead of the reads. Or to be more precise, it is the passive version as writes are (passively) resolved if all their reads have been (actively) resolved. jdoerfert: No it is the same but for the write accesses instead of the reads. Or to be more precise, it is…
		// If the user is outside the SCoP the @p AccessValue is escaping and
		// we implicitly model the escaping use.
		if (!R.contains(UserBB))
		return true;

		auto *UserPHI = dyn_cast<PHINode>(UserInst);
		if (!UserPHI)
		return ContainsRead(UserBB, AccessValue);

		//
		if (UserBB == R.getEntry() && !R.getEnteringBlock())
		return true;

		// TODO: This is too conservative.
		if (auto *UserPHIStmt = getStmtForBasicBlock(UserBB))
		if (UserPHIStmt->isRegionStmt())
		MeinersburUnsubmitted Done Reply Inline Actions collectNonTrivialOperands actually skips PHIs which are also implicit accesses. Meinersbur: collectNonTrivialOperands actually skips PHIs which are also implicit accesses.
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions It does not skip them but stops as they are non-trivial operands. jdoerfert: It does not skip them but stops as they are non-trivial operands.
		return true;

		// If the user is a PHI node we have to check the incoming blocks of that
		// PHI that are associated with the access instruction for possible uses.
		for (unsigned u = 0, e = UserPHI->getNumIncomingValues(); u < e; u++)
		if (UserPHI->getIncomingValue(u) == AccessValue)
		if (ContainsRead(UserPHI->getIncomingBlock(u), AccessValue))
		return true;

		MeinersburUnsubmitted Done Reply Inline Actions hence -> i.e. Meinersbur: hence -> i.e.
		return false;
		};

		// Iterate over all accesses in @p Stmt and check if for all value writes
		// if we still have a (possible) user. If not we will remove the access
		// as well as the corresponding ScopArrayInfo object.
		SmallVector<MemoryAccess *, 8> ResolvedAccesses;
		for (MemoryAccess *MA : Stmt) {
		if (!(MA->isValueKind() && MA->isWrite()))
		continue;

		bool AllUsersRemoved = true;
		auto *AccessInst = cast<Instruction>(MA->getAccessValue());
		for (auto *User : AccessInst->users())
		AllUsersRemoved &= !IsRemainingUser(cast<Instruction>(User), AccessInst);

		if (AllUsersRemoved)
		ResolvedAccesses.push_back(MA);
		}

		for (MemoryAccess *MA : ResolvedAccesses) {
		// Remove the ScopArrayInfo object for this scalar as we do neither
		// define nor read it in the SCoP description but only copy the original
		// version in the unoptimized region.
		removeScopArrayInfo(MA->getScopArrayInfo());

		Stmt.removeMemoryAccesses({MA});
		}
		}

		void Scop::simplifyScalarAccesses(LoopInfo &LI) {
		// First iterate over all implicit write accesses, hence scalar definitions
		// and collect all operands that might have side effects or read memory as
		// well as all operands that are outside the SCoP. The former is needed to
		// decide if we can recompute the scalar definition to another statement.
		// The latter to add read-only scalar accesses to the statement in which the
		// scalar is recomputed. That allows us to identify values needed e.g., for
		// parallel code generation.
		collectNonTrivialOperands(LI);

		// In the second step traverse all implicit read accesses, hence scalar uses
		// in statements that do not define the scalar. However, at the moment we
		// exclude PHIs to simplify the logic. For each use we will check if we can
		// recompute the definition to this block (see canRecomputeInStmt()).
		// If so we will:
		// o Add the definition to the dependent scalars set of the use statement.
		// o Add read accesses for all prior to the SCoP defined values if they
		// were present in the definition statement.
		// o Remove the use access from the use statement as it will be recomputed
		// and does not need to be communicated anymore.
		for (ScopStmt &Stmt : *this)
		removeRecomputableScalarReads(Stmt);

		// In the third and final step we iterate over the scalar definitions in
		// the SCoP again. We will check if we removed the accesses for all users
		// of the scalar in the SCoP. If so, we can safely remove the scalar write
		// access as all users will recompute the value. As we currently cannot simply
		// use this logic to recompute values at the exit of the SCoP we will not
		// remove scalars that escape the SCoP.
		for (ScopStmt &Stmt : *this)
		removeUnreadScalarWrites(Stmt);
		}

		void Scop::removeScopArrayInfo(const ScopArrayInfo *SAI) {
		ScopArrayInfoMap.erase(std::make_pair(SAI->getBasePtr(), SAI->getKind()));
		}

const ScopArrayInfo *		const ScopArrayInfo *
Scop::getOrCreateScopArrayInfo(Value BasePtr, Type ElementType,		Scop::getOrCreateScopArrayInfo(Value BasePtr, Type ElementType,
ArrayRef<const SCEV *> Sizes,		ArrayRef<const SCEV *> Sizes,
ScopArrayInfo::MemoryKind Kind) {		ScopArrayInfo::MemoryKind Kind) {
auto &SAI = ScopArrayInfoMap[std::make_pair(BasePtr, Kind)];		auto &SAI = ScopArrayInfoMap[std::make_pair(BasePtr, Kind)];
if (!SAI) {		if (!SAI) {
auto &DL = getRegion().getEntry()->getModule()->getDataLayout();		auto &DL = getRegion().getEntry()->getModule()->getDataLayout();
SAI.reset(new ScopArrayInfo(BasePtr, ElementType, getIslCtx(), Sizes, Kind,		SAI.reset(new ScopArrayInfo(BasePtr, ElementType, getIslCtx(), Sizes, Kind,
▲ Show 20 Lines • Show All 1,215 Lines • Show Last 20 Lines

lib/CodeGen/BlockGenerators.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	BlockGenerator::BlockGenerator(PollyIRBuilder &B, LoopInfo &LI,
ScalarAllocaMapTy &PHIOpMap,		ScalarAllocaMapTy &PHIOpMap,
EscapeUsersAllocaMapTy &EscapeMap,		EscapeUsersAllocaMapTy &EscapeMap,
ValueMapT &GlobalMap,		ValueMapT &GlobalMap,
IslExprBuilder *ExprBuilder)		IslExprBuilder *ExprBuilder)
: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),		: Builder(B), LI(LI), SE(SE), ExprBuilder(ExprBuilder), DT(DT),
EntryBB(nullptr), PHIOpMap(PHIOpMap), ScalarMap(ScalarMap),		EntryBB(nullptr), PHIOpMap(PHIOpMap), ScalarMap(ScalarMap),
EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}		EscapeMap(EscapeMap), GlobalMap(GlobalMap) {}

		void BlockGenerator::recomputeDependentScalars(
		ScopStmt &Stmt, ValueMapT &BBMap, LoopToScevMapT &LTS,
		isl_id_to_ast_expr *NewAccesses) {

		for (auto *Inst : Stmt.getDependentScalars())
		if (!GlobalMap.count(Inst))
		MeinersburUnsubmitted Done Reply Inline Actions When does a dependent scalar become a global? Meinersbur: When does a dependent scalar become a global?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions If it is hoisted. jdoerfert: If it is hoisted.
		copyInstruction(Stmt, Inst, BBMap, LTS, NewAccesses, true);
		}

Value BlockGenerator::trySynthesizeNewValue(ScopStmt &Stmt, Value Old,		Value BlockGenerator::trySynthesizeNewValue(ScopStmt &Stmt, Value Old,
ValueMapT &BBMap,		ValueMapT &BBMap,
LoopToScevMapT &LTS,		LoopToScevMapT &LTS, Loop *L) {
Loop *L) const {
if (!SE.isSCEVable(Old->getType()))		if (!SE.isSCEVable(Old->getType()))
return nullptr;		return nullptr;

const SCEV *Scev = SE.getSCEVAtScope(Old, L);		const SCEV *Scev = SE.getSCEVAtScope(Old, L);
if (!Scev)		if (!Scev)
return nullptr;		return nullptr;

if (isa<SCEVCouldNotCompute>(Scev))		if (isa<SCEVCouldNotCompute>(Scev))
return nullptr;		return nullptr;
		MeinersburUnsubmitted Done Reply Inline Actions Shouldn't these be in the BBMap after recomputeDependentScalars() executed? Meinersbur: Shouldn't these be in the BBMap after recomputeDependentScalars() executed?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions After recomputeDependentScalars is executed they are in BBMap. Though our synthesize code sometimes thinks it can sythesize something while it actually referes to old values we have to recompute first while we are in recomputeDependentScalars. jdoerfert: After recomputeDependentScalars is executed they are in BBMap. Though our synthesize code…

const SCEV *NewScev = apply(Scev, LTS, SE);		const SCEV *NewScev = apply(Scev, LTS, SE);

		// Recompute scalars needed for this SCEV
		const Region &R = Stmt.getParent()->getRegion();
		SetVector<Value *> Values;
		findValues(NewScev, Values);
		for (Value *Val : Values) {
		if (Instruction *Inst = dyn_cast<Instruction>(Val))
		if (R.contains(Inst))
		copyInstruction(Stmt, Inst, BBMap, LTS, nullptr, true);
		}

ValueMapT VTV;		ValueMapT VTV;
VTV.insert(BBMap.begin(), BBMap.end());		VTV.insert(BBMap.begin(), BBMap.end());
VTV.insert(GlobalMap.begin(), GlobalMap.end());		VTV.insert(GlobalMap.begin(), GlobalMap.end());

Scop &S = *Stmt.getParent();		Scop &S = *Stmt.getParent();
const DataLayout &DL =		const DataLayout &DL =
S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();		S.getRegion().getEntry()->getParent()->getParent()->getDataLayout();
auto IP = Builder.GetInsertPoint();		auto IP = Builder.GetInsertPoint();

assert(IP != Builder.GetInsertBlock()->end() &&		assert(IP != Builder.GetInsertBlock()->end() &&
"Only instructions can be insert points for SCEVExpander");		"Only instructions can be insert points for SCEVExpander");
Value *Expanded =		Value *Expanded =
expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), &*IP, &VTV);		expandCodeFor(S, SE, DL, "polly", NewScev, Old->getType(), &*IP, &VTV);

BBMap[Old] = Expanded;		BBMap[Old] = Expanded;
return Expanded;		return Expanded;
}		}

Value BlockGenerator::getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,		Value BlockGenerator::getNewValue(ScopStmt &Stmt, Value Old, ValueMapT &BBMap,
LoopToScevMapT &LTS, Loop *L) const {		LoopToScevMapT &LTS, Loop *L, bool TryOnly) {
// Constants that do not reference any named value can always remain		// Constants that do not reference any named value can always remain
// unchanged. Handle them early to avoid expensive map lookups. We do not take		// unchanged. Handle them early to avoid expensive map lookups. We do not take
// the fast-path for external constants which are referenced through globals		// the fast-path for external constants which are referenced through globals
// as these may need to be rewritten when distributing code accross different		// as these may need to be rewritten when distributing code accross different
// LLVM modules.		// LLVM modules.
if (isa<Constant>(Old) && !isa<GlobalValue>(Old))		if (isa<Constant>(Old) && !isa<GlobalValue>(Old))
return Old;		return Old;

Show All 9 Lines	if (Old->getType()->getScalarSizeInBits() <
New = Builder.CreateTruncOrBitCast(New, Old->getType());		New = Builder.CreateTruncOrBitCast(New, Old->getType());

return New;		return New;
}		}

if (Value *New = BBMap.lookup(Old))		if (Value *New = BBMap.lookup(Old))
return New;		return New;

		if (canSynthesize(Old, &LI, &SE, &Stmt.getParent()->getRegion()))
if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))		if (Value *New = trySynthesizeNewValue(Stmt, Old, BBMap, LTS, L))
return New;		return New;
		MeinersburUnsubmitted Not Done Reply Inline Actions Why does this become necessary? Shouldn't "trySynthesizeNewValue" determine itself whether it is synthesizable? Meinersbur: Why does this become necessary? Shouldn't "trySynthesizeNewValue" determine itself whether it…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions Good question. I don't remember anymore. I'll check once the general patch is agreed on. jdoerfert: Good question. I don't remember anymore. I'll check once the general patch is agreed on.

// A scop-constant value defined by a global or a function parameter.		// A scop-constant value defined by a global or a function parameter.
if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))		if (isa<GlobalValue>(Old) \|\| isa<Argument>(Old))
return Old;		return Old;

// A scop-constant value defined by an instruction executed outside the scop.		// A scop-constant value defined by an instruction executed outside the scop.
if (const Instruction *Inst = dyn_cast<Instruction>(Old))		if (const Instruction *Inst = dyn_cast<Instruction>(Old))
if (!Stmt.getParent()->getRegion().contains(Inst->getParent()))		if (!Stmt.getParent()->getRegion().contains(Inst->getParent()))
return Old;		return Old;

		if (TryOnly)
		return nullptr;

// The scalar dependence is neither available nor SCEVCodegenable.		// The scalar dependence is neither available nor SCEVCodegenable.
llvm_unreachable("Unexpected scalar dependence in region!");		llvm_unreachable("Unexpected scalar dependence in region!");
		etherzhhbUnsubmitted Not Done Reply Inline Actions TryOnly is only used here. And this block is equivalent to: assert(TryOnly && "Unexpected scalar dependence in region!"); return nullptr; Instead of passing TryOnly as function parameter, can we move the assertion out of this function, like: auto V = getNewValue(....); assert((V \|\| TryOnly) && "Unexpected scalar dependence in region!"); Now we can remove this confusing TryOnly. etherzhhb:* TryOnly is only used here. And this block is equivalent to: ``` assert(TryOnly && "Unexpected…
return nullptr;
}		}
		MeinersburUnsubmitted Done Reply Inline Actions This essentially becomes assert(TryOnly) return nullptr; Meinersbur: This essentially becomes assert(TryOnly) return nullptr;
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Seems fine to me. jdoerfert: Seems fine to me.

void BlockGenerator::copyInstScalar(ScopStmt &Stmt, Instruction *Inst,		void BlockGenerator::copyInstScalar(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS) {		ValueMapT &BBMap, LoopToScevMapT &LTS,
		bool Recompute) {
// We do not generate debug intrinsics as we did not investigate how to		// We do not generate debug intrinsics as we did not investigate how to
// copy them correctly. At the current state, they just crash the code		// copy them correctly. At the current state, they just crash the code
// generation as the meta-data operands are not correctly copied.		// generation as the meta-data operands are not correctly copied.
if (isa<DbgInfoIntrinsic>(Inst))		if (isa<DbgInfoIntrinsic>(Inst))
return;		return;

		const Region &R = Stmt.getParent()->getRegion();
Instruction *NewInst = Inst->clone();		Instruction *NewInst = Inst->clone();

// Replace old operands with the new ones.		// Replace old operands with the new ones.
for (Value *OldOperand : Inst->operands()) {		for (Value *OldOperand : Inst->operands()) {
Value *NewOperand =		Value *NewOperand = getNewValue(Stmt, OldOperand, BBMap, LTS,
getNewValue(Stmt, OldOperand, BBMap, LTS, getLoopForStmt(Stmt));		getLoopForStmt(Stmt), Recompute);

		if (Recompute) {
		Instruction *NewOperandInst = dyn_cast_or_null<Instruction>(NewOperand);
		if (!NewOperand \|\| (NewOperandInst && R.contains(NewOperandInst))) {
		MeinersburUnsubmitted Done Reply Inline Actions What is the rationale for this condition? Meinersbur: What is the rationale for this condition?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions If we could not generate a new operand but we need to because the old one is an instruction in the region we have to do something. jdoerfert: If we could not generate a new operand but we need to because the old one is an instruction in…
		if (Instruction *OldOperandInst = dyn_cast<Instruction>(OldOperand)) {
		copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute);
		MeinersburUnsubmitted Done Reply Inline Actions Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar? If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence? Meinersbur: Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar? This code is only triggered while we execute recomputeDependentScalars. If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence? Which order of dependence? We got a DAG here and have to code generate it. One way is to find some topological order on the whole dag and then generate code, the other (which is implemented) is a recursive demand driven code generation. As we do the second everywhere else I figured it makes sense to do it here too. jdoerfert: > Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the…
		NewOperand = BBMap[OldOperand];
		etherzhhbUnsubmitted Not Done Reply Inline Actions If copyInstruction returns the newly copied instruction, the code copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); NewOperand = BBMap[OldOperand]; Become: NewOperand = copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); Which looks better and easier to understand. But this can be done in another patch. etherzhhb: If copyInstruction returns the newly copied instruction, the code ``` copyInstruction(Stmt…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions We could do that later, true. jdoerfert: We could do that later, true.
		}
		}
		}

if (!NewOperand) {		if (!NewOperand) {
assert(!isa<StoreInst>(NewInst) &&		assert(!isa<StoreInst>(NewInst) &&
"Store instructions are always needed!");		"Store instructions are always needed!");
		assert(!Recompute && "Recompute copy should never fail");
delete NewInst;		delete NewInst;
return;		return;
}		}

NewInst->replaceUsesOfWith(OldOperand, NewOperand);		NewInst->replaceUsesOfWith(OldOperand, NewOperand);
}		}

Builder.Insert(NewInst);		Builder.Insert(NewInst);
Show All 27 Lines	if (OldPtrTy != NewPtrTy)
Address = Builder.CreateBitOrPointerCast(Address, OldPtrTy);		Address = Builder.CreateBitOrPointerCast(Address, OldPtrTy);
return Address;		return Address;
}		}

return getNewValue(Stmt, Inst.getPointerOperand(), BBMap, LTS,		return getNewValue(Stmt, Inst.getPointerOperand(), BBMap, LTS,
getLoopForStmt(Stmt));		getLoopForStmt(Stmt));
}		}

Loop *BlockGenerator::getLoopForStmt(const ScopStmt &Stmt) const {		Loop *BlockGenerator::getLoopForStmt(const ScopStmt &Stmt) const {
auto *StmtBB =		auto *StmtBB =
Stmt.isBlockStmt() ? Stmt.getBasicBlock() : Stmt.getRegion()->getEntry();		Stmt.isBlockStmt() ? Stmt.getBasicBlock() : Stmt.getRegion()->getEntry();
		MeinersburUnsubmitted Done Reply Inline Actions This idiom appears multiple times. Introduce a helper function? Meinersbur: This idiom appears multiple times. Introduce a helper function?
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions That's something not conflicting or required for this commit we can do. jdoerfert: That's something not conflicting or required for this commit we can do.
return LI.getLoopFor(StmtBB);		return LI.getLoopFor(StmtBB);
		MeinersburUnsubmitted Done Reply Inline Actions I like this change, but could be committed independently. Maybe also as a member function of ScopStmt? Meinersbur: I like this change, but could be committed independently. Maybe also as a member function of…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I can commit it a priori but I don't see the need for a ScopStmt member function at the moment. jdoerfert: I can commit it a priori but I don't see the need for a ScopStmt member function at the moment.
}		}

Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, LoadInst Load,		Value BlockGenerator::generateScalarLoad(ScopStmt &Stmt, LoadInst Load,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
if (Value *PreloadLoad = GlobalMap.lookup(Load))		if (Value *PreloadLoad = GlobalMap.lookup(Load))
return PreloadLoad;		return PreloadLoad;

Show All 27 Lines
bool BlockGenerator::canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst) {		bool BlockGenerator::canSyntheziseInStmt(ScopStmt &Stmt, Instruction *Inst) {
Loop *L = getLoopForStmt(Stmt);		Loop *L = getLoopForStmt(Stmt);
return (Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&		return (Stmt.isBlockStmt() \|\| !Stmt.getRegion()->contains(L)) &&
canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion());		canSynthesize(Inst, &LI, &SE, &Stmt.getParent()->getRegion());
}		}

void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,		void BlockGenerator::copyInstruction(ScopStmt &Stmt, Instruction *Inst,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses,
		bool Recompute) {

// Terminator instructions control the control flow. They are explicitly		// Terminator instructions control the control flow. They are explicitly
// expressed in the clast and do not need to be copied.		// expressed in the clast and do not need to be copied.
if (Inst->isTerminator())		if (Inst->isTerminator())
return;		return;

// Synthesizable statements will be generated on-demand.		// Synthesizable statements will be generated on-demand.
if (canSyntheziseInStmt(Stmt, Inst))		if (canSyntheziseInStmt(Stmt, Inst))
return;		return;
Show All 16 Lines	if (auto *PHI = dyn_cast<PHINode>(Inst)) {
return;		return;
}		}

// Skip some special intrinsics for which we do not adjust the semantics to		// Skip some special intrinsics for which we do not adjust the semantics to
// the new schedule. All others are handled like every other instruction.		// the new schedule. All others are handled like every other instruction.
if (isIgnoredIntrinsic(Inst))		if (isIgnoredIntrinsic(Inst))
return;		return;

copyInstScalar(Stmt, Inst, BBMap, LTS);		copyInstScalar(Stmt, Inst, BBMap, LTS, Recompute);
}		}

void BlockGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,		void BlockGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
assert(Stmt.isBlockStmt() &&		assert(Stmt.isBlockStmt() &&
"Only block statements can be copied by the block generator");		"Only block statements can be copied by the block generator");

ValueMapT BBMap;		ValueMapT BBMap;
Show All 9 Lines	BasicBlock BlockGenerator::splitBB(BasicBlock BB) {
return CopyBB;		return CopyBB;
}		}

BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,		BasicBlock BlockGenerator::copyBB(ScopStmt &Stmt, BasicBlock BB,
ValueMapT &BBMap, LoopToScevMapT &LTS,		ValueMapT &BBMap, LoopToScevMapT &LTS,
isl_id_to_ast_expr *NewAccesses) {		isl_id_to_ast_expr *NewAccesses) {
BasicBlock *CopyBB = splitBB(BB);		BasicBlock *CopyBB = splitBB(BB);
Builder.SetInsertPoint(&CopyBB->front());		Builder.SetInsertPoint(&CopyBB->front());

		recomputeDependentScalars(Stmt, BBMap, LTS, NewAccesses);
		MeinersburUnsubmitted Done Reply Inline Actions Any reason to do this before generateScalarLoads()? Naively, I'd think it belongs after it because some if the dependent instructions might still use scalars (e.g. read-only ones). Meinersbur: Any reason to do this before generateScalarLoads()? Naively, I'd think it belongs after it…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions First, by construction we can place it here as all recomputable scalars do not depend on anything we cannot recompute. Second, we can probably move it after the scalar loads generation but it should be all the same. jdoerfert: First, by construction we can place it here as all recomputable scalars do not depend on…
		etherzhhbUnsubmitted Not Done Reply Inline Actions First, by construction we can place it here as all recomputable scalars do not depend on anything we cannot recompute. Second, we can probably move it after the scalar loads generation but it should be all the same. It would be good to put your above explanations as comments to the code. etherzhhb: >First, by construction we can place it here as all recomputable scalars do not depend on…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I can add a comment. sure. jdoerfert: I can add a comment. sure.

generateScalarLoads(Stmt, BBMap);		generateScalarLoads(Stmt, BBMap);

copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);		copyBB(Stmt, BB, CopyBB, BBMap, LTS, NewAccesses);

// After a basic block was copied store all scalars that escape this block in		// After a basic block was copied store all scalars that escape this block in
// their alloca.		// their alloca.
generateScalarStores(Stmt, LTS, BBMap);		generateScalarStores(Stmt, LTS, BBMap);
return CopyBB;		return CopyBB;
▲ Show 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	void RegionGenerator::copyStmt(ScopStmt &Stmt, LoopToScevMapT &LTS,
// inputs.		// inputs.
BasicBlock *EntryBB = R->getEntry();		BasicBlock *EntryBB = R->getEntry();
BasicBlock *EntryBBCopy = SplitBlock(Builder.GetInsertBlock(),		BasicBlock *EntryBBCopy = SplitBlock(Builder.GetInsertBlock(),
&*Builder.GetInsertPoint(), &DT, &LI);		&*Builder.GetInsertPoint(), &DT, &LI);
EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");		EntryBBCopy->setName("polly.stmt." + EntryBB->getName() + ".entry");
Builder.SetInsertPoint(&EntryBBCopy->front());		Builder.SetInsertPoint(&EntryBBCopy->front());

ValueMapT &EntryBBMap = RegionMaps[EntryBBCopy];		ValueMapT &EntryBBMap = RegionMaps[EntryBBCopy];
		recomputeDependentScalars(Stmt, EntryBBMap, LTS, IdToAstExp);
generateScalarLoads(Stmt, EntryBBMap);		generateScalarLoads(Stmt, EntryBBMap);
		etherzhhbUnsubmitted Not Done Reply Inline Actions Interesting duplication :) We should also add some comments to explain why we recomputeDependentScalar before generateScalarLoads here as well. etherzhhb: Interesting duplication :) We should also add some comments to explain why we…
		jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I am confused. Are you hinting on the fact that we call "recomputeDependentScalars" here too or that we call it again before "generateScalarLoads"? I talked about the order already in the comment above. And the duplication of the "recomputeDependentScalars" call has the reason as the duplication of the "generateScalarLoads" call, this is the starting point of a "copyStmt" function. While above it is for block statements, here it is for region statements. If you want I can extract both calls into a "initializeScalars" method that is called before a statement is copied. jdoerfert: I am confused. Are you hinting on the fact that we call "recomputeDependentScalars" here too or…
		etherzhhbUnsubmitted Not Done Reply Inline Actions I just think the duplicate here is interesting/funny ... My intention is to suggest that we put the same comments as 344 here to explain the order, otherwise people may confused when they see this before they see the comments in line 344. If you want I can extract both calls into a "initializeScalars" method that is called before a statement is copied. This is great but can be done in another patch. etherzhhb: I just think the duplicate here is interesting/funny ... My intention is to suggest that we…

for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)		for (auto PI = pred_begin(EntryBB), PE = pred_end(EntryBB); PI != PE; ++PI)
if (!R->contains(*PI))		if (!R->contains(*PI))
BlockMap[*PI] = EntryBBCopy;		BlockMap[*PI] = EntryBBCopy;

// Determine the original exit block of this subregion. If it the exit block		// Determine the original exit block of this subregion. If it the exit block
// is also the scop's exit, it it has been changed to polly.merge_new_and_old.		// is also the scop's exit, it it has been changed to polly.merge_new_and_old.
// We move one block back to find the original block. This only happens if the		// We move one block back to find the original block. This only happens if the
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

test/Isl/CodeGen/OpenMP/invariant_base_pointer_preloaded_different_bb.ll

	; RUN: opt %loadPolly -polly-codegen -polly-parallel \			; RUN: opt %loadPolly -polly-codegen -polly-parallel \
	; RUN: -polly-parallel-force -S < %s \| FileCheck %s			; RUN: -polly-parallel-force -S < %s \| FileCheck %s
	;			;
	; Test to verify that we hand down the preloaded A[0] to the OpenMP subfunction.			; Test to verify that we hand down the preloaded A[0] to the OpenMP subfunction.
	;			;
	; void f(float *A) {			; void f(float *A) {
	; for (int i = 1; i < 1000; i++)			; for (int i = 1; i < 1000; i++)
	; A[i] += /* split bb */ A[0];			; A[i] += /* split bb */ A[0];
	; }			; }
	; A[0] tmp (unused) A			; A[0] A
	; CHECK: %polly.par.userContext = alloca { float, float, float }			; CHECK: %polly.par.userContext = alloca { float, float* }
	;			;
	; CHECK: %polly.subfn.storeaddr.polly.access.A.load = getelementptr inbounds			; CHECK: %polly.subfn.storeaddr.polly.access.A.load = getelementptr inbounds
	; CHECK: store float %polly.access.A.load, float* %polly.subfn.storeaddr.polly.access.A.load			; CHECK: store float %polly.access.A.load, float* %polly.subfn.storeaddr.polly.access.A.load
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(float* nocapture %A) {			define void @f(float* nocapture %A) {
	entry:			entry:
	Show All 19 Lines

test/Isl/CodeGen/eliminate-multiple-scalar-fp-reads.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s --check-prefix=SCOP
				; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
				;
				; SCOP-NOT: Scalar: 1
				; SCOP-NOT: ReadAccess
				;
				; Verify the original region is untouched but all computation is moved to the
				; only place it is needed in the generated region.
				;
				; CHECK: for.body.f:
				; CHECK-NEXT: %idxprom = sext i32 %i.0 to i64
				; CHECK-NEXT: %arrayidx = getelementptr inbounds float, float* %A, i64 %idxprom
				; CHECK-NEXT: store float %add5, float* %arrayidx
				;
				; CHECK: polly.stmt.for.body.f:
				; CHECK: %0 = trunc i64 %polly.indvar to i32
				; CHECK: %1 = shl i32 %0, 1
				; CHECK: %p_conv = sitofp i32 %1 to float
				; CHECK: %p_add = fadd float %p_conv, %p_conv
				; CHECK: %p_add3 = fadd float %p_conv, %p_add
				; CHECK: %p_add1 = fadd float %p_add, %p_conv
				; CHECK: %p_add4 = fadd float %p_add3, %p_add1
				; CHECK: %p_add2 = fadd float %p_conv, %p_conv
				; CHECK: %p_add5 = fadd float %p_add4, %p_add2
				; CHECK: %scevgep = getelementptr float, float* %A, i64 %polly.indvar
				; CHECK: store float %p_add5, float* %scevgep
				;
				; void f(float *A) {
				; for (int i = 0; i < 1000; i++) {
				; float a = i * 2;
				; /* split BB */
				; float b = a + a;
				; /* split BB */
				; float c = b + a;
				; /* split BB */
				; float d = a + a;
				; /* split BB */
				; float e = a + b + c + d;
				; /* split BB */
				; A[i] = e;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(float* %A) {
				entry:
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i.0, 1000
				br i1 %cmp, label %for.body.a, label %for.end

				for.body.a:
				%mul = mul nsw i32 %i.0, 2
				%conv = sitofp i32 %mul to float
				br label %for.body.b

				for.body.b:
				%add = fadd float %conv, %conv
				br label %for.body.c

				for.body.c:
				%add1 = fadd float %add, %conv
				br label %for.body.d

				for.body.d:
				%add2 = fadd float %conv, %conv
				br label %for.body.e

				for.body.e:
				%add3 = fadd float %conv, %add
				%add4 = fadd float %add3, %add1
				%add5 = fadd float %add4, %add2
				br label %for.body.f

				for.body.f:
				%idxprom = sext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds float, float* %A, i64 %idxprom
				store float %add5, float* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/Isl/CodeGen/eliminate-multiple-scalar-reads.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s --check-prefix=SCOP
				; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
				;
				; SCOP-NOT: Scalar: 1
				; SCOP-NOT: ReadAccess
				;
				; Verify the original region is untouched but all computation is moved to the
				; only place it is needed in the generated region.
				;
				; CHECK: for.body.f:
				; CHECK-NEXT: %idxprom6 = sext i32 %i.0 to i64
				; CHECK-NEXT: %arrayidx7 = getelementptr inbounds i32, i32* %A, i64 %idxprom6
				; CHECK-NEXT: store i32 %add5, i32* %arrayidx7, align 4
				;
				; CHECK: polly.stmt.for.body.f:
				; CHECK: %scevgep = getelementptr i32, i32* %A, i64 %polly.indvar
				; CHECK: %0 = trunc i64 %polly.indvar to i32
				; CHECK: %1 = shl i32 %0, 4
				; CHECK: store i32 %1, i32* %scevgep
				;
				; void f(int *A) {
				; for (int i = 0; i < 1000; i++) {
				; int a = i * 2;
				; /* split BB */
				; int b = a + a;
				; /* split BB */
				; int c = b + a;
				; /* split BB */
				; int d = a + a;
				; /* split BB */
				; int e = a + b + c + d;
				; /* split BB */
				; A[i] = e;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A) {
				entry:
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i.0, 1000
				br i1 %cmp, label %for.body.a, label %for.end

				for.body.a: ; preds = %for.cond
				%tmp = mul nsw i32 %i.0, 2
				br label %for.body.b

				for.body.b:
				%add = add nsw i32 %tmp, %tmp
				br label %for.body.c

				for.body.c:
				%add1 = add nsw i32 %add, %tmp
				br label %for.body.d

				for.body.d:
				%add2 = add nsw i32 %tmp, %tmp
				br label %for.body.e

				for.body.e:
				%add3 = add nsw i32 %tmp, %add
				%add4 = add nsw i32 %add3, %add1
				%add5 = add nsw i32 %add4, %add2
				br label %for.body.f

				for.body.f:
				%idxprom6 = sext i32 %i.0 to i64
				%arrayidx7 = getelementptr inbounds i32, i32* %A, i64 %idxprom6
				store i32 %add5, i32* %arrayidx7, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/Isl/CodeGen/eliminate-scalars-with-outside-load.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s --check-prefix=CODEGEN
				;
				; Verify that we will virtually move %mul but also the read of %tmp to the
				; for.body.split block.
				;
				; CHECK: Stmt_for_body_split
				; CHECK-NOT: MemRef_mul
				; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK: { Stmt_for_body_split[i0] -> MemRef_tmp[] };
				; CHECK-NOT: MemRef_mul
				; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
				; CHECK: { Stmt_for_body_split[i0] -> MemRef_A[i0] };
				; CHECK-NOT: MemRef_mul
				;
				; CODEGEN: polly.stmt.for.body.split:
				; CODEGEN-NEXT: %p_mul = fmul float %tmp, 2.000000e+00
				; // Unused:
				; CODEGEN-NEXT: %tmp.s2a.reload = load float, float* %tmp.s2a
				;
				; CODEGEN-NEXT: %scevgep = getelementptr float, float* %A, i64 %polly.indvar
				; CODEGEN-NEXT: store float %p_mul,
				;
				; void f(float *A) {
				; float x = A[-1];
				; for (int i = 0; i < 1000; i++) {
				; float a = x * 2;
				; /* split BB */
				; A[i] = a;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(float* %A) {
				entry:
				%arrayidx = getelementptr inbounds float, float* %A, i64 -1
				%tmp = load float, float* %arrayidx, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %entry ]
				%exitcond = icmp ne i64 %indvars.iv, 1000
				br i1 %exitcond, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%mul = fmul float %tmp, 2.000000e+00
				br label %for.body.split

				for.body.split: ; preds = %for.cond
				%arrayidx1 = getelementptr inbounds float, float* %A, i64 %indvars.iv
				store float %mul, float* %arrayidx1, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/Isl/CodeGen/srem-in-other-bb.ll

	; RUN: opt %loadPolly -polly-codegen -S \			; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s
				MeinersburUnsubmitted Done Reply Inline Actions Nice, but unrelated Meinersbur: Nice, but unrelated
				jdoerfertAuthorUnsubmitted Not Done Reply Inline Actions I probably forogot that while I prepared this commit and all the test cases. I'm sorry. jdoerfert: I probably forogot that while I prepared this commit and all the test cases. I'm sorry.
	; RUN: < %s \| FileCheck %s
	;			;
	; void pos(float *A, long n) {			; void pos(float *A, long n) {
	; for (long i = 0; i < 100; i++)			; for (long i = 0; i < 100; i++)
	; A[n % 42] += 1;			; A[n % 42] += 1;
	; }			; }
	;			;
	; CHECK: polly.stmt.bb2:
	; CHECK-NEXT: %p_tmp = srem i64 %n, 42
	; CHECK-NEXT: store i64 %p_tmp, i64* %tmp.s2a
	;
	; CHECK: polly.stmt.bb3:			; CHECK: polly.stmt.bb3:
	; CHECK: %tmp.s2a.reload = load i64, i64* %tmp.s2a			; CHECK: %[[rem:[._a-zA-Z0-9]*]] = srem i64 %n, 42
	; CHECK: %p_tmp3 = getelementptr inbounds float, float* %A, i64 %tmp.s2a.reload			; CHECK: getelementptr inbounds float, float* %A, i64 %[[rem]]

	define void @pos(float* %A, i64 %n) {			define void @pos(float* %A, i64 %n) {
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb6, %bb			bb1: ; preds = %bb6, %bb
	%i.0 = phi i64 [ 0, %bb ], [ %tmp7, %bb6 ]			%i.0 = phi i64 [ 0, %bb ], [ %tmp7, %bb6 ]
	%exitcond = icmp ne i64 %i.0, 100			%exitcond = icmp ne i64 %i.0, 100
	Show All 20 Lines

test/ScopInfo/eliminate-scalar-caused-by-load-reduction-2.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -S < %s \| FileCheck %s --check-prefix=CODEGEN
				;
				; This is a negative test. We should move the load to the split block
				; and remove all scalar accesses, however at the moment we only move
				; instructions that are trivially safe to move. All three checks should
				; be negated at some point. This also checks that we currently not try to
				; move a part of the scalar operand chain, i.e., the %add instruction.
				;
				; CHECK: Scalar: 1
				; CHECK: Scalar: 1
				; CHECK-NOT: Reduction: +
				;
				; These checks should stay as they verify we did not modify the original region:
				;
				; CODEGEN: for.body.split:
				; CODEGEN-NEXT: %arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				; CODEGEN-NEXT: store i32 %add, i32* %arrayidx2, align 4
				;
				; void f(int *A) {
				; for (int i = 0; i < 1000; i++) {
				; int x = A[i] + 3;
				; /* split BB */
				; A[i] = x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A) {
				entry:
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %entry ]
				%exitcond = icmp ne i64 %indvars.iv, 1000
				br i1 %exitcond, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %tmp, 3
				br label %for.body.split

				for.body.split:
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				store i32 %add, i32* %arrayidx2, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/ScopInfo/eliminate-scalar-caused-by-load-reduction.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				;
				; This is a negative test. We should move the load to the split block
				; and remove all scalar accesses, however at the moment we only move
				; instructions that are trivially safe to move. All three checks should
				; be negated at some point.
				;
				; CHECK: Scalar: 1
				; CHECK: Scalar: 1
				; CHECK-NOT: Reduction: +
				;
				; void f(int *A) {
				; for (int i = 0; i < 1000; i++) {
				; int x = A[i];
				; /* split BB */
				; A[i] = x + 3;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A) {
				entry:
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %entry ]
				%exitcond = icmp ne i64 %indvars.iv, 1000
				br i1 %exitcond, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp = load i32, i32* %arrayidx, align 4
				br label %for.body.split

				for.body.split:
				%add = add nsw i32 %tmp, 3
				%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				store i32 %add, i32* %arrayidx2, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/ScopInfo/inter_bb_scalar_dep.ll

; RUN: opt %loadPolly -basicaa -polly-scops -analyze < %s \| FileCheck %s		; RUN: opt %loadPolly -basicaa -polly-scops -analyze < %s \| FileCheck %s
		; RUN: opt %loadPolly -basicaa -polly-codegen -analyze < %s

; void f(long A[], int N, int *init_ptr) {		; void f(long A[], int N, int *init_ptr) {
; long i, j;		; long i, j;
;		;
; for (i = 0; i < N; ++i) {		; for (i = 0; i < N; ++i) {
; init = *init_ptr;		; init = *init_ptr;
; for (i = 0; i < N; ++i) {		; for (i = 0; i < N; ++i) {
; A[i] = init + 2;		; A[i] = init + 2;
Show All 21 Lines	entry.next: ; preds = %for.i
%init = load i64, i64* %init_ptr		%init = load i64, i64* %init_ptr
; CHECK-NOT: Stmt_entry_next		; CHECK-NOT: Stmt_entry_next
br label %for.j		br label %for.j

for.j: ; preds = %for.j, %entry.next		for.j: ; preds = %for.j, %entry.next
%indvar.j = phi i64 [ 0, %entry.next ], [ %indvar.j.next, %for.j ]		%indvar.j = phi i64 [ 0, %entry.next ], [ %indvar.j.next, %for.j ]
%init_plus_two = add i64 %init, 2		%init_plus_two = add i64 %init, 2
; CHECK-LABEL: Stmt_for_j		; CHECK-LABEL: Stmt_for_j
; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 1]		; CHECK-NOT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_init[] };		; CHECK-NOT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_init[] };
; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]		; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };		; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };
		; CHECK-NOT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
%scevgep = getelementptr i64, i64* %A, i64 %indvar.j		%scevgep = getelementptr i64, i64* %A, i64 %indvar.j
store i64 %init_plus_two, i64* %scevgep		store i64 %init_plus_two, i64* %scevgep
%indvar.j.next = add nsw i64 %indvar.j, 1		%indvar.j.next = add nsw i64 %indvar.j, 1
%exitcond.j = icmp eq i64 %indvar.j.next, %N		%exitcond.j = icmp eq i64 %indvar.j.next, %N
br i1 %exitcond.j, label %for.i.end, label %for.j		br i1 %exitcond.j, label %for.i.end, label %for.j

for.i.end: ; preds = %for.j		for.i.end: ; preds = %for.j
%exitcond.i = icmp eq i64 %indvar.i.next, %N		%exitcond.i = icmp eq i64 %indvar.i.next, %N
br i1 %exitcond.i, label %return, label %for.i		br i1 %exitcond.i, label %return, label %for.i

return: ; preds = %for.i.end		return: ; preds = %for.i.end
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

test/ScopInfo/intra_and_inter_bb_scalar_dep.ll

	Show All 18 Lines
	; CHECK-NEXT: }			; CHECK-NEXT: }
	;			;
	; CHECK: Statements {			; CHECK: Statements {
	; CHECK-NEXT: Stmt_for_j			; CHECK-NEXT: Stmt_for_j
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_init[] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };
	; CHECK-NEXT: }			; CHECK-NEXT: }

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"

	define void @f(i64* noalias %A, i64 %N, i64* noalias %init_ptr) #0 {			define void @f(i64* noalias %A, i64 %N, i64* noalias %init_ptr) #0 {
	entry:			entry:
	Show All 30 Lines

test/ScopInfo/invariant-loads-leave-read-only-statements.ll

	; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
	; RUN: opt %loadPolly -polly-codegen -analyze < %s			; RUN: opt %loadPolly -polly-codegen -analyze < %s

	; CHECK: Statements {			; CHECK: Statements {
	; CHECK-NEXT: Stmt_top_split			; CHECK-NEXT: Stmt_top_split
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] -> [0, 0, 0, 0] };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] -> [0, 0, 0, 0] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] -> MemRef_26[] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] -> MemRef_25[] };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_top_split[] -> MemRef_25[] };
	; CHECK-NEXT: Stmt_L_4			; CHECK-NEXT: Stmt_L_4
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] : 0 <= i0 < p_0 and 0 <= i1 < p_0 and 0 <= i2 < p_0 };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] : 0 <= i0 < p_0 and 0 <= i1 < p_0 and 0 <= i2 < p_0 };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] -> [1, i0, i1, i2] };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] -> [1, i0, i1, i2] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] -> MemRef_19[i1, i0] };			; CHECK-NEXT: [p_0, p_1, p_2] -> { Stmt_L_4[i0, i1, i2] -> MemRef_19[i1, i0] };
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

test/ScopInfo/multidim_fortran_srem.ll

	; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
	target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

	; CHECK: Statements {			; CHECK: Statements {
	; CHECK-NEXT: Stmt_bb188
	; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb188[i0] : 0 <= i0 <= -3 + tmp183 };
	; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb188[i0] -> [i0, 0, 0, 0] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb188[i0] -> MemRef_tmp192[] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb188[i0] -> MemRef_tmp194[] };
	; CHECK-NEXT: Stmt_bb203			; CHECK-NEXT: Stmt_bb203
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] : 0 <= i0 <= -3 + tmp183 and 0 <= i1 <= -3 + tmp180 and 0 <= i2 <= -3 + tmp177 };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] : 0 <= i0 <= -3 + tmp183 and 0 <= i1 <= -3 + tmp180 and 0 <= i2 <= -3 + tmp177 };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> [i0, 1, i1, i2] };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> [i0, i1, i2] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp192[] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp173[o0, 1 + i1, 1 + i2] : 3*floor((-i0 + o0)/3) = -i0 + o0 and 0 <= o0 <= 2 };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp173[o0, 1 + i1, 1 + i2] : 3*floor((-i0 + o0)/3) = -i0 + o0 and 0 <= o0 <= 2 };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp194[] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp173[o0, 1 + i1, 1 + i2] : 3*floor((-2 - i0 + o0)/3) = -2 - i0 + o0 and 0 <= o0 <= 2 };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_tmp173[o0, 1 + i1, 1 + i2] : 3*floor((-2 - i0 + o0)/3) = -2 - i0 + o0 and 0 <= o0 <= 2 };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_arg56[1 + i0, 1 + i1, 1 + i2] };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_arg56[1 + i0, 1 + i1, 1 + i2] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_arg55[1 + i0, 1 + i1, 1 + i2] };			; CHECK-NEXT: [tmp180, tmp177, tmp183, tmp162, tmp157, tmp150, tmp146, tmp140, tmp] -> { Stmt_bb203[i0, i1, i2] -> MemRef_arg55[1 + i0, 1 + i1, 1 + i2] };
	; CHECK-NEXT: }			; CHECK-NEXT: }

	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

test/ScopInfo/out-of-scop-use-in-region-entry-phi-node.ll

	; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s

	; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]			; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [p_0] -> { Stmt_bb3[] -> MemRef_tmp5[] };			; CHECK-NEXT: [p_0] -> { Stmt_bb3[] -> MemRef_tmp5[] };
				; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
				; CHECK-NEXT: [p_0] -> { Stmt_bb3[] -> MemRef_tmp7[] };

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @hoge() {			define void @hoge() {
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb			bb2: ; preds = %bb
	Show All 23 Lines

test/ScopInfo/scalar_dependence_cond_br.ll

	; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
	;			;
	; void f(int *A, int c, int d) {			; void f(int *A, int c, int d) {
	; for (int i = 0; i < 1024; i++)			; for (int i = 0; i < 1024; i++)
	; if (c < i)			; if (c < i)
	; A[i]++;			; A[i]++;
	; }			; }
	;			;
	; FIXME: This test is a negative test until we have an independent blocks alternative.
	;
	; We should move operands as close to their use as possible, hence in this case			; We should move operands as close to their use as possible, hence in this case
	; there should not be any scalar dependence anymore after %cmp1 is moved to			; there should not be any scalar dependence anymore after %cmp1 is virtually
	; %for.body (%c and %indvar.iv are synthesis able).			; moved to %for.body (%c and %indvar.iv are synthesis able).
	;			;
	; CHECK: [Scalar: 1]			; CHECK-NOT: [Scalar: 1]
	;			;
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @f(i32* %A, i64 %c) {			define void @f(i32* %A, i64 %c) {
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.inc, %entry			for.cond: ; preds = %for.inc, %entry
	Show All 25 Lines

test/ScopInfo/schedule-const-post-dominator-walk.ll

	; RUN: opt %loadPolly -analyze -polly-scops < %s \| FileCheck %s			; RUN: opt %loadPolly -analyze -polly-scops < %s \| FileCheck %s
				; RUN: opt %loadPolly -disable-output -polly-codegen < %s

	; CHECK: { Stmt_bb3[i0] -> [0, 0] };			; CHECK: { Stmt_bb3[i0] -> [0, 0] };
	; CHECK: { Stmt_bb2[] -> [1, 0] };			; CHECK: { Stmt_bb2[] -> [1, 0] };

	; Verify that we generate the correct schedule. In older versions of Polly,			; Verify that we generate the correct schedule. In older versions of Polly,
	; we generated an incorrect schedule:			; we generated an incorrect schedule:
	;			;
	; { Stmt_bb3[i0] -> [1, 0]; Stmt_bb2[] -> [0, 0] }			; { Stmt_bb3[i0] -> [1, 0]; Stmt_bb2[] -> [0, 0] }
	Show All 25 Lines

test/ScopInfo/tempscop-printing.ll

	; RUN: opt %loadPolly -basicaa -polly-scops -analyze < %s \| FileCheck %s			; RUN: opt %loadPolly -basicaa -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -basicaa -polly-codegen -analyze < %s

	; void f(long A[], int N, int *init_ptr) {			; void f(long A[], int N, int *init_ptr) {
	; long i, j;			; long i, j;
	;			;
	; for (i = 0; i < N; ++i) {			; for (i = 0; i < N; ++i) {
	; init = *init_ptr;			; init = *init_ptr;
	; for (i = 0; i < N; ++i) {			; for (i = 0; i < N; ++i) {
	; A[i] = init + 2;			; A[i] = init + 2;
	; }			; }
	; }			; }
	; }			; }

	; CHECK-LABEL: Function: f			; CHECK-LABEL: Function: f
	;			;
	; CHECK: Statements {			; CHECK: Statements {
	; CHECK-NEXT: Stmt_for_j			; CHECK-NEXT: Stmt_for_j
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_init[] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };
	; CHECK-NEXT: }			; CHECK-NEXT: }
	;			;
	; CHECK-LABEL: Function: g			; CHECK-LABEL: Function: g
	;			;
	; CHECK: Statements {			; CHECK: Statements {
	; CHECK-NEXT: Stmt_for_j			; CHECK-NEXT: Stmt_for_j
	; CHECK-NEXT: Domain :=			; CHECK-NEXT: Domain :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] : 0 <= i0 < N and 0 <= i1 < N };
	; CHECK-NEXT: Schedule :=			; CHECK-NEXT: Schedule :=
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> [i0, i1] };
	; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]			; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };			; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_A[i1] };
	; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 1]
	; CHECK-NEXT: [N] -> { Stmt_for_j[i0, i1] -> MemRef_init[] };
	; CHECK-NEXT: }			; CHECK-NEXT: }

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"


	define void @f(i64* noalias %A, i64 %N, i64* noalias %init_ptr) nounwind {			define void @f(i64* noalias %A, i64 %N, i64* noalias %init_ptr) nounwind {
	entry:			entry:
	br label %for.i			br label %for.i
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/ScopInfo/unneeded_scalar_dependences-1.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; CHECK-NOT: Scalar: 1
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; int x = i + 1;
				; for (int j = 0; j < N; j++)
				; A[i] += A[j];
				; A[i] += x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc.10, %entry
				%indvars.iv1 = phi i64 [ %indvars.iv.next2, %for.inc.10 ], [ 0, %entry ]
				%indvars.iv.next2 = add nuw nsw i64 %indvars.iv1, 1
				%tmp6 = trunc i64 %indvars.iv.next2 to i32
				%cmp = icmp slt i64 %indvars.iv1, %tmp
				br i1 %cmp, label %for.body, label %for.end.12

				for.body: ; preds = %for.cond
				br label %for.cond.1

				for.cond.1: ; preds = %for.inc, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %for.body ]
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp ne i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.body.3, label %for.end

				for.body.3: ; preds = %for.cond.1
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp3 = load i32, i32* %arrayidx, align 4
				%arrayidx5 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv1
				%tmp4 = load i32, i32* %arrayidx5, align 4
				%add6 = add nsw i32 %tmp4, %tmp3
				store i32 %add6, i32* %arrayidx5, align 4
				br label %for.inc

				for.inc: ; preds = %for.body.3
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond.1

				for.end: ; preds = %for.cond.1
				%arrayidx8 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv1
				%tmp5 = load i32, i32* %arrayidx8, align 4
				%add9 = add nsw i32 %tmp5, %tmp6
				store i32 %add9, i32* %arrayidx8, align 4
				br label %for.inc.10

				for.inc.10: ; preds = %for.end
				br label %for.cond

				for.end.12: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-2.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; CHECK-NOT: Scalar: 1
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; int x = i + 3;
				; for (int j = 0; j < N; j++)
				; A[i] += A[x + j];
				; A[x] += x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc.11, %entry
				%indvars.iv2 = phi i64 [ %indvars.iv.next3, %for.inc.11 ], [ 0, %entry ]
				%cmp = icmp slt i64 %indvars.iv2, %tmp
				br i1 %cmp, label %for.body, label %for.end.13

				for.body: ; preds = %for.cond
				%tmp5 = add nuw nsw i64 %indvars.iv2, 3
				br label %for.cond.1

				for.cond.1: ; preds = %for.inc, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %for.body ]
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp ne i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.body.3, label %for.end

				for.body.3: ; preds = %for.cond.1
				%tmp6 = add nuw nsw i64 %tmp5, %indvars.iv
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %tmp6
				%tmp7 = load i32, i32* %arrayidx, align 4
				%arrayidx6 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv2
				%tmp8 = load i32, i32* %arrayidx6, align 4
				%add7 = add nsw i32 %tmp8, %tmp7
				store i32 %add7, i32* %arrayidx6, align 4
				br label %for.inc

				for.inc: ; preds = %for.body.3
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond.1

				for.end: ; preds = %for.cond.1
				%arrayidx9 = getelementptr inbounds i32, i32* %A, i64 %tmp5
				%tmp9 = load i32, i32* %arrayidx9, align 4
				%tmp10 = trunc i64 %tmp5 to i32
				%add10 = add nsw i32 %tmp9, %tmp10
				store i32 %add10, i32* %arrayidx9, align 4
				br label %for.inc.11

				for.inc.11: ; preds = %for.end
				%indvars.iv.next3 = add nuw nsw i64 %indvars.iv2, 1
				br label %for.cond

				for.end.13: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-3.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; CHECK-NOT: Scalar: 1
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; int x = i + 3;
				; for (int j = 0; j < N; j++)
				; A[i] += x + j * x;
				; A[x] += x * x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc.10, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc.10 ], [ 0, %entry ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end.12

				for.body: ; preds = %for.cond
				%tmp2 = add nuw nsw i64 %indvars.iv, 3
				br label %for.cond.1

				for.cond.1: ; preds = %for.inc, %for.body
				%j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]
				%exitcond = icmp ne i32 %j.0, %N
				br i1 %exitcond, label %for.body.3, label %for.end

				for.body.3: ; preds = %for.cond.1
				%tmp3 = trunc i64 %tmp2 to i32
				%mul = mul nsw i32 %j.0, %tmp3
				%tmp4 = trunc i64 %tmp2 to i32
				%add4 = add nsw i32 %tmp4, %mul
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp5 = load i32, i32* %arrayidx, align 4
				%add5 = add nsw i32 %tmp5, %add4
				store i32 %add5, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body.3
				%inc = add nuw nsw i32 %j.0, 1
				br label %for.cond.1

				for.end: ; preds = %for.cond.1
				%tmp6 = trunc i64 %tmp2 to i32
				%mul6 = mul nsw i32 %tmp6, %tmp6
				%arrayidx8 = getelementptr inbounds i32, i32* %A, i64 %tmp2
				%tmp7 = load i32, i32* %arrayidx8, align 4
				%add9 = add nsw i32 %tmp7, %mul6
				store i32 %add9, i32* %arrayidx8, align 4
				br label %for.inc.10

				for.inc.10: ; preds = %for.end
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end.12: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-4.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; CHECK-NOT: MemRef_cond
				; CHECK-NOT: Scalar: 1
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; int x = i > 42 ? i - 3 : i;
				; for (int j = 0; j < N; j++)
				; A[i] += x + j * x;
				; A[i] += x * x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc.10, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc.10 ], [ 0, %entry ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc11, %for.inc.10 ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end.12

				for.body: ; preds = %for.cond
				%cmp1 = icmp sgt i64 %indvars.iv, 42
				%tmp2 = trunc i64 %indvars.iv to i32
				%sub = add nsw i32 %i.0, -3
				%cond = select i1 %cmp1, i32 %sub, i32 %tmp2
				br label %for.cond.2

				for.cond.2: ; preds = %for.inc, %cond.end
				%j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]
				%cmp3 = icmp slt i32 %j.0, %N
				br i1 %cmp3, label %for.body.4, label %for.end

				for.body.4: ; preds = %for.cond.2
				%mul = mul nsw i32 %j.0, %cond
				%add = add nsw i32 %cond, %mul
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp4 = load i32, i32* %arrayidx, align 4
				%add5 = add nsw i32 %tmp4, %add
				store i32 %add5, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body.4
				%inc = add nuw nsw i32 %j.0, 1
				br label %for.cond.2

				for.end: ; preds = %for.cond.2
				%mul6 = mul nsw i32 %cond, %cond
				%arrayidx8 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp5 = load i32, i32* %arrayidx8, align 4
				%add9 = add nsw i32 %tmp5, %mul6
				store i32 %add9, i32* %arrayidx8, align 4
				br label %for.inc.10

				for.inc.10: ; preds = %for.end
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%inc11 = add nuw nsw i32 %i.0, 1
				br label %for.cond

				for.end.12: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-5.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; CHECK-NOT: MemRef_conv
				; CHECK-NOT: Scalar: 1
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; float x = i + 3.3;
				; for (int j = 0; j < N; j++)
				; A[i] += x + j * x;
				; A[i] += x * x;
				; }
				; }
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc.17, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc.17 ], [ 0, %entry ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end.19

				for.body: ; preds = %for.cond
				%tmp1 = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp1 to double
				%add = fadd double %conv, 3.300000e+00
				%conv1 = fptrunc double %add to float
				br label %for.cond.2

				for.cond.2: ; preds = %for.inc, %for.body
				%j.0 = phi i32 [ 0, %for.body ], [ %inc, %for.inc ]
				%exitcond = icmp ne i32 %j.0, %N
				br i1 %exitcond, label %for.body.5, label %for.end

				for.body.5: ; preds = %for.cond.2
				%conv6 = sitofp i32 %j.0 to float
				%mul = fmul float %conv6, %conv1
				%add7 = fadd float %conv1, %mul
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp2 = load i32, i32* %arrayidx, align 4
				%conv8 = sitofp i32 %tmp2 to float
				%add9 = fadd float %conv8, %add7
				%conv10 = fptosi float %add9 to i32
				store i32 %conv10, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body.5
				%inc = add nuw nsw i32 %j.0, 1
				br label %for.cond.2

				for.end: ; preds = %for.cond.2
				%mul11 = fmul float %conv1, %conv1
				%arrayidx13 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp3 = load i32, i32* %arrayidx13, align 4
				%conv14 = sitofp i32 %tmp3 to float
				%add15 = fadd float %conv14, %mul11
				%conv16 = fptosi float %add15 to i32
				store i32 %conv16, i32* %arrayidx13, align 4
				br label %for.inc.17

				for.inc.17: ; preds = %for.end
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end.19: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-6.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; void f(int *A, int N) {
				; for (int i = 1; i < N; i++) {
				; int preloaded = A[0];
				; /* split BB */
				; A[i] += preloaded;
				; }
				; }
				;
				; CHECK: Invariant Accesses: {
				; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0]
				; CHECK: [N] -> { Stmt_for_body[i0] -> MemRef_A[0] };
				; CHECK: Execution Context: [N] -> { : N >= 2 }
				; CHECK: }
				;
				; CHECK-NOT: Scalar: 1
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 1, %entry ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%tmp1 = load i32, i32* %A, align 4
				br label %for.body.split

				for.body.split:
				%arrayidx1 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				%tmp2 = load i32, i32* %arrayidx1, align 4
				%add = add nsw i32 %tmp2, %tmp1
				store i32 %add, i32* %arrayidx1, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-7.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; void f(int *A, int N) {
				; for (int i = 0; i < N; i++) {
				; int scevable_addr = 2 * i + N;
				; /* split BB */
				; A[scevable_addr] += i;
				; }
				; }
				;
				; CHECK-NOT: Scalar: 1
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %entry ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%tmp1 = trunc i64 %indvars.iv to i32
				%mul = shl nsw i32 %tmp1, 1
				%add = add nsw i32 %mul, %N
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom
				br label %for.body.split

				for.body.split:
				%tmp2 = load i32, i32* %arrayidx, align 4
				%tmp3 = trunc i64 %indvars.iv to i32
				%add1 = add nsw i32 %tmp2, %tmp3
				store i32 %add1, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-8.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; void f(int *A, int N) {
				; int pre_scevable_addr = 2 * N + (N - 1);
				; for (int i = 0; i < N; i++) {
				; int scevable_addr = 2 * i + pre_scevable_addr;
				; /* split BB */
				; A[scevable_addr] += i;
				; }
				; }
				;
				; CHECK-NOT: Scalar: 1
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				%m = mul nsw i32 %N, 2
				%s = sub nsw i32 %N, 1
				%pre_scevable_addr = add nsw i32 %m, %s
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %entry ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%tmp1 = trunc i64 %indvars.iv to i32
				%mul = shl nsw i32 %tmp1, 1
				%add = add nsw i32 %mul, %pre_scevable_addr
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom
				br label %for.body.split

				for.body.split:
				%tmp2 = load i32, i32* %arrayidx, align 4
				%tmp3 = trunc i64 %indvars.iv to i32
				%add1 = add nsw i32 %tmp2, %tmp3
				store i32 %add1, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

test/ScopInfo/unneeded_scalar_dependences-9.ll

This file was added.

				; RUN: opt %loadPolly -polly-scops -analyze < %s \| FileCheck %s
				; RUN: opt %loadPolly -polly-codegen -analyze < %s
				;
				; void f(int *A, int N) {
				; if (N > 42) {
				; int pre_scevable_addr = 2 * N + (N - 1);
				; for (int i = 0; i < N; i++) {
				; int scevable_addr = 2 * i + pre_scevable_addr;
				; /* split BB */
				; A[scevable_addr] += i;
				; }
				; }
				; }
				;
				; CHECK-NOT: Scalar: 1
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @f(i32* %A, i32 %N) {
				entry:
				%tmp = sext i32 %N to i64
				%m = mul nsw i32 %N, 2
				br label %if.cond

				if.cond:
				%s = sub nsw i32 %N, 1
				%c = icmp sgt i32 %N, 42
				br i1 %c, label %if.then, label %for.end

				if.then:
				%pre_scevable_addr = add nsw i32 %m, %s
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 0, %if.then ]
				%cmp = icmp slt i64 %indvars.iv, %tmp
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%tmp1 = trunc i64 %indvars.iv to i32
				%mul = shl nsw i32 %tmp1, 1
				%add = add nsw i32 %mul, %pre_scevable_addr
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom
				br label %for.body.split

				for.body.split:
				%tmp2 = load i32, i32* %arrayidx, align 4
				%tmp3 = trunc i64 %indvars.iv to i32
				%add1 = add nsw i32 %tmp2, %tmp3
				store i32 %add1, i32* %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Create virtual independent blocksAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48642

include/polly/CodeGen/BlockGenerators.h

include/polly/ScopInfo.h

lib/Analysis/ScopInfo.cpp

lib/CodeGen/BlockGenerators.cpp

test/Isl/CodeGen/OpenMP/invariant_base_pointer_preloaded_different_bb.ll

test/Isl/CodeGen/eliminate-multiple-scalar-fp-reads.ll

test/Isl/CodeGen/eliminate-multiple-scalar-reads.ll

test/Isl/CodeGen/eliminate-scalars-with-outside-load.ll

test/Isl/CodeGen/srem-in-other-bb.ll

test/ScopInfo/eliminate-scalar-caused-by-load-reduction-2.ll

test/ScopInfo/eliminate-scalar-caused-by-load-reduction.ll

test/ScopInfo/inter_bb_scalar_dep.ll

test/ScopInfo/intra_and_inter_bb_scalar_dep.ll

test/ScopInfo/invariant-loads-leave-read-only-statements.ll

test/ScopInfo/multidim_fortran_srem.ll

test/ScopInfo/out-of-scop-use-in-region-entry-phi-node.ll

test/ScopInfo/scalar_dependence_cond_br.ll

test/ScopInfo/schedule-const-post-dominator-walk.ll

test/ScopInfo/tempscop-printing.ll

test/ScopInfo/unneeded_scalar_dependences-1.ll

test/ScopInfo/unneeded_scalar_dependences-2.ll

test/ScopInfo/unneeded_scalar_dependences-3.ll

test/ScopInfo/unneeded_scalar_dependences-4.ll

test/ScopInfo/unneeded_scalar_dependences-5.ll

test/ScopInfo/unneeded_scalar_dependences-6.ll

test/ScopInfo/unneeded_scalar_dependences-7.ll

test/ScopInfo/unneeded_scalar_dependences-8.ll

test/ScopInfo/unneeded_scalar_dependences-9.ll

[Polly] Create virtual independent blocks
AbandonedPublic