Even with the scalar/PHI modeling and code generation we still utilized the independent blocks pass to move scalars around in order to decrease the dependences caused by them. However, this has two major drawbacks: o Polly could not be used as analysis only pass by default and without the independent blocks pass the result suffered. o The independent blocks pass had limited knowledge about domains as well as accesses. Both limited the pass to simple movement/re-computation of scalars that can be trivially moved. This patch is a drop-in replacement for the independent blocks pass that first virtually moves/re-computes scalars in the SCoP description. Only in the code generation re-computation will actually take place. The logic is split in three traversals of the scalars in the SCoP: 1) Collect interesting operands for each scalar definition that escapes the block it resides in. 2) For each scalar use, check if the definition can be recomputed in its statement based on the interesting operands of the definition. If so, remove the access and remember that the definition has to be recomputed later. 3) For each scalar definition, check if all uses have been eliminated through virtual re-computation. If so, remove the access as it does not need to be communicated anymore. Subsequent patches can increase the capabilities easily as the domain and access information is already available. -------------------------- REVIEW NOTES -------------------------- - lnt & unit tests are green - We had 245 scalar access in the test suite before, now 236. In total this logic eliminated 21 trivially recomputable scalars and allowed to eliminate 54 instead of 49 statements. - The buildbots show the following compile/execution time changes for a similar but prior version of this patch: Compile time: Improvements : Δ : Previous : Current : σ .../bmm : -71.43% : 0.6440 : 0.1840 : 0.0054 .../paq8p : -12.48% : 5.8644 : 5.1323 : 0.0181 .../consumer-lame : -10.13% : 10.3444 : 9.2961 : 0.0376 .../sieve : -5.61% : 0.4280 : 0.4040 : 0.0093 .../espresso : -1.35% : 10.3841 : 10.2441 : 0.0395 ..//Recurrences-dbl : -1.24% : 4.1923 : 4.1403 : 0.0150 .../471_omnetpp : -1.08% : 42.1606 : 41.7046 : 0.1013 Execution time: Regressions : Δ : Previous : Current : σ .../salsa20 : 4.32% : 6.6684 : 6.9564 : 0.0059 .../GlobalDataFlow-dbl : 1.53% : 4.6963 : 4.7683 : 0.0047 Improvements : Δ : Previous : Current : σ ..//viterbi : -1.61% : 2.7402 : 2.6962 : 0.0072
Details
- Reviewers
etherzhhb Meinersbur grosser bollu
Diff Detail
Event Timeline
Hi Johannes,
great to see we will finally get rid of the IndependentBlocks pass. I just had a very first skim over this patch, but did not manage to do a full review yet.
The two main items I would still like to look into:
- This seems to conflict with http://reviews.llvm.org/D12975 (Michael's DE-LICM)
I need to get a better understanding in which ways these patches possibly conflict (as well as the overall design).
- Post-processing statements vs. instruction-list ScopStmts
Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.
Best,
Tobias
include/polly/ScopInfo.h | ||
---|---|---|
970 | that needs to be recomputed | |
lib/Analysis/ScopInfo.cpp | ||
1409–1410 | use is / uses are | |
2794 | "The latter to add" does not seem to make sense grammatically. | |
2919 | grammar | |
2924 | I am not yet convinced an exit ScopStmt is something we would like to have. Maybe leave this for another patch review. |
To add some details:
D12975 tracks which values flow between ScopStmts which this patch modifies at a very late phase. There should be fewer such implicit flows, but some of those flows might have already be created with origin MAPPED. The array element occupied by this MAPPED flow might be used for some other implicit flow, meaning De-LICM before virtual blocks is ineffective. It is also more difficult to modify MAPPEDs because lifetime issues have to be considered and there is no distinction between SCALAR and PHI origins (not sure how much of a problem this is though).
One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.
So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
1420 | I do not understand this code. Why did you even remove any comment about what it is doing? | |
2782 | God function antipattern |
Yeah, at the time this was written it was fine but as far as I understand it the MAPPED accesses (and possibly more) implicitly change the semantics of the SCoP description without it beeing actually changed (except the "MAPPED" flag). One could argue that this should look for MAPPED accesses but I don't, instead I will argue that "MAPPED" should not be an access property that is determined on the CFG and only a flag in the access description that turns on some alternate path in the code generation. An access that is not actually happening (no code is generated for) should not be in the SCoP description and the ones in there should happen where they are and read/write the location they define. If that is the case this patch should work properly.
- Post-processing statements vs. instruction-list ScopStmts
Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.
Processing the SCoP allows general solutions which are not (easily) reproduceable on the CFG, e.g., determining if a load is overwritten is easy and precise but hard on the CFG. The instruction list is a different issue and it would remove not that much code/logic anyway. With the pending patches by Michael we will reload scalars at the beginning of a statmeent and as I pointed out on that review, alsmost the same is added here. While I did not get a valuable response on the idea (and implementation sketch) to merge the recomputation and the scalar reload at the beginning of a statement, I think it is what we ultimately want.
See the comment on
One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.
While moving, copying and deleting MemoryAccesses sound like a lot, one has to remember the number of times this can happen and the cost of each operation.
I think the fact that we get generally better results with this patch than without (see commit message) shows that the "overhead" is low enough.
The implementation difficulty is another issue. While this patch currently only forwards/copies scalars its general structure allows more, e.g., reload values instead of communicate them (see subsequent patch) or remove conditional PHI nodes and replace them by selects. While e.g., reloading is generally hard and costly on the CFG level (one has to determine if the value has been overwritten somewhere in between) the SCoP abstraction allows to reason about it in a simple and consise way.
So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.
I already proposed the latter in a review last week but I don't recall a positive answer to that.
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
1420 | This code was basically moved and the original comment was left behind, hence the answer to your question is simple: The comment has to be moved too. | |
2782 | I take this comment should read as: Please split the function in three parts. Sure, can do. |
The semantics of a Scop does not change. What does change is where implicits are stored. Like AccessInstruction and AccessValue and non-affine subregions, it is no information required by the polyhedral model itself, but required to associate polyhedral statments to the IR behind it.
To avoid modifying IR directly, we "virtualize" the relevant metainformation such that the Scop description and IR code do not map directly. Having to redirect loads/stores becomes unavoidable.
- Post-processing statements vs. instruction-list ScopStmts
Your current approach of first building all ScopStmts and then post-process them to eliminate scalar dependences is interesting. However, it requires a special post-processing phase in ScopInfo and additional special-purpose code in ScopStmt and BlockGenerator. I wonder if it may make sense to build ScopStmts from the beginning just as a list of instructions (instead of a basic block) which could then be uniformly processed in the BlockGenerator.
Processing the SCoP allows general solutions which are not (easily) reproduceable on the CFG, e.g., determining if a load is overwritten is easy and precise but hard on the CFG. The instruction list is a different issue and it would remove not that much code/logic anyway. With the pending patches by Michael we will reload scalars at the beginning of a statmeent and as I pointed out on that review, alsmost the same is added here.
I can't find such code in this patch. Where is it?
While I did not get a valuable response on the idea (and implementation sketch) to merge the recomputation and the scalar reload at the beginning of a statement, I think it is what we ultimately want.
What proposal are you refereing to? What scalar reload?
Which coomment?
One of my rationale to do De-LICM before even creating MemoryAccesses is that it is easier to implement (no moving, copying, deleting of MemoryAccesses) and reduced overhead because fewer MemoryAccesses are created in the first place.
While moving, copying and deleting MemoryAccesses sound like a lot, one has to remember the number of times this can happen and the cost of each operation.
I think the fact that we get generally better results with this patch than without (see commit message) shows that the "overhead" is low enough.
I have no general objection to it. However, you occasionally raise such concerns yourself (e.g. D13341) so I try to take them into account.
The implementation difficulty is another issue. While this patch currently only forwards/copies scalars its general structure allows more, e.g., reload values instead of communicate them (see subsequent patch) or remove conditional PHI nodes and replace them by selects. While e.g., reloading is generally hard and costly on the CFG level (one has to determine if the value has been overwritten somewhere in between) the SCoP abstraction allows to reason about it in a simple and consise way.
Such analysis on the polyhedral model requires expensive ISL operations to compute the dependences. One of our intents of simplifying the scop description is to make its analysis run faster. We'd do still to the complicated analysis on the large scop description to get a simpler on and redo the same analysis. I am not saying this that this cannot make sense, but the simplifications we want to perform (IndependentBlocks, DeLICM) are both on the IR level on which reasoning is much cheaper.
So we have the possibilities to either do "simplifyScalarAccesses" before creating MemoryAccesses (e.g. by marking Instructions to where they should be copied) or "De-LICM" after simplifyScalarAccesses.
I already proposed the latter in a review last week but I don't recall a positive answer to that.
AFAIU we about to discuss this.
I have the impression that this very much relies on that accesses (mostly SCALAR READ, but also PHI writes in non-affine subregions) are computed per instruction, whereas in D13762 I tried to make them per-ScopStmt. The difference is that e.g. if there are two reads in the same BB, two loads would be generated. In D13762 I tried to ensure that there is only one load.
This patch seems to skip PHI accesses completely. In San Joe you mentioned that only loop-carried phis would be skipped. Where is that part? Where would DeLICM tell what memory can be reused?
I got the impression we could generalize this with in SCEV expander. A SCEV-like object (we already support SDiv/SRem) could store an entire operand tree of movable instructions and CodeGen would synthesize/materialize it on request. This looks like more streamlined with current mechanisms instead of introducing a new one. What do you think?
include/polly/CodeGen/BlockGenerators.h | ||
---|---|---|
422 | "all" does not match the description below | |
426 | ... where they are used. | |
428 | ... thus ensures that all ... | |
483 | llvm_unreachable is not defined to trigger a trap. | |
488 | Missing documentation? | |
546 | Flag to indicate that the instruction ... | |
include/polly/LinkAllPasses.h | ||
36 ↗ | (On Diff #37007) | Separate patch to remove IndependentBlocks? |
include/polly/ScopInfo.h | ||
516 | We can get the AccList from getStatement()->getParent()->AccFuncMap. Not necessary to pass it as parameter. Rename to "copyTo"? | |
541 | Unrelated? | |
932 | I think we should not depend on the order of accesses. ISL does not understand such an order and depending on it will likely cause some bugs (Think of a load after storing to the same location; do we even handle that case yet?). AFAIK ISL assumes that all loads happen before the stores. | |
940 | OnlyMA=false removes multiple accesses, so move to a "removeAccessesCausedBy" function? | |
974 | It would be better if the class would not expose its implementation as an API. It could return iterators or an ArrayRef. | |
1275 | It would be more idomatic if each call would check only a single Instruction instead of passing a set implementation. | |
1291 | I think you should clarify that you actually mean a flattened operand _tree_. I think LLVM uses the term operands as the direct arguments/parameters of an instruction. | |
1295 | AFAICS it is only used during the execution of simplifyScalarAccesses. You could make it local and pass it to the functions that need it. | |
1330 | The name "simplifyScalarAccesses" is very general. How about "recomputableDependentScalars"? | |
lib/Analysis/ScopInfo.cpp | ||
667 | What if it has fewer dimensions? If it is impossible, could you add an assertion to make it clear? | |
673 | Doesn't this update InstructionToAccesses? | |
1420 | I'd still ask you to refactor this code. It is hard to understand and looks fragile as it depends on order. Maybe you could use some functions from <algorithm> | |
2778 | What are you plans here? When can instructions with side effects be recomputed? | |
2783 | What is there to see? | |
2788 | Why exit node PHIs processed, but not standard phis? | |
2800 | Why for each access separately? Shouldn't this be per-ScopStmt because it only matters for cross-stmt operands? | |
2815 | auto -> Use ? | |
2817 | This probably will have issues with PHIs in the scop's entry just is in PR25394, ie. if InstOpInst is a PHI with an incoming block from outside the scop. | |
2824 | and continue ? | |
2876 | Wouldn't it be possible to recompute some of the dependent values, even if some operands have side effects? | |
2880 | Why only instructions that have operands from outside? For movable instructions (DependentScalar) without such operands, but implicit accesses, don't we need to remove its memory accesses as well? | |
2885 | This is using the list of accesses this function modifies on the fly ("InstructionToAccess"). As such, it seems to me to depend on the order we iterate over all statements. E.g. the previous iteration already removed the read access of "OutsideUser", hence will not be added to AdditionalAccesses anymore, hence we miss some read accesses. Or am I wrong? Why is it this cannot happen? | |
2890 | What does this condition mean? I think we generally are misusing "BaseAddr" for scalar access, because it actually not an address. | |
2894 | I'd expect this to be computed in collectNonTrivialOperands(), like a partition of instructions in the operand tree into
I understand your algorithm works differently, but to understand, what is the rationale to only have direct operands in DependentScalars, while we want to copy/move an entire operand tree? | |
2932 | If a user has no more accesses, is it because it is intended to be removed? | |
2935 | auto -> MemoryAccess ? | |
2940 | Is this break meant to leave both for-loops? | |
2947 | This seems to be a different kind of "resolved" than in removeRecomputableScalarReads. | |
2963 | collectNonTrivialOperands actually skips PHIs which are also implicit accesses. | |
2972 | hence -> i.e. | |
lib/CodeGen/BlockGenerators.cpp | ||
67 | When does a dependent scalar become a global? | |
86 | Shouldn't these be in the BBMap after recomputeDependentScalars() executed? | |
130 | Why does this become necessary? Shouldn't "trySynthesizeNewValue" determine itself whether it is synthesizable? | |
147 | This essentially becomes assert(TryOnly) return nullptr; | |
169 | What is the rationale for this condition? | |
171 | Shouldn't this been handled by recomputeDependentScalars() already? How do we know that the operand is even a DependentScalar? If this is because DependendScalars is unordered, wouldn't it be cleaner to ensure they are in their order of dependence? | |
226 | I like this change, but could be committed independently. Maybe also as a member function of ScopStmt? | |
228 | This idiom appears multiple times. Introduce a helper function? | |
334 | Any reason to do this before generateScalarLoads()? Naively, I'd think it belongs after it because some if the dependent instructions might still use scalars (e.g. read-only ones). | |
test/Isl/CodeGen/srem-in-other-bb.ll | ||
1 | Nice, but unrelated |
One vs. multiple loads/stores of scalars is only of little difference to us as it will only affect code generation. Everything before, especially dependence analysis, will be the same. Hence, consolidating scalar accesses only to save duplicate loads/stores in the genereted IR (that will be taken care of anyway) is no good reason to make any kind of scalar elimination harder.
This patch seems to skip PHI accesses completely. In San Joe you mentioned that only loop-carried phis would be skipped. Where is that part? Where would DeLICM tell what memory can be reused?
There are 5 patches. This is the first and it only deals with trivially recomputable scalars, thus the onces without any "interesting/complicated operands".
The following patches as well as DeLICM comes into play in the canRecomputeInStmt function which is (almost) the only place that needs to be updated to allow more functionality.
I got the impression we could generalize this with in SCEV expander. A SCEV-like object (we already support SDiv/SRem) could store an entire operand tree of movable instructions and CodeGen would synthesize/materialize it on request. This looks like more streamlined with current mechanisms instead of introducing a new one. What do you think?
If SCEV would allow floating point values maybe, till then, we would implement everything we have here somehow in our SCoPExpander which is not really helpful. Additionally, during the SCoP generation (e.g., when we check if we can synthezise scalars instead of communicating them) we cannot check for overwritten loads and free memory locations we can reuse.
include/polly/CodeGen/BlockGenerators.h | ||
---|---|---|
480 | When we virtually move instructions we are not really interested in the loop surroinding them but their new parent block/statement, thus the change. |
I tried to address (in words) all your comments. Once we agrred on how to proceed I will do code changes.
include/polly/CodeGen/BlockGenerators.h | ||
---|---|---|
426 | I will check that. | |
428 | I will check that. | |
546 | I will check that. | |
include/polly/ScopInfo.h | ||
940 | We could do that. | |
1275 | We could do that. | |
1291 | Where do you expect a tree here? It actually contains transitive operands and I could add that word. | |
1295 | We could do that. | |
lib/Analysis/ScopInfo.cpp | ||
2815 | We could do that. | |
2824 | This looks weird without the follow up commits that introduce some code here. | |
2935 | We could do that. | |
lib/CodeGen/BlockGenerators.cpp | ||
226 | I can commit it a priori but I don't see the need for a ScopStmt member function at the moment. | |
228 | That's something not conflicting or required for this commit we can do. | |
334 | First, by construction we can place it here as all recomputable scalars do not depend on anything we cannot recompute. Second, we can probably move it after the scalar loads generation but it should be all the same. | |
test/Isl/CodeGen/srem-in-other-bb.ll | ||
1 | I probably forogot that while I prepared this commit and all the test cases. I'm sorry. |
include/polly/CodeGen/BlockGenerators.h | ||
---|---|---|
422 | All that are not communicated via memory but need to be recomputed. | |
483 | I can use the original message instead. If you have a better way to describe llvm_unreachable I can put it here alternatively. | |
488 | Yeah, somebody should add documentation to all this undocumented functions at some point. | |
include/polly/ScopInfo.h | ||
516 | I would like to avoid accessing members of a different class. | |
541 | Needed somewhere, but it was quite a while ago. If ppl need to know where I can check. | |
932 | I'm not sure about your isl assumption but in any case, I'm not assuming anything here. The different input locations are purely for user experience (as the accesses are ordered in a more natural fashion). | |
1330 | We do not recompute here and later we do not always recompute to simplify scalars, thus I think it actually fits pretty well. | |
lib/Analysis/ScopInfo.cpp | ||
667 | Sure we can add an assertion. | |
673 | Mh, there was no need but we can do that too I guess. | |
1420 | This code (already in the current HEAD) is ugly because it tries to be smart. While it heavily depends on the order it might need a lot more iterations if it would not. I don't argue we have to keep it but only that it works in Polly HEAD for now and can be takled afterwards, if we find a nice and comparable way to do it. | |
2778 | That was what I was telling you in SJ about. I even pushed the first patch that implements this TODO to reviews and it depends on this one. | |
2783 | I'm not sure how to answer to that as I expected the link to the function that containts the comment for this one to be self-explanatory. | |
2788 | I'm not sure why you ask this but as you can probably see in the code no "PHI access" is processed. Though, if something that is a PHI does not introduce a "PHI access" it might be processed here. | |
2800 | I'm not sure what you mean. Why do we look at each access separatly or why do we look at the non-trivial-operands of each access separalty? | |
2817 | Mh, if we have a test case I'll check that. | |
2876 | Might be, but it won't help with removing the scalar dependences. Actually, it might make them worse. | |
2880 | Outside operands are the read only scalars we track to easy OpenMP code generation. The scalar access we recompute is removed later. | |
2885 | No read access is/should be removed if it is still connected to anything else. Hence, your senario should never happen. Additionally, we actually iterate over accesses here, thus the actual removal happens only at the end (after we traversed all accesses). | |
2890 | It depends on how you see it. I think the (virtual) register is a unique address for a scalar, thus it fits well here. | |
2894 | We copy/move the entire operand tree but for that I do not need to traverse it and put all instructions into a list. I simply store the root of the tree and traverse it during the actual copy/move. | |
2932 | Actually, it has already been removed, hence the statement but no read access. | |
2940 | It could be but doesn't need to. | |
2947 | No it is the same but for the write accesses instead of the reads. Or to be more precise, it is the passive version as writes are (passively) resolved if all their reads have been (actively) resolved. | |
2963 | It does not skip them but stops as they are non-trivial operands. | |
lib/CodeGen/BlockGenerators.cpp | ||
67 | If it is hoisted. | |
86 | After recomputeDependentScalars is executed they are in BBMap. Though our synthesize code sometimes thinks it can sythesize something while it actually referes to old values we have to recompute first while we are in recomputeDependentScalars. | |
130 | Good question. I don't remember anymore. I'll check once the general patch is agreed on. | |
147 | Seems fine to me. | |
169 | If we could not generate a new operand but we need to because the old one is an instruction in the region we have to do something. | |
171 |
This code is only triggered while we execute recomputeDependentScalars.
Which order of dependence? We got a DAG here and have to code generate it. One way is to find some topological order on the whole dag and then generate code, the other (which is implemented) is a recursive demand driven code generation. As we do the second everywhere else I figured it makes sense to do it here too. |
include/polly/ScopInfo.h | ||
---|---|---|
1291 | What is a "transitive operand"? | |
lib/Analysis/ScopInfo.cpp | ||
2800 | This is nested inside a "for (MemoryAccess *MA : Stmt)" loop and NonTrivialOperandsMap is indexed by its AccessInst.
|
include/polly/ScopInfo.h | ||
---|---|---|
946 | Does "Caused by the instruction of MA" mean the user of instruction of MA? | |
1190 | SE is a member of Scop, no need to pass it as a parameter. I think it is reasonable to carry it around because we store SCEVs with Scop, e.g. Parameters. | |
1275 |
+1 | |
1277 | why we need they in SCoP Users? | |
1291 | a = b + c d = a + e I guess b, c and hopefully a are the transitive operands of d. They are also the leaves of the operand DAG rooted on d. When we recompute d, the recomputation stop at b, c, e, right? That is, all instructions inside the DAG is recomputed. So NoTrivialInstructionSet and OutsideOperandsSet together are the boundaries/leaves of the recompuation DAG, and NoTrivialInstructionSet are the instructions inside the Scop while OutsideOperandsSet are the instructions outside the Scop, right? | |
1295 | I read: <NoTrivialInstructions (first of NonTrivialOperandsPairTy )> <OutsideOperandsSetTy (second of NonTrivialOperandsPairTy)> \ / Recomputable DAG | (root) Instruction (the key of NonTrivialOperandsMap) Actually can we check if an operand is outside Scop on the fly, such that only 1 operand set is need? | |
lib/Analysis/ScopInfo.cpp | ||
2509 | SE is a member of Scop | |
2790 | // Now AccessInst is a write scalar access (and not a PHI). | |
2805–2830 | How about using the following approach, which is all over LLVM codebase, and we can create a function for it. // Depth-first traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands) std::set Visited; SmallVector<std::pair<Instruction*, Instruction::op_iterator> > VisitStack; VisitStack.push_back(std::make_pair(AccessInst, AccessInst->begin())); while (VisitStack.empty()) { auto CurNode = VisitStack.back().first; auto &CurChildIt = VisitStack.back().second; if (CurChildIt == CurNode->end()) { VisitStack.pop_back(); continue; } Instruction *ChildInst = dyn_cast<Instruction>(*CurChildIt); ++CurChildIt; // Stop at invariances if (ChildInst == nullptr) continue; if (!Visited.insert(Inst).second) continue; // Invariance again. if (!R.contains(ChildInst)) { OutsideOperands.push_back(std::make_pair(InstOpInst, Inst)); continue; } if (Inst->mayHaveSideEffects() || Inst->mayReadFromMemory()) { SideEffectOperands.insert(Inst); continue; } if (!isa<PHINode>(Inst)) continue; if (R.contains(Inst) && canSynthesize(Inst, &LI, &SE, &R)) continue; SideEffectOperands.insert(Inst); } | |
2884 | This user is actually inside the Scop, right? | |
2885 | So what we need is only the parent block of the user that is inside the Scop? | |
2890 | I read: if (Stmt.containValueReadOf(OutsideOperand)) | |
lib/CodeGen/BlockGenerators.cpp | ||
172 | If copyInstruction returns the newly copied instruction, the code copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); NewOperand = BBMap[OldOperand]; Become: NewOperand = copyInstruction(Stmt, OldOperandInst, BBMap, LTS, nullptr, Recompute); Which looks better and easier to understand. But this can be done in another patch. |
I addressed all of the comments except one (the new algorithm to collect non-trivial operands).
@etherzhhb, @Meinersbur, @grosser, @sebpop
Since this patch caused so much discussion I guess it would be nice to get two ppl to tell me it's fine to commit.
include/polly/ScopInfo.h | ||
---|---|---|
946 | I created two functions now. One to remove all accesses caused by an instruction and one to remove exactly the accesses passed as arguments. | |
1277 | To find the access in the SCoP (if there is one). Only knowing the outside operand doesn't suffice to identify the MemoryAccess objects that we already created because of this outside operand. | |
1291 |
In the code above operands would mean {b,c,e} for the instruction d.
True, but whit PHI nodes we might not have a DAG and the PHIs are not necessarily leaves.
Yes. And since we do not recompute loop carried PHIs here we always have this nice DAG.
Yes, except the PHI case where we do not have a DAG and even though PHI are not leaves we put them in the NoTrivialInstructionSet. | |
1295 | Good point, I adjusted the code to use only one set of non-trivial operands and then compute the outside operands + inside users on the fly. Thanks! | |
lib/Analysis/ScopInfo.cpp | ||
2790 | Added a comment. | |
2805–2830 | I don't get the problem you try to solve here. Why do we need a different algorithm and what is it that it does better? | |
2884 | Renamed. | |
2885 | True. I changed this code to use only one set of operands and compute everything else on the fly. | |
2890 | Yes. Is that a problem? | |
lib/CodeGen/BlockGenerators.cpp | ||
172 | We could do that later, true. |
include/polly/ScopInfo.h | ||
---|---|---|
1291 | Just to confirm, we do not recompute loop-carried PHIs, right? | |
lib/Analysis/ScopInfo.cpp | ||
2805–2830 | It is almost the same algorithm, but in different 'style'. I think I should not push too hard about the "style". So I think we are done with this. But we should put some comments to describe what the algorithm is doing like: // Traverse the operand DAG of AccessInst, until we reach: // * An instruction that is defined outside the current Scop (OutsideOperands ) // * An instruction with side-effect (SideEffectOperands) | |
2890 | It is a little bit not straight forward, but with the comment above, I think it is ok. | |
lib/CodeGen/BlockGenerators.cpp | ||
143–147 | TryOnly is only used here. And this block is equivalent to: assert(TryOnly && "Unexpected scalar dependence in region!"); return nullptr; Instead of passing TryOnly as function parameter, can we move the assertion out of this function, like: auto *V = getNewValue(....); assert((V || TryOnly) && "Unexpected scalar dependence in region!"); Now we can remove this confusing TryOnly. | |
334 |
It would be good to put your above explanations as comments to the code. | |
1071–1072 | Interesting duplication :) |
etherzhhb requested changes to this revision
This revision now requires changes to proceed.
Do you want other changes than the missing comments and the TryOnly thing? Both seem to be minor but the status change seems to indicate a bigger problem with this patch.
include/polly/ScopInfo.h | ||
---|---|---|
1291 | Not in this patch (or the following three that are more or less rdy), however we are working on loop-carried PHIs too. | |
lib/Analysis/ScopInfo.cpp | ||
2805–2830 | I can do that. | |
lib/CodeGen/BlockGenerators.cpp | ||
334 | I can add a comment. sure. | |
1071–1072 | I am confused. Are you hinting on the fact that we call "recomputeDependentScalars" here too or that we call it again before "generateScalarLoads"? I talked about the order already in the comment above. And the duplication of the "recomputeDependentScalars" call has the reason as the duplication of the "generateScalarLoads" call, this is the starting point of a "copyStmt" function. While above it is for block statements, here it is for region statements. If you want I can extract both calls into a "initializeScalars" method that is called before a statement is copied. |
include/polly/ScopInfo.h | ||
---|---|---|
1291 | ok | |
lib/CodeGen/BlockGenerators.cpp | ||
1071–1072 | I just think the duplicate here is interesting/funny ... My intention is to suggest that we put the same comments as 344 here to explain the order, otherwise people may confused when they see this before they see the comments in line 344.
This is great but can be done in another patch. |
I change the status to make this patch not block on me. There is no major issue.
I think this patch is good to go once the comments are fixed.
I still want to have a look at this, but did not make it. I am a little
delayed on the larger patches, but first will need to have a look at
Johannes SSA patch.
Best,
Tobias
"all" does not match the description below