This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thanks for getting this for started. The part of adding those passes to the new pass manager's pipeline must be done in RegisterPasses.cpp, but all passes must be registered to the new pass manager, otherwise it cannot use them.
Note that we have two correctness issues that are fragile with the pass managers:
- Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:
ScopDetection: A and B
ScopInfo on A, verify whether A is still a SCoP
CodeGeneration of A
ScopInfo on B, verify whether B is still a SCoP
CodeGeneration of B
but not:
ScopDetection: A and B
ScopInfo on A
ScopInfo on B
CodeGeneration of A
CodeGeneration of B
CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.
This is way we rely on the RegionPass to run all passes on region, the continue with the next region.
- Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.
Before these are not resolved, we cannot really use the new pass manager in production.
include/polly/ScopDetection.h | ||
---|---|---|
204–208 ↗ | (On Diff #93365) | My personal opinion is to prefer pointers over references when the class is not copyable/value-like. References cannot be used in all contexts. The reason is consistency of how an object of a type is used independent from which function it is used in. It also avoids global rewriting such as this one. However, this is my personal opinion and I see not everyone agrees with it. |
lib/Analysis/ScopDetection.cpp | ||
272–321 ↗ | (On Diff #93365) | I do not recommend having a lot of logic in a constructor:
LLVM has exception switched off and ScopDetection is not derived from, so stricly speaking the 2. and 3. points do not appy. However, there is still the first and IMHO it is a good coding practice to only initialize members in constructors and nothing else. Expecially be careful when calling other members. |
Pass Registration is still a pending issue. Until D11032 is ready to land, there is no defined way to set up a polly pipeline.
Note that we have two correctness issues that are fragile with the pass managers:
- Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:
This is precisely the problem a ScopXManager would solve...
CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.
... however this violates fundamental assumptions both the new and(!) the old pass managers make. It's an upstream miscompile or crash waiting to happen. In what way do SCoPs specifically interact? If there is no way to break that dependence, we can't run our own SCoP pipeline and must widen the IRUnit of the analyses to Function.
- Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.
This is a non-issue. The new PM pessimistically invalidates all cached analysis results after a transfrom by default.
include/polly/ScopDetection.h | ||
---|---|---|
204–208 ↗ | (On Diff #93365) | I tentatively disagree. References are perfectly copyable. In turn, pointers convey one extra semantic, namely being nullable. This is of course nowhere being checked here. It made sense to use pointers before, because they were lazily initialized. But now they aren't anymore. |
lib/Analysis/ScopDetection.cpp | ||
272–321 ↗ | (On Diff #93365) | I will add a factory. |
I was trying to ask Chandler Carruth ask about this in the Hacker's Lab. Unfortunately I already asked too many questions and he preferred to let other people ask questions as well.
Note that we have two correctness issues that are fragile with the pass managers:
- Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:
This is precisely the problem a ScopXManager would solve...
What is ScopXManager? If it is a single pass (Tobias suggested "-polly-sched") that would run all the other passes itself, then we have full control and can run in any order we like. However, we then also don't need to convert to existing passes to the new pass manager, they would be not be scheduled by any LLVM's pass manager anymore.
CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.
... however this violates fundamental assumptions both the new and(!) the old pass managers make. It's an upstream miscompile or crash waiting to happen.
This happens with Loop passes all the time. Loop passes can create new loops (loop distribution) or remove them as it they like, as long as the LoopInfo remains consistent.
In what way do SCoPs specifically interact? If there is no way to break that dependence, we can't run our own SCoP pipeline and must widen the IRUnit of the analyses to Function.
I did not observe by myself (probably because it is already handled), but according to Tobias it happened in the past.
What I can think of is that the exit block of one SCoP is the entry block of the next one (and they are not combined to a single SCoP, e.g. because there is another edge from somewhere to that block). After codegen, there will be a merge block (polly.merge_new_and_old), combining the control flow from the the loop versioning branch, which would be the new entry block of the second. However, if ScopInfo already ran on the second SCoP, it will still reference the original entry block, which landed somewhere else, e.g. as the original code section.
- Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.
This is a non-issue. The new PM pessimistically invalidates all cached analysis results after a transfrom by default.
Because pass dependencies are mostly transitive, invalidating all analysis would mean it also invalidates ScopDetection and we would try to re-detect the output of CodeGeneration. This is actually because polly passes have to preserve _all_ analyses used by any polly pass. It is also inefficient because we'd need to run all these analyses multiple times.
There is already ScopInfoWrapperPass which tries this. It as the aforementioned problem that generating one SCoP may invalidate another.
I've discussed this a bit with Chandler. If my patch lands it will enable pass registration. Hopefully that'll happen soonish.
What is ScopXManager? If it is a single pass (Tobias suggested "-polly-sched") that would run all the other passes itself, then we have full control and can run in any order we like. However, we then also don't need to convert to existing passes to the new pass manager, they would be not be scheduled by any LLVM's pass manager anymore.
The X stands for Pass and Analysis. I was thinking about adding these Managers to keep the pipelining behavior as it was before. But this depends on the issues below.
This happens with Loop passes all the time. Loop passes can create new loops (loop distribution) or remove them as it they like, as long as the LoopInfo remains consistent.
Loop passes modify the current loop nest, add loops to it or delete it or its children. They don't affect entirely different loops in the same function, I think. Either way, the PM is flexible enough to allow for adding and removing Scops from the current worklist.
What I can think of is that the exit block of one SCoP is the entry block of the next one (and they are not combined to a single SCoP, e.g. because there is another edge from somewhere to that block). After codegen, there will be a merge block (polly.merge_new_and_old), combining the control flow from the the loop versioning branch, which would be the new entry block of the second. However, if ScopInfo already ran on the second SCoP, it will still reference the original entry block, which landed somewhere else, e.g. as the original code section.
If we keep ScopInfo as a ScopPass, this cannot happen.
Because pass dependencies are mostly transitive, invalidating all analysis would mean it also invalidates ScopDetection and we would try to re-detect the output of CodeGeneration. This is actually because polly passes have to preserve _all_ analyses used by any polly pass. It is also inefficient because we'd need to run all these analyses multiple times.
It wouldn't necessarily mean that. If, as discussed above, I add a ScopPass pipeline, ScopDetection would naturally be preserved. Inside a ScopPass individual analyses can be preserved, if we can pessimistically guarantee that that's sound. Frequent cache invalidation may happen, and it does in the current O2, O3 pipelines. However, so far the performance impact is negligible.
Fundamentally I don't see anything of this as a blocker to using a ScopPassManager. I'll thus move forward with building one and porting the existing passes over to the new Scop pipeline. Any further pointers with regard to SCoP interaction would much appreciated!
This patch is currently blocked by an ugly bug in RegionInfo, which I'll have to sort out first before I can move this further. Apologies!
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
272–321 ↗ | (On Diff #93365) | Some bikeshedding on this:
In summary: I'd still like to do the full initialization in the constructor, because the full knowledge of the ValidRegions is the specific internal state that defines an object of this class (a factory would externalize this). To deal with the missing virtual dispatch in the constructor, I however propose moving a lot of the private interface of the ScopDetection class and its implementation out out of the type. This would further satisfy your composability concerns. Thoughts? |
Out of interest: What is the RegionInfo bug you are talking about?
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1615 ↗ | (On Diff #93365) | Is this change related. If it is not, I suggest to commit this as-obvious without further review. (Do you have commit access. If not please ask Chris for commit access. Your code is great and I believe you have plans for further contributions). |
Region-objects capture the address of the creating RegionInfo instance. RegionInfo is movable, and after a move performed by the AnalysisManager, most accesses to the Region objects segfault. This will be a rather noisy fix, because I'll probably need to strictly decouple the Regions from RegionInfo.
Is the RegionInfoAnalysis actually maintained by the Polly community?
lib/Analysis/ScopDetection.cpp | ||
---|---|---|
1615 ↗ | (On Diff #93365) | It's not related. Will commit seperately! |
Polly is one (but not the only) user of the region infrastructure. So yes, a portion of the bugs is resolved by us. Would be great if this could be fixed, though. If the change is noisy, but simple, the review should be easy.
Hi Philip,
thanks for working on this. I just had a first view and I think the direction looks very good. There are still some style discussions open, but otherwise the patch looks great. I would like to get it in soon. The final thing missing is to actually be able to run and test this. What is needed to actually make this run and testable? I would really like to see a test case that verifies this is working. Also, could you add a comment in the commit message that explains how to run the viewers in the new pass manager?
include/polly/ScopDetection.h | ||
---|---|---|
204–208 ↗ | (On Diff #93365) | Even though I dislike global rewrites (as most of us), LLVM has a long tradition of not letting such dislikes prevent progress in terms of code quality and functionality. Hence, if there is an argument for a rewrite -- at best if it is mostly mechanical and just a single kind of change -- I am happy even with larger changes. Now, I think Philip has a good point. If we can ensure the reference is never null, expressing this semantic by using references makes sense. Hence, I would agree with him and suggest to proceed with his choice assuming Michael is OK with this. If we do this, I would just suggest to mention the reason why this change was applied in the commit message. Now, this is very subjective. Neither of the two choices is a lot better. One of the typical ways to get around bikesheding is to use 'grep' to get simple statistics what LLVM is doing. Sometimes this gives a clear picture, what we can follow. Otherwise, I suggest to agree on something and then follow this in Polly consistently. |
What is needed to actually make this run and testable? I would really like to see a test case that verifies this is working.
While the 'right' way to do this is still depending on the outcome of D11032, what I could do right now is add a set of unit tests which hand-craft pipelines. That way I can at least demonstrate that the passes work with the new PM as well.
Also, could you add a comment in the commit message that explains how to run the viewers in the new pass manager?
What exactly do you mean by 'viewers' here?
Hey Philipp,
if it is easy to add these unit tests, this would be really helpful. I started to review some of your later patches and some unit tests that show how to run these passes would be really helpful.
Best,
Tobias
include/polly/ScopDetection.h | ||
---|---|---|
204–208 ↗ | (On Diff #93365) | I am OK with it. |
lib/Analysis/ScopDetection.cpp | ||
272–321 ↗ | (On Diff #93365) |
This is not a blocking issue for me, so if this is important for you, feel free to ignore my opinion. However, I think avoiding large constructors is preferable in general for the reasons already mentioned, even if these do not apply for a particulat case. |
Hi Philip,
I am not sure what / if the update changed here. Is this a pure rebase with no functional change. In my last comment I asked if it would be possible to add test cases, are you currently working on this or is this not possible? It would be really great to move this patch upstream.
Best,
Tobias
This patch looks good to me. I understand, we can currently not have execution test coverage. I suggest to commit this exceptionally without test cases to at least get compile time test coverage and to also facilitate further testing with the pass manager plugins.