This is an archive of the discontinued LLVM Phabricator instance.

[Polly][NewPM] Port ScopDetection to the new PassManager
ClosedPublic

Authored by philip.pfaffe on Mar 29 2017, 6:47 AM.

Details

Summary

This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.

Diff Detail

Repository
rL LLVM

Event Timeline

philip.pfaffe created this revision.Mar 29 2017, 6:47 AM
Meinersbur edited edge metadata.Mar 29 2017, 7:44 AM

Thanks for getting this for started. The part of adding those passes to the new pass manager's pipeline must be done in RegisterPasses.cpp, but all passes must be registered to the new pass manager, otherwise it cannot use them.

Note that we have two correctness issues that are fragile with the pass managers:

  1. Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:

ScopDetection: A and B
ScopInfo on A, verify whether A is still a SCoP
CodeGeneration of A
ScopInfo on B, verify whether B is still a SCoP
CodeGeneration of B

but not:

ScopDetection: A and B
ScopInfo on A
ScopInfo on B
CodeGeneration of A
CodeGeneration of B

CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.

This is way we rely on the RegionPass to run all passes on region, the continue with the next region.

  1. Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.

Before these are not resolved, we cannot really use the new pass manager in production.

include/polly/ScopDetection.h
204–208 ↗(On Diff #93365)

My personal opinion is to prefer pointers over references when the class is not copyable/value-like. References cannot be used in all contexts. The reason is consistency of how an object of a type is used independent from which function it is used in. It also avoids global rewriting such as this one.

However, this is my personal opinion and I see not everyone agrees with it.

lib/Analysis/ScopDetection.cpp
272–321 ↗(On Diff #93365)

I do not recommend having a lot of logic in a constructor:

  • Makes composability more difficult. Assume we have to create this object in a different way. Then we'd have to clone the complete constructor although most members still have the 'obvious' initialization. Here: the analyses. This happened aready to the ScopStmt and MemoryAccess ctors.
  • If an exception is raised, the destructors are called, on possibly uninitialized variables. This is problaby not what you expect.
  • Calling an overridded virtual method calls the non-overridden method. This way it is possible to call an unimplemented method of an abstract class. Even if it is not an abstract class, if you are running an algorithm, you want the instantiated class' methods to be called, not one of the base classes. That is, better don't implement an algorithm in a constructor.

LLVM has exception switched off and ScopDetection is not derived from, so stricly speaking the 2. and 3. points do not appy. However, there is still the first and IMHO it is a good coding practice to only initialize members in constructors and nothing else. Expecially be careful when calling other members.

Thanks for getting this for started. The part of adding those passes to the new pass manager's pipeline must be done in RegisterPasses.cpp, but all passes must be registered to the new pass manager, otherwise it cannot use them.

Pass Registration is still a pending issue. Until D11032 is ready to land, there is no defined way to set up a polly pipeline.

Note that we have two correctness issues that are fragile with the pass managers:

  1. Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:

This is precisely the problem a ScopXManager would solve...

CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.

... however this violates fundamental assumptions both the new and(!) the old pass managers make. It's an upstream miscompile or crash waiting to happen. In what way do SCoPs specifically interact? If there is no way to break that dependence, we can't run our own SCoP pipeline and must widen the IRUnit of the analyses to Function.

  1. Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.

This is a non-issue. The new PM pessimistically invalidates all cached analysis results after a transfrom by default.

philip.pfaffe added inline comments.Mar 29 2017, 11:56 AM
include/polly/ScopDetection.h
204–208 ↗(On Diff #93365)

I tentatively disagree. References are perfectly copyable. In turn, pointers convey one extra semantic, namely being nullable. This is of course nowhere being checked here.
Unless there's a reason to allow this extra state, I find references to be the better defalt.

It made sense to use pointers before, because they were lazily initialized. But now they aren't anymore.

lib/Analysis/ScopDetection.cpp
272–321 ↗(On Diff #93365)

I will add a factory.

Thanks for getting this for started. The part of adding those passes to the new pass manager's pipeline must be done in RegisterPasses.cpp, but all passes must be registered to the new pass manager, otherwise it cannot use them.

Pass Registration is still a pending issue. Until D11032 is ready to land, there is no defined way to set up a polly pipeline.

I was trying to ask Chandler Carruth ask about this in the Hacker's Lab. Unfortunately I already asked too many questions and he preferred to let other people ask questions as well.

Note that we have two correctness issues that are fragile with the pass managers:

  1. Polly's codegen can change the IR of other SCoPs. Particularly it can make previously detected scops invalid. Thats why the sequence must be:

This is precisely the problem a ScopXManager would solve...

What is ScopXManager? If it is a single pass (Tobias suggested "-polly-sched") that would run all the other passes itself, then we have full control and can run in any order we like. However, we then also don't need to convert to existing passes to the new pass manager, they would be not be scheduled by any LLVM's pass manager anymore.

CodeGeneration of A can invalidate B, resulting in miscompiles because the IR of B was changed and does not correspond to its ScopInfo anymore.

... however this violates fundamental assumptions both the new and(!) the old pass managers make. It's an upstream miscompile or crash waiting to happen.

This happens with Loop passes all the time. Loop passes can create new loops (loop distribution) or remove them as it they like, as long as the LoopInfo remains consistent.

In what way do SCoPs specifically interact? If there is no way to break that dependence, we can't run our own SCoP pipeline and must widen the IRUnit of the analyses to Function.

I did not observe by myself (probably because it is already handled), but according to Tobias it happened in the past.

What I can think of is that the exit block of one SCoP is the entry block of the next one (and they are not combined to a single SCoP, e.g. because there is another edge from somewhere to that block). After codegen, there will be a merge block (polly.merge_new_and_old), combining the control flow from the the loop versioning branch, which would be the new entry block of the second. However, if ScopInfo already ran on the second SCoP, it will still reference the original entry block, which landed somewhere else, e.g. as the original code section.

  1. Polly does a bad job preserving analyses. It does not create new regions and loops in generated code. There is currently an ugly hack called NoopBarrier added to the pass pipeline that effectively throws away all analyses with the legacy pass manager. I get miscompiles when that barrier pass is removed. The new pass manager just caches all analyses.

This is a non-issue. The new PM pessimistically invalidates all cached analysis results after a transfrom by default.

Because pass dependencies are mostly transitive, invalidating all analysis would mean it also invalidates ScopDetection and we would try to re-detect the output of CodeGeneration. This is actually because polly passes have to preserve _all_ analyses used by any polly pass. It is also inefficient because we'd need to run all these analyses multiple times.

If there is no way to break that dependence, we can't run our own SCoP pipeline and must widen the IRUnit of the analyses to Function.

There is already ScopInfoWrapperPass which tries this. It as the aforementioned problem that generating one SCoP may invalidate another.

I was trying to ask Chandler Carruth ask about this in the Hacker's Lab. Unfortunately I already asked too many questions and he preferred to let other people ask questions as well.

I've discussed this a bit with Chandler. If my patch lands it will enable pass registration. Hopefully that'll happen soonish.

What is ScopXManager? If it is a single pass (Tobias suggested "-polly-sched") that would run all the other passes itself, then we have full control and can run in any order we like. However, we then also don't need to convert to existing passes to the new pass manager, they would be not be scheduled by any LLVM's pass manager anymore.

The X stands for Pass and Analysis. I was thinking about adding these Managers to keep the pipelining behavior as it was before. But this depends on the issues below.

This happens with Loop passes all the time. Loop passes can create new loops (loop distribution) or remove them as it they like, as long as the LoopInfo remains consistent.

Loop passes modify the current loop nest, add loops to it or delete it or its children. They don't affect entirely different loops in the same function, I think. Either way, the PM is flexible enough to allow for adding and removing Scops from the current worklist.

What I can think of is that the exit block of one SCoP is the entry block of the next one (and they are not combined to a single SCoP, e.g. because there is another edge from somewhere to that block). After codegen, there will be a merge block (polly.merge_new_and_old), combining the control flow from the the loop versioning branch, which would be the new entry block of the second. However, if ScopInfo already ran on the second SCoP, it will still reference the original entry block, which landed somewhere else, e.g. as the original code section.

If we keep ScopInfo as a ScopPass, this cannot happen.

Because pass dependencies are mostly transitive, invalidating all analysis would mean it also invalidates ScopDetection and we would try to re-detect the output of CodeGeneration. This is actually because polly passes have to preserve _all_ analyses used by any polly pass. It is also inefficient because we'd need to run all these analyses multiple times.

It wouldn't necessarily mean that. If, as discussed above, I add a ScopPass pipeline, ScopDetection would naturally be preserved. Inside a ScopPass individual analyses can be preserved, if we can pessimistically guarantee that that's sound. Frequent cache invalidation may happen, and it does in the current O2, O3 pipelines. However, so far the performance impact is negligible.

Fundamentally I don't see anything of this as a blocker to using a ScopPassManager. I'll thus move forward with building one and porting the existing passes over to the new Scop pipeline. Any further pointers with regard to SCoP interaction would much appreciated!

This patch is currently blocked by an ugly bug in RegionInfo, which I'll have to sort out first before I can move this further. Apologies!

lib/Analysis/ScopDetection.cpp
272–321 ↗(On Diff #93365)

Some bikeshedding on this:

  • I disagree with your Composiability point. Right now this type serves a specific and narrow purpose. Widening this always requires some redesign.
  • Exception (un)safety is an issue, but only when there are managed resources. In the constructor, the object is fully initialized an in a defined state. Thus, if we stick to exception safe code (i.e., using RAII and/or exception safe resource management), we need not worry about exceptions.
  • This point I actually agree with fully. Right now there's no virtual dispatch here, but who knows if that might change in the future. So accepting this hazard right now might bite us in the future.

In summary: I'd still like to do the full initialization in the constructor, because the full knowledge of the ValidRegions is the specific internal state that defines an object of this class (a factory would externalize this). To deal with the missing virtual dispatch in the constructor, I however propose moving a lot of the private interface of the ScopDetection class and its implementation out out of the type. This would further satisfy your composability concerns. Thoughts?

grosser edited edge metadata.Apr 5 2017, 5:59 AM

Out of interest: What is the RegionInfo bug you are talking about?

lib/Analysis/ScopDetection.cpp
1615 ↗(On Diff #93365)

Is this change related. If it is not, I suggest to commit this as-obvious without further review.

(Do you have commit access. If not please ask Chris for commit access. Your code is great and I believe you have plans for further contributions).

Out of interest: What is the RegionInfo bug you are talking about?

Region-objects capture the address of the creating RegionInfo instance. RegionInfo is movable, and after a move performed by the AnalysisManager, most accesses to the Region objects segfault. This will be a rather noisy fix, because I'll probably need to strictly decouple the Regions from RegionInfo.

Is the RegionInfoAnalysis actually maintained by the Polly community?

lib/Analysis/ScopDetection.cpp
1615 ↗(On Diff #93365)

It's not related. Will commit seperately!

Polly is one (but not the only) user of the region infrastructure. So yes, a portion of the bugs is resolved by us. Would be great if this could be fixed, though. If the change is noisy, but simple, the review should be easy.

Add the as of yet missing printing facilities as a PrinterPass.

Hi Philip,

thanks for working on this. I just had a first view and I think the direction looks very good. There are still some style discussions open, but otherwise the patch looks great. I would like to get it in soon. The final thing missing is to actually be able to run and test this. What is needed to actually make this run and testable? I would really like to see a test case that verifies this is working. Also, could you add a comment in the commit message that explains how to run the viewers in the new pass manager?

include/polly/ScopDetection.h
204–208 ↗(On Diff #93365)

Even though I dislike global rewrites (as most of us), LLVM has a long tradition of not letting such dislikes prevent progress in terms of code quality and functionality. Hence, if there is an argument for a rewrite -- at best if it is mostly mechanical and just a single kind of change -- I am happy even with larger changes.

Now, I think Philip has a good point. If we can ensure the reference is never null, expressing this semantic by using references makes sense. Hence, I would agree with him and suggest to proceed with his choice assuming Michael is OK with this. If we do this, I would just suggest to mention the reason why this change was applied in the commit message.

Now, this is very subjective. Neither of the two choices is a lot better. One of the typical ways to get around bikesheding is to use 'grep' to get simple statistics what LLVM is doing. Sometimes this gives a clear picture, what we can follow. Otherwise, I suggest to agree on something and then follow this in Polly consistently.

What is needed to actually make this run and testable? I would really like to see a test case that verifies this is working.

While the 'right' way to do this is still depending on the outcome of D11032, what I could do right now is add a set of unit tests which hand-craft pipelines. That way I can at least demonstrate that the passes work with the new PM as well.

Also, could you add a comment in the commit message that explains how to run the viewers in the new pass manager?

What exactly do you mean by 'viewers' here?

Yes, please add such unit tests

Hey Philipp,

if it is easy to add these unit tests, this would be really helpful. I started to review some of your later patches and some unit tests that show how to run these passes would be really helpful.

Best,
Tobias

Meinersbur added inline comments.May 4 2017, 4:03 AM
include/polly/ScopDetection.h
204–208 ↗(On Diff #93365)

I am OK with it.

lib/Analysis/ScopDetection.cpp
272–321 ↗(On Diff #93365)
  • Composability: Whether a class has one or more constructors does not depend on how specific the class'es purpose is. A new/different constructor might be required in different contexts it is used in or even for even narrower contexts. For instance, a ScopDetection might be instantiated only to verify that a given Region is a SCoP.
  • Exceptions: While this is true, it is easy to forget (especially by new contributors) and then add a constructor to manually free some resource. Again, this is theoretical (e.g. someone loading Polly in context where exceptions) since LLVM has exceptions disabled.
  • A factory is not the only possibility to avoid large constructors. You can also create a public method findScops(Function &) where the main work happens. llvm::IDFCalculator is an example of using this.

This is not a blocking issue for me, so if this is important for you, feel free to ignore my opinion. However, I think avoiding large constructors is preferable in general for the reasons already mentioned, even if these do not apply for a particulat case.

philip.pfaffe edited the summary of this revision. (Show Details)

Rebase. NFC.

Rebase.

[Fixing the previous update, I failed at arc ...]

Hi Philip,

I am not sure what / if the update changed here. Is this a pure rebase with no functional change. In my last comment I asked if it would be possible to add test cases, are you currently working on this or is this not possible? It would be really great to move this patch upstream.

Best,
Tobias

grosser accepted this revision.May 12 2017, 5:48 AM

This patch looks good to me. I understand, we can currently not have execution test coverage. I suggest to commit this exceptionally without test cases to at least get compile time test coverage and to also facilitate further testing with the pass manager plugins.

This revision is now accepted and ready to land.May 12 2017, 5:48 AM

Final rebase and clang-format. NFC.

This revision was automatically updated to reflect the committed changes.