Enable DivergenceAnalysis with non-trivial Simple Loop Unswitch.
Because DivergenceAnalysis is a function pass, it's not possible
to add the analysis to the pass manager pipeline when it's used
by a loop pass. This patch creates a new DivergenceInfo instance
only when needed by Simple Loop Unswitch. The PostDominatorTree
is needed as well. This adds DivergenceAnalysis when using the new
pass manager only and does not change the legacy pass manager.
Details
Diff Detail
Unit Tests
Event Timeline
lgtm but wait for other lgtms
llvm/test/Transforms/SimpleLoopUnswitch/AMDGPU/nontrivial-unswitch.ll | ||
---|---|---|
1 ↗ | (On Diff #372481) | not a big deal, but we could just do REQUIRES: amdgpu-registered-target right? unless you're planning on adding more of these sorts of tests |
you should probably check that this doesn't affect compile times too much since we're creating some new (potentially expensive) analyses without any sort of caching
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for divergent targets? |
I've run a few test cases, but I don't know of a straightforward way to evaluate a large set of tests for compile time. It would be nice to be be able to cache the analysis.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | I agree that it's not good, but I believe the only way to do this as a dependency/required is to have all the loop passes update the post-dominator tree info when needed (which is what's done for dominators and several other loop analyses). |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | See the loop analyses part of https://llvm.org/docs/NewPassManager.html#using-analyses for limitations on accessing function analyses in a loop pass. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | Thanks for that reference. If I understand correctly, another approach to adding divergence analysis would be to add an entry to LoopStandardAnalysisResults for DivergenceAnalysisInfo (but, at least initially, without the guarantee that loop passes update it). That way it can be accessed by SimpleLoopUnswtich. Then, in PassBuilder.cpp add an instance of the DivergeranceAnalysis prior to SimpleLoopUnswitch. This also requires a call to createFunctionToLoopPassAdaptor between DivergenceAnalysis and SimpleLoopUnswitch. We could do this only for targets that have branch divergence so that the pass pipeline remains the same for other targets. This approach seems most similar to how the legacy pass manager works. My main hesitation is adding DivergenceAnalysis to LoopStandardAnalysisResults. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Part of the reason we don't allow this for other IR units is potential future concurrency and determinism, but I don't think we will ever do concurrency for loop passes (as in run loop passes on multiple loops in a function concurrently). I believe @asbirlea has mentioned something like this in the past, any thoughts? Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters? |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 |
Yes, I've been tracking down the source of some performance regressions that we're seeing after the switch from the legacy pass manager to the new pass manager. I've seen a couple instances where non-trivial loop unswitching does improve performance (though, in one instance, I've noticed that the threshold needs to increase in simple loop unswitch to recover performance). |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | Just to add some detail, here's one idea of "splitting | |
2803 |
Just FYI, here's one idea of how "splitting the loop pipeline" might look like if we attempted that: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148619.html |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | I'm currently talking to some people to see if we can do some redesigning and allow loop passes to request function analyses on demand. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | Great to hear. Let me know if there is anything I can do to help. Do you think it's worthwhile to merge this patch now or wait until the outcome of the redesign? |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2803 | Merging this now is fine. (and still LGTM) |
I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.
Thanks for the update. I've waited to commit this patch because the motivating code that is improved by non-trivial unswitching requires additional changes (that are in the regular LoopUnswitch pass but not the SimpleLoopUnswitch pass).
Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.
With this patch, we have to rerun DivergenceAnalysis every time we run the pass on every loop. If we change nontrivial unswitching into a function pass, we can upgrade that to only having to be rerun every time we actually unswitch something.
To be more specific, with the function pass, we'd have to invalidate most analyses anyway after every successful unswitch.
for every loop DA = FAM.getAnalysis<DivergenceAnalysis>(); // this will not rerun the analysis if we haven't invalidated below because we didn't successfully unswitch if successful unswitch auto PA = PreservedAnalyases::none(); PA.preserve<SCEV>(); // and all the other existing analyses we preserved FAM.invalidate(PA);
There's also the option to update all analysis inside the new function pass that's doing non-trivial unswitching. This is currently the case with all other analyses (they are updated), so the only one that needs to be added is DA.
This limits the need to update DA to just the new Function pass, vs all Loop passes that are part of the same LPM as SimpleLoopUnswitch.
The alternative is, as you both already said, for DA to be recomputed when it is invalidated.
Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for divergent targets?