This is an archive of the discontinued LLVM Phabricator instance.

[NewPM][SimpleLoopUnswitch] Add DivergenceInfo
AcceptedPublic

Authored by bcahoon on Sep 14 2021, 7:44 AM.

Details

Summary

Enable DivergenceAnalysis with non-trivial Simple Loop Unswitch.
Because DivergenceAnalysis is a function pass, it's not possible
to add the analysis to the pass manager pipeline when it's used
by a loop pass. This patch creates a new DivergenceInfo instance
only when needed by Simple Loop Unswitch. The PostDominatorTree
is needed as well. This adds DivergenceAnalysis when using the new
pass manager only and does not change the legacy pass manager.

Diff Detail

Event Timeline

bcahoon created this revision.Sep 14 2021, 7:44 AM
bcahoon requested review of this revision.Sep 14 2021, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 7:44 AM
aeubanks accepted this revision.Sep 14 2021, 9:49 AM

lgtm but wait for other lgtms

llvm/test/Transforms/SimpleLoopUnswitch/AMDGPU/nontrivial-unswitch.ll
1 ↗(On Diff #372481)

not a big deal, but we could just do REQUIRES: amdgpu-registered-target right? unless you're planning on adding more of these sorts of tests

This revision is now accepted and ready to land.Sep 14 2021, 9:49 AM
asbirlea accepted this revision.Sep 14 2021, 9:51 AM

lgtm.

you should probably check that this doesn't affect compile times too much since we're creating some new (potentially expensive) analyses without any sort of caching

bcahoon updated this revision to Diff 372610.Sep 14 2021, 6:37 PM

Added amdgpu-registered-target to test case.

arsenm added inline comments.Sep 14 2021, 6:41 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for divergent targets?

you should probably check that this doesn't affect compile times too much since we're creating some new (potentially expensive) analyses without any sort of caching

I've run a few test cases, but I don't know of a straightforward way to evaluate a large set of tests for compile time. It would be nice to be be able to cache the analysis.

bcahoon added inline comments.Sep 14 2021, 6:47 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

I agree that it's not good, but I believe the only way to do this as a dependency/required is to have all the loop passes update the post-dominator tree info when needed (which is what's done for dominators and several other loop analyses).

aeubanks added inline comments.Sep 15 2021, 4:34 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

See the loop analyses part of https://llvm.org/docs/NewPassManager.html#using-analyses for limitations on accessing function analyses in a loop pass.

bcahoon added inline comments.Sep 16 2021, 9:59 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Thanks for that reference. If I understand correctly, another approach to adding divergence analysis would be to add an entry to LoopStandardAnalysisResults for DivergenceAnalysisInfo (but, at least initially, without the guarantee that loop passes update it). That way it can be accessed by SimpleLoopUnswtich.

Then, in PassBuilder.cpp add an instance of the DivergeranceAnalysis prior to SimpleLoopUnswitch. This also requires a call to createFunctionToLoopPassAdaptor between DivergenceAnalysis and SimpleLoopUnswitch. We could do this only for targets that have branch divergence so that the pass pipeline remains the same for other targets.

This approach seems most similar to how the legacy pass manager works. My main hesitation is adding DivergenceAnalysis to LoopStandardAnalysisResults.

aeubanks added inline comments.Sep 16 2021, 10:49 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok
also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right

I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Part of the reason we don't allow this for other IR units is potential future concurrency and determinism, but I don't think we will ever do concurrency for loop passes (as in run loop passes on multiple loops in a function concurrently). I believe @asbirlea has mentioned something like this in the past, any thoughts?

Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters?

bcahoon added inline comments.Sep 16 2021, 12:31 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters?

Yes, I've been tracking down the source of some performance regressions that we're seeing after the switch from the legacy pass manager to the new pass manager. I've seen a couple instances where non-trivial loop unswitching does improve performance (though, in one instance, I've noticed that the threshold needs to increase in simple loop unswitch to recover performance).

sameerds added inline comments.Sep 16 2021, 7:17 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Just to add some detail, here's one idea of "splitting

2803

I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok
also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right

I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand.

Just FYI, here's one idea of how "splitting the loop pipeline" might look like if we attempted that:

https://lists.llvm.org/pipermail/llvm-dev/2021-February/148619.html

aeubanks added inline comments.Sep 17 2021, 4:25 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

I'm currently talking to some people to see if we can do some redesigning and allow loop passes to request function analyses on demand.

bcahoon added inline comments.Sep 19 2021, 6:31 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Great to hear. Let me know if there is anything I can do to help. Do you think it's worthwhile to merge this patch now or wait until the outcome of the redesign?

aeubanks added inline comments.Sep 19 2021, 9:49 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803

Merging this now is fine. (and still LGTM)

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Thanks for the update. I've waited to commit this patch because the motivating code that is improved by non-trivial unswitching requires additional changes (that are in the regular LoopUnswitch pass but not the SimpleLoopUnswitch pass).

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

With this patch, we have to rerun DivergenceAnalysis every time we run the pass on every loop. If we change nontrivial unswitching into a function pass, we can upgrade that to only having to be rerun every time we actually unswitch something.
To be more specific, with the function pass, we'd have to invalidate most analyses anyway after every successful unswitch.

for every loop
  DA = FAM.getAnalysis<DivergenceAnalysis>(); // this will not rerun the analysis if we haven't invalidated below because we didn't successfully unswitch
  if successful unswitch
    auto PA = PreservedAnalyases::none();
    PA.preserve<SCEV>();
    // and all the other existing analyses we preserved
    FAM.invalidate(PA);

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

With this patch, we have to rerun DivergenceAnalysis every time we run the pass on every loop. If we change nontrivial unswitching into a function pass, we can upgrade that to only having to be rerun every time we actually unswitch something.
To be more specific, with the function pass, we'd have to invalidate most analyses anyway after every successful unswitch.

for every loop
  DA = FAM.getAnalysis<DivergenceAnalysis>(); // this will not rerun the analysis if we haven't invalidated below because we didn't successfully unswitch
  if successful unswitch
    auto PA = PreservedAnalyases::none();
    PA.preserve<SCEV>();
    // and all the other existing analyses we preserved
    FAM.invalidate(PA);

There's also the option to update all analysis inside the new function pass that's doing non-trivial unswitching. This is currently the case with all other analyses (they are updated), so the only one that needs to be added is DA.
This limits the need to update DA to just the new Function pass, vs all Loop passes that are part of the same LPM as SimpleLoopUnswitch.
The alternative is, as you both already said, for DA to be recomputed when it is invalidated.

Has this patch been merged into the main branch yet?

Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2022, 7:10 PM