This is an archive of the discontinued LLVM Phabricator instance.

[Passes] Only run extra vector passes if loops have been vectorized.
ClosedPublic

Authored by fhahn on Dec 3 2021, 10:39 AM.

Details

Summary

This patch uses a similar trick as in D113947 to only run the extra
passes after vectorization on functions where loops have been
vectorized.

The reason for running the 'extra vector passes' is
simplification/unswitching of the runtime checks created by LV, there
should be no need to run them if nothing got vectorized

To do that, a new dummy analysis ShouldRunExtraVectorPasses has been
added. If loops have been vectorized for a function, LV will cache the
analysis. At the moment it uses MadeCFGChanges as proxy for loop
vectorized, which isn't perfect (it could be too aggressive, e.g.
because no runtime checks have been added), but should be good enough
for now.

The extra passes are then managed by a new FunctionPassManager that
collects 2 sets of passes: one to run unconditionally and one to run
only if ShouldRunExtraVectorPasses has been cached.

The reason it manages 2 sets of passes is that there are some
unconditional passes between LV and the 'extra' passes. Having the pass
manager manage both allows us to query the cache before the
unconditional passes might invalidate it.

Without this patch, -extra-vectorizer-passes has the following
compile-time impact:

NewPM-O3: +4.86%
NewPM-ReleaseThinLTO: +3.56%
NewPM-ReleaseLTO-g: +7.17%

http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions

With this patch, that gets reduced to

NewPM-O3: +1.43%
NewPM-ReleaseThinLTO: +1.00%
NewPM-ReleaseLTO-g: +1.58%

http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions

It is probably still too high to enable by default, but much better.

Diff Detail

Event Timeline

fhahn created this revision.Dec 3 2021, 10:39 AM
fhahn requested review of this revision.Dec 3 2021, 10:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2021, 10:39 AM
aeubanks added inline comments.Dec 3 2021, 11:34 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
98

can ExtraVectorPassManager only contain ConditionalPasses and we have a separate FPM for the normal passes? separation of concerns, seems like this is doing two things that could be separated

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10754

add a TODO for more precise vectorization tracking?

llvm/test/Other/opt-pipeline-vector-passes.ll
3

this should check that we actually don't run the extra passes when there's no vectorization, e.g.
O2-NOT: Running pass: ...

fhahn added inline comments.Dec 3 2021, 12:08 PM
llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
98

The reason why ExtraVectorPassManager contains both the conditional and unconditional passes is that this way we can check whether ShouldRunExtraVectorPasses is available *before* the unconditional passes run.

If we would run the unconditional passes outside of ExtraVectorPassManager, then any change they make would invalidate ShouldRunExtraVectorPasses IIUC. Unless we teach all those passes to preserve it. Or perhaps there's a different alternative?

llvm/test/Other/opt-pipeline-vector-passes.ll
3

Yes, the original test doesn't really check for that. I'll push a commit to improve the test in that regard separately and will also include the vector loop.

fhahn updated this revision to Diff 391709.Dec 3 2021, 12:33 PM

rebased on top of test changes, added TODO

It would be good to mimic what the module equivalence check does
(when no changes were reported, the module should not have been changed),
and e.g. when in EXPENSIVE_CHECKS mode, if we weren't going to run those
extra vectorization passes because we predicted they wouldn't do anything,
to still run them, and check that they actually didn't do anything.

fhahn added a comment.Dec 3 2021, 1:33 PM

It would be good to mimic what the module equivalence check does
(when no changes were reported, the module should not have been changed),
and e.g. when in EXPENSIVE_CHECKS mode, if we weren't going to run those
extra vectorization passes because we predicted they wouldn't do anything,
to still run them, and check that they actually didn't do anything.

Unfortunately I don't think this will be feasible. The extra passes may simplify things in functions where nothing gets vectorized, *but* I think this is only a side-effect of the current implementation. The comment for the extra passes states that they are specifically intended to simplify runtime checks generated by the vectorizer.

It is probably still too high to enable by default, but much better.

It depends, for -O3, this may not be an issue. Maybe worth to look at "mafft"? (+4.17%)

this is a little different from D113947 in that we're using an analysis to keep track of something that must happen if a pass did something, whereas D113947 uses an analysis to keep track of if we *don't* have to do something
the only other thing I can think of that's somewhat similar is the devirtualization wrapper where we keep track of the number of direct/indirect calls, and also using value handles to keep track of if any devirtualization happened, and rerun passes if it did

I guess there isn't a great alternative to something like this unless we have some sort of marker within the IR itself, which doesn't really make any sense
so something along these lines seems fine to me

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
98

you could override the invalidate() method of Result to be stateless, e.g. following

bool GlobalsAAResult::invalidate(Module &, const PreservedAnalyses &PA,
                                 ModuleAnalysisManager::Invalidator &) {
  // Check whether the analysis has been explicitly invalidated. Otherwise, it's
  // stateless and remains preserved.
  auto PAC = PA.getChecker<GlobalsAA>();
  return !PAC.preservedWhenStateless();
}

this makes it so it's only invalidated if it's specifically abandoned

this should make it so this only needs to wrap the ConditionaPasses and not be its own FunctionPassManager

fhahn updated this revision to Diff 393130.Dec 9 2021, 5:51 AM

Updated to only invalidate the marker analysis when explicitly invalidated as suggested.

I also run a set of benchmarks including SPEC2006/Geekbench on AArch64 (extra vector passes enabled) with and without this patch and there were no notable performance differences.

It is probably still too high to enable by default, but much better.

It depends, for -O3, this may not be an issue. Maybe worth to look at "mafft"? (+4.17%)

Possibly, but I rather not get into this discussion at least until the major compile-time issue this patch addresses has been fixed. I suspect there is potential to tune the 'extra' vector pass sequence a bit more, but unfortunately I won't be able to drive towards enabling by default in the near future.

this is a little different from D113947 in that we're using an analysis to keep track of something that must happen if a pass did something, whereas D113947 uses an analysis to keep track of if we *don't* have to do something
the only other thing I can think of that's somewhat similar is the devirtualization wrapper where we keep track of the number of direct/indirect calls, and also using value handles to keep track of if any devirtualization happened, and rerun passes if it did

I guess there isn't a great alternative to something like this unless we have some sort of marker within the IR itself, which doesn't really make any sense
so something along these lines seems fine to me

Yeah, at the moment I don't think there's a general alternative to optionally running passes and using a marker analysis seems like a very convenient way to use the new pass manager to do that.

A related issue was discussed a while ago on llvm-dev ('Execute intrinsic lowering passes on demand' thread) [1] . It might be possible to use a similar approach to conditionally execute various intrinsic lowering passes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-March/140531.html

fhahn updated this revision to Diff 393131.Dec 9 2021, 5:56 AM
fhahn marked an inline comment as done.

Also update doc-comment for ExtraVectorPassManager after recent change.

fhahn added inline comments.Dec 9 2021, 6:18 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
98

Done, thanks!

aeubanks accepted this revision.Dec 9 2021, 11:32 AM
This revision is now accepted and ready to land.Dec 9 2021, 11:32 AM
Unknown Object (User) added a subscriber: Unknown Object (User).Oct 19 2022, 6:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 6:53 PM
Unknown Object (User) added a comment.Oct 19 2022, 6:53 PM

thank you for the post! It helps me tremendously!
krunker