This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Guard verifyFunction with EXPENSIVE_CHECKS macro
AbandonedPublic

Authored by tislam on Jan 12 2021, 6:36 PM.

Details

Summary

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

This patch guards the calls to verifyFunction with the EXPENSIVE_CHECKS macro.

Diff Detail

Event Timeline

tislam created this revision.Jan 12 2021, 6:36 PM
tislam requested review of this revision.Jan 12 2021, 6:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 6:36 PM
fhahn added a subscriber: fhahn.Jan 13 2021, 2:00 AM

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

See also https://bugs.llvm.org/show_bug.cgi?id=47712. I'm curious, where is the time spent in the verifier? PR47712 mentions a regression in the verifier where the majority of time is spent in visitMDNode. Is this a similar issue?

In general, I think the function verification has been very valuable in terms of surfacing bugs and I would be very reluctant to move it behind expensive checks, especially since we had this check for a long time. As mentioned in PR47712, I think verifyFunction should be O(# instructions in function) and not behind EXPENSIVE_CHECKS.

tislam added a comment.EditedJan 13 2021, 6:42 AM

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

See also https://bugs.llvm.org/show_bug.cgi?id=47712. I'm curious, where is the time spent in the verifier? PR47712 mentions a regression in the verifier where the majority of time is spent in visitMDNode. Is this a similar issue?

In general, I think the function verification has been very valuable in terms of surfacing bugs and I would be very reluctant to move it behind expensive checks, especially since we had this check for a long time. As mentioned in PR47712, I think verifyFunction should be O(# instructions in function) and not behind EXPENSIVE_CHECKS.

Thanks @fhahn for the information on PR47712.

The above stats are for cases where the instructions do not have any metadata (like debug or alias) in general. Here are some profile snapshots.

|      |             |     --3.96%--llvm::LoopVectorizePass::runImpl
|      |             |      |
|      |             |       --3.96%--llvm::LoopVectorizePass::processLoop
|      |             |        |
|      |             |        |--2.63%--llvm::verifyFunction
|      |             |        |    |
|      |             |        |     --2.54%--(anonymous namespace)::Verifier::verify
|      |             |        |      |
|      |             |        |      |--1.95%--llvm::InstVisitor<(anonymous namespace)::Verifier, void>::visit<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false> >
|      |             |        |      |
|      |             |        |       --0.54%--llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::BasicBlock, false> >
|      |             |        |
|      |             |         --1.17%--llvm::LoopVectorizationPlanner::executePlan

With our local branch, we saw the following.

|      |      |     --14.69%--llvm::LoopVectorizePass::runImpl
|      |      |      |
|      |      |       --14.69%--llvm::LoopVectorizePass::processLoop
|      |      |        |
|      |      |        |--9.98%--llvm::verifyFunction
|      |      |        |    |
|      |      |        |     --9.59%--(anonymous namespace)::Verifier::verify
|      |      |        |      |
|      |      |        |      |--2.49%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |    |
|      |      |        |      |     --1.28%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |
|      |      |        |      |--2.35%--llvm::InstVisitor<(anonymous namespace)::Verifier, void>::visit
|      |      |        |      |    |
|      |      |        |      |     --2.04%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |      |
|      |      |        |      |       --1.14%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |
|      |      |        |      |--2.12%--(anonymous namespace)::Verifier::verify
|      |      |        |      |
|      |      |        |       --2.06%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::CalculateFromScratch
|      |      |        |        |
|      |      |        |        |--0.89%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runDFS<false, bool (*)(llvm::BasicBlock*, llvm::BasicBlock*)>
|      |      |        |        |
|      |      |        |        |--0.60%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::attachNewSubtree
|      |      |        |        |
|      |      |        |         --0.52%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runSemiNCA
|      |      |        |
|      |      |         --4.22%--llvm::LoopVectorizationPlanner::executePlan

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

That is what the message said yes.

Thanks - if we can fix/limit verifyFunction itself instead of the callers, that seems like a better solution (and we would revert that change in SLP).

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

That is what the message said yes.

@tislam to make progress here, can you share a reproducer that shows the problematic compile time implications on vanilla llvm?

kkwli0 added a subscriber: kkwli0.Jan 13 2021, 10:58 AM
tislam added a comment.EditedJan 14 2021, 6:36 AM

Hi @lebedev.ri, I have uploaded an IR file that demonstrates the compile-time implications on vanilla llvm on our system. I used the following invocation.

opt  -passes=default\<O3\> -aa-pipeline=default -mcpu=pwr9 -o ./t.O3.bc ./t.ll

@fhahn, @lebedev.ri

Thank you for your input. Do you have any suggestion regarding this change? Please let me know if you need additional information from my side.

That's end-to-end test.
Since you are changing LV, can you give a test that *only* involves LV?

tislam added a comment.Feb 2 2021, 8:01 AM

opt -passes='loop-vectorize' -aa-pipeline=default -o ./tmp.bc D94576_LV_only.ll

Hi @lebedev.ri, I have uploaded a second IR file for testing the loop-vectorize pass only.

tislam added a comment.Mar 8 2021, 7:47 AM

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

Sorry, this completely dropped off my radar.
My comment is still the same, it would be good to to understand,
is there some particular reason why verification is slow for that sample?
Do you have a bisection as to when it became slow?

Hiding verification under EXPENSIVE_CHECKS is somewhat problematic,
it will effectively disable verification.

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

Sorry, this completely dropped off my radar.
My comment is still the same, it would be good to to understand,
is there some particular reason why verification is slow for that sample?
Do you have a bisection as to when it became slow?

Hiding verification under EXPENSIVE_CHECKS is somewhat problematic,
it will effectively disable verification.

This verification gets expensive when there are a lot of loops that are processed in a function. It seems kind of odd to run the verification on the entire function on each loop that is processed. Since the cost of verification is something like O(numloops x funcsize), it stands to reason that functions with many loops (which will also be quite large) will cause a huge cost to be paid for verification.
Simply modifying the vectorizer as:

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index bef39a56d9f2..220977f91cf3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10243,7 +10243,6 @@ static bool processLoopInVPlanNativePath(
 
   // Mark the loop as already vectorized to avoid vectorizing again.
   Hints.setAlreadyVectorized();
-  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
   return true;
 }
 
@@ -10672,7 +10671,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     Hints.setAlreadyVectorized();
   }
 
-  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
   return true;
 }
 
@@ -10738,6 +10736,8 @@ LoopVectorizeResult LoopVectorizePass::runImpl(
     Changed |= CFGChanged |= processLoop(L);
   }
 
+  assert(!Changed || !verifyFunction(F, &dbgs()));
+
   // Process each loop nest in the function.
   return LoopVectorizeResult(Changed, CFGChanged);
 }

cuts down the compile time by a factor of around 3.

The disadvantage of course is that in case of a broken function, the developer won't know which loop is transformed incorrectly. However, paying such a huge compile-time cost for something that bugpoint should easily help the developer narrow down in the rare case of a broken transformation, seems completely unnecessary.

lebedev.ri resigned from this revision.Jan 12 2023, 5:29 PM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

tislam added a comment.Feb 1 2023, 6:56 AM

Thanks very much to all reviewers for your inputs. Unfortunately, I have moved to a different project. I am abandoning this review.

In case anyone wants to revisit the problem. the following observation might be helpful.

For my case, a given metadata appeared in multiple places. I noticed that the verification of a given metadata is done redundantly for each invocation to verify. The redundancy can be avoided by maintaining a verification status on the metadata. After the first verification, a metadata remains in verified state until the next update operation. Subsequent verifications can simply check and return the verfied status, in case the state is not stale.

tislam abandoned this revision.Feb 1 2023, 6:59 AM