This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp

Differential D94576

[LoopVectorize] Guard verifyFunction with EXPENSIVE_CHECKS macro
AbandonedPublic

Authored by tislam on Jan 12 2021, 6:36 PM.

Download Raw Diff

Details

Reviewers

mkuper
jdoerfert
fhahn
lebedev.ri

Summary

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

This patch guards the calls to verifyFunction with the EXPENSIVE_CHECKS macro.

Diff Detail

Event Timeline

tislam created this revision.Jan 12 2021, 6:36 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 12 2021, 6:36 PM

tislam requested review of this revision.Jan 12 2021, 6:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 6:36 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tislam added a reviewer: mkuper.Jan 12 2021, 6:41 PM

tislam added a reviewer: jdoerfert.Jan 12 2021, 6:54 PM

Harbormaster completed remote builds in B84964: Diff 316294.Jan 12 2021, 7:28 PM

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

See also https://bugs.llvm.org/show_bug.cgi?id=47712. I'm curious, where is the time spent in the verifier? PR47712 mentions a regression in the verifier where the majority of time is spent in visitMDNode. Is this a similar issue?

In general, I think the function verification has been very valuable in terms of surfacing bugs and I would be very reluctant to move it behind expensive checks, especially since we had this check for a long time. As mentioned in PR47712, I think verifyFunction should be O(# instructions in function) and not behind EXPENSIVE_CHECKS.

In D94576#2495433, @fhahn wrote:

We have found that the calls to verifyFunction can take up to 66% of the time spent in llvm::LoopVectorizePass::processLoop. For some source files, verifyFunction took around 10% of the total compile time.

See also https://bugs.llvm.org/show_bug.cgi?id=47712. I'm curious, where is the time spent in the verifier? PR47712 mentions a regression in the verifier where the majority of time is spent in visitMDNode. Is this a similar issue?

In general, I think the function verification has been very valuable in terms of surfacing bugs and I would be very reluctant to move it behind expensive checks, especially since we had this check for a long time. As mentioned in PR47712, I think verifyFunction should be O(# instructions in function) and not behind EXPENSIVE_CHECKS.

Thanks @fhahn for the information on PR47712.

The above stats are for cases where the instructions do not have any metadata (like debug or alias) in general. Here are some profile snapshots.

|      |             |     --3.96%--llvm::LoopVectorizePass::runImpl
|      |             |      |
|      |             |       --3.96%--llvm::LoopVectorizePass::processLoop
|      |             |        |
|      |             |        |--2.63%--llvm::verifyFunction
|      |             |        |    |
|      |             |        |     --2.54%--(anonymous namespace)::Verifier::verify
|      |             |        |      |
|      |             |        |      |--1.95%--llvm::InstVisitor<(anonymous namespace)::Verifier, void>::visit<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false> >
|      |             |        |      |
|      |             |        |       --0.54%--llvm::DomTreeBuilder::Calculate<llvm::DominatorTreeBase<llvm::BasicBlock, false> >
|      |             |        |
|      |             |         --1.17%--llvm::LoopVectorizationPlanner::executePlan

With our local branch, we saw the following.

|      |      |     --14.69%--llvm::LoopVectorizePass::runImpl
|      |      |      |
|      |      |       --14.69%--llvm::LoopVectorizePass::processLoop
|      |      |        |
|      |      |        |--9.98%--llvm::verifyFunction
|      |      |        |    |
|      |      |        |     --9.59%--(anonymous namespace)::Verifier::verify
|      |      |        |      |
|      |      |        |      |--2.49%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |    |
|      |      |        |      |     --1.28%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |
|      |      |        |      |--2.35%--llvm::InstVisitor<(anonymous namespace)::Verifier, void>::visit
|      |      |        |      |    |
|      |      |        |      |     --2.04%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |      |
|      |      |        |      |       --1.14%--(anonymous namespace)::Verifier::visitInstruction
|      |      |        |      |
|      |      |        |      |--2.12%--(anonymous namespace)::Verifier::verify
|      |      |        |      |
|      |      |        |       --2.06%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::CalculateFromScratch
|      |      |        |        |
|      |      |        |        |--0.89%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runDFS<false, bool (*)(llvm::BasicBlock*, llvm::BasicBlock*)>
|      |      |        |        |
|      |      |        |        |--0.60%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::attachNewSubtree
|      |      |        |        |
|      |      |        |         --0.52%--llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runSemiNCA
|      |      |        |
|      |      |         --4.22%--llvm::LoopVectorizationPlanner::executePlan

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

In D94576#2495786, @lebedev.ri wrote:

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

(cc @spatel, re rG0aa75fb12faa04e07ba1a6e334605357b6a159c9)

In D94576#2495787, @tislam wrote:

In D94576#2495786, @lebedev.ri wrote:

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

That is what the message said yes.

In D94576#2495788, @lebedev.ri wrote:

(cc @spatel, re rG0aa75fb12faa04e07ba1a6e334605357b6a159c9)

Thanks - if we can fix/limit verifyFunction itself instead of the callers, that seems like a better solution (and we would revert that change in SLP).

In D94576#2495791, @lebedev.ri wrote:

In D94576#2495787, @tislam wrote:

In D94576#2495786, @lebedev.ri wrote:

So are the compilation time issues reproducible with vanilla llvm, or only in your local branch?
If latter, shouldn't the fix (which isn't hiding the verification under) belong to the branch?

Hi @lebedev.ri, the first profile is with the vanilla llvm.

That is what the message said yes.

@tislam to make progress here, can you share a reproducer that shows the problematic compile time implications on vanilla llvm?

kkwli0 added a subscriber: kkwli0.Jan 13 2021, 10:58 AM

D94576.tar.gz3 MBDownload

Hi @lebedev.ri, I have uploaded an IR file that demonstrates the compile-time implications on vanilla llvm on our system. I used the following invocation.

opt  -passes=default\<O3\> -aa-pipeline=default -mcpu=pwr9 -o ./t.O3.bc ./t.ll

tislam added reviewers: fhahn, lebedev.ri.Jan 14 2021, 4:13 PM

@fhahn, @lebedev.ri

Thank you for your input. Do you have any suggestion regarding this change? Please let me know if you need additional information from my side.

That's end-to-end test.
Since you are changing LV, can you give a test that *only* involves LV?

D94576_LV_only.ll.tar.gz915 KBDownload

opt -passes='loop-vectorize' -aa-pipeline=default -o ./tmp.bc D94576_LV_only.ll

Hi @lebedev.ri, I have uploaded a second IR file for testing the loop-vectorize pass only.

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

In D94576#2611255, @tislam wrote:

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

Sorry, this completely dropped off my radar.
My comment is still the same, it would be good to to understand,
is there some particular reason why verification is slow for that sample?
Do you have a bisection as to when it became slow?

Hiding verification under EXPENSIVE_CHECKS is somewhat problematic,
it will effectively disable verification.

In D94576#3269996, @lebedev.ri wrote:

In D94576#2611255, @tislam wrote:

Hi @lebedev.ri, did the second example work for you? Please let me know if you need any additional information.

Sorry, this completely dropped off my radar.
My comment is still the same, it would be good to to understand,
is there some particular reason why verification is slow for that sample?
Do you have a bisection as to when it became slow?

Hiding verification under EXPENSIVE_CHECKS is somewhat problematic,
it will effectively disable verification.

This verification gets expensive when there are a lot of loops that are processed in a function. It seems kind of odd to run the verification on the entire function on each loop that is processed. Since the cost of verification is something like O(numloops x funcsize), it stands to reason that functions with many loops (which will also be quite large) will cause a huge cost to be paid for verification.
Simply modifying the vectorizer as:

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index bef39a56d9f2..220977f91cf3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10243,7 +10243,6 @@ static bool processLoopInVPlanNativePath(
 
   // Mark the loop as already vectorized to avoid vectorizing again.
   Hints.setAlreadyVectorized();
-  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
   return true;
 }
 
@@ -10672,7 +10671,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     Hints.setAlreadyVectorized();
   }
 
-  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
   return true;
 }
 
@@ -10738,6 +10736,8 @@ LoopVectorizeResult LoopVectorizePass::runImpl(
     Changed |= CFGChanged |= processLoop(L);
   }
 
+  assert(!Changed || !verifyFunction(F, &dbgs()));
+
   // Process each loop nest in the function.
   return LoopVectorizeResult(Changed, CFGChanged);
 }

cuts down the compile time by a factor of around 3.

The disadvantage of course is that in case of a broken function, the developer won't know which loop is transformed incorrectly. However, paying such a huge compile-time cost for something that bugpoint should easily help the developer narrow down in the rare case of a broken transformation, seems completely unnecessary.

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:29 PM

Herald added subscribers: • pcwang-thead, shiva0217, StephenFan. · View Herald Transcript

Thanks very much to all reviewers for your inputs. Unfortunately, I have moved to a different project. I am abandoning this review.

In case anyone wants to revisit the problem. the following observation might be helpful.

For my case, a given metadata appeared in multiple places. I noticed that the verification of a given metadata is done redundantly for each invocation to verify. The redundancy can be avoided by maintaining a verification status on the metadata. After the first verification, a metadata remains in verified state until the next update operation. Subsequent verifications can simply check and return the verfied status, in case the state is not stale.

tislam abandoned this revision.Feb 1 2023, 6:59 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

6 lines

Diff 316294

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,094 Lines • ▼ Show 20 Lines	InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, 1, LVL,
&CM, BFI, PSI);		&CM, BFI, PSI);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""		LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
<< L->getHeader()->getParent()->getName() << "\"\n");		<< L->getHeader()->getParent()->getName() << "\"\n");
LVP.executePlan(LB, DT);		LVP.executePlan(LB, DT);

// Mark the loop as already vectorized to avoid vectorizing again.		// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();

		#ifdef EXPENSIVE_CHECKS
assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));		assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
		#endif

return true;		return true;
}		}

LoopVectorizePass::LoopVectorizePass(LoopVectorizeOptions Opts)		LoopVectorizePass::LoopVectorizePass(LoopVectorizeOptions Opts)
: InterleaveOnlyWhenForced(Opts.InterleaveOnlyWhenForced \|\|		: InterleaveOnlyWhenForced(Opts.InterleaveOnlyWhenForced \|\|
!EnableLoopInterleaving),		!EnableLoopInterleaving),
VectorizeOnlyWhenForced(Opts.VectorizeOnlyWhenForced \|\|		VectorizeOnlyWhenForced(Opts.VectorizeOnlyWhenForced \|\|
!EnableLoopVectorization) {}		!EnableLoopVectorization) {}
▲ Show 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
} else {		} else {
if (DisableRuntimeUnroll)		if (DisableRuntimeUnroll)
AddRuntimeUnrollDisableMetaData(L);		AddRuntimeUnrollDisableMetaData(L);

// Mark the loop as already vectorized to avoid vectorizing again.		// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();
}		}

		#ifdef EXPENSIVE_CHECKS
assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));		assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
		#endif

return true;		return true;
}		}

LoopVectorizeResult LoopVectorizePass::runImpl(		LoopVectorizeResult LoopVectorizePass::runImpl(
Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,		Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,
DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,		DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,
DemandedBits &DB_, AAResults &AA_, AssumptionCache &AC_,		DemandedBits &DB_, AAResults &AA_, AssumptionCache &AC_,
std::function<const LoopAccessInfo &(Loop &)> &GetLAA_,		std::function<const LoopAccessInfo &(Loop &)> &GetLAA_,
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines