This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
2/9
CodeMetrics.cpp

Differential D23396

[Assumptions] Make collecting ephemeral values not quadratic in the number of assume intrinsics.
ClosedPublic

Authored by chandlerc on Aug 11 2016, 2:48 AM.

Download Raw Diff

Details

Reviewers

dberris
sanjoy
hfinkel

Commits

rGe2f36bcb8421: [Assumptions] Make collecting ephemeral values not quadratic in the number of…
rL279099: [Assumptions] Make collecting ephemeral values not quadratic in the

Summary

The classical way to have a cache-friendly vector style container when
we need queue semantics for BFS instead of stack semantics for DFS is to
use an ever-growing vector and an index. Erasing from the front requires
O(size) work, and unless we expect the worklist to grow *very* large,
its probably cheaper to just grow and race down the list.

But that makes it more bad that we're putting the assume intrinsics in
this at all. We end up looking at the (by definition empty) use list to
see if they're ephemeral (when we've already put them in that set), etc.

Instead, directly populate the worklist with the operands when we mark
the assume intrinsics as ephemeral. Also, test the visited set *before*
putting things into the worklist so we don't accumulate the same value
in the list 100s of times.

It would be nice to use a set-vector for this but I think its useful to
test the set earlier to avoid repeatedly querying whether the same
instruction is safe to speculate.

Hopefully with these changes the number of values pushed onto the
worklist is smaller, and we avoid quadratic work by letting it grow as
necessary.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 67665.Aug 11 2016, 2:48 AM

chandlerc retitled this revision from to [Assumptions] Make collecting ephemeral values not quadratic in the number of assume intrinsics..

chandlerc updated this object.

chandlerc added reviewers: sanjoy, hfinkel.

chandlerc added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 11 2016, 2:48 AM

hfinkel added inline comments.Aug 12 2016, 2:49 PM

lib/Analysis/CodeMetrics.cpp
54	Is there a reason to use the index here? Why not write this as: for (auto I = Worklist.begin(); I != Worklist.end(); ++I) { const Value V = I;
85	to a -> do a

chandlerc marked an inline comment as done.Aug 16 2016, 7:55 PM

chandlerc added inline comments.

lib/Analysis/CodeMetrics.cpp
54	Because that isn't valid. When you grow the worklist you invalidate all iterators (because the iterators are just pointers and its a vector that may have to re-allocate).

How hard would it be to amortize the removal from the queue grows to a certain size? I ask because at some point the cost of copying around chunks of memory when a vector grows might outweigh the cost of just regularly trimming the queue if there are visited/stale entries at the beginning to the "current head" iterator.

In D23396#517618, @dberris wrote:

How hard would it be to amortize the removal from the queue grows to a certain size? I ask because at some point the cost of copying around chunks of memory when a vector grows might outweigh the cost of just regularly trimming the queue if there are visited/stale entries at the beginning to the "current head" iterator.

I suspect this would hard to make worthwhile...

Because the growth is amortized already, the copies are too. So you're not going to see something *really* hot here no matter what. The real risk is that this uses gobs and gobs of memory.

But it still seems like a very low risk. This is strictly bounded on the number of instructions in a function. I think we are always going to be OK creating a vector with that many pointers in it (and even the visited set of those pointers) in terms of memory usage, so I'm not sure there is really an effective way to make this substantially cheaper.

If you have a concrete suggestion though, I'm happy to look at it. We have lots of worklists like this in LLVM.

If you have a concrete suggestion though, I'm happy to look at it. We have lots of worklists like this in LLVM.

Not at this moment -- but I do think it's worthwhile to have an ADT to implement worklists and algorithms around it, if only to make it easier to reason about the code utilising them (and be able to consolidate the efficiency work to a smaller set of places).

This revision is now accepted and ready to land.Aug 16 2016, 8:31 PM

majnemer added a subscriber: majnemer.Aug 16 2016, 10:37 PM

majnemer added inline comments.

lib/Analysis/CodeMetrics.cpp
32–33	I think you could use a `SetVector` here because you never pop from `Worklist`, right?
34	`const auto *`?

hfinkel added inline comments.Aug 16 2016, 10:56 PM

lib/Analysis/CodeMetrics.cpp
54	Ah, right ;) -- Why don't we use a std::deque for these things?

chandlerc marked an inline comment as done.Aug 17 2016, 12:24 AM

chandlerc added inline comments.

lib/Analysis/CodeMetrics.cpp
32–33	I started with that, but we add things to the visited set that we don't put on the worklist which allows us to avoid repeatedly querying isSafeToSpeculativelyExecute. While micro-optimizing the data structure seems unlikely to be worthwhile (see my comment below), avoiding repeated calls to this fairly heavy weight routine do seem worthwhile when the cost is just keeping the set separate.
54	Because std::deque is amazingly slower, especially for the common case of a small number of values. And any deque implementation will have this property really. Too much book-keeping for too little value. As Danny pointed out, far and away the best data structure patterns we will come up with will look roughly like a double stack or a vector that auto-shrinks the front when forced to re-allocate due to growth at the back. But I strongly, strongly suspect that this is not the code path so hot to motivate adding a highly tuned data structure. So I don't think we should wait for that here, I think we should go with the normal, boring solution that is used pretty much everywhere else already. And if/when we find something that is hot enough to really care, let's build a nicely tuned thing. All I really want is to make this not quadratic. ;]

hfinkel added inline comments.Aug 17 2016, 8:29 AM

lib/Analysis/CodeMetrics.cpp
54	Because std::deque is amazingly slower, especially for the common case of a small number of values. ... But I strongly, strongly suspect that this is not the code path so hot to motivate adding a highly tuned data structure. So this is kind of my point ;) -- but fair enough, if this is the useful general pattern, this LGTM.

Closed by commit rL279099: [Assumptions] Make collecting ephemeral values not quadratic in the (authored by chandlerc). · Explain WhyAug 18 2016, 10:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

CodeMetrics.cpp

59 lines

Diff 67665

lib/Analysis/CodeMetrics.cpp

	Show All 21 Lines
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#define DEBUG_TYPE "code-metrics"			#define DEBUG_TYPE "code-metrics"

	using namespace llvm;			using namespace llvm;

	static void completeEphemeralValues(SmallVector<const Value *, 16> &WorkSet,			static void
	SmallPtrSetImpl<const Value*> &EphValues) {			appendSpeculatableOperands(const Value *V,
	SmallPtrSet<const Value *, 32> Visited;			SmallPtrSetImpl<const Value *> &Visited,
				SmallVectorImpl<const Value *> &Worklist) {
				majnemerUnsubmitted Not Done Reply Inline Actions I think you could use a `SetVector` here because you never pop from `Worklist`, right? majnemer: I think you could use a `SetVector` here because you never pop from `Worklist`, right?
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions I started with that, but we add things to the visited set that we don't put on the worklist which allows us to avoid repeatedly querying isSafeToSpeculativelyExecute. While micro-optimizing the data structure seems unlikely to be worthwhile (see my comment below), avoiding repeated calls to this fairly heavy weight routine do seem worthwhile when the cost is just keeping the set separate. chandlerc: I started with that, but we add things to the visited set that we don't put on the worklist…
	// Make sure that all of the items in WorkSet are in our EphValues set.			const User *U = dyn_cast<User>(V);
				majnemerUnsubmitted Done Reply Inline Actions `const auto `? majnemer:* `const auto *`?
	EphValues.insert(WorkSet.begin(), WorkSet.end());			if (!U)
				return;

				for (const Value *Operand : U->operands())
				if (Visited.insert(Operand).second)
				if (isSafeToSpeculativelyExecute(Operand))
				Worklist.push_back(Operand);
				}

				static void completeEphemeralValues(SmallPtrSetImpl<const Value *> &Visited,
				SmallVectorImpl<const Value *> &Worklist,
				SmallPtrSetImpl<const Value *> &EphValues) {
	// Note: We don't speculate PHIs here, so we'll miss instruction chains kept			// Note: We don't speculate PHIs here, so we'll miss instruction chains kept
	// alive only by ephemeral values.			// alive only by ephemeral values.

	while (!WorkSet.empty()) {			// Walk the worklist using an index but without caching the size so we can
	const Value *V = WorkSet.front();			// append more entries as we process the worklist. This forms a queue without
	WorkSet.erase(WorkSet.begin());			// quadratic behavior by just leaving processed nodes at the head of the
				// worklist forever.
				for (int i = 0; i < (int)Worklist.size(); ++i) {
				hfinkelUnsubmitted Not Done Reply Inline Actions Is there a reason to use the index here? Why not write this as: for (auto I = Worklist.begin(); I != Worklist.end(); ++I) { const Value V = I; hfinkel: Is there a reason to use the index here? Why not write this as: for (auto I = Worklist.begin…
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Because that isn't valid. When you grow the worklist you invalidate all iterators (because the iterators are just pointers and its a vector that may have to re-allocate). chandlerc: Because that isn't valid. When you grow the worklist you invalidate all iterators (because the…
				hfinkelUnsubmitted Not Done Reply Inline Actions Ah, right ;) -- Why don't we use a std::deque for these things? hfinkel: Ah, right ;) -- Why don't we use a std::deque for these things?
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Because std::deque is amazingly slower, especially for the common case of a small number of values. And any deque implementation will have this property really. Too much book-keeping for too little value. As Danny pointed out, far and away the best data structure patterns we will come up with will look roughly like a double stack or a vector that auto-shrinks the front when forced to re-allocate due to growth at the back. But I strongly, strongly suspect that this is not the code path so hot to motivate adding a highly tuned data structure. So I don't think we should wait for that here, I think we should go with the normal, boring solution that is used pretty much everywhere else already. And if/when we find something that is hot enough to really care, let's build a nicely tuned thing. All I really want is to make this not quadratic. ;] chandlerc: Because std::deque is amazingly slower, especially for the common case of a small number of…
				hfinkelUnsubmitted Not Done Reply Inline Actions Because std::deque is amazingly slower, especially for the common case of a small number of values. ... But I strongly, strongly suspect that this is not the code path so hot to motivate adding a highly tuned data structure. So this is kind of my point ;) -- but fair enough, if this is the useful general pattern, this LGTM. hfinkel: > Because std::deque is amazingly slower, especially for the common case of a small number of…
				const Value *V = Worklist[i];

	if (!Visited.insert(V).second)			assert(Visited.count(V) &&
	continue;			"Failed to add a worklist entry to our visited set!");

	// If all uses of this value are ephemeral, then so is this value.			// If all uses of this value are ephemeral, then so is this value.
	if (!std::all_of(V->user_begin(), V->user_end(),			if (!std::all_of(V->user_begin(), V->user_end(),
	[&](const User *U) { return EphValues.count(U); }))			[&](const User *U) { return EphValues.count(U); }))
	continue;			continue;

	EphValues.insert(V);			EphValues.insert(V);
	DEBUG(dbgs() << "Ephemeral Value: " << *V << "\n");			DEBUG(dbgs() << "Ephemeral Value: " << *V << "\n");

	if (const User *U = dyn_cast<User>(V))			// Append any more operands to consider.
	for (const Value *J : U->operands()) {			appendSpeculatableOperands(V, Visited, Worklist);
	if (isSafeToSpeculativelyExecute(J))
	WorkSet.push_back(J);
	}
	}			}
	}			}

	// Find all ephemeral values.			// Find all ephemeral values.
	void CodeMetrics::collectEphemeralValues(			void CodeMetrics::collectEphemeralValues(
	const Loop L, AssumptionCache AC,			const Loop L, AssumptionCache AC,
	SmallPtrSetImpl<const Value *> &EphValues) {			SmallPtrSetImpl<const Value *> &EphValues) {
	SmallVector<const Value *, 16> WorkSet;			SmallPtrSet<const Value *, 32> Visited;
				SmallVector<const Value *, 16> Worklist;

	for (auto &AssumeVH : AC->assumptions()) {			for (auto &AssumeVH : AC->assumptions()) {
	if (!AssumeVH)			if (!AssumeVH)
	continue;			continue;
	Instruction *I = cast<Instruction>(AssumeVH);			Instruction *I = cast<Instruction>(AssumeVH);

	// Filter out call sites outside of the loop so we don't to a function's			// Filter out call sites outside of the loop so we don't to a function's
				hfinkelUnsubmitted Done Reply Inline Actions to a -> do a hfinkel: to a -> do a
	// worth of work for each of its loops (and, in the common case, ephemeral			// worth of work for each of its loops (and, in the common case, ephemeral
	// values in the loop are likely due to @llvm.assume calls in the loop).			// values in the loop are likely due to @llvm.assume calls in the loop).
	if (!L->contains(I->getParent()))			if (!L->contains(I->getParent()))
	continue;			continue;

	WorkSet.push_back(I);			if (EphValues.insert(I).second)
				appendSpeculatableOperands(I, Visited, Worklist);
	}			}

	completeEphemeralValues(WorkSet, EphValues);			completeEphemeralValues(Visited, Worklist, EphValues);
	}			}

	void CodeMetrics::collectEphemeralValues(			void CodeMetrics::collectEphemeralValues(
	const Function F, AssumptionCache AC,			const Function F, AssumptionCache AC,
	SmallPtrSetImpl<const Value *> &EphValues) {			SmallPtrSetImpl<const Value *> &EphValues) {
	SmallVector<const Value *, 16> WorkSet;			SmallPtrSet<const Value *, 32> Visited;
				SmallVector<const Value *, 16> Worklist;

	for (auto &AssumeVH : AC->assumptions()) {			for (auto &AssumeVH : AC->assumptions()) {
	if (!AssumeVH)			if (!AssumeVH)
	continue;			continue;
	Instruction *I = cast<Instruction>(AssumeVH);			Instruction *I = cast<Instruction>(AssumeVH);
	assert(I->getParent()->getParent() == F &&			assert(I->getParent()->getParent() == F &&
	"Found assumption for the wrong function!");			"Found assumption for the wrong function!");
	WorkSet.push_back(I);
				if (EphValues.insert(I).second)
				appendSpeculatableOperands(I, Visited, Worklist);
	}			}

	completeEphemeralValues(WorkSet, EphValues);			completeEphemeralValues(Visited, Worklist, EphValues);
	}			}

	/// Fill in the current structure with information gleaned from the specified			/// Fill in the current structure with information gleaned from the specified
	/// block.			/// block.
	void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,			void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
	const TargetTransformInfo &TTI,			const TargetTransformInfo &TTI,
	const SmallPtrSetImpl<const Value*> &EphValues) {			const SmallPtrSetImpl<const Value*> &EphValues) {
	++NumBlocks;			++NumBlocks;
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines