This is an archive of the discontinued LLVM Phabricator instance.

[Assumptions] Make collecting ephemeral values not quadratic in the number of assume intrinsics.
ClosedPublic

Authored by chandlerc on Aug 11 2016, 2:48 AM.

Details

Summary

The classical way to have a cache-friendly vector style container when
we need queue semantics for BFS instead of stack semantics for DFS is to
use an ever-growing vector and an index. Erasing from the front requires
O(size) work, and unless we expect the worklist to grow *very* large,
its probably cheaper to just grow and race down the list.

But that makes it more bad that we're putting the assume intrinsics in
this at all. We end up looking at the (by definition empty) use list to
see if they're ephemeral (when we've already put them in that set), etc.

Instead, directly populate the worklist with the operands when we mark
the assume intrinsics as ephemeral. Also, test the visited set *before*
putting things into the worklist so we don't accumulate the same value
in the list 100s of times.

It would be nice to use a set-vector for this but I think its useful to
test the set earlier to avoid repeatedly querying whether the same
instruction is safe to speculate.

Hopefully with these changes the number of values pushed onto the
worklist is smaller, and we avoid quadratic work by letting it grow as
necessary.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 67665.Aug 11 2016, 2:48 AM
chandlerc retitled this revision from to [Assumptions] Make collecting ephemeral values not quadratic in the number of assume intrinsics..
chandlerc updated this object.
chandlerc added reviewers: sanjoy, hfinkel.
chandlerc added a subscriber: llvm-commits.
hfinkel added inline comments.Aug 12 2016, 2:49 PM
lib/Analysis/CodeMetrics.cpp
54

Is there a reason to use the index here? Why not write this as:

for (auto I = Worklist.begin(); I != Worklist.end(); ++I) {
  const Value *V = *I;
85

to a -> do a

chandlerc marked an inline comment as done.Aug 16 2016, 7:55 PM
chandlerc added inline comments.
lib/Analysis/CodeMetrics.cpp
54

Because that isn't valid.

When you grow the worklist you invalidate all iterators (because the iterators are just pointers and its a vector that may have to re-allocate).

How hard would it be to amortize the removal from the queue grows to a certain size? I ask because at some point the cost of copying around chunks of memory when a vector grows might outweigh the cost of just regularly trimming the queue if there are visited/stale entries at the beginning to the "current head" iterator.

How hard would it be to amortize the removal from the queue grows to a certain size? I ask because at some point the cost of copying around chunks of memory when a vector grows might outweigh the cost of just regularly trimming the queue if there are visited/stale entries at the beginning to the "current head" iterator.

I suspect this would hard to make worthwhile...

Because the growth is amortized already, the copies are too. So you're not going to see something *really* hot here no matter what. The real risk is that this uses gobs and gobs of memory.

But it still seems like a very low risk. This is strictly bounded on the number of instructions in a function. I think we are always going to be OK creating a vector with that many pointers in it (and even the visited set of those pointers) in terms of memory usage, so I'm not sure there is really an effective way to make this substantially cheaper.

If you have a concrete suggestion though, I'm happy to look at it. We have lots of worklists like this in LLVM.

dberris accepted this revision.Aug 16 2016, 8:31 PM
dberris added a reviewer: dberris.

If you have a concrete suggestion though, I'm happy to look at it. We have lots of worklists like this in LLVM.

Not at this moment -- but I do think it's worthwhile to have an ADT to implement worklists and algorithms around it, if only to make it easier to reason about the code utilising them (and be able to consolidate the efficiency work to a smaller set of places).

This revision is now accepted and ready to land.Aug 16 2016, 8:31 PM
majnemer added inline comments.
lib/Analysis/CodeMetrics.cpp
32–33

I think you could use a SetVector here because you never pop from Worklist, right?

34

const auto *?

hfinkel added inline comments.Aug 16 2016, 10:56 PM
lib/Analysis/CodeMetrics.cpp
54

Ah, right ;) -- Why don't we use a std::deque for these things?

chandlerc marked an inline comment as done.Aug 17 2016, 12:24 AM
chandlerc added inline comments.
lib/Analysis/CodeMetrics.cpp
32–33

I started with that, but we add things to the visited set that we don't put on the worklist which allows us to avoid repeatedly querying isSafeToSpeculativelyExecute. While micro-optimizing the data structure seems unlikely to be worthwhile (see my comment below), avoiding repeated calls to this fairly heavy weight routine do seem worthwhile when the cost is just keeping the set separate.

54

Because std::deque is *amazingly* slower, especially for the common case of a small number of values. And any deque implementation will have this property really. Too much book-keeping for too little value.

As Danny pointed out, far and away the best data structure patterns we will come up with will look roughly like a double stack or a vector that auto-shrinks the front when forced to re-allocate due to growth at the back.

But I strongly, strongly suspect that *this* is not the code path so hot to motivate adding a highly tuned data structure. So I don't think we should wait for that here, I think we should go with the normal, boring solution that is used pretty much everywhere else already. And if/when we find something that is hot enough to *really* care, let's build a nicely tuned thing.

All I really want is to make this not quadratic. ;]

hfinkel added inline comments.Aug 17 2016, 8:29 AM
lib/Analysis/CodeMetrics.cpp
54

Because std::deque is *amazingly* slower, especially for the common case of a small number of values.
...
But I strongly, strongly suspect that *this* is not the code path so hot to motivate adding a highly tuned data structure.

So this is kind of my point ;) -- but fair enough, if this is the useful general pattern, this LGTM.

This revision was automatically updated to reflect the committed changes.