This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace SetVector with custom BitVectorVector
AbandonedPublic

Authored by lebedev.ri on Nov 12 2018, 3:02 AM.

Details

Summary

As it was suggested in D54381, Set is not the optimal data structure, given the data knowledge.
We know that the values are in range [0, Points_.size()).
Thus, we really only need a single bit to track whether we have the value or not.
Naturally, BitVector fits the bill.
We can't just use BitVector as a Set in SetVector, thus i have copied SetVector
into llvm-exegesis directory, pruned everything that isn't needed, and replaced Set with BitVector terminology.

Old: (D54415)

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       6564.110422      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.19% )
...
            6.5657 +- 0.0129 seconds time elapsed  ( +-  0.20% )

New:

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       5553.640326      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.27% )
...
            5.5545 +- 0.0148 seconds time elapsed  ( +-  0.27% )

So another -15%.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Nov 12 2018, 3:02 AM

I'm not convinced you need a new type here - why don't you just use BitVector (or SparseBitVector?) and create a helper function for the insert/erase?

I'm not convinced you need a new type here - why don't you just use BitVector (or SparseBitVector?) and create a helper function for the insert/erase?

I'm not sure i follow your suggestion.
In SetVector, set_ is a private member variable, i can't reach it from my custom function.

I'm also not following... DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

I'm also not following...

I'm sorry that i'm not following what you do not follow :)

DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

Which is exactly what this BitVectorVector implements, as far as i can tell.
I do not understand what the question is. I'm sorry. Can you reformulate the question, please?

I'm also not following...

I'm sorry that i'm not following what you do not follow :)

DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

Which is exactly what this BitVectorVector implements, as far as i can tell.
I do not understand what the question is. I'm sorry. Can you reformulate the question, please?

Maybe I should quote "talk is cheap" here... My idea is https://reviews.llvm.org/D54442 (I hope it won't interfere your other improvement)

I'm also not following...

I'm sorry that i'm not following what you do not follow :)

DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

Which is exactly what this BitVectorVector implements, as far as i can tell.
I do not understand what the question is. I'm sorry. Can you reformulate the question, please?

Maybe I should quote "talk is cheap" here... My idea is https://reviews.llvm.org/D54442 (I hope it won't interfere your other improvement)

I still do not understand what the question is.
Yes, that pretty much the same as this code.
Differences:

  • uses pop back, instead of pop front. my measurements so far always showed pop back to be worse.
  • Both data structures are inline, instead of abstracting them under BitVectorVector.
  • One very interesting point - initial insertion is a special case, no need to check in the set. I do expect that to be faster, let's see..

I'm also not following...

I'm sorry that i'm not following what you do not follow :)

DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

Which is exactly what this BitVectorVector implements, as far as i can tell.
I do not understand what the question is. I'm sorry. Can you reformulate the question, please?

Maybe I should quote "talk is cheap" here... My idea is https://reviews.llvm.org/D54442 (I hope it won't interfere your other improvement)

I still do not understand what the question is.
Yes, that pretty much the same as this code.
Differences:

  • uses pop back, instead of pop front. my measurements so far always showed pop back to be worse.
  • Both data structures are inline, instead of abstracting them under BitVectorVector.
  • One very interesting point - initial insertion is a special case, no need to check in the set. I do expect that to be faster, let's see..

Can you suggest some way to get the large benchmark file as you have in /tmp/benchmarks.yaml? I need to experiment with it but it is still unclear to me why pop_back is slower.

I'm also not following...

I'm sorry that i'm not following what you do not follow :)

DBSCAN is a BFS/Dijkstra-like algorithm. a BitVector + reserved std::vector suffice.

Which is exactly what this BitVectorVector implements, as far as i can tell.
I do not understand what the question is. I'm sorry. Can you reformulate the question, please?

Maybe I should quote "talk is cheap" here... My idea is https://reviews.llvm.org/D54442 (I hope it won't interfere your other improvement)

I still do not understand what the question is.
Yes, that pretty much the same as this code.
Differences:

  • uses pop back, instead of pop front. my measurements so far always showed pop back to be worse.
  • Both data structures are inline, instead of abstracting them under BitVectorVector.
  • One very interesting point - initial insertion is a special case, no need to check in the set. I do expect that to be faster, let's see..

Can you suggest some way to get the large benchmark file as you have in /tmp/benchmarks.yaml? I need to experiment with it but it is still unclear to me why pop_back is slower.


Please note that it is quite literally just first 500'000 lines of the full uop benchmark, which is ~10x as large.
It is that big because i was experimenting with repeating benchmarking 10x times, and then being able visually undestand what is the correct value when the benchmark ends up being noisy.

Thanks for your benchmark file!

llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /tmp/new

https://reviews.llvm.org/D54442 alone (without your other optimization) reduces the time from 16s to 4.2s on my machine. I'll leave the const auto Neighbors = rangeQuery(Q); inlining and other stuff to you :)

Thanks for your benchmark file!

llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /tmp/new

https://reviews.llvm.org/D54442 alone (without your other optimization) reduces the time from 16s to 4.2s on my machine. I'll leave the const auto Neighbors = rangeQuery(Q); inlining and other stuff to you :)

After thinking, i see what the question was - why use abstractions when you can avoid using abstractions.
Well, because they are abstractions :) I have seen a bit much code, than i would have liked to, that intentionally avoided abstractions.
So i personally would not write everything inline like it is in D54442, and would not review that.
If it is abstracted, it is more readable, actually testable. And one day might even be moved to a more common (ADT e.g.) place should others need it.
If it is all inline, someone *will* have to refactor it to use abstractions later on.
So i guess the question boils down to - why bother writing good code when one can write "C spaghetti".

I'm not sure if D54442 is just a proof-of-concept or actually for the review,
but i would personally guess that small incremental patches
(with actual perf changes stated! since it is a perf improvements patchset) are better,
since small incremental changes are anyway recommended by the coding standards.

Thanks for your benchmark file!

llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /tmp/new

https://reviews.llvm.org/D54442 alone (without your other optimization) reduces the time from 16s to 4.2s on my machine. I'll leave the const auto Neighbors = rangeQuery(Q); inlining and other stuff to you :)

After thinking, i see what the question was - why use abstractions when you can avoid using abstractions.
Well, because they are abstractions :) I have seen a bit much code, than i would have liked to, that intentionally avoided abstractions.
So i personally would not write everything inline like it is in D54442, and would not review that.
If it is abstracted, it is more readable, actually testable. And one day might even be moved to a more common (ADT e.g.) place should others need it.
If it is all inline, someone *will* have to refactor it to use abstractions later on.
So i guess the question boils down to - why bother writing good code when one can write "C spaghetti".

I'm not sure if D54442 is just a proof-of-concept or actually for the review,
but i would personally guess that small incremental patches
(with actual perf changes stated! since it is a perf improvements patchset) are better,
since small incremental changes are anyway recommended by the coding standards.

D54442 is also for review. It uses inlined bit vector Added (vector<char> is faster than BitVector in this case) as I don't think the abstraction is really necessary. It is used similarly as what would be used in BFS or Dijkstra's algorithm.

lebedev.ri abandoned this revision.Nov 14 2018, 12:41 AM

Okay, if everyone hates abstractions, so be it, D54514 :)
Did benchmarks, std::vector<char> is indeed slightly better here.