This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] BitVectorVector: provide assign() method, use it.
AbandonedPublic

Authored by lebedev.ri on Nov 12 2018, 1:31 PM.

Details

Summary

Original idea by @MaskRay in D54442.
When doing the initial filing of the BitVectorVector, we know beforehand that
we will not have duplicates in the input values, and the storage is empty,
so we don't have to check before assigning.

Old: (D54418)

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       5597.034043      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.35% )
...
            5.5975 +- 0.0198 seconds time elapsed  ( +-  0.35% )

New:

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       5578.307677      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.18% )
...
            5.5800 +- 0.0101 seconds time elapsed  ( +-  0.18% )

-0.3%, not that much, but not nothing, too.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Nov 12 2018, 1:31 PM

I will cut this patchseries here for now.
Old walltime was around ~25sec, new is ~5.6sec, so -77.58%.

5 patches remain in need of review:

  • D54383 [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. <- self-contained, pretty trivial
  • D54388 [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage. <- was essentially replaced by D54415, but that was only possible after D54390, so it kind-of makes sense as an intermediate step.
  • Small incremental improvements to the dbScan():
    • D54381 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<> <- very straight-forward replacement, nothing complicated
    • D54418 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace SetVector with custom BitVectorVector
    • D54445 [llvm-exegesis] BitVectorVector: provide assign() method, use it. <- trivial

The biggest remaining problem is within isNeighbour. It contains 99% of cache misses, due to that vector-within-vector storage.
I believe i will be able to get some improvement by

  1. Precalculating (into BitVector) the is neighbor matrix. I actually already tried that, not much of an improvement, anoter 100ms i think, not worth it on it's own.

and

  1. Parallelizing this computation. I have not *yet* tried this, but i'm optimistic about it.

Thanks!

lebedev.ri abandoned this revision.Nov 14 2018, 12:43 AM

Replaced by D54514