This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
ClosedPublic

Authored by lebedev.ri on Mar 26 2019, 7:36 AM.

Details

Summary

This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: 0.5, 1.0, 1.5, 2.0.
Let's suppose we are using -analysis-clustering-epsilon=0.5.
By default now we will start processing the 0.5 point, find that 1.0 is it's neighbor, add them to a new cluster.
Then we will notice that 1.5 is a neighbor of 1.0 and add it to that same cluster.
Then we will notice that 2.0 is a neighbor of 1.5 and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify -analysis-inconsistency-epsilon=0.5, then no matter
the LLVM values this cluster will never match the LLVM values,
and thus this cluster will always be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the .yaml and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is yet another step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes PR40880.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Mar 26 2019, 7:36 AM
courbet added inline comments.Mar 26 2019, 8:28 AM
tools/llvm-exegesis/lib/Clustering.cpp
56 ↗(On Diff #192272)

"are neighbours up to AnalysisClusteringEpsilon"

57 ↗(On Diff #192272)

This is O(N^2). You can do it in O(N): compute the cluster centroid (O(N)), then compute distance from each point to centroid (O(N)).

This relies on the fact that if there exists p and q such that d(p,q) > e, then either d(p, centroid) > e/2 or `d(q, centroid) > e/2.

Proof (ad absurdum):
Assume both d(p, centroid) <= e/2 and d(q, centroid) <= e/2. Then:

d(p, centroid)  + d(q, centroid) <= e

By symmetry:

d(p, centroid)  + d(centroid, q) <= e

By triangle inequality:

d (p, q) <= d(p, centroid)  + d(q, centroid) <= e
lebedev.ri marked 3 inline comments as done.

Apply premature optimizations.

Use llvm::all_of().

courbet added inline comments.Mar 27 2019, 2:13 AM
tools/llvm-exegesis/lib/Clustering.h
149 ↗(On Diff #192323)

You're not using PointIds in BenchmarkGroup, so please move back PointIds and getPointIds to SchedClassCluster.

At that point it makes more sense to call this SchedClassClusterCentroid and make it a member of SchedClassCluster, with no virtual inheritance and protected section. Something like:

class SchedClassClusterCentroid {
public:
  const std::vector<PerInstructionStats> &getStats() const { return Representative; }
  std::vector<BenchmarkMeasure> getAsPoint() const;
  void addPoint(ArrayRef<BenchmarkMeasure> Point);

private:
  // Measurement stats for the points in the SchedClassCluster.
  std::vector<PerInstructionStats> Representative;
};

class SchedClassCluster {
public:
  const InstructionBenchmarkClustering::ClusterId &id() const;
  const std::vector<size_t> &getPointIds() const;
  void addPoint(size_t PointId, void addPoint(size_t PointId, const InstructionBenchmarkClustering &Clustering);
  const SchedClassClusterCentroid& getCentroid() const { return Centroid; }
  bool measurementsMatch(...);

private:
  std::vector<size_t> PointIds;
  InstructionBenchmarkClustering::ClusterId ClusterId;
  SchedClassClusterCentroid Centroid;
};

Ad then only use SchedClassClusterCentroid in areAllNeighbours().

courbet added inline comments.Mar 27 2019, 2:57 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

Oops, I just realized that I proved the opposite direction of what I wanted. Two options:

  • We consider that this criterion is as good as the other one to decide that we should reject the cluster, or
  • We take another approach such as computing the bounding box of the cluster (O(N)), then compare its diagonal (distance between the two extremal points, O(1)) to the rejection threshold.
lebedev.ri marked an inline comment as done.

Yes, indeed, this layering is better, thank you.
It was pretty obvious, i hadn't had a clear moment to thought about it.

lebedev.ri added inline comments.Mar 27 2019, 3:05 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

This appears to be sufficient.

Though, i think we can replace getAsPoint() with getMinPoint()+getMaxPoint(),
and just compare that these Pmin and Pmax are neighbours up to AnalysisClusteringEpsilon? (not halved!)
Maybe that is even less controversial?

courbet added inline comments.Mar 27 2019, 3:10 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

Sounds perfect to me.

lebedev.ri added inline comments.Mar 27 2019, 3:31 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

Hm, i should have specified, i mostly only considered the current 1D case.
I did not think about the situation with more dimensions.

I believe that for the current 1D situation, the original brute-force approach,
this O(2*N) approach, and the O(N+1) approach, are all identical.
I'm not sure about 2D case, especially because it is presently theoretical, i can't test it.

Thinking about it more, i'm not fully convinced that the Pmin/Pmax solution would be correct for 2D.

I think we should keep this at least for now, unless don't believe that it is correct?
I don't expect it to be the performance bottleneck.

courbet added inline comments.Mar 27 2019, 7:28 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

I believe that for the current 1D situation, the original brute-force approach, this O(2*N) approach, and the O(N+1) approach, are all identical.

If there are D dimensions and N points, the original approach would be O(D*N^2) (all pairs of points * one distance computation for each pair), the second one O(D*N) (all points * accumulate each coordinate + all points * distance computation), and the third one O(D*N + D) (all points * {min+max computation over each coordinate} + distance computation between min bound and max bound).

I'm not sure about 2D case, especially because it is presently theoretical, i can't test it.

-mode=uops gives you a 9-dimentional vector (ports 0-7 + NumMicroOps)

Thinking about it more, i'm not fully convinced that the Pmin/Pmax solution would be correct for 2D.

I don't think there is really a "correct" way - eventually this is just a heuristic to mark clusters as noise, anything reasonable works :)

lebedev.ri added inline comments.Mar 27 2019, 8:08 AM
tools/llvm-exegesis/lib/Clustering.cpp
57 ↗(On Diff #192272)

Sorry, i lost this thread.

I don't think there is really a "correct" way - eventually this is just a heuristic to mark clusters as noise, anything reasonable works :)

Yes.

What i guess i'm trying to say is that this current implementation appears to be eps-compatible
with the brute-force approach. I'm not sure the min-max approach will be and i have not really
thought if it will be possible to make it compatible by auto-adjusting the eps.
More importantly, i'm not yet we want to switch this to min-max approach?

If there are remaining issues here, please specify so; thank you for looking!

courbet accepted this revision.Mar 28 2019, 12:48 AM
This revision is now accepted and ready to land.Mar 28 2019, 12:48 AM

@courbet thank you for the review!

This revision was automatically updated to reflect the committed changes.