Download Raw Diff

Details

Reviewers

courbet
RKSimon
gchatelet
john.brawn
lebedev.ri
dexonsmith

Commits

rG96408bb04a03: Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
rG92537ccc7e21: [llvm-exegesis] Optimize ToProcess in dbScan
rL349139: Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
rL349136: [llvm-exegesis] Optimize ToProcess in dbScan

Summary

Use vector<char> Added + vector<size_t> ToProcess to replace SetVector ToProcess

We also check Added[P] to enqueueing a point more than once, which
also saves us a ClusterIdForPoint_[Q].isUndef() check.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 26032
Build 26031: arc lint + arc unit

Event Timeline

MaskRay created this revision.Nov 12 2018, 12:56 PM

Herald added subscribers: llvm-commits, tschuett. · View Herald TranscriptNov 12 2018, 12:56 PM

Harbormaster completed remote builds in B24901: Diff 173738.Nov 12 2018, 12:56 PM

Harbormaster completed remote builds in B24902: Diff 173739.Nov 12 2018, 12:57 PM

Harbormaster completed remote builds in B24903: Diff 173740.Nov 12 2018, 12:58 PM

That is more or less how it ends up looking in D54418, just without abstraction.

tools/llvm-exegesis/lib/Clustering.cpp
128–129	I did try this, at different stages of the patchset. It always measured to be slower than deque + pop front.

lebedev.ri mentioned this in D54445: [llvm-exegesis] BitVectorVector: provide assign() method, use it..Nov 12 2018, 1:31 PM

Optimize

Harbormaster completed remote builds in B24908: Diff 173753.Nov 12 2018, 1:38 PM

MaskRay marked an inline comment as done.Nov 12 2018, 1:40 PM

MaskRay added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
128–129	I moved the vector definition outside to avoid construction/destruction. `Added` is not really necessary but it is faster than doing two checks: if an element is either Undef or Noise.

lebedev.ri mentioned this in D54418: [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace SetVector with custom BitVectorVector.Nov 12 2018, 11:22 PM

lebedev.ri mentioned this in D54514: [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use manual std::deque<size_t> + std::vector<char> instead of SetVector..Nov 14 2018, 12:41 AM

Looking forward further improvements!

rebase

I've rebased the revision.

Old:

% perf stat -r 10 ~/llvm/Release/bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /dev/null
4356.618418 task-clock (msec)
...

New:

Use std::vector<size_t> ToProcess(NumPoints);

sorry messed it up while rebasing

Harbormaster completed remote builds in B25155: Diff 174641.Nov 19 2018, 10:25 AM

courbet added inline comments.Nov 21 2018, 2:21 AM

tools/llvm-exegesis/lib/Clustering.cpp

120–126

All of this makes the dbscan implementation harder to follow. Please move all this to a separate struct:

// This is long-lived.
struct WorkSetStorage {
  public:
   WorkSetStorage(size_t NumPoints) : ToProcess(NumPoints), Added(NumPoints) {}

   void setProcessed(size_t P) { Added[P] = 1; }

  private:
   friend class WorkSet;
   std::vector<size_t> ToProcess;
   std::vector<char> Added;
};

// This is short-lived and replaces  llvm::SetVector<size_t, std::deque<size_t>> ToProcess;
class WorkSet {
  public:
   WorkSet(WorkSetStorage& Storage) : Storage(Storage), Head(0), Tail(0) {}

   // Inserts a point if not already processed.
   void insert(size_t P);

   // returns the first element from the work set and pops it.
   size_t pop();

  private:
   WorkSetStorage& Storage;
   size_t Head;
   size_t Tail;
};

136–145

did you just remove the rangeQuery() call on Q ? You have the neighbours from P here.

Reinstated rangeQuery(Q, Neighbors)

rebase made me sad

Harbormaster completed remote builds in B25233: Diff 174953.Nov 21 2018, 10:42 AM

MaskRay marked an inline comment as done.Nov 21 2018, 10:45 AM

MaskRay added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	Sorry I don't think making the abstraction is very necessary. The use of `Head/Tail` is very local and takes very few lines of code. The dbscan algorithm is in essence a variant of Dijkstra or FIFO Bellman-Ford. There is not much fantasy here. Keeping these lines inline is IMHO more readable.

lebedev.ri added inline comments.Nov 21 2018, 10:57 AM

tools/llvm-exegesis/lib/Clustering.cpp
120–126	(FWIW i have initially implemented this in D54418 with (but different) abstractions, before this review was submitted)

Friendly ping

This revision was not accepted when it landed; it landed in state Needs Review.Dec 14 2018, 12:30 AM

Closed by commit rL349136: [llvm-exegesis] Optimize ToProcess in dbScan (authored by MaskRay). · Explain Why

This revision was automatically updated to reflect the committed changes.

Reverted at rL349139

I don't think we've reached a consensus here. Sorry for missing the ping.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	The abstraction makes it clear that we're working with a set. I'm no saying we should have exactly this abstraction (actually D54418 had another approach), but clearly no abstraction makes it harder to read at least for me...

lebedev.ri added inline comments.Dec 14 2018, 2:22 AM

tools/llvm-exegesis/lib/Clustering.cpp
120–126	I can reopen that one, or change it to such manual deque.

Never mind. Someone asked me for reviews of the llvm-exegesis patch series. I said it wasn't necessarily to be done that way and thus created this one. Later that patch series got accepted and merged and they even mentioned . Yes, I don't understand why that patch series changed the same place back and forth.

It looks I missed the unittest, sorry for that. And I somehow messed ToProcess[Tail++] = Q;. The test can be simply repaired. What should I do if I don't agree that an additional abstraction layer should be added?

Harbormaster completed remote builds in B26031: Diff 178246.Dec 14 2018, 10:12 AM

simplify

Harbormaster completed remote builds in B26032: Diff 178247.Dec 14 2018, 10:15 AM

MaskRay marked 2 inline comments as done.Dec 14 2018, 10:19 AM

MaskRay added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	I have said I don't think the abstraction is necessary. It just makes a simple algorithm complicated and harder to understand as developers have to read 2 separate places for the single usage.

MaskRay added a reviewer: dexonsmith.Dec 14 2018, 8:36 PM

MaskRay marked 2 inline comments as done.Dec 17 2018, 5:43 PM

MaskRay added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	Gentle ping. Please take another look. I've even simplified the logic. The additional abstraction (which will be one-shot and takes tens of lines) will assuredly make the coder harder to read.

In D54442#1331367, @MaskRay wrote:

It looks I missed the unittest, sorry for that. And I somehow messed ToProcess[Tail++] = Q;.

An abstraction would also be testable independently of the algorithm that uses it, so it might not have had this issue.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	Let me explain my rationale: As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set. I'll only look at the actual implementation if I want to understand more. With the implementation in this patch the set semantics are not obvious. The pattern: while (!set.empty()) { auto elem = set.top(); set.pop(); ... } will be extremely familiar to anyone. On the other hand, in: for (size_t Head = 0; Head < Tail; ++Head) { P = ToProcess[Head]; } it's not trivial what's happening.

In D54442#1331367, @MaskRay wrote:

It looks I missed the unittest, sorry for that.

I don;t see the unit test in the patch.

In D54442#1334641, @courbet wrote:

In D54442#1331367, @MaskRay wrote:

It looks I missed the unittest, sorry for that.

I don;t see the unit test in the patch.

LLVMExegesisTests catches this. I didn't notice it.

MaskRay marked an inline comment as done.Dec 18 2018, 9:40 AM

MaskRay added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
120–126	As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set. Thanks for expressing your argument. I would say the existing style may not be familiar to users as a point may be added to the queue multiple times, while the patch changes it to a strictly BFS manner. for (size_t Head = 0; Head < Tail; ++Head) { P = ToProcess[Head]; } The for loop is not a foreign BFS style coined by me. It is well established and also used otherwhere. https://github.com/llvm-mirror/llvm/tree/master/lib/Target/X86/X86ISelLowering.cpp#L18472 https://github.com/llvm-mirror/llvm/tree/master/lib/Transforms/Utils/LoopUtils.cpp#L470 https://github.com/llvm-mirror/llvm/tree/master/lib/CodeGen/LiveRangeCalc.cpp#L360

MaskRay marked 3 inline comments as done.Dec 18 2018, 9:41 AM

Looks like this again got committed in rL350035.

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

In D54442#1343382, @courbet wrote:

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

Ping? Is this going to proceed? Shall i resurrect D54514 in some form instead?

In D54442#1377373, @lebedev.ri wrote:

In D54442#1343382, @courbet wrote:

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

Ping? Is this going to proceed? Shall i resurrect D54514 in some form instead?

@MaskRay, are you still planning to work on this ?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 12:08 AM

Yep, looks stuck. I'm still deeply unhappy with the deeply non-linear performance characteristics
with large number of samples, so i will try to come up with some another take on this..

What you've reverted is not the version shown in this diff. It included the ideas from Roman. I feel that I should no longer work on this.

Diff 178247

tools/llvm-exegesis/lib/Clustering.cpp

//===-- Clustering.cpp ------------------------------------------- C++ --===//		//===-- Clustering.cpp ------------------------------------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Clustering.h"		#include "Clustering.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include <string>		#include <string>

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

// The clustering problem has the following characteristics:		// The clustering problem has the following characteristics:
// (A) - Low dimension (dimensions are typically proc resource units,		// (A) - Low dimension (dimensions are typically proc resource units,
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	llvm::Error InstructionBenchmarkClustering::validateAndSetup() {
}		}
if (LastMeasurement) {		if (LastMeasurement) {
NumDimensions_ = LastMeasurement->size();		NumDimensions_ = LastMeasurement->size();
}		}
return llvm::Error::success();		return llvm::Error::success();
}		}

void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {		void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {
std::vector<size_t> Neighbors; // Persistent buffer to avoid allocs.		const size_t NumPoints = Points_.size();
for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
		// Persistent buffers to avoid allocs.
		std::vector<size_t> Neighbors;
		std::vector<size_t> ToProcess(NumPoints);
		std::vector<char> Added(NumPoints);

		for (size_t P = 0; P < NumPoints; ++P) {
if (!ClusterIdForPoint_[P].isUndef())		if (!ClusterIdForPoint_[P].isUndef())
continue; // Previously processed in inner loop.		continue; // Previously processed in inner loop.
rangeQuery(P, Neighbors);		rangeQuery(P, Neighbors);
if (Neighbors.size() + 1 < MinPts) { // Density check.		if (Neighbors.size() + 1 < MinPts) { // Density check.
// The region around P is not dense enough to create a new cluster, mark		// The region around P is not dense enough to create a new cluster, mark
// as noise for now.		// as noise for now.
ClusterIdForPoint_[P] = ClusterId::noise();		ClusterIdForPoint_[P] = ClusterId::noise();
continue;		continue;
}		}

// Create a new cluster, add P.		// Create a new cluster, add P.
Clusters_.emplace_back(ClusterId::makeValid(Clusters_.size()));		Clusters_.emplace_back(ClusterId::makeValid(Clusters_.size()));
Cluster &CurrentCluster = Clusters_.back();		Cluster &CurrentCluster = Clusters_.back();
ClusterIdForPoint_[P] = CurrentCluster.Id; /* Label initial point */		ClusterIdForPoint_[P] = CurrentCluster.Id; /* Label initial point */
CurrentCluster.PointIndices.push_back(P);		CurrentCluster.PointIndices.push_back(P);
		Added[P] = 1;

// Process P's neighbors.		// Process P's neighbors.
llvm::SetVector<size_t, std::deque<size_t>> ToProcess;		size_t Tail = 0;
ToProcess.insert(Neighbors.begin(), Neighbors.end());		for (size_t Q : Neighbors)
while (!ToProcess.empty()) {		if (!Added[Q]) {
		ToProcess[Tail++] = Q;
		Added[Q] = 1;
		}
		for (size_t Head = 0; Head < Tail; ++Head) {
		courbetUnsubmitted Not Done Reply Inline Actions All of this makes the dbscan implementation harder to follow. Please move all this to a separate struct: // This is long-lived. struct WorkSetStorage { public: WorkSetStorage(size_t NumPoints) : ToProcess(NumPoints), Added(NumPoints) {} void setProcessed(size_t P) { Added[P] = 1; } private: friend class WorkSet; std::vector<size_t> ToProcess; std::vector<char> Added; }; // This is short-lived and replaces llvm::SetVector<size_t, std::deque<size_t>> ToProcess; class WorkSet { public: WorkSet(WorkSetStorage& Storage) : Storage(Storage), Head(0), Tail(0) {} // Inserts a point if not already processed. void insert(size_t P); // returns the first element from the work set and pops it. size_t pop(); private: WorkSetStorage& Storage; size_t Head; size_t Tail; }; courbet: All of this makes the dbscan implementation harder to follow. Please move all this to a…
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Sorry I don't think making the abstraction is very necessary. The use of `Head/Tail` is very local and takes very few lines of code. The dbscan algorithm is in essence a variant of Dijkstra or FIFO Bellman-Ford. There is not much fantasy here. Keeping these lines inline is IMHO more readable. MaskRay: Sorry I don't think making the abstraction is very necessary. The use of `Head/Tail` is very…
		courbetUnsubmitted Done Reply Inline Actions The abstraction makes it clear that we're working with a set. I'm no saying we should have exactly this abstraction (actually D54418 had another approach), but clearly no abstraction makes it harder to read at least for me... courbet: The abstraction makes it clear that we're working with a set. I'm no saying we should have…
		lebedev.riUnsubmitted Done Reply Inline Actions I can reopen that one, or change it to such manual deque. lebedev.ri: I can reopen that one, or change it to such manual deque.
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Gentle ping. Please take another look. I've even simplified the logic. The additional abstraction (which will be one-shot and takes tens of lines) will assuredly make the coder harder to read. MaskRay: Gentle ping. Please take another look. I've even simplified the logic. The additional…
		lebedev.riUnsubmitted Done Reply Inline Actions (FWIW i have initially implemented this in D54418 with (but different) abstractions, before this review was submitted) lebedev.ri: (FWIW i have initially implemented this in D54418 with (but different) abstractions, before…
		MaskRayAuthorUnsubmitted Done Reply Inline Actions I have said I don't think the abstraction is necessary. It just makes a simple algorithm complicated and harder to understand as developers have to read 2 separate places for the single usage. MaskRay: I have said I don't think the abstraction is necessary. It just makes a simple algorithm…
		courbetUnsubmitted Not Done Reply Inline Actions Let me explain my rationale: As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set. I'll only look at the actual implementation if I want to understand more. With the implementation in this patch the set semantics are not obvious. The pattern: while (!set.empty()) { auto elem = set.top(); set.pop(); ... } will be extremely familiar to anyone. On the other hand, in: for (size_t Head = 0; Head < Tail; ++Head) { P = ToProcess[Head]; } it's not trivial what's happening. courbet: Let me explain my rationale: As a developer, I read the code, the abstraction says it has set…
		MaskRayAuthorUnsubmitted Done Reply Inline Actions As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set. Thanks for expressing your argument. I would say the existing style may not be familiar to users as a point may be added to the queue multiple times, while the patch changes it to a strictly BFS manner. for (size_t Head = 0; Head < Tail; ++Head) { P = ToProcess[Head]; } The for loop is not a foreign BFS style coined by me. It is well established and also used otherwhere. https://github.com/llvm-mirror/llvm/tree/master/lib/Target/X86/X86ISelLowering.cpp#L18472 https://github.com/llvm-mirror/llvm/tree/master/lib/Transforms/Utils/LoopUtils.cpp#L470 https://github.com/llvm-mirror/llvm/tree/master/lib/CodeGen/LiveRangeCalc.cpp#L360 MaskRay: > As a developer, I read the code, the abstraction says it has set semantics, I don't read the…
// Retrieve a point from the set.		// Retrieve a point from the set.
const size_t Q = *ToProcess.begin();		P = ToProcess[Head];
ToProcess.erase(ToProcess.begin());

		lebedev.riUnsubmitted Done Reply Inline Actions I did try this, at different stages of the patchset. It always measured to be slower than deque + pop front. lebedev.ri: I did try this, at different stages of the patchset. It always measured to be slower than deque…
		MaskRayAuthorUnsubmitted Not Done Reply Inline Actions I moved the vector definition outside to avoid construction/destruction. `Added` is not really necessary but it is faster than doing two checks: if an element is either Undef or Noise. MaskRay: I moved the vector definition outside to avoid construction/destruction. `Added` is not really…
if (ClusterIdForPoint_[Q].isNoise()) {		// Add P to the current custer.
// Change noise point to border point.		ClusterId CID = ClusterIdForPoint_[P];
ClusterIdForPoint_[Q] = CurrentCluster.Id;		ClusterIdForPoint_[P] = CurrentCluster.Id;
CurrentCluster.PointIndices.push_back(Q);		CurrentCluster.PointIndices.push_back(P);
		if (CID.isNoise())
continue;		continue;
}		assert(CID.isUndef());
if (!ClusterIdForPoint_[Q].isUndef()) {
continue; // Previously processed.		// And extend to the neighbors of P if the region is dense enough.
}		rangeQuery(P, Neighbors);
// Add Q to the current custer.		if (Neighbors.size() + 1 >= MinPts)
ClusterIdForPoint_[Q] = CurrentCluster.Id;		for (size_t Q : Neighbors)
CurrentCluster.PointIndices.push_back(Q);		if (!Added[Q]) {
// And extend to the neighbors of Q if the region is dense enough.		ToProcess[Tail++] = Q;
rangeQuery(Q, Neighbors);		Added[Q] = 1;
if (Neighbors.size() + 1 >= MinPts) {
ToProcess.insert(Neighbors.begin(), Neighbors.end());
}		}
		courbetUnsubmitted Done Reply Inline Actions did you just remove the `rangeQuery()` call on `Q` ? You have the neighbours from `P` here. courbet: did you just remove the `rangeQuery()` call on `Q` ? You have the neighbours from `P` here.
}		}
}		}
// assert(Neighbors.capacity() == (Points_.size() - 1));		// assert(Neighbors.capacity() == (Points_.size() - 1));
// ^ True, but it is not quaranteed to be true in all the cases.		// ^ True, but it is not quaranteed to be true in all the cases.

// Add noisy points to noise cluster.		// Add noisy points to noise cluster.
for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {		for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
if (ClusterIdForPoint_[P].isNoise()) {		if (ClusterIdForPoint_[P].isNoise()) {
Show All 23 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Optimize ToProcess in dbScan
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 178247

tools/llvm-exegesis/lib/Clustering.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Optimize ToProcess in dbScanAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 178247

tools/llvm-exegesis/lib/Clustering.cpp

[llvm-exegesis] Optimize ToProcess in dbScan
AbandonedPublic