This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Optimize ToProcess in dbScan
AbandonedPublic

Authored by MaskRay on Nov 12 2018, 12:56 PM.

Details

Summary

Use vector<char> Added + vector<size_t> ToProcess to replace SetVector ToProcess

We also check Added[P] to enqueueing a point more than once, which
also saves us a ClusterIdForPoint_[Q].isUndef() check.

Diff Detail

Event Timeline

MaskRay created this revision.Nov 12 2018, 12:56 PM

That is more or less how it ends up looking in D54418, just without abstraction.

tools/llvm-exegesis/lib/Clustering.cpp
128–129

I did try this, at different stages of the patchset.
It always measured to be slower than deque + pop front.

MaskRay marked an inline comment as done.Nov 12 2018, 1:40 PM
MaskRay added inline comments.
tools/llvm-exegesis/lib/Clustering.cpp
128–129

I moved the vector definition outside to avoid construction/destruction.

Added is not really necessary but it is faster than doing two checks: if an element is either Undef or Noise.

Looking forward further improvements!

MaskRay updated this revision to Diff 174637.Nov 19 2018, 10:09 AM
MaskRay marked an inline comment as done.

rebase

MaskRay updated this revision to Diff 174638.Nov 19 2018, 10:10 AM
MaskRay edited the summary of this revision. (Show Details)

.

I've rebased the revision.

Old:

% perf stat -r 10 ~/llvm/Release/bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /dev/null
4356.618418 task-clock (msec)
...

New:

% perf stat -r 10 ~/llvm/Release/bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/Downloads/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html > /dev/null
4225.986843 task-clock (msec)

MaskRay updated this revision to Diff 174641.Nov 19 2018, 10:25 AM

Use std::vector<size_t> ToProcess(NumPoints);

sorry messed it up while rebasing

courbet added inline comments.Nov 21 2018, 2:21 AM
tools/llvm-exegesis/lib/Clustering.cpp
120–126

All of this makes the dbscan implementation harder to follow. Please move all this to a separate struct:

// This is long-lived.
struct WorkSetStorage {
  public:
   WorkSetStorage(size_t NumPoints) : ToProcess(NumPoints), Added(NumPoints) {}

   void setProcessed(size_t P) { Added[P] = 1; }

  private:
   friend class WorkSet;
   std::vector<size_t> ToProcess;
   std::vector<char> Added;
};

// This is short-lived and replaces  llvm::SetVector<size_t, std::deque<size_t>> ToProcess;
class WorkSet {
  public:
   WorkSet(WorkSetStorage& Storage) : Storage(Storage), Head(0), Tail(0) {}

   // Inserts a point if not already processed.
   void insert(size_t P);

   // returns the first element from the work set and pops it.
   size_t pop();

  private:
   WorkSetStorage& Storage;
   size_t Head;
   size_t Tail;
};
136–145

did you just remove the rangeQuery() call on Q ? You have the neighbours from P here.

MaskRay updated this revision to Diff 174953.Nov 21 2018, 10:42 AM

Reinstated rangeQuery(Q, Neighbors)

rebase made me sad

MaskRay marked an inline comment as done.Nov 21 2018, 10:45 AM
MaskRay added inline comments.
tools/llvm-exegesis/lib/Clustering.cpp
120–126

Sorry I don't think making the abstraction is very necessary. The use of Head/Tail is very local and takes very few lines of code. The dbscan algorithm is in essence a variant of Dijkstra or FIFO Bellman-Ford. There is not much fantasy here. Keeping these lines inline is IMHO more readable.

lebedev.ri added inline comments.Nov 21 2018, 10:57 AM
tools/llvm-exegesis/lib/Clustering.cpp
120–126

(FWIW i have initially implemented this in D54418 with (but different) abstractions, before this review was submitted)

Friendly ping

This revision was not accepted when it landed; it landed in state Needs Review.Dec 14 2018, 12:30 AM
This revision was automatically updated to reflect the committed changes.
RKSimon reopened this revision.Dec 14 2018, 1:28 AM

Reverted at rL349139

I don't think we've reached a consensus here. Sorry for missing the ping.

tools/llvm-exegesis/lib/Clustering.cpp
120–126

The abstraction makes it clear that we're working with a set. I'm no saying we should have exactly this abstraction (actually D54418 had another approach), but clearly no abstraction makes it harder to read at least for me...

lebedev.ri added inline comments.Dec 14 2018, 2:22 AM
tools/llvm-exegesis/lib/Clustering.cpp
120–126

I can reopen that one, or change it to such manual deque.

MaskRay abandoned this revision.Dec 14 2018, 9:53 AM

Never mind. Someone asked me for reviews of the llvm-exegesis patch series. I said it wasn't necessarily to be done that way and thus created this one. Later that patch series got accepted and merged and they even mentioned . Yes, I don't understand why that patch series changed the same place back and forth.

MaskRay reclaimed this revision.Dec 14 2018, 10:11 AM

It looks I missed the unittest, sorry for that. And I somehow messed ToProcess[Tail++] = Q;. The test can be simply repaired. What should I do if I don't agree that an additional abstraction layer should be added?

MaskRay marked 2 inline comments as done.Dec 14 2018, 10:19 AM
MaskRay added inline comments.
tools/llvm-exegesis/lib/Clustering.cpp
120–126

I have said I don't think the abstraction is necessary. It just makes a simple algorithm complicated and harder to understand as developers have to read 2 separate places for the single usage.

MaskRay marked 2 inline comments as done.Dec 17 2018, 5:43 PM
MaskRay added inline comments.
tools/llvm-exegesis/lib/Clustering.cpp
120–126

Gentle ping.

Please take another look. I've even simplified the logic. The additional abstraction (which will be one-shot and takes tens of lines) will assuredly make the coder harder to read.

It looks I missed the unittest, sorry for that. And I somehow messed ToProcess[Tail++] = Q;.

An abstraction would also be testable independently of the algorithm that uses it, so it might not have had this issue.

tools/llvm-exegesis/lib/Clustering.cpp
120–126

Let me explain my rationale: As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set. I'll only look at the actual implementation if I want to understand more. With the implementation in this patch the set semantics are not obvious.

The pattern:

while (!set.empty()) {
  auto elem = set.top();
  set.pop();
  ...
}

will be *extremely* familiar to anyone.

On the other hand, in:

for (size_t Head = 0; Head < Tail; ++Head) {
  P = ToProcess[Head];
}

it's not trivial what's happening.

It looks I missed the unittest, sorry for that.

I don;t see the unit test in the patch.

It looks I missed the unittest, sorry for that.

I don;t see the unit test in the patch.

LLVMExegesisTests catches this. I didn't notice it.

MaskRay marked an inline comment as done.Dec 18 2018, 9:40 AM
MaskRay added inline comments.
tools/llvm-exegesis/lib/Clustering.cpp
120–126

As a developer, I read the code, the abstraction says it has set semantics, I don't read the implementation and carry on with reading the code keeping in mind that this is a set.

Thanks for expressing your argument. I would say the existing style may not be familiar to users as a point may be added to the queue multiple times, while the patch changes it to a strictly BFS manner.

for (size_t Head = 0; Head < Tail; ++Head) {
P = ToProcess[Head];
}

The for loop is not a foreign BFS style coined by me. It is well established and also used otherwhere.

https://github.com/llvm-mirror/llvm/tree/master/lib/Target/X86/X86ISelLowering.cpp#L18472
https://github.com/llvm-mirror/llvm/tree/master/lib/Transforms/Utils/LoopUtils.cpp#L470
https://github.com/llvm-mirror/llvm/tree/master/lib/CodeGen/LiveRangeCalc.cpp#L360

MaskRay marked 3 inline comments as done.Dec 18 2018, 9:41 AM

Looks like this again got committed in rL350035.

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

Ping? Is this going to proceed? Shall i resurrect D54514 in some form instead?

@MaskRay, can you help me understand why you decided to submit this when there are still disagreements in the review thread ?

Ping? Is this going to proceed? Shall i resurrect D54514 in some form instead?

@MaskRay, are you still planning to work on this ?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 12:08 AM

Yep, looks stuck. I'm still deeply unhappy with the deeply non-linear performance characteristics
with large number of samples, so i will try to come up with some another take on this..

MaskRay abandoned this revision.Feb 9 2019, 9:45 PM

What you've reverted is not the version shown in this diff. It included the ideas from Roman. I feel that I should no longer work on this.