This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/CommandGuide/
-
CommandGuide/
-
llvm-exegesis.rst
-
test/tools/llvm-exegesis/X86/
-
tools/
-
llvm-exegesis/
-
X86/
-
analysis-same-cluster-for-ops-in-different-sched-clusters.test
-
analysis-simplified-dbscan.test
-
tools/llvm-exegesis/
-
llvm-exegesis/
-
lib/
-
Clustering.h
-
Clustering.cpp
-
llvm-exegesis.cpp

Differential D59539

[llvm-exegesis] Option to lobotomize dbscan (PR40880)
AbandonedPublic

Authored by lebedev.ri on Mar 19 2019, 3:42 AM.

Download Raw Diff

Details

Reviewers

courbet
gchatelet

Summary

Let's suppose we have measured 4 different opcodes, and got: 0.5, 1.0, 1.5, 2.0.
Let's suppose we are using -analysis-clustering-epsilon=0.5.
By default now we will start processing the 0.5 point, find that 1.0 is it's neighbor, add them to a new cluster.
Then we will notice that 1.5 is a neighbor of 1.0 and add it to that same cluster.
Then we will notice that 2.0 is a neighbor of 1.5 and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify -analysis-inconsistency-epsilon=0.5, then no matter
the LLVM values this cluster will never match the LLVM values,
and thus this cluster will always be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the .yaml and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

(This will also help with opcode denoising/stabilization)

While the current default is good for abstract 'analyse clustering of measurements',
i'm not sure how often that is the actual goal, not 'compare llvm data with measurements'.
So i'm not sure what should be the default.

Thoughts?

This is yet another step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes PR40880.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Mar 19 2019, 3:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2019, 3:42 AM

Herald added subscribers: lldb-commits, jdoerfert, abidh, tschuett. · View Herald Transcript

lebedev.ri removed a project: Restricted Project.Mar 19 2019, 3:42 AM

lebedev.ri removed subscribers: tschuett, abidh, jdoerfert, lldb-commits.

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2019, 3:42 AM

lebedev.ri changed the repository for this revision from rLLDB LLDB to rL LLVM.Mar 19 2019, 3:43 AM

lebedev.ri removed a project: Restricted Project.

lebedev.ri added a reviewer: gchatelet.Mar 19 2019, 11:57 AM

This may or may not be a correct implementation of dbscan clustering algorithm.

Yes, that's what we want. This really is one cluster.

While the current default is good for abstract 'analyse clustering of measurements',

i'm not sure how often that is the actual goal, not 'compare llvm data with measurements'.
So i'm not sure what should be the default.

It think that's a fair point: I originally intended to sort of have one cluster be one sched class. But why do you need clusters at all if all you care about is the mismatch between each instruction and its scheduling data ?

Then why do you need clustering at all ? Why not create one cluster by instruction ?

In D59539#1436193, @courbet wrote:

This may or may not be a correct implementation of dbscan clustering algorithm.

Yes, that's what we want. This really is one cluster.

Yes, i do think that is the correct implementation of the algorithm.

While the current default is good for abstract 'analyse clustering of measurements',

i'm not sure how often that is the actual goal, not 'compare llvm data with measurements'.
So i'm not sure what should be the default.

It think that's a fair point: I originally intended to sort of have one cluster be one sched class.
But why do you need clusters at all if all you care about is the mismatch between each instruction and its scheduling data ?

Then why do you need clustering at all ? Why not create one cluster by instruction ?

No, the clustering is actually good.
If i only take one measurements set, and then dumbly change all the sched values to match,
then when i take a new measurements set, i will be "very" surprized that
it all of a sudden no longer matches the checked-in values.

Thus i take multiple measurements (3..10), and then if those measurements are different,
the "cluster stabilization" (D58355) kicks in and i only get non-flaky clusters by default.

In D59539#1436216, @lebedev.ri wrote:

I'm not saying that this *is* The solution, but i do believe the problem exists,
and i'm not sure what other approaches are there to solve it..

There is another bug there, i suppose, we really should sort the measurements we loaded.
E.g. because now given measurement 0.5, 1.0, 1.5, if they are in that order then
you get (i think, didn't check) two clusters: 0.5+1.0 and 1.5 (correct).
But if they are `1.0, 0.5`, 1.5, then that will be a single cluster..

In D59539#1438165, @lebedev.ri wrote:

In D59539#1436216, @lebedev.ri wrote:

I'm not saying that this *is* The solution, but i do believe the problem exists,
and i'm not sure what other approaches are there to solve it..

There is another bug there, i suppose, we really should sort the measurements we loaded.

Sorting for clustering only works in one dimension. As soon as you have several dimensions (e.g. uops), it becomes more complex (hence dbscan & others).

E.g. because now given measurement 0.5, 1.0, 1.5, if they are in that order then
you get (i think, didn't check) two clusters: 0.5+1.0 and 1.5 (correct).
But if they are `1.0, 0.5`, 1.5, then that will be a single cluster..

dbscan without lobotomization is mostly insensitive to point ordering (see https://reviews.llvm.org/D59693)

Yes

To make my suggestion clearer: if you just want to compare measured instruction data to its checked-in data, why not cluster by instruction opcode (to merge the several measurements for a given instruction) and run the analysis on that ?

Essentially: clusterid == opcode, you just do away with dbscan fully.

In D59539#1439375, @courbet wrote:

To make my suggestion clearer: if you just want to compare measured instruction data to its checked-in data, why not cluster by instruction opcode (to merge the several measurements for a given instruction) and run the analysis on that ?

Essentially: clusterid == opcode, you just do away with dbscan fully.

In D59539#1436216, @lebedev.ri wrote:

In D59539#1436193, @courbet wrote:

It think that's a fair point: I originally intended to sort of have one cluster be one sched class.
But why do you need clusters at all if all you care about is the mismatch between each instruction and its scheduling data ?

Then why do you need clustering at all ? Why not create one cluster by instruction ?

No, the clustering is actually good.
If i only take one measurements set, and then dumbly change all the sched values to match,
then when i take a new measurements set, i will be "very" surprized that
it all of a sudden no longer matches the checked-in values.

Thus i take multiple measurements (3..10), and then if those measurements are different,
the "cluster stabilization" (D58355) kicks in and i only get non-flaky clusters by default.

In D59539#1439380, @lebedev.ri wrote:

In D59539#1439375, @courbet wrote:

To make my suggestion clearer: if you just want to compare measured instruction data to its checked-in data, why not cluster by instruction opcode (to merge the several measurements for a given instruction) and run the analysis on that ?

Essentially: clusterid == opcode, you just do away with dbscan fully.

In D59539#1436216, @lebedev.ri wrote:

In D59539#1436193, @courbet wrote:

It think that's a fair point: I originally intended to sort of have one cluster be one sched class.
But why do you need clusters at all if all you care about is the mismatch between each instruction and its scheduling data ?

Then why do you need clustering at all ? Why not create one cluster by instruction ?

No, the clustering is actually good.
If i only take one measurements set, and then dumbly change all the sched values to match,
then when i take a new measurements set, i will be "very" surprized that
it all of a sudden no longer matches the checked-in values.

Thus i take multiple measurements (3..10), and then if those measurements are different,
the "cluster stabilization" (D58355) kicks in and i only get non-flaky clusters by default.

To reword: because if i do simple clustering by opcode, i will then need to add yet another
"stabilization" step - for each cluster, check that every measurement is neighbor of all
the other points in that cluster, and if they are not, mark cluster as noise.
(well, not every vs. every, just the lower/upper triangle excluding diagonal)

I can do that instead, maybe that would even better than this (no dependency on measurement ordering).

Any advice on how to proceed? I would really love to see this issue resolved :)

To reword: because if i do simple clustering by opcode, i will then need to add yet another
"stabilization" step - for each cluster, check that every measurement is neighbor of all
the other points in that cluster, and if they are not, mark cluster as noise.
(well, not every vs. every, just the lower/upper triangle excluding diagonal)

OK I see, thanks. To sum up my understanding: There are some areas where two clusters that should be separate are so noisy that there is a dense region connecting the two clusters, so even taking a small epsilon will not separate them. You want to reject these merged clusters based on the variance of the points within the cluster.

One suggestion I have is to compute the variance within the cluster (this can be done incrementally when adding points to the cluster) and reject clusters where the variance is more than a certain threshold. What do you think ?

I can do that instead, maybe that would even better than this (no dependency on measurement ordering).

Yes, I would really like to avoid the dependence on the ordering.

In D59539#1442729, @courbet wrote:

To reword: because if i do simple clustering by opcode, i will then need to add yet another
"stabilization" step - for each cluster, check that every measurement is neighbor of all
the other points in that cluster, and if they are not, mark cluster as noise.
(well, not every vs. every, just the lower/upper triangle excluding diagonal)

OK I see, thanks. To sum up my understanding: There are some areas where two clusters that should be separate are so noisy that there is a dense region connecting the two clusters, so even taking a small epsilon will not separate them. You want to reject these merged clusters based on the variance of the points within the cluster.

There are two situations, as far as i can tell:
(also, i'm only looking at the case with only a single dimension - latency/uops/rthrouthput, not a combination of measurements.)

Let's suppose we have measurements 0.5, 1.0, 1.5. If they are all from the same opcode, they will currently be put into the same cluster. This is unwanted (at least for me)
If you have measurements: 0.5(opcode a), 3.5(opcode a), they will be put into different clusters, which is, while correct, also not quite wanted, because they are from the same opcode. That should be "unstable" cluster. (it is unspecified why that happened, could be noisy measurements, could be cpu pipeline quirks, could be register fastpath, could be dependent on the reg values, etc etc)

The second issue standalone i have resolved with D58355, but the first issue remains.
So i'm trying to solve the first issue, without regressing the second issue.

One suggestion I have is to compute the variance within the cluster (this can be done incrementally when adding points to the cluster) and reject clusters where the variance is more than a certain threshold. What do you think ?

I can do that instead, maybe that would even better than this (no dependency on measurement ordering).

Yes, I would really like to avoid the dependence on the ordering.

Okay, i will try the "cluster by opcode + stabilize" approach, thanks!

Implemented alternative in D59820.

Diffusion mentioned this in rL357152: [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880).Mar 28 2019, 1:54 AM

lebedev.ri mentioned this in rGc2423fe6899a: [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880).Mar 28 2019, 1:54 AM

Revision Contents

Path

Size

docs/

CommandGuide/

llvm-exegesis.rst

7 lines

test/

tools/

llvm-exegesis/

X86/

analysis-same-cluster-for-ops-in-different-sched-clusters.test

46 lines

analysis-simplified-dbscan.test

234 lines

tools/

llvm-exegesis/

lib/

Clustering.h

4 lines

Clustering.cpp

16 lines

llvm-exegesis.cpp

12 lines

Diff 191264

docs/CommandGuide/llvm-exegesis.rst

	Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	Specify the numPoints parameters to be used for DBSCAN clustering			Specify the numPoints parameters to be used for DBSCAN clustering
	(`analysis` mode).			(`analysis` mode).

	.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>			.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>

	Specify the epsilon parameter used for clustering of benchmark points			Specify the epsilon parameter used for clustering of benchmark points
	(`analysis` mode).			(`analysis` mode).

				.. option:: -analysis-simplified-dbscan=false

				By default, full dbscan algo is used to create clusters. this option cripples
				the algo to not add the neighbors of the neighboring point being analysed to
				the cluster. this results in smaller, more stable clusters.
				(`analysis` mode).

	.. option:: -analysis-inconsistency-epsilon=<epsilon>			.. option:: -analysis-inconsistency-epsilon=<epsilon>

	Specify the epsilon parameter used for detection of when the cluster			Specify the epsilon parameter used for detection of when the cluster
	is different from the LLVM schedule profile values (`analysis` mode).			is different from the LLVM schedule profile values (`analysis` mode).

	.. option:: -analysis-display-unstable-clusters			.. option:: -analysis-display-unstable-clusters

	If there is more than one benchmark for an opcode, said benchmarks may end up			If there is more than one benchmark for an opcode, said benchmarks may end up
	Show All 18 Lines

test/tools/llvm-exegesis/X86/analysis-same-cluster-for-ops-in-different-sched-clusters.test

This file was added.

				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=10 -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-CLUSTERS %s

				# Normally BSR32rr is in WriteBSR and BSF32rr is in WriteBSF sched classes.
				# Here we check that if we have clustered these two measurements into the same
				# cluster, we don't split it per the sched classes into two.

				# CHECK-CLUSTERS: {{^}}cluster_id,opcode_name,config,sched_class,inverse_throughput{{$}}
				# CHECK-CLUSTERS-NEXT: {{^}}0,
				# CHECK-CLUSTERS-SAME: ,4.03{{$}}
				# CHECK-CLUSTERS-NEXT: {{^}}0,
				# CHECK-CLUSTERS-SAME: ,3.02{{$}}

				---
				mode: inverse_throughput
				key:
				instructions:
				- 'BSR32rr R11D EDI'
				config: ''
				register_initial_values:
				- 'EDI=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 4.03048, per_snippet_value: 4.03048 }
				error: ''
				info: instruction has no tied variables picking Uses different from defs
				assembled_snippet: BF00000000440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDF440FBDDFC3
				...
				---
				mode: inverse_throughput
				key:
				instructions:
				- 'BSF32rr EAX R14D'
				config: ''
				register_initial_values:
				- 'R14D=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 3.02186, per_snippet_value: 3.02186 }
				error: ''
				info: instruction has no tied variables picking Uses different from defs
				assembled_snippet: 415641BE00000000410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6410FBCC6415EC3
				...

test/tools/llvm-exegesis/X86/analysis-simplified-dbscan.test

This file was added.

				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.5 -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN-05 %s
				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.49 -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN-049 %s
				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.5 -analysis-numpoints=1 -analysis-simplified-dbscan \| FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05 %s
				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-clustering-epsilon=0.49 -analysis-numpoints=1 -analysis-simplified-dbscan \| FileCheck -check-prefixes=CHECK-CLUSTERS-ALL,CHECK-CLUSTERS-DBSCAN-049 %s

				# CHECK-CLUSTERS-ALL: {{^}}cluster_id,opcode_name,config,sched_class,inverse_throughput{{$}}

				# By default with -analysis-clustering-epsilon=0.5 everything ends up in the
				# same cluster, because each next point is a neighbour of the previous point.

				# CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-DBSCAN-05-SAME: ,1.00{{$}}
				# CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-DBSCAN-05-SAME: ,1.50{{$}}
				# CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-DBSCAN-05-SAME: ,2.00{{$}}
				# CHECK-CLUSTERS-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-DBSCAN-05-SAME: ,2.50{{$}}

				# With -analysis-clustering-epsilon=0.49 every point does into separate cluster.

				# CHECK-CLUSTERS-DBSCAN-049-NEXT: {{^}}0,
				# CHECK-CLUSTERS-DBSCAN-049-SAME: ,1.00{{$}}
				# CHECK-CLUSTERS-DBSCAN-049: {{^}}1,
				# CHECK-CLUSTERS-DBSCAN-049-SAME: ,1.50{{$}}
				# CHECK-CLUSTERS-DBSCAN-049: {{^}}2,
				# CHECK-CLUSTERS-DBSCAN-049-SAME: ,2.00{{$}}
				# CHECK-CLUSTERS-DBSCAN-049: {{^}}3,
				# CHECK-CLUSTERS-DBSCAN-049-SAME: ,2.50{{$}}

				# And with -analysis-clustering-epsilon=0.5, with partial "simplified" dbscan
				# we end up with two clusters. Original point has only one neighbour,
				# two points remain. And out of the two remaining points the second one is
				# the neighbour of the first one.

				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-SAME: ,1.00{{$}}
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-NEXT: {{^}}0,
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-SAME: ,1.50{{$}}
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05: {{^}}1,
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-SAME: ,2.00{{$}}
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-NEXT: {{^}}1,
				# CHECK-CLUSTERS-SIMPLIFIED-DBSCAN-05-SAME: ,2.50{{$}}

				# The "value" is manually specified, not measured.

				---
				mode: inverse_throughput
				key:
				instructions:
				- 'ROL8ri AH AH i_0x1'
				- 'ROL8ri AL AL i_0x1'
				- 'ROL8ri BH BH i_0x1'
				- 'ROL8ri BL BL i_0x1'
				- 'ROL8ri BPL BPL i_0x1'
				- 'ROL8ri CH CH i_0x1'
				- 'ROL8ri CL CL i_0x1'
				- 'ROL8ri DH DH i_0x1'
				- 'ROL8ri DIL DIL i_0x1'
				- 'ROL8ri DL DL i_0x1'
				- 'ROL8ri SIL SIL i_0x1'
				- 'ROL8ri R8B R8B i_0x1'
				- 'ROL8ri R9B R9B i_0x1'
				- 'ROL8ri R10B R10B i_0x1'
				- 'ROL8ri R11B R11B i_0x1'
				- 'ROL8ri R12B R12B i_0x1'
				- 'ROL8ri R13B R13B i_0x1'
				- 'ROL8ri R14B R14B i_0x1'
				- 'ROL8ri R15B R15B i_0x1'
				config: ''
				register_initial_values:
				- 'AH=0x0'
				- 'AL=0x0'
				- 'BH=0x0'
				- 'BL=0x0'
				- 'BPL=0x0'
				- 'CH=0x0'
				- 'CL=0x0'
				- 'DH=0x0'
				- 'DIL=0x0'
				- 'DL=0x0'
				- 'SIL=0x0'
				- 'R8B=0x0'
				- 'R9B=0x0'
				- 'R10B=0x0'
				- 'R11B=0x0'
				- 'R12B=0x0'
				- 'R13B=0x0'
				- 'R14B=0x0'
				- 'R15B=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 1.0000, per_snippet_value: 30.4026 }
				error: ''
				info: instruction has tied variables, using static renaming.
				assembled_snippet: 55415741564155415453B400B000B700B30040B500B500B100B60040B700B20040B60041B00041B10041B20041B30041B40041B50041B60041B700C0C401C0C001C0C701C0C30140C0C501C0C501C0C101C0C60140C0C701C0C20140C0C60141C0C00141C0C10141C0C20141C0C30141C0C40141C0C50141C0C60141C0C7015B415C415D415E415F5DC3
				...
				---
				mode: inverse_throughput
				key:
				instructions:
				- 'ROL16ri AX AX i_0x1'
				- 'ROL16ri BP BP i_0x1'
				- 'ROL16ri BX BX i_0x1'
				- 'ROL16ri CX CX i_0x1'
				- 'ROL16ri DI DI i_0x1'
				- 'ROL16ri DX DX i_0x1'
				- 'ROL16ri SI SI i_0x1'
				- 'ROL16ri R8W R8W i_0x1'
				- 'ROL16ri R9W R9W i_0x1'
				- 'ROL16ri R10W R10W i_0x1'
				- 'ROL16ri R11W R11W i_0x1'
				- 'ROL16ri R12W R12W i_0x1'
				- 'ROL16ri R13W R13W i_0x1'
				- 'ROL16ri R14W R14W i_0x1'
				- 'ROL16ri R15W R15W i_0x1'
				config: ''
				register_initial_values:
				- 'AX=0x0'
				- 'BP=0x0'
				- 'BX=0x0'
				- 'CX=0x0'
				- 'DI=0x0'
				- 'DX=0x0'
				- 'SI=0x0'
				- 'R8W=0x0'
				- 'R9W=0x0'
				- 'R10W=0x0'
				- 'R11W=0x0'
				- 'R12W=0x0'
				- 'R13W=0x0'
				- 'R14W=0x0'
				- 'R15W=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 1.5000, per_snippet_value: 30.154 }
				error: ''
				info: instruction has tied variables, using static renaming.
				assembled_snippet: 5541574156415541545366B8000066BD000066BB000066B9000066BF000066BA000066BE00006641B800006641B900006641BA00006641BB00006641BC00006641BD00006641BE00006641BF000066C1C00166C1C50166C1C30166C1C10166C1C70166C1C20166C1C6016641C1C0016641C1C1016641C1C2016641C1C3016641C1C4016641C1C5016641C1C6016641C1C70166C1C0015B415C415D415E415F5DC3
				...
				---
				mode: inverse_throughput
				key:
				instructions:
				- 'ROL32ri EAX EAX i_0x1'
				- 'ROL32ri EBP EBP i_0x1'
				- 'ROL32ri EBX EBX i_0x1'
				- 'ROL32ri ECX ECX i_0x1'
				- 'ROL32ri EDI EDI i_0x1'
				- 'ROL32ri EDX EDX i_0x1'
				- 'ROL32ri ESI ESI i_0x1'
				- 'ROL32ri R8D R8D i_0x1'
				- 'ROL32ri R9D R9D i_0x1'
				- 'ROL32ri R10D R10D i_0x1'
				- 'ROL32ri R11D R11D i_0x1'
				- 'ROL32ri R12D R12D i_0x1'
				- 'ROL32ri R13D R13D i_0x1'
				- 'ROL32ri R14D R14D i_0x1'
				- 'ROL32ri R15D R15D i_0x1'
				config: ''
				register_initial_values:
				- 'EAX=0x0'
				- 'EBP=0x0'
				- 'EBX=0x0'
				- 'ECX=0x0'
				- 'EDI=0x0'
				- 'EDX=0x0'
				- 'ESI=0x0'
				- 'R8D=0x0'
				- 'R9D=0x0'
				- 'R10D=0x0'
				- 'R11D=0x0'
				- 'R12D=0x0'
				- 'R13D=0x0'
				- 'R14D=0x0'
				- 'R15D=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 2.0000, per_snippet_value: 23.2762 }
				error: ''
				info: instruction has tied variables, using static renaming.
				assembled_snippet: 55415741564155415453B800000000BD00000000BB00000000B900000000BF00000000BA00000000BE0000000041B80000000041B90000000041BA0000000041BB0000000041BC0000000041BD0000000041BE0000000041BF00000000C1C001C1C501C1C301C1C101C1C701C1C201C1C60141C1C00141C1C10141C1C20141C1C30141C1C40141C1C50141C1C60141C1C701C1C0015B415C415D415E415F5DC3
				...
				---
				mode: inverse_throughput
				key:
				instructions:
				- 'ROL64ri RAX RAX i_0x1'
				- 'ROL64ri RBP RBP i_0x1'
				- 'ROL64ri RBX RBX i_0x1'
				- 'ROL64ri RCX RCX i_0x1'
				- 'ROL64ri RDI RDI i_0x1'
				- 'ROL64ri RDX RDX i_0x1'
				- 'ROL64ri RSI RSI i_0x1'
				- 'ROL64ri R8 R8 i_0x1'
				- 'ROL64ri R9 R9 i_0x1'
				- 'ROL64ri R10 R10 i_0x1'
				- 'ROL64ri R11 R11 i_0x1'
				- 'ROL64ri R12 R12 i_0x1'
				- 'ROL64ri R13 R13 i_0x1'
				- 'ROL64ri R14 R14 i_0x1'
				- 'ROL64ri R15 R15 i_0x1'
				config: ''
				register_initial_values:
				- 'RAX=0x0'
				- 'RBP=0x0'
				- 'RBX=0x0'
				- 'RCX=0x0'
				- 'RDI=0x0'
				- 'RDX=0x0'
				- 'RSI=0x0'
				- 'R8=0x0'
				- 'R9=0x0'
				- 'R10=0x0'
				- 'R11=0x0'
				- 'R12=0x0'
				- 'R13=0x0'
				- 'R14=0x0'
				- 'R15=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 1000000
				measurements:
				- { key: inverse_throughput, value: 2.5000, per_snippet_value: 26.2268 }
				error: ''
				info: instruction has tied variables, using static renaming.
				assembled_snippet: 5541574156415541545348B8000000000000000048BD000000000000000048BB000000000000000048B9000000000000000048BF000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BA000000000000000049BB000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000048C1C00148C1C50148C1C30148C1C10148C1C70148C1C20148C1C60149C1C00149C1C10149C1C20149C1C30149C1C40149C1C50149C1C60149C1C70148C1C0015B415C415D415E415F5DC3
				...

tools/llvm-exegesis/lib/Clustering.h

Show All 24 Lines

class InstructionBenchmarkClustering {		class InstructionBenchmarkClustering {
public:		public:
// Clusters `Points` using DBSCAN with the given parameters. See the cc file		// Clusters `Points` using DBSCAN with the given parameters. See the cc file
// for more explanations on the algorithm.		// for more explanations on the algorithm.
static llvm::Expected<InstructionBenchmarkClustering>		static llvm::Expected<InstructionBenchmarkClustering>
create(const std::vector<InstructionBenchmark> &Points, size_t MinPts,		create(const std::vector<InstructionBenchmark> &Points, size_t MinPts,
double AnalysisClusteringEpsilon,		double AnalysisClusteringEpsilon,
		bool AnalysisSimplifiedDbscan = false,
llvm::Optional<unsigned> NumOpcodes = llvm::None);		llvm::Optional<unsigned> NumOpcodes = llvm::None);

class ClusterId {		class ClusterId {
public:		public:
static ClusterId noise() { return ClusterId(kNoise); }		static ClusterId noise() { return ClusterId(kNoise); }
static ClusterId error() { return ClusterId(kError); }		static ClusterId error() { return ClusterId(kError); }
static ClusterId makeValid(size_t Id) { return ClusterId(Id); }		static ClusterId makeValid(size_t Id) { return ClusterId(Id); }
static ClusterId makeValidUnstable(size_t Id) {		static ClusterId makeValidUnstable(size_t Id) {
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	for (size_t I = 0, E = P.size(); I < E; ++I) {
DistanceSquared += Diff * Diff;		DistanceSquared += Diff * Diff;
}		}
return DistanceSquared <= EpsilonSquared_;		return DistanceSquared <= EpsilonSquared_;
}		}

private:		private:
InstructionBenchmarkClustering(		InstructionBenchmarkClustering(
const std::vector<InstructionBenchmark> &Points,		const std::vector<InstructionBenchmark> &Points,
double AnalysisClusteringEpsilonSquared);		double AnalysisClusteringEpsilonSquared, bool AnalysisSimplifiedDbscan);

llvm::Error validateAndSetup();		llvm::Error validateAndSetup();
void dbScan(size_t MinPts);		void dbScan(size_t MinPts);
void stabilize(unsigned NumOpcodes);		void stabilize(unsigned NumOpcodes);
void rangeQuery(size_t Q, std::vector<size_t> &Scratchpad) const;		void rangeQuery(size_t Q, std::vector<size_t> &Scratchpad) const;

const std::vector<InstructionBenchmark> &Points_;		const std::vector<InstructionBenchmark> &Points_;
const double AnalysisClusteringEpsilonSquared_;		const double AnalysisClusteringEpsilonSquared_;
		const bool AnalysisSimplifiedDbscan_;
int NumDimensions_ = 0;		int NumDimensions_ = 0;
// ClusterForPoint_[P] is the cluster id for Points[P].		// ClusterForPoint_[P] is the cluster id for Points[P].
std::vector<ClusterId> ClusterIdForPoint_;		std::vector<ClusterId> ClusterIdForPoint_;
std::vector<Cluster> Clusters_;		std::vector<Cluster> Clusters_;
Cluster NoiseCluster_;		Cluster NoiseCluster_;
Cluster ErrorCluster_;		Cluster ErrorCluster_;
};		};

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H

tools/llvm-exegesis/lib/Clustering.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (isNeighbour(PMeasurements, QMeasurements,
AnalysisClusteringEpsilonSquared_)) {		AnalysisClusteringEpsilonSquared_)) {
Neighbors.push_back(P);		Neighbors.push_back(P);
}		}
}		}
}		}

InstructionBenchmarkClustering::InstructionBenchmarkClustering(		InstructionBenchmarkClustering::InstructionBenchmarkClustering(
const std::vector<InstructionBenchmark> &Points,		const std::vector<InstructionBenchmark> &Points,
const double AnalysisClusteringEpsilonSquared)		const double AnalysisClusteringEpsilonSquared,
		const bool AnalysisSimplifiedDbscan)
: Points_(Points),		: Points_(Points),
AnalysisClusteringEpsilonSquared_(AnalysisClusteringEpsilonSquared),		AnalysisClusteringEpsilonSquared_(AnalysisClusteringEpsilonSquared),
		AnalysisSimplifiedDbscan_(AnalysisSimplifiedDbscan),
NoiseCluster_(ClusterId::noise()), ErrorCluster_(ClusterId::error()) {}		NoiseCluster_(ClusterId::noise()), ErrorCluster_(ClusterId::error()) {}

llvm::Error InstructionBenchmarkClustering::validateAndSetup() {		llvm::Error InstructionBenchmarkClustering::validateAndSetup() {
ClusterIdForPoint_.resize(Points_.size());		ClusterIdForPoint_.resize(Points_.size());
// Mark erroneous measurements out.		// Mark erroneous measurements out.
// All points must have the same number of dimensions, in the same order.		// All points must have the same number of dimensions, in the same order.
const std::vector<BenchmarkMeasure> *LastMeasurement = nullptr;		const std::vector<BenchmarkMeasure> *LastMeasurement = nullptr;
for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {		for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	while (!ToProcess.empty()) {
continue;		continue;
}		}
if (!ClusterIdForPoint_[Q].isUndef()) {		if (!ClusterIdForPoint_[Q].isUndef()) {
continue; // Previously processed.		continue; // Previously processed.
}		}
// Add Q to the current custer.		// Add Q to the current custer.
ClusterIdForPoint_[Q] = CurrentCluster.Id;		ClusterIdForPoint_[Q] = CurrentCluster.Id;
CurrentCluster.PointIndices.push_back(Q);		CurrentCluster.PointIndices.push_back(Q);
// And extend to the neighbors of Q if the region is dense enough.
		// Do we want to also consider the neighbors of this point?
		if (AnalysisSimplifiedDbscan_)
		continue; // No, we do not. Go to the next point in ToProcess.

		// Else, extend to the neighbors of Q if the region is dense enough.
rangeQuery(Q, Neighbors);		rangeQuery(Q, Neighbors);
if (Neighbors.size() + 1 >= MinPts) {		if (Neighbors.size() + 1 >= MinPts) {
ToProcess.insert(Neighbors.begin(), Neighbors.end());		ToProcess.insert(Neighbors.begin(), Neighbors.end());
}		}
}		}
}		}
// assert(Neighbors.capacity() == (Points_.size() - 1));		// assert(Neighbors.capacity() == (Points_.size() - 1));
// ^ True, but it is not quaranteed to be true in all the cases.		// ^ True, but it is not quaranteed to be true in all the cases.
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	assert(UnstableCluster.PointIndices.size() >= ClusterIDs.size() &&
"was clusters");		"was clusters");
}		}
assert(Clusters_.size() == NewTotalClusterCount && "sanity check");		assert(Clusters_.size() == NewTotalClusterCount && "sanity check");
}		}

llvm::Expected<InstructionBenchmarkClustering>		llvm::Expected<InstructionBenchmarkClustering>
InstructionBenchmarkClustering::create(		InstructionBenchmarkClustering::create(
const std::vector<InstructionBenchmark> &Points, const size_t MinPts,		const std::vector<InstructionBenchmark> &Points, const size_t MinPts,
const double AnalysisClusteringEpsilon,		const double AnalysisClusteringEpsilon, const bool AnalysisSimplifiedDbscan,
llvm::Optional<unsigned> NumOpcodes) {		llvm::Optional<unsigned> NumOpcodes) {
InstructionBenchmarkClustering Clustering(		InstructionBenchmarkClustering Clustering(
Points, AnalysisClusteringEpsilon * AnalysisClusteringEpsilon);		Points, AnalysisClusteringEpsilon * AnalysisClusteringEpsilon,
		AnalysisSimplifiedDbscan);
if (auto Error = Clustering.validateAndSetup()) {		if (auto Error = Clustering.validateAndSetup()) {
return std::move(Error);		return std::move(Error);
}		}
if (Clustering.ErrorCluster_.PointIndices.size() == Points.size()) {		if (Clustering.ErrorCluster_.PointIndices.size() == Points.size()) {
return Clustering; // Nothing to cluster.		return Clustering; // Nothing to cluster.
}		}

Clustering.dbScan(MinPts);		Clustering.dbScan(MinPts);
Show All 9 Lines

tools/llvm-exegesis/llvm-exegesis.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
static cl::opt<std::string>		static cl::opt<std::string>
BenchmarkFile("benchmarks-file",		BenchmarkFile("benchmarks-file",
cl::desc("File to read (analysis mode) or write "		cl::desc("File to read (analysis mode) or write "
"(latency/uops/inverse_throughput modes) benchmark "		"(latency/uops/inverse_throughput modes) benchmark "
"results. “-” uses stdin/stdout."),		"results. “-” uses stdin/stdout."),
cl::cat(Options), cl::init(""));		cl::cat(Options), cl::init(""));

static cl::opt<exegesis::InstructionBenchmark::ModeE> BenchmarkMode(		static cl::opt<exegesis::InstructionBenchmark::ModeE> BenchmarkMode(
"mode", cl::desc("the mode to run"), cl::cat(BenchmarkOptions),		"mode", cl::desc("the mode to run"), cl::cat(Options),
cl::values(clEnumValN(exegesis::InstructionBenchmark::Latency, "latency",		cl::values(clEnumValN(exegesis::InstructionBenchmark::Latency, "latency",
"Instruction Latency"),		"Instruction Latency"),
clEnumValN(exegesis::InstructionBenchmark::InverseThroughput,		clEnumValN(exegesis::InstructionBenchmark::InverseThroughput,
"inverse_throughput",		"inverse_throughput",
"Instruction Inverse Throughput"),		"Instruction Inverse Throughput"),
clEnumValN(exegesis::InstructionBenchmark::Uops, "uops",		clEnumValN(exegesis::InstructionBenchmark::Uops, "uops",
"Uop Decomposition"),		"Uop Decomposition"),
// When not asking for a specific benchmark mode,		// When not asking for a specific benchmark mode,
Show All 16 Lines	static cl::opt<unsigned> AnalysisNumPoints(
cl::desc("minimum number of points in an analysis cluster"),		cl::desc("minimum number of points in an analysis cluster"),
cl::cat(AnalysisOptions), cl::init(3));		cl::cat(AnalysisOptions), cl::init(3));

static cl::opt<float> AnalysisClusteringEpsilon(		static cl::opt<float> AnalysisClusteringEpsilon(
"analysis-clustering-epsilon",		"analysis-clustering-epsilon",
cl::desc("dbscan epsilon for benchmark point clustering"),		cl::desc("dbscan epsilon for benchmark point clustering"),
cl::cat(AnalysisOptions), cl::init(0.1));		cl::cat(AnalysisOptions), cl::init(0.1));

		static cl::opt<bool> AnalysisSimplifiedDbscan(
		"analysis-simplified-dbscan",
		cl::desc("by default, full dbscan algo is used to create clusters. this "
		"option cripples the algo to not add the neighbors of the "
		"neighboring point being analysed to the cluster. this results in "
		"smaller, more stable clusters"),
		cl::cat(AnalysisOptions), cl::init(false));

static cl::opt<float> AnalysisInconsistencyEpsilon(		static cl::opt<float> AnalysisInconsistencyEpsilon(
"analysis-inconsistency-epsilon",		"analysis-inconsistency-epsilon",
cl::desc("epsilon for detection of when the cluster is different from the "		cl::desc("epsilon for detection of when the cluster is different from the "
"LLVM schedule profile values"),		"LLVM schedule profile values"),
cl::cat(AnalysisOptions), cl::init(0.1));		cl::cat(AnalysisOptions), cl::init(0.1));

static cl::opt<std::string>		static cl::opt<std::string>
AnalysisClustersOutputFile("analysis-clusters-output-file", cl::desc(""),		AnalysisClustersOutputFile("analysis-clusters-output-file", cl::desc(""),
▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	if (!TheTarget) {
llvm::errs() << "unknown target '" << Points[0].LLVMTriple << "'\n";		llvm::errs() << "unknown target '" << Points[0].LLVMTriple << "'\n";
return;		return;
}		}

std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());		std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());

const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(		const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(
Points, AnalysisNumPoints, AnalysisClusteringEpsilon,		Points, AnalysisNumPoints, AnalysisClusteringEpsilon,
InstrInfo->getNumOpcodes()));		AnalysisSimplifiedDbscan, InstrInfo->getNumOpcodes()));

const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,		const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,
AnalysisInconsistencyEpsilon,		AnalysisInconsistencyEpsilon,
AnalysisDisplayUnstableOpcodes);		AnalysisDisplayUnstableOpcodes);

maybeRunAnalysis<Analysis::PrintClusters>(Analyzer, "analysis clusters",		maybeRunAnalysis<Analysis::PrintClusters>(Analyzer, "analysis clusters",
AnalysisClustersOutputFile);		AnalysisClustersOutputFile);
maybeRunAnalysis<Analysis::PrintSchedClassInconsistencies>(		maybeRunAnalysis<Analysis::PrintSchedClassInconsistencies>(
Show All 24 Lines