This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/CommandGuide/
-
CommandGuide/
-
llvm-exegesis.rst
-
test/tools/llvm-exegesis/X86/
-
tools/
-
llvm-exegesis/
-
X86/
-
analysis-cluster-stabilization.test
-
tools/llvm-exegesis/
-
llvm-exegesis/
-
lib/
-
Analysis.h
-
Analysis.cpp
-
BenchmarkResult.h
-
Clustering.h
12/12
Clustering.cpp
2/2
llvm-exegesis.cpp

Differential D58355

[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
ClosedPublic

Authored by lebedev.ri on Feb 18 2019, 8:04 AM.

Download Raw Diff

Details

Reviewers

courbet
gchatelet

Commits

rG69716394f3d6: [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
rL354441: [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)

Summary

Given an instruction Opcode, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction Opcode, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction Opcode may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every Opcode with benchmarks not in just one cluster, and move
*all* the benchmarks of said Opcode into one new unstable cluster per Opcode.

I have solved this by making ClusterId a bit field, adding a IsUnstable bit,
and introducing -analysis-display-unstable-clusters switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head)

clusters-old.html6 MBDownload

$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle         
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters)

clusters-new-all.html6 MBDownload

$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle         
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )

Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters

clusters-new-stable.html6 MBDownload

$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle         
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )

Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters

clusters-new-unstable.html46 KBDownload

$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle         
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )

This tells us that the actual -analysis-inconsistencies-output-file= outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes PR40715.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Feb 18 2019, 8:04 AM

Herald added subscribers: jdoerfert, tschuett. · View Herald TranscriptFeb 18 2019, 8:04 AM

courbet added inline comments.Feb 19 2019, 4:49 AM

tools/llvm-exegesis/lib/Clustering.cpp
169	"The list of opcodes that have more than one cluster".
173	Why not `const auto&` ?
193	`for (const size_t UnstableOpcode` for safety.
215	at least, not at most.
217	This for loop + CleanedPointIndices could be removed using `std::remove_if`: // Find which points should be moved to the new cluster. const auto it = std::remove_if(OldCluster.PointIndices.begin(), OldCluster.PointIndices.end(), [this, UnstableOpcode](size_t P){ return Points_[P].keyInstruction().getOpcode() == UnstableOpcode; }); // Move removed points to the new cluster: UnstableCluster.PointIndices.insert(it, OldCluster.PointIndices.end()); // Remove points form the old cluster. OldCluster.PointIndices.erase(it, OldCluster.PointIndices.end());
tools/llvm-exegesis/llvm-exegesis.cpp
444–450	nit: `InstrInfo`
445	Why not: `std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());` ?

Address most of @courbet's review notes.

tools/llvm-exegesis/lib/Clustering.cpp
173	Hm, i see all three variants within the codebase. I guess `const auto&` is better.
215	Sure. I have meant that any other assumption would be too optimistic, and we would end up reallocating if we guessed wrong. A guess of one will never be too optimistic, and won't cause reallocations. At worst, we will over-allocate a bit.
217	Hmm, are you sure? http://cpp.sh/7dfh5 If that is so, then that ^ should have printed `Textwithsomewhitespaces` but it prints `Textwithsomewhitespacesespaces`. https://en.cppreference.com/w/cpp/algorithm/remove Iterators pointing to an element between the new logical end and the physical end of the range are still dereferenceable, but the elements themselves have unspecified values (as per MoveAssignable post-condition). `unspecified values` is pretty self-explanatory..

courbet added inline comments.Feb 19 2019, 7:10 AM

tools/llvm-exegesis/lib/Clustering.cpp
217	Yes, sorry. `std::stable_partition` should work.

Address @courbet's review notes.

tools/llvm-exegesis/lib/Clustering.cpp
217	Aha. Not as simple as that snippet, but works.

lebedev.ri added inline comments.Feb 19 2019, 11:25 PM

tools/llvm-exegesis/lib/Clustering.cpp
217	Thinking about this a bit more, do we really care that "Relative order of the elements is preserved."? I don't think so. Only that the new order is deterministic. Can we just use [[ https://en.cppreference.com/w/cpp/algorithm/partition \| `std::partition` ]] instead?

lebedev.ri marked 2 inline comments as done.Feb 20 2019, 12:19 AM

lebedev.ri added inline comments.

tools/llvm-exegesis/lib/Clustering.cpp
217	Actually, hmm, i'm not confident `std::partition` is deterministic. Never mind.

courbet accepted this revision.Feb 20 2019, 12:43 AM

This revision is now accepted and ready to land.Feb 20 2019, 12:43 AM

Yay, thank you for the review!

Closed by commit rL354441: [llvm-exegesis] Opcode stabilization / reclusterization (PR40715) (authored by lebedevri). · Explain WhyFeb 20 2019, 1:13 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri mentioned this in D59539: [llvm-exegesis] Option to lobotomize dbscan (PR40880).Mar 20 2019, 5:26 AM

Revision Contents

Path

Size

docs/

CommandGuide/

llvm-exegesis.rst

7 lines

test/

tools/

llvm-exegesis/

X86/

analysis-cluster-stabilization.test

82 lines

tools/

llvm-exegesis/

lib/

5 lines

14 lines

2 lines

28 lines

101 lines

20 lines

Diff 187379

docs/CommandGuide/llvm-exegesis.rst

	Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	Specify the numPoints parameters to be used for DBSCAN clustering			Specify the numPoints parameters to be used for DBSCAN clustering
	(`analysis` mode).			(`analysis` mode).

	.. option:: -analysis-espilon=<dbscan epsilon parameter>			.. option:: -analysis-espilon=<dbscan epsilon parameter>

	Specify the numPoints parameters to be used for DBSCAN clustering			Specify the numPoints parameters to be used for DBSCAN clustering
	(`analysis` mode).			(`analysis` mode).

				.. option:: -analysis-display-unstable-clusters

				If there is more than one benchmark for an opcode, said benchmarks may end up
				not being clustered into the same cluster if the measured performance
				characteristics are different. by default all such opcodes are filtered out.
				This flag will instead show only such unstable opcodes.

	.. option:: -ignore-invalid-sched-class=false			.. option:: -ignore-invalid-sched-class=false

	If set, ignore instructions that do not have a sched class (class idx = 0).			If set, ignore instructions that do not have a sched class (class idx = 0).

	.. option:: -mcpu=<cpu name>			.. option:: -mcpu=<cpu name>

	If set, measure the cpu characteristics using the counters for this CPU. This			If set, measure the cpu characteristics using the counters for this CPU. This
	is useful when creating new sched models (the host CPU is unknown to LLVM).			is useful when creating new sched models (the host CPU is unknown to LLVM).

	EXIT STATUS			EXIT STATUS
	-----------			-----------

	:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is			:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is
	printed to standard error, and the tool returns a non 0 value.			printed to standard error, and the tool returns a non 0 value.

test/tools/llvm-exegesis/X86/analysis-cluster-stabilization.test

This file was added.

				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-clusters-output-file=- -analysis-epsilon=0.1 -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-CLUSTERS %s
				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-STABLE %s
				# RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-epsilon=0.5 -analysis-display-unstable-clusters -analysis-numpoints=1 \| FileCheck -check-prefixes=CHECK-INCONSISTENCIES-ALL,CHECK-INCONSISTENCIES-UNSTABLE %s

				# We have one ADD32rr measurement, and two measurements for SQRTSSr.
				# The ADD32rr measurement and one of the SQRTSSr measurements are identical,
				# and thus will be be in the same cluster. But the second SQRTSSr measurement
				# is different from the first SQRTSSr measurement, and thus it will be in it's
				# own cluster. We do reclusterization, and thus since there is more than one
				# measurement from SQRTSSr, and they are not in the same cluster, we move
				# all two SQRTSSr measurements into their own cluster, and mark it as unstable.
				# By default, we do not show such unstable clusters.
				# If told to show, we only show such unstable clusters.

				# CHECK-CLUSTERS: {{^}}cluster_id,opcode_name,config,sched_class,latency{{$}}
				# CHECK-CLUSTERS-NEXT: {{^}}0,
				# CHECK-CLUSTERS-SAME: ,90.00{{$}}
				# CHECK-CLUSTERS: {{^}}3,
				# CHECK-CLUSTERS-SAME: ,90.11{{$}}
				# CHECK-CLUSTERS-NEXT: {{^}}3,
				# CHECK-CLUSTERS-SAME: ,100.00{{$}}

				# CHECK-INCONSISTENCIES-STABLE: ADD32rr
				# CHECK-INCONSISTENCIES-STABLE-NOT: ADD32rr
				# CHECK-INCONSISTENCIES-STABLE-NOT: SQRTSSr

				# CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
				# CHECK-INCONSISTENCIES-UNSTABLE: SQRTSSr
				# CHECK-INCONSISTENCIES-UNSTABLE-NOT: SQRTSSr
				# CHECK-INCONSISTENCIES-UNSTABLE-NOT: ADD32rr

				---
				mode: latency
				key:
				instructions:
				- 'ADD32rr EDX EDX EAX'
				config: ''
				register_initial_values:
				- 'EDX=0x0'
				- 'EAX=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 10000
				measurements:
				- { key: latency, value: 90.0000, per_snippet_value: 90.0000 }
				error: ''
				info: Repeating a single implicitly serial instruction
				assembled_snippet: BA00000000B80000000001C201C201C201C201C201C201C201C201C201C201C201C201C201C201C201C2C3
				---
				mode: latency
				key:
				instructions:
				- 'SQRTSSr XMM11 XMM11'
				config: ''
				register_initial_values:
				- 'XMM11=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 10000
				measurements:
				- { key: latency, value: 90.1111, per_snippet_value: 90.1111 }
				error: ''
				info: Repeating a single explicitly serial instruction
				assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
				...
				---
				mode: latency
				key:
				instructions:
				- 'SQRTSSr XMM11 XMM11'
				config: ''
				register_initial_values:
				- 'XMM11=0x0'
				cpu_name: bdver2
				llvm_triple: x86_64-unknown-linux-gnu
				num_repetitions: 10000
				measurements:
				- { key: latency, value: 100, per_snippet_value: 100 }
				error: ''
				info: Repeating a single explicitly serial instruction
				assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C410F3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBF3450F51DBC3
				...

tools/llvm-exegesis/lib/Analysis.h

Show All 30 Lines

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

// A helper class to analyze benchmark results for a target.		// A helper class to analyze benchmark results for a target.
class Analysis {		class Analysis {
public:		public:
Analysis(const llvm::Target &Target,		Analysis(const llvm::Target &Target,
const InstructionBenchmarkClustering &Clustering);		std::unique_ptr<llvm::MCInstrInfo> InstrInfo,
		const InstructionBenchmarkClustering &Clustering,
		bool AnalysisDisplayUnstableOpcodes);

// Prints a csv of instructions for each cluster.		// Prints a csv of instructions for each cluster.
struct PrintClusters {};		struct PrintClusters {};
// Find potential errors in the scheduling information given measurements.		// Find potential errors in the scheduling information given measurements.
struct PrintSchedClassInconsistencies {};		struct PrintSchedClassInconsistencies {};

template <typename Pass> llvm::Error run(llvm::raw_ostream &OS) const;		template <typename Pass> llvm::Error run(llvm::raw_ostream &OS) const;

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	private:
llvm::MCObjectFileInfo ObjectFileInfo_;		llvm::MCObjectFileInfo ObjectFileInfo_;
std::unique_ptr<llvm::MCContext> Context_;		std::unique_ptr<llvm::MCContext> Context_;
std::unique_ptr<llvm::MCSubtargetInfo> SubtargetInfo_;		std::unique_ptr<llvm::MCSubtargetInfo> SubtargetInfo_;
std::unique_ptr<llvm::MCInstrInfo> InstrInfo_;		std::unique_ptr<llvm::MCInstrInfo> InstrInfo_;
std::unique_ptr<llvm::MCRegisterInfo> RegInfo_;		std::unique_ptr<llvm::MCRegisterInfo> RegInfo_;
std::unique_ptr<llvm::MCAsmInfo> AsmInfo_;		std::unique_ptr<llvm::MCAsmInfo> AsmInfo_;
std::unique_ptr<llvm::MCInstPrinter> InstPrinter_;		std::unique_ptr<llvm::MCInstPrinter> InstPrinter_;
std::unique_ptr<llvm::MCDisassembler> Disasm_;		std::unique_ptr<llvm::MCDisassembler> Disasm_;
		const bool AnalysisDisplayUnstableOpcodes_;
};		};

// Computes the idealized ProcRes Unit pressure. This is the expected		// Computes the idealized ProcRes Unit pressure. This is the expected
// distribution if the CPU scheduler can distribute the load as evenly as		// distribution if the CPU scheduler can distribute the load as evenly as
// possible.		// possible.
std::vector<std::pair<uint16_t, float>> computeIdealizedProcResPressure(		std::vector<std::pair<uint16_t, float>> computeIdealizedProcResPressure(
const llvm::MCSchedModel &SM,		const llvm::MCSchedModel &SM,
llvm::SmallVector<llvm::MCWriteProcResEntry, 8> WPRS);		llvm::SmallVector<llvm::MCWriteProcResEntry, 8> WPRS);

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H

tools/llvm-exegesis/lib/Analysis.cpp

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	void Analysis::printInstructionRowCsv(const size_t PointId,
const InstructionBenchmark &Point = Clustering_.getPoints()[PointId];		const InstructionBenchmark &Point = Clustering_.getPoints()[PointId];
writeClusterId<kEscapeCsv>(OS, Clustering_.getClusterIdForPoint(PointId));		writeClusterId<kEscapeCsv>(OS, Clustering_.getClusterIdForPoint(PointId));
OS << kCsvSep;		OS << kCsvSep;
writeSnippet<EscapeTag, kEscapeCsv>(OS, Point.AssembledSnippet, "; ");		writeSnippet<EscapeTag, kEscapeCsv>(OS, Point.AssembledSnippet, "; ");
OS << kCsvSep;		OS << kCsvSep;
writeEscaped<kEscapeCsv>(OS, Point.Key.Config);		writeEscaped<kEscapeCsv>(OS, Point.Key.Config);
OS << kCsvSep;		OS << kCsvSep;
assert(!Point.Key.Instructions.empty());		assert(!Point.Key.Instructions.empty());
const llvm::MCInst &MCI = Point.Key.Instructions[0];		const llvm::MCInst &MCI = Point.keyInstruction();
const unsigned SchedClassId = resolveSchedClassId(		const unsigned SchedClassId = resolveSchedClassId(
*SubtargetInfo_, InstrInfo_->get(MCI.getOpcode()).getSchedClass(), MCI);		*SubtargetInfo_, InstrInfo_->get(MCI.getOpcode()).getSchedClass(), MCI);

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
const llvm::MCSchedClassDesc *const SCDesc =		const llvm::MCSchedClassDesc *const SCDesc =
SubtargetInfo_->getSchedModel().getSchedClassDesc(SchedClassId);		SubtargetInfo_->getSchedModel().getSchedClassDesc(SchedClassId);
writeEscaped<kEscapeCsv>(OS, SCDesc->Name);		writeEscaped<kEscapeCsv>(OS, SCDesc->Name);
#else		#else
OS << SchedClassId;		OS << SchedClassId;
#endif		#endif
for (const auto &Measurement : Point.Measurements) {		for (const auto &Measurement : Point.Measurements) {
OS << kCsvSep;		OS << kCsvSep;
writeMeasurementValue<kEscapeCsv>(OS, Measurement.PerInstructionValue);		writeMeasurementValue<kEscapeCsv>(OS, Measurement.PerInstructionValue);
}		}
OS << "\n";		OS << "\n";
}		}

Analysis::Analysis(const llvm::Target &Target,		Analysis::Analysis(const llvm::Target &Target,
const InstructionBenchmarkClustering &Clustering)		std::unique_ptr<llvm::MCInstrInfo> InstrInfo,
: Clustering_(Clustering) {		const InstructionBenchmarkClustering &Clustering,
		bool AnalysisDisplayUnstableOpcodes)
		: Clustering_(Clustering), InstrInfo_(std::move(InstrInfo)),
		AnalysisDisplayUnstableOpcodes_(AnalysisDisplayUnstableOpcodes) {
if (Clustering.getPoints().empty())		if (Clustering.getPoints().empty())
return;		return;

const InstructionBenchmark &FirstPoint = Clustering.getPoints().front();		const InstructionBenchmark &FirstPoint = Clustering.getPoints().front();
InstrInfo_.reset(Target.createMCInstrInfo());
RegInfo_.reset(Target.createMCRegInfo(FirstPoint.LLVMTriple));		RegInfo_.reset(Target.createMCRegInfo(FirstPoint.LLVMTriple));
AsmInfo_.reset(Target.createMCAsmInfo(*RegInfo_, FirstPoint.LLVMTriple));		AsmInfo_.reset(Target.createMCAsmInfo(*RegInfo_, FirstPoint.LLVMTriple));
SubtargetInfo_.reset(Target.createMCSubtargetInfo(FirstPoint.LLVMTriple,		SubtargetInfo_.reset(Target.createMCSubtargetInfo(FirstPoint.LLVMTriple,
FirstPoint.CpuName, ""));		FirstPoint.CpuName, ""));
InstPrinter_.reset(Target.createMCInstPrinter(		InstPrinter_.reset(Target.createMCInstPrinter(
llvm::Triple(FirstPoint.LLVMTriple), 0 /default variant/, *AsmInfo_,		llvm::Triple(FirstPoint.LLVMTriple), 0 /default variant/, *AsmInfo_,
InstrInfo_, RegInfo_));		InstrInfo_, RegInfo_));

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	Analysis::makePointsPerSchedClass() const {
const auto &Points = Clustering_.getPoints();		const auto &Points = Clustering_.getPoints();
for (size_t PointId = 0, E = Points.size(); PointId < E; ++PointId) {		for (size_t PointId = 0, E = Points.size(); PointId < E; ++PointId) {
const InstructionBenchmark &Point = Points[PointId];		const InstructionBenchmark &Point = Points[PointId];
if (!Point.Error.empty())		if (!Point.Error.empty())
continue;		continue;
assert(!Point.Key.Instructions.empty());		assert(!Point.Key.Instructions.empty());
// FIXME: we should be using the tuple of classes for instructions in the		// FIXME: we should be using the tuple of classes for instructions in the
// snippet as key.		// snippet as key.
const llvm::MCInst &MCI = Point.Key.Instructions[0];		const llvm::MCInst &MCI = Point.keyInstruction();
unsigned SchedClassId = InstrInfo_->get(MCI.getOpcode()).getSchedClass();		unsigned SchedClassId = InstrInfo_->get(MCI.getOpcode()).getSchedClass();
const bool WasVariant = SchedClassId && SubtargetInfo_->getSchedModel()		const bool WasVariant = SchedClassId && SubtargetInfo_->getSchedModel()
.getSchedClassDesc(SchedClassId)		.getSchedClassDesc(SchedClassId)
->isVariant();		->isVariant();
SchedClassId = resolveSchedClassId(*SubtargetInfo_, SchedClassId, MCI);		SchedClassId = resolveSchedClassId(*SubtargetInfo_, SchedClassId, MCI);
const auto IndexIt = SchedClassIdToIndex.find(SchedClassId);		const auto IndexIt = SchedClassIdToIndex.find(SchedClassId);
if (IndexIt == SchedClassIdToIndex.end()) {		if (IndexIt == SchedClassIdToIndex.end()) {
// Create a new entry.		// Create a new entry.
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	for (const auto &RSCAndPoints : makePointsPerSchedClass()) {
if (!RSCAndPoints.RSC.SCDesc)		if (!RSCAndPoints.RSC.SCDesc)
continue;		continue;
// Bucket sched class points into sched class clusters.		// Bucket sched class points into sched class clusters.
std::vector<SchedClassCluster> SchedClassClusters;		std::vector<SchedClassCluster> SchedClassClusters;
for (const size_t PointId : RSCAndPoints.PointIds) {		for (const size_t PointId : RSCAndPoints.PointIds) {
const auto &ClusterId = Clustering_.getClusterIdForPoint(PointId);		const auto &ClusterId = Clustering_.getClusterIdForPoint(PointId);
if (!ClusterId.isValid())		if (!ClusterId.isValid())
continue; // Ignore noise and errors. FIXME: take noise into account ?		continue; // Ignore noise and errors. FIXME: take noise into account ?
		if (ClusterId.isUnstable() ^ AnalysisDisplayUnstableOpcodes_)
		continue; // Either display stable or unstable clusters only.
auto SchedClassClusterIt =		auto SchedClassClusterIt =
std::find_if(SchedClassClusters.begin(), SchedClassClusters.end(),		std::find_if(SchedClassClusters.begin(), SchedClassClusters.end(),
[ClusterId](const SchedClassCluster &C) {		[ClusterId](const SchedClassCluster &C) {
return C.id() == ClusterId;		return C.id() == ClusterId;
});		});
if (SchedClassClusterIt == SchedClassClusters.end()) {		if (SchedClassClusterIt == SchedClassClusters.end()) {
SchedClassClusters.emplace_back();		SchedClassClusters.emplace_back();
SchedClassClusterIt = std::prev(SchedClassClusters.end());		SchedClassClusterIt = std::prev(SchedClassClusters.end());
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

tools/llvm-exegesis/lib/BenchmarkResult.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	// The result of an instruction benchmark.			// The result of an instruction benchmark.
	struct InstructionBenchmark {			struct InstructionBenchmark {
	InstructionBenchmarkKey Key;			InstructionBenchmarkKey Key;
	enum ModeE { Unknown, Latency, Uops, InverseThroughput };			enum ModeE { Unknown, Latency, Uops, InverseThroughput };
	ModeE Mode;			ModeE Mode;
	std::string CpuName;			std::string CpuName;
	std::string LLVMTriple;			std::string LLVMTriple;
				// Which instruction is being benchmarked here?
				const llvm::MCInst &keyInstruction() const { return Key.Instructions[0]; }
	// The number of instructions inside the repeated snippet. For example, if a			// The number of instructions inside the repeated snippet. For example, if a
	// snippet of 3 instructions is repeated 4 times, this is 12.			// snippet of 3 instructions is repeated 4 times, this is 12.
	int NumRepetitions = 0;			int NumRepetitions = 0;
	// Note that measurements are per instruction.			// Note that measurements are per instruction.
	std::vector<BenchmarkMeasure> Measurements;			std::vector<BenchmarkMeasure> Measurements;
	std::string Error;			std::string Error;
	std::string Info;			std::string Info;
	std::vector<uint8_t> AssembledSnippet;			std::vector<uint8_t> AssembledSnippet;
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

tools/llvm-exegesis/lib/Clustering.h

Show All 9 Lines
/// Utilities to compute benchmark result clusters.		/// Utilities to compute benchmark result clusters.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H		#ifndef LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H
#define LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H		#define LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H

#include "BenchmarkResult.h"		#include "BenchmarkResult.h"
		#include "llvm/ADT/Optional.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
		#include <limits>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

class InstructionBenchmarkClustering {		class InstructionBenchmarkClustering {
public:		public:
// Clusters `Points` using DBSCAN with the given parameters. See the cc file		// Clusters `Points` using DBSCAN with the given parameters. See the cc file
// for more explanations on the algorithm.		// for more explanations on the algorithm.
static llvm::Expected<InstructionBenchmarkClustering>		static llvm::Expected<InstructionBenchmarkClustering>
create(const std::vector<InstructionBenchmark> &Points, size_t MinPts,		create(const std::vector<InstructionBenchmark> &Points, size_t MinPts,
double Epsilon);		double Epsilon, llvm::Optional<unsigned> NumOpcodes = llvm::None);

class ClusterId {		class ClusterId {
public:		public:
static ClusterId noise() { return ClusterId(kNoise); }		static ClusterId noise() { return ClusterId(kNoise); }
static ClusterId error() { return ClusterId(kError); }		static ClusterId error() { return ClusterId(kError); }
static ClusterId makeValid(size_t Id) { return ClusterId(Id); }		static ClusterId makeValid(size_t Id) { return ClusterId(Id); }
ClusterId() : Id_(kUndef) {}		static ClusterId makeValidUnstable(size_t Id) {
		return ClusterId(Id, /IsUnstable=/true);
		}

		ClusterId() : Id_(kUndef), IsUnstable_(false) {}

		// Compare id's, ignoring the 'unstability' bit.
bool operator==(const ClusterId &O) const { return Id_ == O.Id_; }		bool operator==(const ClusterId &O) const { return Id_ == O.Id_; }
bool operator<(const ClusterId &O) const { return Id_ < O.Id_; }		bool operator<(const ClusterId &O) const { return Id_ < O.Id_; }

bool isValid() const { return Id_ <= kMaxValid; }		bool isValid() const { return Id_ <= kMaxValid; }
bool isUndef() const { return Id_ == kUndef; }		bool isUnstable() const { return IsUnstable_; }
bool isNoise() const { return Id_ == kNoise; }		bool isNoise() const { return Id_ == kNoise; }
bool isError() const { return Id_ == kError; }		bool isError() const { return Id_ == kError; }
		bool isUndef() const { return Id_ == kUndef; }

// Precondition: isValid().		// Precondition: isValid().
size_t getId() const {		size_t getId() const {
assert(isValid());		assert(isValid());
return Id_;		return Id_;
}		}

private:		private:
explicit ClusterId(size_t Id) : Id_(Id) {}		ClusterId(size_t Id, bool IsUnstable = false)
		: Id_(Id), IsUnstable_(IsUnstable) {}

static constexpr const size_t kMaxValid =		static constexpr const size_t kMaxValid =
std::numeric_limits<size_t>::max() - 4;		(std::numeric_limits<size_t>::max() >> 1) - 4;
static constexpr const size_t kNoise = kMaxValid + 1;		static constexpr const size_t kNoise = kMaxValid + 1;
static constexpr const size_t kError = kMaxValid + 2;		static constexpr const size_t kError = kMaxValid + 2;
static constexpr const size_t kUndef = kMaxValid + 3;		static constexpr const size_t kUndef = kMaxValid + 3;
size_t Id_;
		size_t Id_ : (std::numeric_limits<size_t>::digits - 1);
		size_t IsUnstable_ : 1;
};		};
		static_assert(sizeof(ClusterId) == sizeof(size_t), "should be a bit field.");

struct Cluster {		struct Cluster {
Cluster() = delete;		Cluster() = delete;
explicit Cluster(const ClusterId &Id) : Id(Id) {}		explicit Cluster(const ClusterId &Id) : Id(Id) {}

const ClusterId Id;		const ClusterId Id;
// Indices of benchmarks within the cluster.		// Indices of benchmarks within the cluster.
std::vector<int> PointIndices;		std::vector<int> PointIndices;
Show All 27 Lines	for (size_t I = 0, E = P.size(); I < E; ++I) {
DistanceSquared += Diff * Diff;		DistanceSquared += Diff * Diff;
}		}
return DistanceSquared <= EpsilonSquared_;		return DistanceSquared <= EpsilonSquared_;
}		}

private:		private:
InstructionBenchmarkClustering(		InstructionBenchmarkClustering(
const std::vector<InstructionBenchmark> &Points, double EpsilonSquared);		const std::vector<InstructionBenchmark> &Points, double EpsilonSquared);

llvm::Error validateAndSetup();		llvm::Error validateAndSetup();
void dbScan(size_t MinPts);		void dbScan(size_t MinPts);
		void stabilize(unsigned NumOpcodes);
void rangeQuery(size_t Q, std::vector<size_t> &Scratchpad) const;		void rangeQuery(size_t Q, std::vector<size_t> &Scratchpad) const;

const std::vector<InstructionBenchmark> &Points_;		const std::vector<InstructionBenchmark> &Points_;
const double EpsilonSquared_;		const double EpsilonSquared_;
int NumDimensions_ = 0;		int NumDimensions_ = 0;
// ClusterForPoint_[P] is the cluster id for Points[P].		// ClusterForPoint_[P] is the cluster id for Points[P].
std::vector<ClusterId> ClusterIdForPoint_;		std::vector<ClusterId> ClusterIdForPoint_;
std::vector<Cluster> Clusters_;		std::vector<Cluster> Clusters_;
Cluster NoiseCluster_;		Cluster NoiseCluster_;
Cluster ErrorCluster_;		Cluster ErrorCluster_;
};		};

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_CLUSTERING_H

tools/llvm-exegesis/lib/Clustering.cpp

//===-- Clustering.cpp ------------------------------------------- C++ --===//		//===-- Clustering.cpp ------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Clustering.h"		#include "Clustering.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include <algorithm>
#include <string>		#include <string>
		#include <vector>

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

// The clustering problem has the following characteristics:		// The clustering problem has the following characteristics:
// (A) - Low dimension (dimensions are typically proc resource units,		// (A) - Low dimension (dimensions are typically proc resource units,
// typically < 10).		// typically < 10).
// (B) - Number of points : ~thousands (points are measurements of an MCInst)		// (B) - Number of points : ~thousands (points are measurements of an MCInst)
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {
// Add noisy points to noise cluster.		// Add noisy points to noise cluster.
for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {		for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
if (ClusterIdForPoint_[P].isNoise()) {		if (ClusterIdForPoint_[P].isNoise()) {
NoiseCluster_.PointIndices.push_back(P);		NoiseCluster_.PointIndices.push_back(P);
}		}
}		}
}		}

		// Given an instruction Opcode, we can make benchmarks (measurements) of the
		// instruction characteristics/performance. Then, to facilitate further analysis
		// we group the benchmarks with similar characteristics into clusters.
		// Now, this is all not entirely deterministic. Some instructions have variable
		// characteristics, depending on their arguments. And thus, if we do several
		// benchmarks of the same instruction Opcode, we may end up with different
		// performance characteristics measurements. And when we then do clustering,
		// these several benchmarks of the same instruction Opcode may end up being
		// clustered into different clusters. This is not great for further analysis.
		// We shall find every opcode with benchmarks not in just one cluster, and move
		// all the benchmarks of said Opcode into one new unstable cluster per Opcode.
		void InstructionBenchmarkClustering::stabilize(unsigned NumOpcodes) {
		// Given an instruction Opcode, in which clusters do benchmarks of this
		// instruction lie? Normally, they all should be in the same cluster.
		std::vector<llvm::SmallSet<ClusterId, 1>> OpcodeToClusterIDs;
		OpcodeToClusterIDs.resize(NumOpcodes);
		// The list of opcodes that have more than one cluster.
		courbetUnsubmitted Done Reply Inline Actions "The list of opcodes that have more than one cluster". courbet: "The list of opcodes that have more than one cluster".
		llvm::SetVector<size_t> UnstableOpcodes;
		// Populate OpcodeToClusterIDs and UnstableOpcodes data structures.
		assert(ClusterIdForPoint_.size() == Points_.size() && "size mismatch");
		for (const auto &Point : zip(Points_, ClusterIdForPoint_)) {
		courbetUnsubmitted Done Reply Inline Actions Why not `const auto&` ? courbet: Why not `const auto&` ?
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hm, i see all three variants within the codebase. I guess `const auto&` is better. lebedev.ri: Hm, i see all three variants within the codebase. I guess `const auto&` is better.
		const ClusterId &ClusterIdOfPoint = std::get<1>(Point);
		if (!ClusterIdOfPoint.isValid())
		continue; // Only process fully valid clusters.
		const unsigned Opcode = std::get<0>(Point).keyInstruction().getOpcode();
		assert(Opcode < NumOpcodes && "NumOpcodes is incorrect (too small)");
		llvm::SmallSet<ClusterId, 1> &ClusterIDsOfOpcode =
		OpcodeToClusterIDs[Opcode];
		ClusterIDsOfOpcode.insert(ClusterIdOfPoint);
		// Is there more than one ClusterID for this opcode?.
		if (ClusterIDsOfOpcode.size() < 2)
		continue; // If not, then at this moment this Opcode is stable.
		// Else let's record this unstable opcode for future use.
		UnstableOpcodes.insert(Opcode);
		}
		assert(OpcodeToClusterIDs.size() == NumOpcodes && "sanity check");

		// We know with how many [new] clusters we will end up with.
		const auto NewTotalClusterCount = Clusters_.size() + UnstableOpcodes.size();
		Clusters_.reserve(NewTotalClusterCount);
		for (const size_t UnstableOpcode : UnstableOpcodes.getArrayRef()) {
		courbetUnsubmitted Done Reply Inline Actions `for (const size_t UnstableOpcode` for safety. courbet: `for (const size_t UnstableOpcode` for safety.
		const llvm::SmallSet<ClusterId, 1> &ClusterIDs =
		OpcodeToClusterIDs[UnstableOpcode];
		assert(ClusterIDs.size() > 1 &&
		"Should only have Opcodes with more than one cluster.");

		// Create a new unstable cluster, one per Opcode.
		Clusters_.emplace_back(ClusterId::makeValidUnstable(Clusters_.size()));
		Cluster &UnstableCluster = Clusters_.back();
		// We will find at least one point in each of these clusters.
		UnstableCluster.PointIndices.reserve(ClusterIDs.size());

		// Go through every cluster which we recorded as containing benchmarks
		// of this UnstableOpcode. NOTE: we only recorded valid clusters.
		for (const ClusterId &CID : ClusterIDs) {
		assert(CID.isValid() &&
		"We only recorded valid clusters, not noise/error clusters.");
		Cluster &OldCluster = Clusters_[CID.getId()]; // Valid clusters storage.
		// Within each cluster, go through each point, and either move it to the
		// new unstable cluster, or 'keep' it.
		// In this case, we'll reshuffle OldCluster.PointIndices vector
		// so that all the points that are not for UnstableOpcode are first,
		// and the rest of the points is for the UnstableOpcode.
		courbetUnsubmitted Done Reply Inline Actions at least, not at most. courbet: at least, not at most.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Sure. I have meant that any other assumption would be too optimistic, and we would end up reallocating if we guessed wrong. A guess of one will never be too optimistic, and won't cause reallocations. At worst, we will over-allocate a bit. lebedev.ri: Sure. I have meant that any other assumption would be too optimistic, and we would end up…
		const auto it = std::stable_partition(
		OldCluster.PointIndices.begin(), OldCluster.PointIndices.end(),
		courbetUnsubmitted Done Reply Inline Actions This for loop + CleanedPointIndices could be removed using `std::remove_if`: // Find which points should be moved to the new cluster. const auto it = std::remove_if(OldCluster.PointIndices.begin(), OldCluster.PointIndices.end(), [this, UnstableOpcode](size_t P){ return Points_[P].keyInstruction().getOpcode() == UnstableOpcode; }); // Move removed points to the new cluster: UnstableCluster.PointIndices.insert(it, OldCluster.PointIndices.end()); // Remove points form the old cluster. OldCluster.PointIndices.erase(it, OldCluster.PointIndices.end()); courbet: This for loop + CleanedPointIndices could be removed using `std::remove_if`: ``` // Find which…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hmm, are you sure? http://cpp.sh/7dfh5 If that is so, then that ^ should have printed `Textwithsomewhitespaces` but it prints `Textwithsomewhitespacesespaces`. https://en.cppreference.com/w/cpp/algorithm/remove Iterators pointing to an element between the new logical end and the physical end of the range are still dereferenceable, but the elements themselves have unspecified values (as per MoveAssignable post-condition). `unspecified values` is pretty self-explanatory.. lebedev.ri: Hmm, are you sure? http://cpp.sh/7dfh5 If that is so, then that ^ should have printed…
		courbetUnsubmitted Done Reply Inline Actions Yes, sorry. `std::stable_partition` should work. courbet: Yes, sorry. `std::stable_partition` should work.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Aha. Not as simple as that snippet, but works. lebedev.ri: Aha. Not as simple as that snippet, but works.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Thinking about this a bit more, do we really care that "Relative order of the elements is preserved."? I don't think so. Only that the new order is deterministic. Can we just use [[ https://en.cppreference.com/w/cpp/algorithm/partition \| `std::partition` ]] instead? lebedev.ri: Thinking about this a bit more, do we really care that "Relative order of the elements is…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Actually, hmm, i'm not confident `std::partition` is deterministic. Never mind. lebedev.ri: Actually, hmm, i'm not confident `std::partition` is deterministic. Never mind.
		[this, UnstableOpcode](size_t P) {
		return Points_[P].keyInstruction().getOpcode() != UnstableOpcode;
		});
		assert(std::distance(it, OldCluster.PointIndices.end()) > 0 &&
		"Should have found at least one bad point");
		// Mark to-be-moved points as belonging to the new cluster.
		std::for_each(it, OldCluster.PointIndices.end(),
		[this, &UnstableCluster](size_t P) {
		ClusterIdForPoint_[P] = UnstableCluster.Id;
		});
		// Actually append to-be-moved points to the new cluster.
		UnstableCluster.PointIndices.insert(UnstableCluster.PointIndices.cend(),
		it, OldCluster.PointIndices.end());
		// And finally, remove "to-be-moved" points form the old cluster.
		OldCluster.PointIndices.erase(it, OldCluster.PointIndices.cend());
		// Now, the old cluster may end up being empty, but let's just keep it
		// in whatever state it ended up. Purging empty clusters isn't worth it.
		};
		assert(UnstableCluster.PointIndices.size() > 1 &&
		"New unstable cluster should end up with more than one point.");
		assert(UnstableCluster.PointIndices.size() >= ClusterIDs.size() &&
		"New unstable cluster should end up with no less points than there "
		"was clusters");
		}
		assert(Clusters_.size() == NewTotalClusterCount && "sanity check");
		}

llvm::Expected<InstructionBenchmarkClustering>		llvm::Expected<InstructionBenchmarkClustering>
InstructionBenchmarkClustering::create(		InstructionBenchmarkClustering::create(
const std::vector<InstructionBenchmark> &Points, const size_t MinPts,		const std::vector<InstructionBenchmark> &Points, const size_t MinPts,
const double Epsilon) {		const double Epsilon, llvm::Optional<unsigned> NumOpcodes) {
InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);		InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);
if (auto Error = Clustering.validateAndSetup()) {		if (auto Error = Clustering.validateAndSetup()) {
return std::move(Error);		return std::move(Error);
}		}
if (Clustering.ErrorCluster_.PointIndices.size() == Points.size()) {		if (Clustering.ErrorCluster_.PointIndices.size() == Points.size()) {
return Clustering; // Nothing to cluster.		return Clustering; // Nothing to cluster.
}		}

Clustering.dbScan(MinPts);		Clustering.dbScan(MinPts);

		if (NumOpcodes.hasValue())
		Clustering.stabilize(NumOpcodes.getValue());

return Clustering;		return Clustering;
}		}

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

tools/llvm-exegesis/llvm-exegesis.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

static cl::opt<std::string>		static cl::opt<std::string>
AnalysisClustersOutputFile("analysis-clusters-output-file", cl::desc(""),		AnalysisClustersOutputFile("analysis-clusters-output-file", cl::desc(""),
cl::init(""));		cl::init(""));
static cl::opt<std::string>		static cl::opt<std::string>
AnalysisInconsistenciesOutputFile("analysis-inconsistencies-output-file",		AnalysisInconsistenciesOutputFile("analysis-inconsistencies-output-file",
cl::desc(""), cl::init(""));		cl::desc(""), cl::init(""));

		static cl::opt<bool> AnalysisDisplayUnstableOpcodes(
		"analysis-display-unstable-clusters",
		cl::desc("if there is more than one benchmark for an opcode, said "
		"benchmarks may end up not being clustered into the same cluster "
		"if the measured performance characteristics are different. by "
		"default all such opcodes are filtered out. this flag will "
		"instead show only such unstable opcodes"),
		cl::init(false));

static cl::opt<std::string>		static cl::opt<std::string>
CpuName("mcpu",		CpuName("mcpu",
cl::desc(		cl::desc(
"cpu name to use for pfm counters, leave empty to autodetect"),		"cpu name to use for pfm counters, leave empty to autodetect"),
cl::init(""));		cl::init(""));


static ExitOnError ExitOnErr;		static ExitOnError ExitOnErr;

#ifdef LLVM_EXEGESIS_INITIALIZE_NATIVE_TARGET		#ifdef LLVM_EXEGESIS_INITIALIZE_NATIVE_TARGET
void LLVM_EXEGESIS_INITIALIZE_NATIVE_TARGET();		void LLVM_EXEGESIS_INITIALIZE_NATIVE_TARGET();
#endif		#endif

// Checks that only one of OpcodeNames, OpcodeIndex or SnippetsFile is provided,		// Checks that only one of OpcodeNames, OpcodeIndex or SnippetsFile is provided,
// and returns the opcode indices or {} if snippets should be read from		// and returns the opcode indices or {} if snippets should be read from
▲ Show 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	static void analysisMain() {

std::string Error;		std::string Error;
const auto *TheTarget =		const auto *TheTarget =
llvm::TargetRegistry::lookupTarget(Points[0].LLVMTriple, Error);		llvm::TargetRegistry::lookupTarget(Points[0].LLVMTriple, Error);
if (!TheTarget) {		if (!TheTarget) {
llvm::errs() << "unknown target '" << Points[0].LLVMTriple << "'\n";		llvm::errs() << "unknown target '" << Points[0].LLVMTriple << "'\n";
return;		return;
}		}

		std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());

		courbetUnsubmitted Done Reply Inline Actions Why not: `std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());` ? courbet: Why not: `std::unique_ptr<llvm::MCInstrInfo> InstrInfo(TheTarget->createMCInstrInfo());` ?
const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(		const auto Clustering = ExitOnErr(InstructionBenchmarkClustering::create(
Points, AnalysisNumPoints, AnalysisEpsilon));		Points, AnalysisNumPoints, AnalysisEpsilon, InstrInfo->getNumOpcodes()));

const Analysis Analyzer(*TheTarget, Clustering);		const Analysis Analyzer(*TheTarget, std::move(InstrInfo), Clustering,
		AnalysisDisplayUnstableOpcodes);
		courbetUnsubmitted Done Reply Inline Actions nit: `InstrInfo` courbet: nit: `InstrInfo`

maybeRunAnalysis<Analysis::PrintClusters>(Analyzer, "analysis clusters",		maybeRunAnalysis<Analysis::PrintClusters>(Analyzer, "analysis clusters",
AnalysisClustersOutputFile);		AnalysisClustersOutputFile);
maybeRunAnalysis<Analysis::PrintSchedClassInconsistencies>(		maybeRunAnalysis<Analysis::PrintSchedClassInconsistencies>(
Analyzer, "sched class consistency analysis",		Analyzer, "sched class consistency analysis",
AnalysisInconsistenciesOutputFile);		AnalysisInconsistenciesOutputFile);
}		}

Show All 20 Lines