This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-exegesis/
-
tools/
-
llvm-exegesis/
-
lib/
-
BenchmarkResult.h
2/2
BenchmarkRunner.h
9/9
BenchmarkRunner.cpp
-
LatencyBenchmarkRunner.h
21/21
LatencyBenchmarkRunner.cpp
-
PerfHelper.h
4/4
PerfHelper.cpp
-
Target.h
3/3
Target.cpp
-
llvm-exegesis.cpp

Differential D81050

[llvm-exegesis] Let Counter returns up to 16 entries.
ClosedPublic

Authored by oontvoo on Jun 2 2020, 8:17 PM.

Download Raw Diff

Details

Reviewers

ondrasej
courbet

Summary

LBR contains (up to) 16 entries for last x branches and the X86LBRCounter (from D77422) should be able to return all those.
Currently, it just returns the latest entry, which could lead to mis-leading measurements.
This patch aslo changes the LatencyBenchmarkRunner to accommodate multi-value readings.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

oontvoo created this revision.Jun 2 2020, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2020, 8:17 PM

Herald added subscribers: llvm-commits, mstojanovic, courbet. · View Herald Transcript

Harbormaster failed remote builds in B58846: Diff 268044!Jun 2 2020, 9:21 PM

courbet added inline comments.Jun 2 2020, 11:29 PM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
73	Please get rif of the magic number. Why 16 ? what about `Counter->numValues()` ?
110–114	Please split this off to `void accumulateCounterValues(ValueOrError.get(), CounterValues)`
111	[style] if (!ValueOrError) return ValueOrError.takeError(); ...
117	Why not `+=` ?
llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
33	`ComputeVariance` ?
43	This is missing a square.
66	So computing the min and stddev across values in `runAndMeasureMulti()` requires them to be measuring the same thing. From the documentation it's not clear to me what the values represent. Are they always homogeneous ? Can you give an example of what they are in the LBR case.
72	why not `std::numeric_limits<double>` ?
77	Technically you're computing a variance.
78	Because of short-circuiting, if `WithMinStdev.empty()`, then `CurStdev` will not be evaluated, and `CurStdev` will be zero. Then `Stdev` will be set to `0`, preventing any further updates.
79	Was this supposed to be the other way around ? Here we are selecting the largest stddev.
llvm/tools/llvm-exegesis/lib/PerfHelper.cpp
137–139	This returns a vector with `Count` elements set to zero. If this passes all tests then we are clearly missing tests :(

This looks good in general, but we should be careful about aggregating the values from the measurements (and aggregation when the counter returns multiple values). In particular for the LBR, we'd be losing interesting and potentially useful information by aggregating all the values into a single number.

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
72	This should be llvm::SmallVector<int64_t, N> with some small N (e.g. 4) to avoid memory allocations when there is just one return value. This is the case for virtually all counters except the LBR.
73	Consider calling this runAndSample().
llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
38	You could do const double Sum = std::accumulate(Values.begin(), Values.end(), 0.0); instead.
90	I'd prefer to have more flexibility about the numbers that are returned and reported by the tool. The original code collapsed the measurements to a single value (the minimum). This is useful when looking for lower bounds/optimistic numbers, but other processing would also make sense: the mean of the measurements - this might give a better idea of the actual performance when running in the loop. the list of all measured values - so that the user can analyze the distribution of the measurements by themselves (and go beyond the mean/variance). In particular for the LBR measurements: being able to see the raw timings of individual loop iterations is what makes that measurement method so appealing, and we'd lose all that information if we just took an aggregate value, be it a min or a mean. The min and mean from the LBR do have their value and I'd expect them to be more precise than the measurements over an unrolled loop, so ideally we'd have a way to see both. I think a good solution would be to add an argument to this method that determines how the values are aggregated over the measurements: min, mean, min variance, keep all. and over the contents of the returned vector: min, mean, keep all. This will mean more changes up the call stack, but it would give us a lot more flexibility in using the tool. What do you think?
llvm/tools/llvm-exegesis/lib/PerfHelper.cpp
121–122	You might want to check that there is at least one element.
131	This should be llvm::SmallVector<int64_t, some small N> - most counters return just a single value.

oontvoo added reviewers: ondrasej, courbet.Jun 3 2020, 6:32 AM

oontvoo marked an inline comment as done.Jun 3 2020, 6:39 AM

oontvoo added inline comments.

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
90	Yes, that's a good idea. I've thought about this a bit more and realised even keeping min-variance (of each read) could still be completely wrong. Imagine your benchmarked code has a number of distinct branches; then there is no reason to expect the cycles from these branches to correlate. The min-variance approach is only meaningful if it's the same code-path.

oontvoo marked an inline comment as not done.Jun 3 2020, 6:46 AM

Updated diff

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
66	Yes, they're supposed to be homogenous, representing the measurements of the same block sampled at fixed rate. For the LBR specifically: Each value in the vector is the number of elapsed cycles since last branch-retire When we take a sample, we have 16 of such values, representing the last 16 branches. If we have a loop whose body is a basic-block, then effectively these are the measurements for the last 16 iterations[0]. if the loop body has some branches then we'd need a different aggregation strategy. [0] I think this is where it's a bit "wrong" right now. We always only read the last 16 branches. What we probably want is to be able to have the BenchmarkRunner pause at fixed period and take a sample. (Just made a note so we could talk about this tomorrow @courbet @ondrasej )
77	¯\_(ツ)_/¯ indeed!

Harbormaster failed remote builds in B59915: Diff 270029!Jun 10 2020, 10:06 PM

Fix clang-tidy issues

Harbormaster failed remote builds in B59970: Diff 270129!Jun 11 2020, 8:14 AM

I've added a couple of style comment + one bigger comment on the aggregation of results from multiple runs/counter buffer.

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
56	NewValues should be a const reference, no?
65	This is probably not a big deal, but consider: const size_t NumValues = std::max(NewValues.size(), Result->size()); if (NumValues > Result->size()) Result->resize(NumValues, 0); for (size_t i = 0, End = NewValues.size(); i < End; ++i) { (*Result)[i] += NewValues[i]; } for unrolled + vectorized code.
73	s/reserved/Reserved/
87	I'd prefer not using assignment in function arguments - this might be easily overlooked or mistaken for a bug (assignment in place of equality comparison).
llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
41	std::pow is somewhat heavy-weight as it is not optimized for integer exponents, const double Delta = V - Mean; Ret += Delta * Delta; will likely be significantly faster.
53	A shorter way to write this: return std::accumulate( Values.begin(), Values.end(), std::numeric_limits<int64_t>::max(), [](int64_t A, int64_t B) { return A < B ? A : B; }); And similar for FindMax below.
81	Nit: Since we have WithMinVariance, consider renaming Variance to MinVariance, and CurVariance to just Variance.
82	Ideally, this should use move semantics (if they are supported by SmallVector).
83	This would update Variance (MinVariance with the rename proposed above) even if the variance increased. What you probably want is if (Variance < MinVariance) { WithMinVariance = *ExpectedCounterValues; MinVariancec = Variance; }
98	I'd still split this into two arguments: One that decides what happens with the return values of runAndSample() with {Concatenate, MinVariance}. And another one that decides what to do with the results of the previous step (Min, Max, Mean, return as is}. With the current MinVariance filtering in all cases, we're changing the behavior for scalar counters, where we might drop some values (and we might drop some values also in the LBR case).
llvm/tools/llvm-exegesis/lib/PerfHelper.cpp
155	This should also be llvm::SmallVector (please test with HAVE_LIBPFM undefined/false).

Fixed another clang-tidy issue

Harbormaster completed remote builds in B59991: Diff 270164.Jun 11 2020, 11:02 AM

Updated diff

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
98	Ah, I think we can infer how to accumulate the values returnt by runAndSample(). If the return vector has more than 1 element, then keep the set with min variance (because it wouldn't make sense to concat all the values from different runs together) If the vector only has 1 element, then concat them together to find Min/Max/Mean

Harbormaster completed remote builds in B60014: Diff 270202.Jun 11 2020, 1:14 PM

One last comment, but otherwise this looks good. I'll leave the approval to Clement.

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
72	Very nit: if we're working with doubles, std::numeric_limits<double>::infinity() might be even better.

std::numeric_limits<double>::infinity()

oontvoo marked an inline comment as done.Jun 18 2020, 1:00 PM

Harbormaster completed remote builds in B60884: Diff 271816.Jun 18 2020, 2:15 PM

courbet added inline comments.Jun 19 2020, 5:28 AM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
62	[style] no braces
llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp
53	Why not `std::min_element` ?
llvm/tools/llvm-exegesis/lib/Target.cpp
74	Let's do this now to avoid being in an inconsistent state.

update diff

Harbormaster completed remote builds in B61656: Diff 273211.Jun 24 2020, 8:05 PM

courbet added inline comments.Jun 25 2020, 12:26 AM

llvm/tools/llvm-exegesis/lib/Target.cpp

121

I the default Target implementation does not use it, let's just do:

std::unique_ptr<BenchmarkRunner> ExegesisTarget::createUopsBenchmarkRunner(
    const LLVMState &State,
    InstructionBenchmark::ResultAggregationModeE /*unused*/) const {
  return std::make_unique<UopsBenchmarkRunner>(State);
}

oontvoo marked an inline comment as done.Jun 25 2020, 7:14 AM

oontvoo added inline comments.

llvm/tools/llvm-exegesis/lib/Target.cpp
121	Should we change the impl to use it?

Removed unused ResultAggMode

update diff

Harbormaster failed remote builds in B61740: Diff 273373!Jun 25 2020, 10:14 AM

Harbormaster failed remote builds in B61741: Diff 273374!

courbet accepted this revision.Jun 26 2020, 12:27 AM

This revision is now accepted and ready to land.Jun 26 2020, 12:27 AM

oontvoo closed this revision.Jun 26 2020, 8:50 AM

Revision Contents

Path

Size

llvm/

tools/

llvm-exegesis/

lib/

BenchmarkResult.h

3 lines

BenchmarkRunner.h

5 lines

BenchmarkRunner.cpp

42 lines

LatencyBenchmarkRunner.h

7 lines

LatencyBenchmarkRunner.cpp

96 lines

6 lines

22 lines

10 lines

13 lines

18 lines

Diff 270129

llvm/tools/llvm-exegesis/lib/BenchmarkResult.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	struct InstructionBenchmark {
// snippet of 3 instructions is repeated 4 times, this is 12.		// snippet of 3 instructions is repeated 4 times, this is 12.
int NumRepetitions = 0;		int NumRepetitions = 0;
enum RepetitionModeE { Duplicate, Loop, AggregateMin };		enum RepetitionModeE { Duplicate, Loop, AggregateMin };
// Note that measurements are per instruction.		// Note that measurements are per instruction.
std::vector<BenchmarkMeasure> Measurements;		std::vector<BenchmarkMeasure> Measurements;
std::string Error;		std::string Error;
std::string Info;		std::string Info;
std::vector<uint8_t> AssembledSnippet;		std::vector<uint8_t> AssembledSnippet;
		// How to aggregate measurements.
		enum ResultAggregationModeE { Min, Max, MinVariance };
// Read functions.		// Read functions.
static Expected<InstructionBenchmark> readYaml(const LLVMState &State,		static Expected<InstructionBenchmark> readYaml(const LLVMState &State,
StringRef Filename);		StringRef Filename);

static Expected<std::vector<InstructionBenchmark>>		static Expected<std::vector<InstructionBenchmark>>
readYamls(const LLVMState &State, StringRef Filename);		readYamls(const LLVMState &State, StringRef Filename);

class Error readYamlFrom(const LLVMState &State, StringRef InputContent);		class Error readYamlFrom(const LLVMState &State, StringRef InputContent);
Show All 36 Lines

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h

Show All 15 Lines
#define LLVM_TOOLS_LLVM_EXEGESIS_BENCHMARKRUNNER_H		#define LLVM_TOOLS_LLVM_EXEGESIS_BENCHMARKRUNNER_H

#include "Assembler.h"		#include "Assembler.h"
#include "BenchmarkCode.h"		#include "BenchmarkCode.h"
#include "BenchmarkResult.h"		#include "BenchmarkResult.h"
#include "LlvmState.h"		#include "LlvmState.h"
#include "MCInstrDescView.h"		#include "MCInstrDescView.h"
#include "SnippetRepetitor.h"		#include "SnippetRepetitor.h"
		#include "llvm/ADT/SmallVector.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include <cstdlib>		#include <cstdlib>
#include <memory>		#include <memory>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {
Show All 28 Lines	private:
char *const AlignedPtr;		char *const AlignedPtr;
};		};

// A helper to measure counters while executing a function in a sandboxed		// A helper to measure counters while executing a function in a sandboxed
// context.		// context.
class FunctionExecutor {		class FunctionExecutor {
public:		public:
virtual ~FunctionExecutor();		virtual ~FunctionExecutor();
		// FIXME deprecate this.
virtual Expected<int64_t> runAndMeasure(const char *Counters) const = 0;		virtual Expected<int64_t> runAndMeasure(const char *Counters) const = 0;

		virtual Expected<llvm::SmallVector<int64_t, 4>>
		ondrasejUnsubmitted Done Reply Inline Actions This should be llvm::SmallVector<int64_t, N> with some small N (e.g. 4) to avoid memory allocations when there is just one return value. This is the case for virtually all counters except the LBR. ondrasej: This should be llvm::SmallVector<int64_t, N> with some small N (e.g. 4) to avoid memory…
		runAndSample(const char *Counters) const = 0;
		ondrasejUnsubmitted Done Reply Inline Actions Consider calling this runAndSample(). ondrasej: Consider calling this runAndSample().
};		};

protected:		protected:
const LLVMState &State;		const LLVMState &State;
const InstructionBenchmark::ModeE Mode;		const InstructionBenchmark::ModeE Mode;

private:		private:
virtual Expected<std::vector<BenchmarkMeasure>>		virtual Expected<std::vector<BenchmarkMeasure>>
Show All 12 Lines

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp

Show All 40 Lines	public:
FunctionExecutorImpl(const LLVMState &State,		FunctionExecutorImpl(const LLVMState &State,
object::OwningBinary<object::ObjectFile> Obj,		object::OwningBinary<object::ObjectFile> Obj,
BenchmarkRunner::ScratchSpace *Scratch)		BenchmarkRunner::ScratchSpace *Scratch)
: State(State), Function(State.createTargetMachine(), std::move(Obj)),		: State(State), Function(State.createTargetMachine(), std::move(Obj)),
Scratch(Scratch) {}		Scratch(Scratch) {}

private:		private:
Expected<int64_t> runAndMeasure(const char *Counters) const override {		Expected<int64_t> runAndMeasure(const char *Counters) const override {
		auto ResultOrError = runAndSample(Counters);
		if (ResultOrError)
		return ResultOrError.get()[0];
		return ResultOrError.takeError();
		}

		static void
		accumulateCounterValues(const llvm::SmallVector<int64_t, 4> NewValues,
		ondrasejUnsubmitted Done Reply Inline Actions NewValues should be a const reference, no? ondrasej: NewValues should be a const reference, no?
		llvm::SmallVector<int64_t, 4> *Result) {
		size_t I = 0;
		for (const int64_t &Value : NewValues) {
		if (I >= Result->size())
		Result->push_back(Value);
		else
		courbetUnsubmitted Done Reply Inline Actions [style] no braces courbet: [style] no braces
		(*Result)[I] += Value;
		++I;
		}
		ondrasejUnsubmitted Done Reply Inline Actions This is probably not a big deal, but consider: const size_t NumValues = std::max(NewValues.size(), Result->size()); if (NumValues > Result->size()) Result->resize(NumValues, 0); for (size_t i = 0, End = NewValues.size(); i < End; ++i) { (Result)[i] += NewValues[i]; } for unrolled + vectorized code. ondrasej:* This is probably not a big deal, but consider: ``` const size_t NumValues = std::max(NewValues.
		}

		Expected<llvm::SmallVector<int64_t, 4>>
		runAndSample(const char *Counters) const override {
// We sum counts when there are several counters for a single ProcRes		// We sum counts when there are several counters for a single ProcRes
// (e.g. P23 on SandyBridge).		// (e.g. P23 on SandyBridge).
int64_t CounterValue = 0;		llvm::SmallVector<int64_t, 4> CounterValues;
		int Reserved = 0;
		courbetUnsubmitted Done Reply Inline Actions Please get rif of the magic number. Why 16 ? what about `Counter->numValues()` ? courbet: Please get rif of the magic number. Why 16 ? what about `Counter->numValues()` ?
		ondrasejUnsubmitted Done Reply Inline Actions s/reserved/Reserved/ ondrasej: s/reserved/Reserved/
SmallVector<StringRef, 2> CounterNames;		SmallVector<StringRef, 2> CounterNames;
StringRef(Counters).split(CounterNames, '+');		StringRef(Counters).split(CounterNames, '+');
char *const ScratchPtr = Scratch->ptr();		char *const ScratchPtr = Scratch->ptr();
for (auto &CounterName : CounterNames) {		for (auto &CounterName : CounterNames) {
CounterName = CounterName.trim();		CounterName = CounterName.trim();
auto CounterOrError =		auto CounterOrError =
State.getExegesisTarget().createCounter(CounterName, State);		State.getExegesisTarget().createCounter(CounterName, State);

if (!CounterOrError)		if (!CounterOrError)
return CounterOrError.takeError();		return CounterOrError.takeError();

pfm::Counter *Counter = CounterOrError.get().get();		pfm::Counter *Counter = CounterOrError.get().get();
		if (Reserved == 0)
		CounterValues.reserve(Reserved = Counter->numValues());
		ondrasejUnsubmitted Done Reply Inline Actions I'd prefer not using assignment in function arguments - this might be easily overlooked or mistaken for a bug (assignment in place of equality comparison). ondrasej: I'd prefer not using assignment in function arguments - this might be easily overlooked or…
		else if (Reserved != Counter->numValues())
		// It'd be wrong to accumulate vectors of different sizes.
		return make_error<Failure>(
		llvm::Twine("Inconsistent number of values for counter ")
		.concat(CounterName)
		.concat(std::to_string(Counter->numValues()))
		.concat(" vs expected of ")
		.concat(std::to_string(Reserved)));
Scratch->clear();		Scratch->clear();
{		{
CrashRecoveryContext CRC;		CrashRecoveryContext CRC;
CrashRecoveryContext::Enable();		CrashRecoveryContext::Enable();
const bool Crashed = !CRC.RunSafely([this, Counter, ScratchPtr]() {		const bool Crashed = !CRC.RunSafely([this, Counter, ScratchPtr]() {
Counter->start();		Counter->start();
this->Function(ScratchPtr);		this->Function(ScratchPtr);
Counter->stop();		Counter->stop();
});		});
CrashRecoveryContext::Disable();		CrashRecoveryContext::Disable();
// FIXME: Better diagnosis.		// FIXME: Better diagnosis.
if (Crashed)		if (Crashed)
return make_error<SnippetCrash>("snippet crashed while running");		return make_error<SnippetCrash>("snippet crashed while running");
}		}
CounterValue += Counter->read();		auto ValueOrError = Counter->readOrError();
		if (!ValueOrError)
		courbetUnsubmitted Done Reply Inline Actions [style] if (!ValueOrError) return ValueOrError.takeError(); ... courbet: [style] ``` if (!ValueOrError) return ValueOrError.takeError(); ... ```
		return ValueOrError.takeError();

		accumulateCounterValues(ValueOrError.get(), &CounterValues);
		courbetUnsubmitted Done Reply Inline Actions Please split this off to `void accumulateCounterValues(ValueOrError.get(), CounterValues)` courbet: Please split this off to `void accumulateCounterValues(ValueOrError.get(), CounterValues)`
}		}
return CounterValue;		return CounterValues;
}		}
		courbetUnsubmitted Done Reply Inline Actions Why not `+=` ? courbet: Why not `+=` ?

const LLVMState &State;		const LLVMState &State;
const ExecutableFunction Function;		const ExecutableFunction Function;
BenchmarkRunner::ScratchSpace *const Scratch;		BenchmarkRunner::ScratchSpace *const Scratch;
};		};
} // namespace		} // namespace

Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(		Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.h

	Show All 15 Lines

	#include "BenchmarkRunner.h"			#include "BenchmarkRunner.h"

	namespace llvm {			namespace llvm {
	namespace exegesis {			namespace exegesis {

	class LatencyBenchmarkRunner : public BenchmarkRunner {			class LatencyBenchmarkRunner : public BenchmarkRunner {
	public:			public:
	LatencyBenchmarkRunner(const LLVMState &State,			LatencyBenchmarkRunner(
	InstructionBenchmark::ModeE Mode);			const LLVMState &State, InstructionBenchmark::ModeE Mode,
				InstructionBenchmark::ResultAggregationModeE ResultAggMode);
	~LatencyBenchmarkRunner() override;			~LatencyBenchmarkRunner() override;

	private:			private:
	Expected<std::vector<BenchmarkMeasure>>			Expected<std::vector<BenchmarkMeasure>>
	runMeasurements(const FunctionExecutor &Executor) const override;			runMeasurements(const FunctionExecutor &Executor) const override;

				InstructionBenchmark::ResultAggregationModeE ResultAggMode;
	};			};
	} // namespace exegesis			} // namespace exegesis
	} // namespace llvm			} // namespace llvm

	#endif // LLVM_TOOLS_LLVM_EXEGESIS_LATENCY_H			#endif // LLVM_TOOLS_LLVM_EXEGESIS_LATENCY_H

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp

	//===-- LatencyBenchmarkRunner.cpp ------------------------------- C++ --===//			//===-- LatencyBenchmarkRunner.cpp ------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "LatencyBenchmarkRunner.h"			#include "LatencyBenchmarkRunner.h"

	#include "Target.h"
	#include "BenchmarkRunner.h"			#include "BenchmarkRunner.h"
				#include "Target.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/Support/Error.h"
				#include <algorithm>
				#include <cmath>

	namespace llvm {			namespace llvm {
	namespace exegesis {			namespace exegesis {

	LatencyBenchmarkRunner::LatencyBenchmarkRunner(const LLVMState &State,			LatencyBenchmarkRunner::LatencyBenchmarkRunner(
	InstructionBenchmark::ModeE Mode)			const LLVMState &State, InstructionBenchmark::ModeE Mode,
				InstructionBenchmark::ResultAggregationModeE ResultAgg)
	: BenchmarkRunner(State, Mode) {			: BenchmarkRunner(State, Mode) {
	assert((Mode == InstructionBenchmark::Latency \|\|			assert((Mode == InstructionBenchmark::Latency \|\|
	Mode == InstructionBenchmark::InverseThroughput) &&			Mode == InstructionBenchmark::InverseThroughput) &&
	"invalid mode");			"invalid mode");
				ResultAggMode = ResultAgg;
	}			}

	LatencyBenchmarkRunner::~LatencyBenchmarkRunner() = default;			LatencyBenchmarkRunner::~LatencyBenchmarkRunner() = default;

				static double computeVariance(const llvm::SmallVector<int64_t, 4> &Values) {
				courbetUnsubmitted Done Reply Inline Actions `ComputeVariance` ? courbet: `ComputeVariance` ?
				if (Values.empty())
				return 0.0;
				double Sum = std::accumulate(Values.begin(), Values.end(), 0.0);

				const double Mean = Sum / Values.size();
				ondrasejUnsubmitted Done Reply Inline Actions You could do const double Sum = std::accumulate(Values.begin(), Values.end(), 0.0); instead. ondrasej: You could do const double Sum = std::accumulate(Values.begin(), Values.end(), 0.0); instead.
				double Ret = 0;
				for (const auto &V : Values)
				Ret += std::pow(V - Mean, 2);
				ondrasejUnsubmitted Done Reply Inline Actions std::pow is somewhat heavy-weight as it is not optimized for integer exponents, const double Delta = V - Mean; Ret += Delta * Delta; will likely be significantly faster. ondrasej: std::pow is somewhat heavy-weight as it is not optimized for integer exponents, const double…

				return Ret / Values.size();
				courbetUnsubmitted Done Reply Inline Actions This is missing a square. courbet: This is missing a square.
				}

				static int64_t findMin(const llvm::SmallVector<int64_t, 4> &Values) {
				if (Values.empty())
				return 0;
				int64_t Min = std::numeric_limits<int64_t>::max();
				for (const int64_t &V : Values)
				if (V < Min)
				Min = V;
				return Min;
				ondrasejUnsubmitted Done Reply Inline Actions A shorter way to write this: return std::accumulate( Values.begin(), Values.end(), std::numeric_limits<int64_t>::max(), [](int64_t A, int64_t B) { return A < B ? A : B; }); And similar for FindMax below. ondrasej: A shorter way to write this: ``` return std::accumulate( Values.begin(), Values.end(), std…
				courbetUnsubmitted Done Reply Inline Actions Why not `std::min_element` ? courbet: Why not `std::min_element` ?
				}

				static int64_t findMax(const llvm::SmallVector<int64_t, 4> &Values) {
				if (Values.empty())
				return 0;
				int64_t Max = std::numeric_limits<int64_t>::min();
				for (const int64_t &V : Values)
				if (V > Max)
				Max = V;
				return Max;
				}

	Expected<std::vector<BenchmarkMeasure>> LatencyBenchmarkRunner::runMeasurements(			Expected<std::vector<BenchmarkMeasure>> LatencyBenchmarkRunner::runMeasurements(
				courbetUnsubmitted Done Reply Inline Actions So computing the min and stddev across values in `runAndMeasureMulti()` requires them to be measuring the same thing. From the documentation it's not clear to me what the values represent. Are they always homogeneous ? Can you give an example of what they are in the LBR case. courbet: So computing the min and stddev across values in `runAndMeasureMulti()` requires them to be…
				oontvooAuthorUnsubmitted Done Reply Inline Actions Yes, they're supposed to be homogenous, representing the measurements of the same block sampled at fixed rate. For the LBR specifically: Each value in the vector is the number of elapsed cycles since last branch-retire When we take a sample, we have 16 of such values, representing the last 16 branches. If we have a loop whose body is a basic-block, then effectively these are the measurements for the last 16 iterations[0]. if the loop body has some branches then we'd need a different aggregation strategy. [0] I think this is where it's a bit "wrong" right now. We always only read the last 16 branches. What we probably want is to be able to have the BenchmarkRunner pause at fixed period and take a sample. (Just made a note so we could talk about this tomorrow @courbet @ondrasej ) oontvoo: Yes, they're supposed to be homogenous, representing the measurements of the same block…
	const FunctionExecutor &Executor) const {			const FunctionExecutor &Executor) const {
	// Cycle measurements include some overhead from the kernel. Repeat the			// Cycle measurements include some overhead from the kernel. Repeat the
	// measure several times and take the minimum value.			// measure several times and take the minimum value.
	constexpr const int NumMeasurements = 30;			constexpr const int NumMeasurements = 30;
	int64_t MinValue = std::numeric_limits<int64_t>::max();			llvm::SmallVector<int64_t, 4> WithMinVariance;
				double Variance = std::numeric_limits<double>::max();
				courbetUnsubmitted Done Reply Inline Actions why not `std::numeric_limits<double>` ? courbet: why not `std::numeric_limits<double>` ?
				ondrasejUnsubmitted Done Reply Inline Actions Very nit: if we're working with doubles, std::numeric_limits<double>::infinity() might be even better. ondrasej: Very nit: if we're working with doubles, std::numeric_limits<double>::infinity() might be even…
	const char *CounterName = State.getPfmCounters().CycleCounter;			const char *CounterName = State.getPfmCounters().CycleCounter;
	for (size_t I = 0; I < NumMeasurements; ++I) {			for (size_t I = 0; I < NumMeasurements; ++I) {
	auto ExpectedCounterValue = Executor.runAndMeasure(CounterName);			auto ExpectedCounterValues = Executor.runAndSample(CounterName);
	if (!ExpectedCounterValue)			if (!ExpectedCounterValues)
	return ExpectedCounterValue.takeError();			return ExpectedCounterValues.takeError();
				courbetUnsubmitted Done Reply Inline Actions Technically you're computing a variance. courbet: Technically you're computing a variance.
				oontvooAuthorUnsubmitted Done Reply Inline Actions ¯\_(ツ)_/¯ indeed! oontvoo: ¯\_(ツ)_/¯ indeed!
	if (*ExpectedCounterValue < MinValue)
				courbetUnsubmitted Done Reply Inline Actions Because of short-circuiting, if `WithMinStdev.empty()`, then `CurStdev` will not be evaluated, and `CurStdev` will be zero. Then `Stdev` will be set to `0`, preventing any further updates. courbet: Because of short-circuiting, if `WithMinStdev.empty()`, then `CurStdev` will not be evaluated…
	MinValue = *ExpectedCounterValue;			// We'll keep the reading with lowest variance (ie., most stable)
				courbetUnsubmitted Done Reply Inline Actions Was this supposed to be the other way around ? Here we are selecting the largest stddev. courbet: Was this supposed to be the other way around ? Here we are selecting the largest stddev.
				double CurVariance = computeVariance(*ExpectedCounterValues);
				if (Variance > CurVariance)
				ondrasejUnsubmitted Done Reply Inline Actions Nit: Since we have WithMinVariance, consider renaming Variance to MinVariance, and CurVariance to just Variance. ondrasej: Nit: Since we have WithMinVariance, consider renaming Variance to MinVariance, and CurVariance…
				WithMinVariance = *ExpectedCounterValues;
				ondrasejUnsubmitted Done Reply Inline Actions Ideally, this should use move semantics (if they are supported by SmallVector). ondrasej: Ideally, this should use move semantics (if they are supported by SmallVector).
				Variance = CurVariance;
				ondrasejUnsubmitted Done Reply Inline Actions This would update Variance (MinVariance with the rename proposed above) even if the variance increased. What you probably want is if (Variance < MinVariance) { WithMinVariance = ExpectedCounterValues; MinVariancec = Variance; } ondrasej:* This would update Variance (MinVariance with the rename proposed above) even if the variance…
	}			}
	std::vector<BenchmarkMeasure> Result;
				std::string ModeName;
	switch (Mode) {			switch (Mode) {
	case InstructionBenchmark::Latency:			case InstructionBenchmark::Latency:
	Result = {BenchmarkMeasure::Create("latency", MinValue)};			ModeName = "latency";
	break;			break;
				ondrasejUnsubmitted Done Reply Inline Actions I'd prefer to have more flexibility about the numbers that are returned and reported by the tool. The original code collapsed the measurements to a single value (the minimum). This is useful when looking for lower bounds/optimistic numbers, but other processing would also make sense: the mean of the measurements - this might give a better idea of the actual performance when running in the loop. the list of all measured values - so that the user can analyze the distribution of the measurements by themselves (and go beyond the mean/variance). In particular for the LBR measurements: being able to see the raw timings of individual loop iterations is what makes that measurement method so appealing, and we'd lose all that information if we just took an aggregate value, be it a min or a mean. The min and mean from the LBR do have their value and I'd expect them to be more precise than the measurements over an unrolled loop, so ideally we'd have a way to see both. I think a good solution would be to add an argument to this method that determines how the values are aggregated over the measurements: min, mean, min variance, keep all. and over the contents of the returned vector: min, mean, keep all. This will mean more changes up the call stack, but it would give us a lot more flexibility in using the tool. What do you think? ondrasej: I'd prefer to have more flexibility about the numbers that are returned and reported by the…
				oontvooAuthorUnsubmitted Done Reply Inline Actions Yes, that's a good idea. I've thought about this a bit more and realised even keeping min-variance (of each read) could still be completely wrong. Imagine your benchmarked code has a number of distinct branches; then there is no reason to expect the cycles from these branches to correlate. The min-variance approach is only meaningful if it's the same code-path. oontvoo: Yes, that's a good idea. I've thought about this a bit more and realised even keeping min…
	case InstructionBenchmark::InverseThroughput:			case InstructionBenchmark::InverseThroughput:
	Result = {BenchmarkMeasure::Create("inverse_throughput", MinValue)};			ModeName = "inverse_throughput";
	break;			break;
	default:			default:
	break;			break;
	}			}

				switch (ResultAggMode) {
				ondrasejUnsubmitted Done Reply Inline Actions I'd still split this into two arguments: One that decides what happens with the return values of runAndSample() with {Concatenate, MinVariance}. And another one that decides what to do with the results of the previous step (Min, Max, Mean, return as is}. With the current MinVariance filtering in all cases, we're changing the behavior for scalar counters, where we might drop some values (and we might drop some values also in the LBR case). ondrasej: I'd still split this into two arguments: One that decides what happens with the return values…
				oontvooAuthorUnsubmitted Done Reply Inline Actions Ah, I think we can infer how to accumulate the values returnt by runAndSample(). If the return vector has more than 1 element, then keep the set with min variance (because it wouldn't make sense to concat all the values from different runs together) If the vector only has 1 element, then concat them together to find Min/Max/Mean oontvoo: Ah, I think we can infer how to accumulate the values returnt by runAndSample(). * If the…
				case InstructionBenchmark::MinVariance: {
				std::vector<BenchmarkMeasure> Result;
				Result.reserve(WithMinVariance.size());
				for (const int64_t Value : WithMinVariance)
				Result.push_back(BenchmarkMeasure::Create(ModeName, Value));
				return std::move(Result);
				}
				case InstructionBenchmark::Min: {
				std::vector<BenchmarkMeasure> Result;
				Result.push_back(
				BenchmarkMeasure::Create(ModeName, findMin(WithMinVariance)));
	return std::move(Result);			return std::move(Result);
	}			}
				case InstructionBenchmark::Max: {
				std::vector<BenchmarkMeasure> Result;
				Result.push_back(
				BenchmarkMeasure::Create(ModeName, findMax(WithMinVariance)));
				return std::move(Result);
				}
				}
				return llvm::make_error<Failure>(llvm::Twine("Unexpected benchmark mode(")
				.concat(std::to_string(Mode))
				.concat(" and unexpected ResultAggMode ")
				.concat(std::to_string(ResultAggMode)));
				}

	} // namespace exegesis			} // namespace exegesis
	} // namespace llvm			} // namespace llvm

llvm/tools/llvm-exegesis/lib/PerfHelper.h

Show All 9 Lines
/// Helpers for measuring perf events.		/// Helpers for measuring perf events.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H		#ifndef LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H
#define LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H		#define LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Config/config.h"		#include "llvm/Config/config.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"

#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <memory>		#include <memory>

struct perf_event_attr;		struct perf_event_attr;

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	public:

/// Stops the measurement of the event.		/// Stops the measurement of the event.
void stop();		void stop();

/// Returns the current value of the counter or -1 if it cannot be read.		/// Returns the current value of the counter or -1 if it cannot be read.
int64_t read() const;		int64_t read() const;

/// Returns the current value of the counter or error if it cannot be read.		/// Returns the current value of the counter or error if it cannot be read.
virtual llvm::Expected<int64_t> readOrError() const;		virtual llvm::Expected<llvm::SmallVector<int64_t, 4>> readOrError() const;

		virtual int numValues() const;

private:		private:
PerfEvent Event;		PerfEvent Event;
#ifdef HAVE_LIBPFM		#ifdef HAVE_LIBPFM
int FileDescriptor = -1;		int FileDescriptor = -1;
#endif		#endif
};		};

} // namespace pfm		} // namespace pfm
} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_PERFHELPER_H

llvm/tools/llvm-exegesis/lib/PerfHelper.cpp

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines

	Counter::~Counter() { close(FileDescriptor); }			Counter::~Counter() { close(FileDescriptor); }

	void Counter::start() { ioctl(FileDescriptor, PERF_EVENT_IOC_RESET, 0); }			void Counter::start() { ioctl(FileDescriptor, PERF_EVENT_IOC_RESET, 0); }

	void Counter::stop() { ioctl(FileDescriptor, PERF_EVENT_IOC_DISABLE, 0); }			void Counter::stop() { ioctl(FileDescriptor, PERF_EVENT_IOC_DISABLE, 0); }

	int64_t Counter::read() const {			int64_t Counter::read() const {
	auto ValueOrError = readOrError();			auto ValueOrError = readOrError();
	if (ValueOrError)			if (ValueOrError) {
				ondrasejUnsubmitted Done Reply Inline Actions You might want to check that there is at least one element. ondrasej: You might want to check that there is at least one element.
	return ValueOrError.get();			if (!ValueOrError.get().empty())
				return ValueOrError.get()[0];
				errs() << "Counter has no reading\n";
				} else
	errs() << ValueOrError.takeError() << "\n";			errs() << ValueOrError.takeError() << "\n";
	return -1;			return -1;
	}			}

	llvm::Expected<int64_t> Counter::readOrError() const {			llvm::Expected<llvm::SmallVector<int64_t, 4>> Counter::readOrError() const {
				ondrasejUnsubmitted Done Reply Inline Actions This should be llvm::SmallVector<int64_t, some small N> - most counters return just a single value. ondrasej: This should be llvm::SmallVector<int64_t, some small N> - most counters return just a single…
	int64_t Count = 0;			int64_t Count = 0;
	ssize_t ReadSize = ::read(FileDescriptor, &Count, sizeof(Count));			ssize_t ReadSize = ::read(FileDescriptor, &Count, sizeof(Count));
	if (ReadSize != sizeof(Count))			if (ReadSize != sizeof(Count))
	return llvm::make_error<llvm::StringError>("Failed to read event counter",			return llvm::make_error<llvm::StringError>("Failed to read event counter",
	llvm::errc::io_error);			llvm::errc::io_error);
				llvm::SmallVector<int64_t, 4> Result;
	return Count;			Result.push_back(Count);
				return Result;
				courbetUnsubmitted Done Reply Inline Actions This returns a vector with `Count` elements set to zero. If this passes all tests then we are clearly missing tests :( courbet: This returns a vector with `Count` elements set to zero. If this passes all tests then we are…
	}			}

				int Counter::numValues() const { return 1; }
	#else			#else

	Counter::Counter(PerfEvent &&Event) : Event(std::move(Event)) {}			Counter::Counter(PerfEvent &&Event) : Event(std::move(Event)) {}

	Counter::~Counter() = default;			Counter::~Counter() = default;

	void Counter::start() {}			void Counter::start() {}

	void Counter::stop() {}			void Counter::stop() {}

	int64_t Counter::read() const { return 42; }			int64_t Counter::read() const { return 42; }

	llvm::Expected<int64_t> Counter::readOrError() const {			llvm::Expected<llvm : SmallVector<int64_t, 4>> Counter::readOrError() const {
				ondrasejUnsubmitted Done Reply Inline Actions This should also be llvm::SmallVector (please test with HAVE_LIBPFM undefined/false). ondrasej: This should also be llvm::SmallVector (please test with HAVE_LIBPFM undefined/false).
	return llvm::make_error<llvm::StringError>("Not implemented",			return llvm::make_error<llvm::StringError>("Not implemented",
	llvm::errc::io_error);			llvm::errc::io_error);
	}			}

				int Counter::numValues() const { return 1; }

	#endif			#endif

	} // namespace pfm			} // namespace pfm
	} // namespace exegesis			} // namespace exegesis
	} // namespace llvm			} // namespace llvm

llvm/tools/llvm-exegesis/lib/Target.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	public:
}		}

// Creates a snippet generator for the given mode.		// Creates a snippet generator for the given mode.
std::unique_ptr<SnippetGenerator>		std::unique_ptr<SnippetGenerator>
createSnippetGenerator(InstructionBenchmark::ModeE Mode,		createSnippetGenerator(InstructionBenchmark::ModeE Mode,
const LLVMState &State,		const LLVMState &State,
const SnippetGenerator::Options &Opts) const;		const SnippetGenerator::Options &Opts) const;
// Creates a benchmark runner for the given mode.		// Creates a benchmark runner for the given mode.
Expected<std::unique_ptr<BenchmarkRunner>>		Expected<std::unique_ptr<BenchmarkRunner>> createBenchmarkRunner(
createBenchmarkRunner(InstructionBenchmark::ModeE Mode,		InstructionBenchmark::ModeE Mode, const LLVMState &State,
const LLVMState &State) const;		InstructionBenchmark::ResultAggregationModeE ResultAggMode =
		InstructionBenchmark::Min) const;

// Returns the ExegesisTarget for the given triple or nullptr if the target		// Returns the ExegesisTarget for the given triple or nullptr if the target
// does not exist.		// does not exist.
static const ExegesisTarget *lookup(Triple TT);		static const ExegesisTarget *lookup(Triple TT);
// Returns the default (unspecialized) ExegesisTarget.		// Returns the default (unspecialized) ExegesisTarget.
static const ExegesisTarget &getDefault();		static const ExegesisTarget &getDefault();
// Registers a target. Not thread safe.		// Registers a target. Not thread safe.
static void registerTarget(ExegesisTarget *T);		static void registerTarget(ExegesisTarget *T);
Show All 9 Lines	private:

// Targets can implement their own snippet generators/benchmarks runners by		// Targets can implement their own snippet generators/benchmarks runners by
// implementing these.		// implementing these.
std::unique_ptr<SnippetGenerator> virtual createSerialSnippetGenerator(		std::unique_ptr<SnippetGenerator> virtual createSerialSnippetGenerator(
const LLVMState &State, const SnippetGenerator::Options &Opts) const;		const LLVMState &State, const SnippetGenerator::Options &Opts) const;
std::unique_ptr<SnippetGenerator> virtual createParallelSnippetGenerator(		std::unique_ptr<SnippetGenerator> virtual createParallelSnippetGenerator(
const LLVMState &State, const SnippetGenerator::Options &Opts) const;		const LLVMState &State, const SnippetGenerator::Options &Opts) const;
std::unique_ptr<BenchmarkRunner> virtual createLatencyBenchmarkRunner(		std::unique_ptr<BenchmarkRunner> virtual createLatencyBenchmarkRunner(
const LLVMState &State, InstructionBenchmark::ModeE Mode) const;		const LLVMState &State, InstructionBenchmark::ModeE Mode,
		InstructionBenchmark::ResultAggregationModeE ResultAggMode) const;
std::unique_ptr<BenchmarkRunner> virtual createUopsBenchmarkRunner(		std::unique_ptr<BenchmarkRunner> virtual createUopsBenchmarkRunner(
const LLVMState &State) const;		const LLVMState &State) const;

const ExegesisTarget *Next = nullptr;		const ExegesisTarget *Next = nullptr;
const ArrayRef<CpuAndPfmCounters> CpuPfmCounters;		const ArrayRef<CpuAndPfmCounters> CpuPfmCounters;
};		};

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm

#endif // LLVM_TOOLS_LLVM_EXEGESIS_TARGET_H		#endif // LLVM_TOOLS_LLVM_EXEGESIS_TARGET_H

llvm/tools/llvm-exegesis/lib/Target.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	std::unique_ptr<SnippetGenerator> ExegesisTarget::createSnippetGenerator(
case InstructionBenchmark::Uops:		case InstructionBenchmark::Uops:
case InstructionBenchmark::InverseThroughput:		case InstructionBenchmark::InverseThroughput:
return createParallelSnippetGenerator(State, Opts);		return createParallelSnippetGenerator(State, Opts);
}		}
return nullptr;		return nullptr;
}		}

Expected<std::unique_ptr<BenchmarkRunner>>		Expected<std::unique_ptr<BenchmarkRunner>>
ExegesisTarget::createBenchmarkRunner(InstructionBenchmark::ModeE Mode,		ExegesisTarget::createBenchmarkRunner(
const LLVMState &State) const {		InstructionBenchmark::ModeE Mode, const LLVMState &State,
		InstructionBenchmark::ResultAggregationModeE ResultAggMode) const {
		// FIXME propagate ResultAggMode to other BenchmarkRunner.
		courbetUnsubmitted Done Reply Inline Actions Let's do this now to avoid being in an inconsistent state. courbet: Let's do this now to avoid being in an inconsistent state.
PfmCountersInfo PfmCounters = State.getPfmCounters();		PfmCountersInfo PfmCounters = State.getPfmCounters();
switch (Mode) {		switch (Mode) {
case InstructionBenchmark::Unknown:		case InstructionBenchmark::Unknown:
return nullptr;		return nullptr;
case InstructionBenchmark::Latency:		case InstructionBenchmark::Latency:
case InstructionBenchmark::InverseThroughput:		case InstructionBenchmark::InverseThroughput:
if (!PfmCounters.CycleCounter) {		if (!PfmCounters.CycleCounter) {
const char *ModeName = Mode == InstructionBenchmark::Latency		const char *ModeName = Mode == InstructionBenchmark::Latency
? "latency"		? "latency"
: "inverse_throughput";		: "inverse_throughput";
return make_error<Failure>(		return make_error<Failure>(
Twine("can't run '")		Twine("can't run '")
.concat(ModeName)		.concat(ModeName)
.concat("' mode, sched model does not define a cycle counter."));		.concat("' mode, sched model does not define a cycle counter."));
}		}
return createLatencyBenchmarkRunner(State, Mode);		return createLatencyBenchmarkRunner(State, Mode, ResultAggMode);
case InstructionBenchmark::Uops:		case InstructionBenchmark::Uops:
if (!PfmCounters.UopsCounter && !PfmCounters.IssueCounters)		if (!PfmCounters.UopsCounter && !PfmCounters.IssueCounters)
return make_error<Failure>("can't run 'uops' mode, sched model does not "		return make_error<Failure>("can't run 'uops' mode, sched model does not "
"define uops or issue counters.");		"define uops or issue counters.");
return createUopsBenchmarkRunner(State);		return createUopsBenchmarkRunner(State);
}		}
return nullptr;		return nullptr;
}		}

std::unique_ptr<SnippetGenerator> ExegesisTarget::createSerialSnippetGenerator(		std::unique_ptr<SnippetGenerator> ExegesisTarget::createSerialSnippetGenerator(
const LLVMState &State, const SnippetGenerator::Options &Opts) const {		const LLVMState &State, const SnippetGenerator::Options &Opts) const {
return std::make_unique<SerialSnippetGenerator>(State, Opts);		return std::make_unique<SerialSnippetGenerator>(State, Opts);
}		}

std::unique_ptr<SnippetGenerator> ExegesisTarget::createParallelSnippetGenerator(		std::unique_ptr<SnippetGenerator> ExegesisTarget::createParallelSnippetGenerator(
const LLVMState &State, const SnippetGenerator::Options &Opts) const {		const LLVMState &State, const SnippetGenerator::Options &Opts) const {
return std::make_unique<ParallelSnippetGenerator>(State, Opts);		return std::make_unique<ParallelSnippetGenerator>(State, Opts);
}		}

std::unique_ptr<BenchmarkRunner> ExegesisTarget::createLatencyBenchmarkRunner(		std::unique_ptr<BenchmarkRunner> ExegesisTarget::createLatencyBenchmarkRunner(
const LLVMState &State, InstructionBenchmark::ModeE Mode) const {		const LLVMState &State, InstructionBenchmark::ModeE Mode,
return std::make_unique<LatencyBenchmarkRunner>(State, Mode);		InstructionBenchmark::ResultAggregationModeE ResultAggMode) const {
		return std::make_unique<LatencyBenchmarkRunner>(State, Mode, ResultAggMode);
}		}

std::unique_ptr<BenchmarkRunner>		std::unique_ptr<BenchmarkRunner>
ExegesisTarget::createUopsBenchmarkRunner(const LLVMState &State) const {		ExegesisTarget::createUopsBenchmarkRunner(const LLVMState &State) const {
return std::make_unique<UopsBenchmarkRunner>(State);		return std::make_unique<UopsBenchmarkRunner>(State);
}		}

static_assert(std::is_pod<PfmCountersInfo>::value,		static_assert(std::is_pod<PfmCountersInfo>::value,
		courbetUnsubmitted Done Reply Inline Actions I the default Target implementation does not use it, let's just do: std::unique_ptr<BenchmarkRunner> ExegesisTarget::createUopsBenchmarkRunner( const LLVMState &State, InstructionBenchmark::ResultAggregationModeE /unused/) const { return std::make_unique<UopsBenchmarkRunner>(State); } courbet: I the default Target implementation does not use it, let's just do: ``` std…
		oontvooAuthorUnsubmitted Done Reply Inline Actions Should we change the impl to use it? oontvoo: Should we change the impl to use it?
"We shouldn't have dynamic initialization here");		"We shouldn't have dynamic initialization here");
const PfmCountersInfo PfmCountersInfo::Default = {nullptr, nullptr, nullptr,		const PfmCountersInfo PfmCountersInfo::Default = {nullptr, nullptr, nullptr,
0u};		0u};

const PfmCountersInfo &ExegesisTarget::getPfmCounters(StringRef CpuName) const {		const PfmCountersInfo &ExegesisTarget::getPfmCounters(StringRef CpuName) const {
assert(llvm::is_sorted(		assert(llvm::is_sorted(
CpuPfmCounters,		CpuPfmCounters,
[](const CpuAndPfmCounters &LHS, const CpuAndPfmCounters &RHS) {		[](const CpuAndPfmCounters &LHS, const CpuAndPfmCounters &RHS) {
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(exegesis::InstructionBenchmark::Latency, "latency",
"Instruction Inverse Throughput"),		"Instruction Inverse Throughput"),
clEnumValN(exegesis::InstructionBenchmark::Uops, "uops",		clEnumValN(exegesis::InstructionBenchmark::Uops, "uops",
"Uop Decomposition"),		"Uop Decomposition"),
// When not asking for a specific benchmark mode,		// When not asking for a specific benchmark mode,
// we'll analyse the results.		// we'll analyse the results.
clEnumValN(exegesis::InstructionBenchmark::Unknown, "analysis",		clEnumValN(exegesis::InstructionBenchmark::Unknown, "analysis",
"Analysis")));		"Analysis")));

		static cl::opt<exegesis::InstructionBenchmark::ResultAggregationModeE>
		ResultAggMode(
		"result-aggregation-mode",
		cl::desc("How to aggregate multi-values result"), cl::cat(Options),
		cl::values(clEnumValN(exegesis::InstructionBenchmark::Min, "min",
		"Keep min reading"),
		clEnumValN(exegesis::InstructionBenchmark::Max, "max",
		"Keep max reading"),
		clEnumValN(exegesis::InstructionBenchmark::MinVariance,
		"min-variance",
		"Keep readings set with min-variance")),
		cl::init(exegesis::InstructionBenchmark::Min));

static cl::opt<exegesis::InstructionBenchmark::RepetitionModeE> RepetitionMode(		static cl::opt<exegesis::InstructionBenchmark::RepetitionModeE> RepetitionMode(
"repetition-mode", cl::desc("how to repeat the instruction snippet"),		"repetition-mode", cl::desc("how to repeat the instruction snippet"),
cl::cat(BenchmarkOptions),		cl::cat(BenchmarkOptions),
cl::values(		cl::values(
clEnumValN(exegesis::InstructionBenchmark::Duplicate, "duplicate",		clEnumValN(exegesis::InstructionBenchmark::Duplicate, "duplicate",
"Duplicate the snippet"),		"Duplicate the snippet"),
clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",		clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",
"Loop over the snippet"),		"Loop over the snippet"),
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	#endif

InitializeNativeTarget();		InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();		InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();		InitializeNativeTargetAsmParser();
InitializeNativeExegesisTarget();		InitializeNativeExegesisTarget();

const LLVMState State(CpuName);		const LLVMState State(CpuName);

const std::unique_ptr<BenchmarkRunner> Runner = ExitOnErr(		const std::unique_ptr<BenchmarkRunner> Runner =
State.getExegesisTarget().createBenchmarkRunner(BenchmarkMode, State));		ExitOnErr(State.getExegesisTarget().createBenchmarkRunner(
		BenchmarkMode, State, ResultAggMode));
if (!Runner) {		if (!Runner) {
ExitWithError("cannot create benchmark runner");		ExitWithError("cannot create benchmark runner");
}		}

const auto Opcodes = getOpcodesOrDie(State.getInstrInfo());		const auto Opcodes = getOpcodesOrDie(State.getInstrInfo());

SmallVector<std::unique_ptr<const SnippetRepetitor>, 2> Repetitors;		SmallVector<std::unique_ptr<const SnippetRepetitor>, 2> Repetitors;
if (RepetitionMode != InstructionBenchmark::RepetitionModeE::AggregateMin)		if (RepetitionMode != InstructionBenchmark::RepetitionModeE::AggregateMin)
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Let Counter returns up to 16 entries.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270129

llvm/tools/llvm-exegesis/lib/BenchmarkResult.h

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.h

llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp

llvm/tools/llvm-exegesis/lib/PerfHelper.h

llvm/tools/llvm-exegesis/lib/PerfHelper.cpp

llvm/tools/llvm-exegesis/lib/Target.h

llvm/tools/llvm-exegesis/lib/Target.cpp

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

[llvm-exegesis] Let Counter returns up to 16 entries.
ClosedPublic