This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-exegesis.rst
-
tools/llvm-exegesis/
-
llvm-exegesis/
-
lib/
-
BenchmarkResult.h
1/1
BenchmarkRunner.h
2/5
BenchmarkRunner.cpp
1/1
SnippetRepetitor.cpp
-
llvm-exegesis.cpp

Differential D76921

[llvm-exegesis] 'Min' repetition mode
ClosedPublic

Authored by lebedev.ri on Mar 27 2020, 6:33 AM.

Download Raw Diff

Details

Reviewers

courbet
gchatelet

Commits

rGde22d7154b4a: [llvm-exegesis] 'Min' repetition mode

Summary

As noted in documentation, different repetition modes have different trade-offs:

.. option:: -repetition-mode=[duplicate|loop]

Specify the repetition mode. duplicate will create a large, straight line
basic block with num-repetitions copies of the snippet. loop will wrap
the snippet in a loop which will be run num-repetitions times. The loop
mode tends to better hide the effects of the CPU frontend on architectures
that cache decoded instructions, but consumes a register for counting
iterations.

Indeed. Example:

In D74156#1873657, @lebedev.ri wrote:

At least for CMOV, i'm seeing wildly different results

Latency RThroughput

duplicate 1 0.8

loop 2 0.6

where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

This isn't great for analysis, at least for schedule model development.

As discussed in excruciating detail in

In D74156#1924514, @gchatelet wrote:

In D74156#1920632, @lebedev.ri wrote:

... did that explanation of the question i'm having made any sense?

Thx for digging in the conversation !
Ok it makes more sense now.

I discussed it a bit with @courbet:

We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.

We'd like to still be able to select either repetition mode to dig into special cases

So we could add a third min repetition mode that would run both and take the minimum. It could be the default option.
Would you have some time to look what it would take to add this third mode?

there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.

However, the solutions isn't entirely straight-forward.

We can just add an actual 'multiplexer' MinSnippetRepetitor, because
if we just concatenate snippets produced by DuplicateSnippetRepetitor
and LoopSnippetRepetitor and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | time(D+L)/2 != min(time(D), time(L)) ]])

Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.

As far as i can tell, we can either teach BenchmarkRunner::runConfiguration()
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify BenchmarkRunner::runConfiguration(),
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.

Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.

There's also a question of the test coverage. It sure currently does work here:

$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R15 i_0x0'
    - 'CMOV64rr RBX RBX RBX i_0x0'
    - 'CMOV64rr RCX RCX RBX i_0x0'
    - 'CMOV64rr RDI RDI R10 i_0x0'
    - 'CMOV64rr RDX RDX RAX i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R8 R8 R8 i_0x0'
    - 'CMOV64rr R9 R9 RDX i_0x0'
    - 'CMOV64rr R10 R10 RBX i_0x0'
    - 'CMOV64rr R11 R11 R14 i_0x0'
    - 'CMOV64rr R12 R12 R9 i_0x0'
    - 'CMOV64rr R13 R13 R12 i_0x0'
    - 'CMOV64rr R14 R14 R15 i_0x0'
    - 'CMOV64rr R15 R15 R13 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R15=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'R10=0x0'
    - 'RDX=0x0'
    - 'RSI=0x0'
    - 'R8=0x0'
    - 'R9=0x0'
    - 'R14=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP RSI i_0x0'
    - 'CMOV64rr RBX RBX R9 i_0x0'
    - 'CMOV64rr RCX RCX RSI i_0x0'
    - 'CMOV64rr RDI RDI RBP i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RDI i_0x0'
    - 'CMOV64rr R9 R9 R12 i_0x0'
    - 'CMOV64rr R10 R10 R11 i_0x0'
    - 'CMOV64rr R11 R11 R9 i_0x0'
    - 'CMOV64rr R12 R12 RBP i_0x0'
    - 'CMOV64rr R13 R13 RSI i_0x0'
    - 'CMOV64rr R14 R14 R14 i_0x0'
    - 'CMOV64rr R15 R15 R10 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'RSI=0x0'
    - 'RBX=0x0'
    - 'R9=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'RDX=0x0'
    - 'R12=0x0'
    - 'R10=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R10 i_0x0'
    - 'CMOV64rr RBX RBX R10 i_0x0'
    - 'CMOV64rr RCX RCX RDX i_0x0'
    - 'CMOV64rr RDI RDI RAX i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R9 R9 RBX i_0x0'
    - 'CMOV64rr R10 R10 R12 i_0x0'
    - 'CMOV64rr R11 R11 RDI i_0x0'
    - 'CMOV64rr R12 R12 RDI i_0x0'
    - 'CMOV64rr R13 R13 RDI i_0x0'
    - 'CMOV64rr R14 R14 R9 i_0x0'
    - 'CMOV64rr R15 R15 RBP i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R10=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDX=0x0'
    - 'RDI=0x0'
    - 'R9=0x0'
    - 'RSI=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...

but i open to suggestions as to how test that.

I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Mar 27 2020, 6:33 AM

Herald added a subscriber: mstojanovic. · View Herald TranscriptMar 27 2020, 6:33 AM

lebedev.ri edited the summary of this revision. (Show Details)Mar 27 2020, 6:34 AM

lebedev.ri edited the summary of this revision. (Show Details)Mar 27 2020, 6:36 AM

Harbormaster failed remote builds in B50678: Diff 253097!Mar 27 2020, 8:13 AM

lebedev.ri mentioned this in D74156: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE.Mar 28 2020, 5:41 AM

Thx for the patch !
I have a couple of suggestions but I'm onboard with the approach.

Please wait for comments from @courbet as well.

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
108	The scope of the for loop is pretty big. I think it would be better to create a separate function: for (const std::unique_ptr<const SnippetRepetitor> &Repetitor : Repetitors) { if (auto InstrBenchmarkOrErr = runOneBenchmark(InstrBenchmark)) { // On success, aggregate this run ... } else { // On error, extract the Error value and return it. return InstrBenchmarkOrErr.takeError(); } } This way you also don't have to have the RAII cleaner.
llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
113	I'd rather have an empty case for `InstructionBenchmark::AggregateMin` and not use `default`. This will help if we ever add a new mode.

lebedev.ri added inline comments.Mar 30 2020, 11:35 AM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
108	Hmm. I agree that the loop is getting big, but are you very sure about the fix? Like i said in the patch's description, i tried a variation of that approach initially, and i'd say the result is much uglier. Instead of having one place that fills a single `InstrBenchmark` and just ensuring it doesn't completely override previous results, we will now need to keep two places in sync. Please do note that we have two types of errors here, a fatal one that we signal via `Error`, and a measurement-time error, that we squash and stash into `InstrBenchmark.Error`. The latter must not be reported as a fatal error, but it must interrupt aggregation. So we won't avoid the RAII, just change it's form.

lebedev.ri added inline comments.Mar 30 2020, 11:48 AM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
108	override previous results, we will now need to keep two places in sync. Err, to be more specific: We will then need to just rename the existing `BenchmarkRunner::runConfiguration()` as that `runOneBenchmark()`, and new `BenchmarkRunner::runConfiguration()` would need to know how to aggregate every field of fresh `InstructionBenchmark` returned from `BenchmarkRunner::runOneBenchmark()` into the 'aggregate' report. This seems more complex than teaching `BenchmarkRunner::runConfiguration()` itself to do just that within the per-Repetitor loop.

courbet added inline comments.Mar 30 2020, 11:48 PM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
33	Let's avoid pushing tools flags down into libraries like this. This hinders testability and ties the inner classes to the tool main. What about just using `repetitors.size() > 1` ?

@gchatelet @courbet
Thank you for taking a look!
Addressed review notes.

Harbormaster failed remote builds in B51100: Diff 253801!Mar 31 2020, 2:10 AM

gchatelet added inline comments.Mar 31 2020, 5:54 AM

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp

108

Acknowledge.
I'm not a huge fan of the RAII cleaner which spans the whole function and needs comments to be understood.
Maybe a custom struct would convey more semantics, it's not a lot more code.

struct ClearBenchmarkOnReturn {
  ClearBenchmarkOnReturn(InstructionBenchmark* IB) : IB(IB)
  ~ClearBenchmarkOnReturn() { if(Clear) IB->Measurements.clear(); }

  void disarm() { Clear = false; }
private:
  InstructionBenchmark* const IB;
  bool Clear = true;
};
----------------------

ClearBenchmarkOnReturn CBOR(&InstrBenchmark);

...

CBOR.disarm();

Forego of llvm::make_scope_exit in favor of more manual approach.

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
108	Ok, sure, that i can do.

Harbormaster failed remote builds in B51124: Diff 253863!Mar 31 2020, 7:09 AM

gchatelet accepted this revision.Apr 1 2020, 7:15 AM

This revision is now accepted and ready to land.Apr 1 2020, 7:15 AM

@gchatelet thank you for the review!
@courbet any further comments?

courbet accepted this revision.Apr 1 2020, 11:23 PM

Great, thank you for the reviews!

Closed by commit rGde22d7154b4a: [llvm-exegesis] 'Min' repetition mode (authored by lebedev.ri). · Explain WhyApr 1 2020, 11:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-exegesis.rst

6 lines

tools/

llvm-exegesis/

lib/

3 lines

2 lines

155 lines

2 lines

35 lines

Diff 254436

llvm/docs/CommandGuide/llvm-exegesis.rst

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	Either `opcode-index`, `opcode-name` or `snippets-file` must be set.			Either `opcode-index`, `opcode-name` or `snippets-file` must be set.

	.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]			.. option:: -mode=[latency\|uops\|inverse_throughput\|analysis]

	Specify the run mode. Note that if you pick `analysis` mode, you also need			Specify the run mode. Note that if you pick `analysis` mode, you also need
	to specify at least one of the `-analysis-clusters-output-file=` and			to specify at least one of the `-analysis-clusters-output-file=` and
	`-analysis-inconsistencies-output-file=`.			`-analysis-inconsistencies-output-file=`.

	.. option:: -repetition-mode=[duplicate\|loop]			.. option:: -repetition-mode=[duplicate\|loop\|min]

	Specify the repetition mode. `duplicate` will create a large, straight line			Specify the repetition mode. `duplicate` will create a large, straight line
	basic block with `num-repetitions` copies of the snippet. `loop` will wrap			basic block with `num-repetitions` copies of the snippet. `loop` will wrap
	the snippet in a loop which will be run `num-repetitions` times. The `loop`			the snippet in a loop which will be run `num-repetitions` times. The `loop`
	mode tends to better hide the effects of the CPU frontend on architectures			mode tends to better hide the effects of the CPU frontend on architectures
	that cache decoded instructions, but consumes a register for counting			that cache decoded instructions, but consumes a register for counting
	iterations.			iterations. If performing an analysis over many opcodes, it may be best
				to instead use the `min` mode, which will run each other mode, and produce
				the minimal measured result.

	.. option:: -num-repetitions=<Number of repetitions>			.. option:: -num-repetitions=<Number of repetitions>

	Specify the number of repetitions of the asm snippet.			Specify the number of repetitions of the asm snippet.
	Higher values lead to more accurate measurements but lengthen the benchmark.			Higher values lead to more accurate measurements but lengthen the benchmark.

	.. option:: -max-configs-per-opcode=<value>			.. option:: -max-configs-per-opcode=<value>

	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/lib/BenchmarkResult.h

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	struct InstructionBenchmark {
ModeE Mode;		ModeE Mode;
std::string CpuName;		std::string CpuName;
std::string LLVMTriple;		std::string LLVMTriple;
// Which instruction is being benchmarked here?		// Which instruction is being benchmarked here?
const MCInst &keyInstruction() const { return Key.Instructions[0]; }		const MCInst &keyInstruction() const { return Key.Instructions[0]; }
// The number of instructions inside the repeated snippet. For example, if a		// The number of instructions inside the repeated snippet. For example, if a
// snippet of 3 instructions is repeated 4 times, this is 12.		// snippet of 3 instructions is repeated 4 times, this is 12.
int NumRepetitions = 0;		int NumRepetitions = 0;
enum RepetitionModeE { Duplicate, Loop };		enum RepetitionModeE { Duplicate, Loop, AggregateMin };
RepetitionModeE RepetitionMode;
// Note that measurements are per instruction.		// Note that measurements are per instruction.
std::vector<BenchmarkMeasure> Measurements;		std::vector<BenchmarkMeasure> Measurements;
std::string Error;		std::string Error;
std::string Info;		std::string Info;
std::vector<uint8_t> AssembledSnippet;		std::vector<uint8_t> AssembledSnippet;

// Read functions.		// Read functions.
static Expected<InstructionBenchmark> readYaml(const LLVMState &State,		static Expected<InstructionBenchmark> readYaml(const LLVMState &State,
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h

	Show All 24 Lines
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include <cstdlib>			#include <cstdlib>
	#include <memory>			#include <memory>
	#include <vector>			#include <vector>

	namespace llvm {			namespace llvm {
	namespace exegesis {			namespace exegesis {

	// Common code for all benchmark modes.			// Common code for all benchmark modes.
				courbetUnsubmitted Done Reply Inline Actions Let's avoid pushing tools flags down into libraries like this. This hinders testability and ties the inner classes to the tool main. What about just using `repetitors.size() > 1` ? courbet: Let's avoid pushing tools flags down into libraries like this. This hinders testability and…
	class BenchmarkRunner {			class BenchmarkRunner {
	public:			public:
	explicit BenchmarkRunner(const LLVMState &State,			explicit BenchmarkRunner(const LLVMState &State,
	InstructionBenchmark::ModeE Mode);			InstructionBenchmark::ModeE Mode);

	virtual ~BenchmarkRunner();			virtual ~BenchmarkRunner();

	Expected<InstructionBenchmark>			Expected<InstructionBenchmark>
	runConfiguration(const BenchmarkCode &Configuration, unsigned NumRepetitions,			runConfiguration(const BenchmarkCode &Configuration, unsigned NumRepetitions,
	const SnippetRepetitor &Repetitor,			ArrayRef<std::unique_ptr<const SnippetRepetitor>> Repetitors,
	bool DumpObjectToDisk) const;			bool DumpObjectToDisk) const;

	// Scratch space to run instructions that touch memory.			// Scratch space to run instructions that touch memory.
	struct ScratchSpace {			struct ScratchSpace {
	static constexpr const size_t kAlignment = 1024;			static constexpr const size_t kAlignment = 1024;
	static constexpr const size_t kSize = 1 << 20; // 1MB.			static constexpr const size_t kSize = 1 << 20; // 1MB.
	ScratchSpace()			ScratchSpace()
	: UnalignedPtr(std::make_unique<char[]>(kSize + kAlignment)),			: UnalignedPtr(std::make_unique<char[]>(kSize + kAlignment)),
	Show All 37 Lines

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp

//===-- BenchmarkRunner.cpp -------------------------------------- C++ --===//		//===-- BenchmarkRunner.cpp -------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include <array>		#include <array>
#include <string>		#include <string>

#include "Assembler.h"		#include "Assembler.h"
#include "BenchmarkRunner.h"		#include "BenchmarkRunner.h"
#include "Error.h"		#include "Error.h"
#include "MCInstrDescView.h"		#include "MCInstrDescView.h"
#include "PerfHelper.h"		#include "PerfHelper.h"
		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Support/CrashRecoveryContext.h"		#include "llvm/Support/CrashRecoveryContext.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	private:

const ExecutableFunction Function;		const ExecutableFunction Function;
BenchmarkRunner::ScratchSpace *const Scratch;		BenchmarkRunner::ScratchSpace *const Scratch;
};		};
} // namespace		} // namespace

Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(		Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(
const BenchmarkCode &BC, unsigned NumRepetitions,		const BenchmarkCode &BC, unsigned NumRepetitions,
const SnippetRepetitor &Repetitor, bool DumpObjectToDisk) const {		ArrayRef<std::unique_ptr<const SnippetRepetitor>> Repetitors,
		bool DumpObjectToDisk) const {
InstructionBenchmark InstrBenchmark;		InstructionBenchmark InstrBenchmark;
InstrBenchmark.Mode = Mode;		InstrBenchmark.Mode = Mode;
InstrBenchmark.CpuName = std::string(State.getTargetMachine().getTargetCPU());		InstrBenchmark.CpuName = std::string(State.getTargetMachine().getTargetCPU());
InstrBenchmark.LLVMTriple =		InstrBenchmark.LLVMTriple =
State.getTargetMachine().getTargetTriple().normalize();		State.getTargetMachine().getTargetTriple().normalize();
InstrBenchmark.NumRepetitions = NumRepetitions;		InstrBenchmark.NumRepetitions = NumRepetitions;
InstrBenchmark.Info = BC.Info;		InstrBenchmark.Info = BC.Info;

const std::vector<MCInst> &Instructions = BC.Key.Instructions;		const std::vector<MCInst> &Instructions = BC.Key.Instructions;

InstrBenchmark.Key = BC.Key;		InstrBenchmark.Key = BC.Key;

// Assemble at least kMinInstructionsForSnippet instructions by repeating the		// If we end up having an error, and we've previously succeeded with
// snippet for debug/analysis. This is so that the user clearly understands		// some other Repetitor, we want to discard the previous measurements.
// that the inside instructions are repeated.		struct ClearBenchmarkOnReturn {
		ClearBenchmarkOnReturn(InstructionBenchmark *IB) : IB(IB) {}
		~ClearBenchmarkOnReturn() {
		if (Clear)
		IB->Measurements.clear();
		}
		void disarm() { Clear = false; }

		gchateletUnsubmitted Not Done Reply Inline Actions The scope of the for loop is pretty big. I think it would be better to create a separate function: for (const std::unique_ptr<const SnippetRepetitor> &Repetitor : Repetitors) { if (auto InstrBenchmarkOrErr = runOneBenchmark(InstrBenchmark)) { // On success, aggregate this run ... } else { // On error, extract the Error value and return it. return InstrBenchmarkOrErr.takeError(); } } This way you also don't have to have the RAII cleaner. gchatelet: The scope of the for loop is pretty big. I think it would be better to create a separate…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Hmm. I agree that the loop is getting big, but are you very sure about the fix? Like i said in the patch's description, i tried a variation of that approach initially, and i'd say the result is much uglier. Instead of having one place that fills a single `InstrBenchmark` and just ensuring it doesn't completely override previous results, we will now need to keep two places in sync. Please do note that we have two types of errors here, a fatal one that we signal via `Error`, and a measurement-time error, that we squash and stash into `InstrBenchmark.Error`. The latter must not be reported as a fatal error, but it must interrupt aggregation. So we won't avoid the RAII, just change it's form. lebedev.ri: Hmm. I agree that the loop is getting big, but are you very sure about the fix? Like i said in…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions override previous results, we will now need to keep two places in sync. Err, to be more specific: We will then need to just rename the existing `BenchmarkRunner::runConfiguration()` as that `runOneBenchmark()`, and new `BenchmarkRunner::runConfiguration()` would need to know how to aggregate every field of fresh `InstructionBenchmark` returned from `BenchmarkRunner::runOneBenchmark()` into the 'aggregate' report. This seems more complex than teaching `BenchmarkRunner::runConfiguration()` itself to do just that within the per-Repetitor loop. lebedev.ri: > override previous results, we will now need to keep two places in sync. Err, to be more…
		gchateletUnsubmitted Done Reply Inline Actions Acknowledge. I'm not a huge fan of the RAII cleaner which spans the whole function and needs comments to be understood. Maybe a custom struct would convey more semantics, it's not a lot more code. struct ClearBenchmarkOnReturn { ClearBenchmarkOnReturn(InstructionBenchmark* IB) : IB(IB) ~ClearBenchmarkOnReturn() { if(Clear) IB->Measurements.clear(); } void disarm() { Clear = false; } private: InstructionBenchmark* const IB; bool Clear = true; }; ---------------------- ClearBenchmarkOnReturn CBOR(&InstrBenchmark); ... CBOR.disarm(); gchatelet: Acknowledge. I'm not a huge fan of the RAII cleaner which spans the whole function and needs…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Ok, sure, that i can do. lebedev.ri: Ok, sure, that i can do.
		private:
		InstructionBenchmark *const IB;
		bool Clear = true;
		};
		ClearBenchmarkOnReturn CBOR(&InstrBenchmark);

		for (const std::unique_ptr<const SnippetRepetitor> &Repetitor : Repetitors) {
		// Assemble at least kMinInstructionsForSnippet instructions by repeating
		// the snippet for debug/analysis. This is so that the user clearly
		// understands that the inside instructions are repeated.
constexpr const int kMinInstructionsForSnippet = 16;		constexpr const int kMinInstructionsForSnippet = 16;
{		{
SmallString<0> Buffer;		SmallString<0> Buffer;
raw_svector_ostream OS(Buffer);		raw_svector_ostream OS(Buffer);
if (Error E = assembleToStream(		if (Error E = assembleToStream(
State.getExegesisTarget(), State.createTargetMachine(), BC.LiveIns,		State.getExegesisTarget(), State.createTargetMachine(),
BC.Key.RegisterInitialValues,		BC.LiveIns, BC.Key.RegisterInitialValues,
Repetitor.Repeat(Instructions, kMinInstructionsForSnippet), OS)) {		Repetitor->Repeat(Instructions, kMinInstructionsForSnippet),
		OS)) {
return std::move(E);		return std::move(E);
}		}
const ExecutableFunction EF(State.createTargetMachine(),		const ExecutableFunction EF(State.createTargetMachine(),
getObjectFromBuffer(OS.str()));		getObjectFromBuffer(OS.str()));
const auto FnBytes = EF.getFunctionBytes();		const auto FnBytes = EF.getFunctionBytes();
InstrBenchmark.AssembledSnippet.assign(FnBytes.begin(), FnBytes.end());		InstrBenchmark.AssembledSnippet.insert(
		InstrBenchmark.AssembledSnippet.end(), FnBytes.begin(),
		FnBytes.end());
}		}

// Assemble NumRepetitions instructions repetitions of the snippet for		// Assemble NumRepetitions instructions repetitions of the snippet for
// measurements.		// measurements.
const auto Filler =		const auto Filler =
Repetitor.Repeat(Instructions, InstrBenchmark.NumRepetitions);		Repetitor->Repeat(Instructions, InstrBenchmark.NumRepetitions);

object::OwningBinary<object::ObjectFile> ObjectFile;		object::OwningBinary<object::ObjectFile> ObjectFile;
if (DumpObjectToDisk) {		if (DumpObjectToDisk) {
auto ObjectFilePath = writeObjectFile(BC, Filler);		auto ObjectFilePath = writeObjectFile(BC, Filler);
if (Error E = ObjectFilePath.takeError()) {		if (Error E = ObjectFilePath.takeError()) {
InstrBenchmark.Error = toString(std::move(E));		InstrBenchmark.Error = toString(std::move(E));
return InstrBenchmark;		return InstrBenchmark;
}		}
outs() << "Check generated assembly with: /usr/bin/objdump -d "		outs() << "Check generated assembly with: /usr/bin/objdump -d "
<< *ObjectFilePath << "\n";		<< *ObjectFilePath << "\n";
ObjectFile = getObjectFromFile(*ObjectFilePath);		ObjectFile = getObjectFromFile(*ObjectFilePath);
} else {		} else {
SmallString<0> Buffer;		SmallString<0> Buffer;
raw_svector_ostream OS(Buffer);		raw_svector_ostream OS(Buffer);
if (Error E = assembleToStream(State.getExegesisTarget(),		if (Error E = assembleToStream(
State.createTargetMachine(), BC.LiveIns,		State.getExegesisTarget(), State.createTargetMachine(),
BC.Key.RegisterInitialValues, Filler, OS)) {		BC.LiveIns, BC.Key.RegisterInitialValues, Filler, OS)) {
return std::move(E);		return std::move(E);
}		}
ObjectFile = getObjectFromBuffer(OS.str());		ObjectFile = getObjectFromBuffer(OS.str());
}		}

const FunctionExecutorImpl Executor(State, std::move(ObjectFile),		const FunctionExecutorImpl Executor(State, std::move(ObjectFile),
Scratch.get());		Scratch.get());
auto Measurements = runMeasurements(Executor);		auto NewMeasurements = runMeasurements(Executor);
if (Error E = Measurements.takeError()) {		if (Error E = NewMeasurements.takeError()) {
if (!E.isA<SnippetCrash>())		if (!E.isA<SnippetCrash>())
return std::move(E);		return std::move(E);
InstrBenchmark.Error = toString(std::move(E));		InstrBenchmark.Error = toString(std::move(E));
return InstrBenchmark;		return InstrBenchmark;
}		}
InstrBenchmark.Measurements = std::move(*Measurements);
assert(InstrBenchmark.NumRepetitions > 0 && "invalid NumRepetitions");		assert(InstrBenchmark.NumRepetitions > 0 && "invalid NumRepetitions");
for (BenchmarkMeasure &BM : InstrBenchmark.Measurements) {		for (BenchmarkMeasure &BM : *NewMeasurements) {
// Scale the measurements by instruction.		// Scale the measurements by instruction.
BM.PerInstructionValue /= InstrBenchmark.NumRepetitions;		BM.PerInstructionValue /= InstrBenchmark.NumRepetitions;
// Scale the measurements by snippet.		// Scale the measurements by snippet.
BM.PerSnippetValue *= static_cast<double>(Instructions.size()) /		BM.PerSnippetValue *= static_cast<double>(Instructions.size()) /
InstrBenchmark.NumRepetitions;		InstrBenchmark.NumRepetitions;
}		}
		if (InstrBenchmark.Measurements.empty()) {
		InstrBenchmark.Measurements = std::move(*NewMeasurements);
		continue;
		}

		assert(Repetitors.size() > 1 && !InstrBenchmark.Measurements.empty() &&
		"We're in an 'min' repetition mode, and need to aggregate new "
		"result to the existing result.");
		assert(InstrBenchmark.Measurements.size() == NewMeasurements->size() &&
		"Expected to have identical number of measurements.");
		for (auto I : zip(InstrBenchmark.Measurements, *NewMeasurements)) {
		BenchmarkMeasure &Measurement = std::get<0>(I);
		BenchmarkMeasure &NewMeasurement = std::get<1>(I);
		assert(Measurement.Key == NewMeasurement.Key &&
		"Expected measurements to be symmetric");

		Measurement.PerInstructionValue = std::min(
		Measurement.PerInstructionValue, NewMeasurement.PerInstructionValue);
		Measurement.PerSnippetValue =
		std::min(Measurement.PerSnippetValue, NewMeasurement.PerSnippetValue);
		}
		}

		// We successfully measured everything, so don't discard the results.
		CBOR.disarm();
return InstrBenchmark;		return InstrBenchmark;
}		}

Expected<std::string>		Expected<std::string>
BenchmarkRunner::writeObjectFile(const BenchmarkCode &BC,		BenchmarkRunner::writeObjectFile(const BenchmarkCode &BC,
const FillFunction &FillFunction) const {		const FillFunction &FillFunction) const {
int ResultFD = 0;		int ResultFD = 0;
SmallString<256> ResultPath;		SmallString<256> ResultPath;
Show All 16 Lines

llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	std::unique_ptr<const SnippetRepetitor>			std::unique_ptr<const SnippetRepetitor>
	SnippetRepetitor::Create(InstructionBenchmark::RepetitionModeE Mode,			SnippetRepetitor::Create(InstructionBenchmark::RepetitionModeE Mode,
	const LLVMState &State) {			const LLVMState &State) {
	switch (Mode) {			switch (Mode) {
	case InstructionBenchmark::Duplicate:			case InstructionBenchmark::Duplicate:
	return std::make_unique<DuplicateSnippetRepetitor>(State);			return std::make_unique<DuplicateSnippetRepetitor>(State);
	case InstructionBenchmark::Loop:			case InstructionBenchmark::Loop:
	return std::make_unique<LoopSnippetRepetitor>(State);			return std::make_unique<LoopSnippetRepetitor>(State);
				case InstructionBenchmark::AggregateMin:
				gchateletUnsubmitted Done Reply Inline Actions I'd rather have an empty case for `InstructionBenchmark::AggregateMin` and not use `default`. This will help if we ever add a new mode. gchatelet: I'd rather have an empty case for `InstructionBenchmark::AggregateMin` and not use `default`.
				break;
	}			}
	llvm_unreachable("Unknown RepetitionModeE enum");			llvm_unreachable("Unknown RepetitionModeE enum");
	}			}

	} // namespace exegesis			} // namespace exegesis
	} // namespace llvm			} // namespace llvm

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(exegesis::InstructionBenchmark::Latency, "latency",
// When not asking for a specific benchmark mode,		// When not asking for a specific benchmark mode,
// we'll analyse the results.		// we'll analyse the results.
clEnumValN(exegesis::InstructionBenchmark::Unknown, "analysis",		clEnumValN(exegesis::InstructionBenchmark::Unknown, "analysis",
"Analysis")));		"Analysis")));

static cl::opt<exegesis::InstructionBenchmark::RepetitionModeE> RepetitionMode(		static cl::opt<exegesis::InstructionBenchmark::RepetitionModeE> RepetitionMode(
"repetition-mode", cl::desc("how to repeat the instruction snippet"),		"repetition-mode", cl::desc("how to repeat the instruction snippet"),
cl::cat(BenchmarkOptions),		cl::cat(BenchmarkOptions),
cl::values(clEnumValN(exegesis::InstructionBenchmark::Duplicate,		cl::values(
"duplicate", "Duplicate the snippet"),		clEnumValN(exegesis::InstructionBenchmark::Duplicate, "duplicate",
		"Duplicate the snippet"),
clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",		clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",
"Loop over the snippet")));		"Loop over the snippet"),
		clEnumValN(exegesis::InstructionBenchmark::AggregateMin, "min",
		"All of the above and take the minimum of measurements")),
		cl::init(exegesis::InstructionBenchmark::Duplicate));

static cl::opt<unsigned>		static cl::opt<unsigned>
NumRepetitions("num-repetitions",		NumRepetitions("num-repetitions",
cl::desc("number of time to repeat the asm snippet"),		cl::desc("number of time to repeat the asm snippet"),
cl::cat(BenchmarkOptions), cl::init(10000));		cl::cat(BenchmarkOptions), cl::init(10000));

static cl::opt<unsigned> MaxConfigsPerOpcode(		static cl::opt<unsigned> MaxConfigsPerOpcode(
"max-configs-per-opcode",		"max-configs-per-opcode",
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	#endif
const std::unique_ptr<BenchmarkRunner> Runner = ExitOnErr(		const std::unique_ptr<BenchmarkRunner> Runner = ExitOnErr(
State.getExegesisTarget().createBenchmarkRunner(BenchmarkMode, State));		State.getExegesisTarget().createBenchmarkRunner(BenchmarkMode, State));
if (!Runner) {		if (!Runner) {
ExitWithError("cannot create benchmark runner");		ExitWithError("cannot create benchmark runner");
}		}

const auto Opcodes = getOpcodesOrDie(State.getInstrInfo());		const auto Opcodes = getOpcodesOrDie(State.getInstrInfo());

const auto Repetitor = SnippetRepetitor::Create(RepetitionMode, State);		SmallVector<std::unique_ptr<const SnippetRepetitor>, 2> Repetitors;
		if (RepetitionMode != InstructionBenchmark::RepetitionModeE::AggregateMin)
		Repetitors.emplace_back(SnippetRepetitor::Create(RepetitionMode, State));
		else {
		for (InstructionBenchmark::RepetitionModeE RepMode :
		{InstructionBenchmark::RepetitionModeE::Duplicate,
		InstructionBenchmark::RepetitionModeE::Loop})
		Repetitors.emplace_back(SnippetRepetitor::Create(RepMode, State));
		}

		BitVector AllReservedRegs;
		llvm::for_each(Repetitors,
		[&AllReservedRegs](
		const std::unique_ptr<const SnippetRepetitor> &Repetitor) {
		AllReservedRegs \|= Repetitor->getReservedRegs();
		});

std::vector<BenchmarkCode> Configurations;		std::vector<BenchmarkCode> Configurations;
if (!Opcodes.empty()) {		if (!Opcodes.empty()) {
for (const unsigned Opcode : Opcodes) {		for (const unsigned Opcode : Opcodes) {
// Ignore instructions without a sched class if		// Ignore instructions without a sched class if
// -ignore-invalid-sched-class is passed.		// -ignore-invalid-sched-class is passed.
if (IgnoreInvalidSchedClass &&		if (IgnoreInvalidSchedClass &&
State.getInstrInfo().get(Opcode).getSchedClass() == 0) {		State.getInstrInfo().get(Opcode).getSchedClass() == 0) {
errs() << State.getInstrInfo().getName(Opcode)		errs() << State.getInstrInfo().getName(Opcode)
<< ": ignoring instruction without sched class\n";		<< ": ignoring instruction without sched class\n";
continue;		continue;
}		}
auto ConfigsForInstr =
generateSnippets(State, Opcode, Repetitor->getReservedRegs());		auto ConfigsForInstr = generateSnippets(State, Opcode, AllReservedRegs);
if (!ConfigsForInstr) {		if (!ConfigsForInstr) {
logAllUnhandledErrors(		logAllUnhandledErrors(
ConfigsForInstr.takeError(), errs(),		ConfigsForInstr.takeError(), errs(),
Twine(State.getInstrInfo().getName(Opcode)).concat(": "));		Twine(State.getInstrInfo().getName(Opcode)).concat(": "));
continue;		continue;
}		}
std::move(ConfigsForInstr->begin(), ConfigsForInstr->end(),		std::move(ConfigsForInstr->begin(), ConfigsForInstr->end(),
std::back_inserter(Configurations));		std::back_inserter(Configurations));
}		}
} else {		} else {
Configurations = ExitOnErr(readSnippets(State, SnippetsFile));		Configurations = ExitOnErr(readSnippets(State, SnippetsFile));
}		}

if (NumRepetitions == 0) {		if (NumRepetitions == 0) {
ExitOnErr.setBanner("llvm-exegesis: ");		ExitOnErr.setBanner("llvm-exegesis: ");
ExitWithError("--num-repetitions must be greater than zero");		ExitWithError("--num-repetitions must be greater than zero");
}		}

// Write to standard output if file is not set.		// Write to standard output if file is not set.
if (BenchmarkFile.empty())		if (BenchmarkFile.empty())
BenchmarkFile = "-";		BenchmarkFile = "-";

for (const BenchmarkCode &Conf : Configurations) {		for (const BenchmarkCode &Conf : Configurations) {
InstructionBenchmark Result = ExitOnErr(Runner->runConfiguration(		InstructionBenchmark Result = ExitOnErr(Runner->runConfiguration(
Conf, NumRepetitions, *Repetitor, DumpObjectToDisk));		Conf, NumRepetitions, Repetitors, DumpObjectToDisk));
ExitOnFileError(BenchmarkFile, Result.writeYaml(State, BenchmarkFile));		ExitOnFileError(BenchmarkFile, Result.writeYaml(State, BenchmarkFile));
}		}
exegesis::pfm::pfmTerminate();		exegesis::pfm::pfmTerminate();
}		}

// Prints the results of running analysis pass `Pass` to file `OutputFilename`		// Prints the results of running analysis pass `Pass` to file `OutputFilename`
// if OutputFilename is non-empty.		// if OutputFilename is non-empty.
template <typename Pass>		template <typename Pass>
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] 'Min' repetition modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 254436

llvm/docs/CommandGuide/llvm-exegesis.rst

llvm/tools/llvm-exegesis/lib/BenchmarkResult.h

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h

llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp

llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp

llvm/tools/llvm-exegesis/llvm-exegesis.cpp

[llvm-exegesis] 'Min' repetition mode
ClosedPublic