This is an archive of the discontinued LLVM Phabricator instance.

Differential D126394

[trace][intelpt] Support system-wide tracing [14] - Decode per cpu
ClosedPublic

Authored by wallace on May 25 2022, 10:03 AM.

Download Raw Diff

Details

Reviewers

jj10306

Commits

rGa19fcc2bec81: [trace][intelpt] Support system-wide tracing [14] - Decode per cpu

Summary

This is the final functional patch to support intel pt decoding per cpu.
It works by doing the following:

First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core.
Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons.
With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded.
Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.

I'm adding a test as well.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wallace created this revision.May 25 2022, 10:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 10:03 AM

wallace requested review of this revision.May 25 2022, 10:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 10:03 AM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B166304: Diff 432036.May 25 2022, 10:04 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptMay 25 2022, 10:04 AM

finish diff

Herald added subscribers: mgrang, mgorny. · View Herald TranscriptJun 3 2022, 11:31 AM

Harbormaster completed remote builds in B167755: Diff 434081.Jun 3 2022, 11:31 AM

update test files

Harbormaster completed remote builds in B167763: Diff 434093.Jun 3 2022, 11:50 AM

jj10306 requested changes to this revision.Jun 12 2022, 12:22 PM

jj10306 added inline comments.

lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h
83–85	why get rid of chronos here?
lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
22–38	nit: Is there a reason these classes/structs are not declared in the header file?
100–102	is this what you intended to say here?
353	Why is this the case?
lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
21–24	docs
47–53	docs
lldb/source/Plugins/Trace/intel-pt/PerfContextSwitchDecoder.cpp
16–82	why not put these declarations in the header file?
228	what's this doing here?
240	does this need to be a lambda? iiuc this is only called once (at the end of this function), so it seems like this could just be placed inline instead of being in a lambda
lldb/source/Plugins/Trace/intel-pt/PerfContextSwitchDecoder.h
2	do the structures/logic contained in this file (and its .cpp) belong with the other Perf logic of LLDB or should does this belong with the intelpt specific code?
125
lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp
43
64	I think naming this subtraces would help as it makes it makes the distinction between the subtraces and thread executions more clear while reading the code below.
84
93	perhaps I'm misunderstanding the intention of this counter, but won't this be incremented for all subtraces that don't belong to the specific ThreadContinousExecution currently being operated on instead of being incremented for the subtraces that don't belong to any ThreadContinousExecution?
121	Should this be named differently since this is doing much more than decoding the context switches (it splits the intel pt traces and correlates them with the context switch executions)?
lldb/source/Target/Trace.cpp
415

This revision now requires changes to proceed.Jun 12 2022, 12:22 PM

wallace added inline comments.Jun 14 2022, 3:26 PM

lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h
83–85	because chronos uses signed integers and we lose precision by using it
lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
22–38	they are not used by anyone else, so just that
100–102	hehe yes
lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
21–24	+1
47–53	+1
lldb/source/Plugins/Trace/intel-pt/PerfContextSwitchDecoder.cpp
16–82	they are not used externally, so I rather not expose them until we really need to
228	lol, forgive this poor man
240	I'm using this lambda trick to capture any possible error returned by the complex logic in it and then append some text to this error. This would be equivalent to moving the lambda to its own function. I'm fine with creating another function for this case, but the lambda seems fine to me
lldb/source/Plugins/Trace/intel-pt/PerfContextSwitchDecoder.h
2	the other Perf.h file is lldb-server only, so they can't be together nor merged :(
125	+1
lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp
43	owch, thank you
64	+1
84	thanks
93	not really. In this case it is any ThreadContinousExecution. You need to notice a few things: the intel pt subtraces (or executions), are sorted by time on_new_thread_execution is invoked sorted by time when we process a new thread execution, we look for the intel pt subtraces that we haven't processed yet that happened no later than our current execution. If we find a subtrace that happened before the current execution, then that means that we don't have any thread execution for this subtrace, regardless of which thread it was
121	+1
lldb/source/Target/Trace.cpp
415	i've gotten rid of the set

Herald added a subscriber: Michael137. · View Herald TranscriptJun 14 2022, 3:26 PM

This was improved in https://reviews.llvm.org/D127804

jj10306 accepted this revision.Jun 15 2022, 9:36 AM

This revision is now accepted and ready to land.Jun 15 2022, 9:36 AM

This revision was landed with ongoing or failed builds.Jun 16 2022, 11:23 AM

Closed by commit rGa19fcc2bec81: [trace][intelpt] Support system-wide tracing [14] - Decode per cpu (authored by Walter Erquinigo <wallace@fb.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Walter Erquinigo <wallace@fb.com> added a commit: rGa19fcc2bec81: [trace][intelpt] Support system-wide tracing [14] - Decode per cpu.

Revision Contents

Path

Size

lldb/

include/

lldb/

Target/

Trace.h

14 lines

Utility/

TraceIntelPTGDBRemotePackets.h

4 lines

source/

Plugins/

Trace/

intel-pt/

1 line

2 lines

34 lines

231 lines

PerfContextSwitchDecoder.h

146 lines

PerfContextSwitchDecoder.cpp

281 lines

TraceIntelPTJSONStructs.cpp

6 lines

TraceIntelPTMultiCoreDecoder.h

103 lines

TraceIntelPTMultiCoreDecoder.cpp

347 lines

Target/

Trace.cpp

40 lines

Utility/

TraceIntelPTGDBRemotePackets.cpp

11 lines

test/

API/

commands/

trace/

TestTraceLoad.py

12 lines

intelpt-multi-core-trace/

cores/

45.intelpt_trace

45.perf_context_switch_trace

51.intelpt_trace

51.perf_context_switch_trace

modules/

m.out

multi_thread.cpp

34 lines

trace.json

51 lines

multiple-threads/

TestTraceStartStopMultipleThreads.py

2 lines

unittests/

Process/

Linux/

PerfTests.cpp

8 lines

Diff 437621

lldb/include/lldb/Target/Trace.h

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	public:

/// \return		/// \return
/// The stop ID of the live process being traced, or an invalid stop ID		/// The stop ID of the live process being traced, or an invalid stop ID
/// if the trace is in an error or invalid state.		/// if the trace is in an error or invalid state.
uint32_t GetStopID();		uint32_t GetStopID();

using OnBinaryDataReadCallback =		using OnBinaryDataReadCallback =
std::function<llvm::Error(llvm::ArrayRef<uint8_t> data)>;		std::function<llvm::Error(llvm::ArrayRef<uint8_t> data)>;
		using OnCoresBinaryDataReadCallback = std::function<llvm::Error(
		const llvm::DenseMap<lldb::core_id_t, llvm::ArrayRef<uint8_t>>
		&core_to_data)>;

/// Fetch binary data associated with a thread, either live or postmortem, and		/// Fetch binary data associated with a thread, either live or postmortem, and
/// pass it to the given callback. The reason of having a callback is to free		/// pass it to the given callback. The reason of having a callback is to free
/// the caller from having to manage the life cycle of the data and to hide		/// the caller from having to manage the life cycle of the data and to hide
/// the different data fetching procedures that exist for live and post mortem		/// the different data fetching procedures that exist for live and post mortem
/// threads.		/// threads.
///		///
/// The fetched data is not persisted after the callback is invoked.		/// The fetched data is not persisted after the callback is invoked.
Show All 36 Lines	public:
///		///
/// \return		/// \return
/// An \a llvm::Error if the data couldn't be fetched, or the return value		/// An \a llvm::Error if the data couldn't be fetched, or the return value
/// of the callback, otherwise.		/// of the callback, otherwise.
llvm::Error OnCoreBinaryDataRead(lldb::core_id_t core_id,		llvm::Error OnCoreBinaryDataRead(lldb::core_id_t core_id,
llvm::StringRef kind,		llvm::StringRef kind,
OnBinaryDataReadCallback callback);		OnBinaryDataReadCallback callback);

		/// Similar to \a OnCoreBinaryDataRead but this is able to fetch the same data
		/// from multiple cores at once.
		llvm::Error OnCoresBinaryDataRead(const std::set<lldb::core_id_t> core_ids,
		llvm::StringRef kind,
		OnCoresBinaryDataReadCallback callback);

/// \return		/// \return
/// All the currently traced processes.		/// All the currently traced processes.
std::vector<Process *> GetAllProcesses();		std::vector<Process *> GetAllProcesses();

/// \return		/// \return
/// The list of cores being traced. Might be empty depending on the		/// The list of cores being traced. Might be empty depending on the
/// plugin.		/// plugin.
llvm::ArrayRef<lldb::core_id_t> GetTracedCores();		llvm::ArrayRef<lldb::core_id_t> GetTracedCores();
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	private:
/// \{		/// \{

/// tid -> data kind -> size		/// tid -> data kind -> size
llvm::DenseMap<lldb::tid_t, std::unordered_map<std::string, uint64_t>>		llvm::DenseMap<lldb::tid_t, std::unordered_map<std::string, uint64_t>>
m_live_thread_data;		m_live_thread_data;

/// core id -> data kind -> size		/// core id -> data kind -> size
llvm::DenseMap<lldb::core_id_t, std::unordered_map<std::string, uint64_t>>		llvm::DenseMap<lldb::core_id_t, std::unordered_map<std::string, uint64_t>>
		m_live_core_data_sizes;
		/// core id -> data kind -> bytes
		llvm::DenseMap<lldb::core_id_t,
		std::unordered_map<std::string, std::vector<uint8_t>>>
m_live_core_data;		m_live_core_data;

/// data kind -> size		/// data kind -> size
std::unordered_map<std::string, uint64_t> m_live_process_data;		std::unordered_map<std::string, uint64_t> m_live_process_data;
/// \}		/// \}

/// The list of cores being traced. Might be \b None depending on the plug-in.		/// The list of cores being traced. Might be \b None depending on the plug-in.
llvm::Optional<std::vector<lldb::core_id_t>> m_cores;		llvm::Optional<std::vector<lldb::core_id_t>> m_cores;

/// Postmortem traces can specific additional data files, which are		/// Postmortem traces can specific additional data files, which are
Show All 19 Lines

lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	struct LinuxPerfZeroTscConversion {
// See 'time_zero' section of		// See 'time_zero' section of
// https://man7.org/linux/man-pages/man2/perf_event_open.2.html		// https://man7.org/linux/man-pages/man2/perf_event_open.2.html
///		///
/// \param[in] tsc		/// \param[in] tsc
/// The TSC value to be converted.		/// The TSC value to be converted.
///		///
/// \return		/// \return
/// Nanosecond wall time.		/// Nanosecond wall time.
std::chrono::nanoseconds ToNanos(uint64_t tsc) const;		uint64_t ToNanos(uint64_t tsc) const;

uint64_t ToTSC(std::chrono::nanoseconds nanos) const;		uint64_t ToTSC(uint64_t nanos) const;
		jj10306Unsubmitted Not Done Reply Inline Actions why get rid of chronos here? jj10306: why get rid of chronos here?
		wallaceAuthorUnsubmitted Done Reply Inline Actions because chronos uses signed integers and we lose precision by using it wallace: because chronos uses signed integers and we lose precision by using it

uint32_t time_mult;		uint32_t time_mult;
uint16_t time_shift;		uint16_t time_shift;
uint64_t time_zero;		uint64_t time_zero;
};		};

struct TraceIntelPTGetStateResponse : TraceGetStateResponse {		struct TraceIntelPTGetStateResponse : TraceGetStateResponse {
/// The TSC to wall time conversion if it exists, otherwise \b nullptr.		/// The TSC to wall time conversion if it exists, otherwise \b nullptr.
Show All 17 Lines

lldb/source/Plugins/Trace/intel-pt/CMakeLists.txt

Show All 12 Lines	lldb_tablegen(TraceIntelPTCommandOptions.inc -gen-lldb-option-defs
SOURCE TraceIntelPTOptions.td		SOURCE TraceIntelPTOptions.td
TARGET TraceIntelPTOptionsGen)		TARGET TraceIntelPTOptionsGen)

add_lldb_library(lldbPluginTraceIntelPT PLUGIN		add_lldb_library(lldbPluginTraceIntelPT PLUGIN
CommandObjectTraceStartIntelPT.cpp		CommandObjectTraceStartIntelPT.cpp
DecodedThread.cpp		DecodedThread.cpp
TaskTimer.cpp		TaskTimer.cpp
LibiptDecoder.cpp		LibiptDecoder.cpp
		PerfContextSwitchDecoder.cpp
ThreadDecoder.cpp		ThreadDecoder.cpp
TraceCursorIntelPT.cpp		TraceCursorIntelPT.cpp
TraceIntelPT.cpp		TraceIntelPT.cpp
TraceIntelPTJSONStructs.cpp		TraceIntelPTJSONStructs.cpp
TraceIntelPTMultiCoreDecoder.cpp		TraceIntelPTMultiCoreDecoder.cpp
TraceIntelPTSessionFileParser.cpp		TraceIntelPTSessionFileParser.cpp
TraceIntelPTSessionSaver.cpp		TraceIntelPTSessionSaver.cpp

Show All 12 Lines

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	public:
void RecordTscError(int libipt_error_code);		void RecordTscError(int libipt_error_code);

/// The approximate size in bytes used by this instance,		/// The approximate size in bytes used by this instance,
/// including all the already decoded instructions.		/// including all the already decoded instructions.
size_t CalculateApproximateMemoryUsage() const;		size_t CalculateApproximateMemoryUsage() const;

lldb::ThreadSP GetThread();		lldb::ThreadSP GetThread();

private:
/// Append a decoding error given an llvm::Error.		/// Append a decoding error given an llvm::Error.
void AppendError(llvm::Error &&error);		void AppendError(llvm::Error &&error);

		private:
/// Notify this class that the last added instruction or error has		/// Notify this class that the last added instruction or error has
/// an associated TSC.		/// an associated TSC.
void RecordTscForLastInstruction(uint64_t tsc);		void RecordTscForLastInstruction(uint64_t tsc);

/// When adding new members to this class, make sure		/// When adding new members to this class, make sure
/// to update \a CalculateApproximateMemoryUsage() accordingly.		/// to update \a CalculateApproximateMemoryUsage() accordingly.
lldb::ThreadSP m_thread_sp;		lldb::ThreadSP m_thread_sp;
/// The low level storage of all instruction addresses. Each instruction has		/// The low level storage of all instruction addresses. Each instruction has
Show All 37 Lines

lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h

	//===-- LibiptDecoder.h --======---------------------------------- C++ --===//			//===-- LibiptDecoder.h --======---------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H			#ifndef LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H
	#define LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H			#define LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H

	#include "intel-pt.h"

	#include "DecodedThread.h"			#include "DecodedThread.h"
				#include "PerfContextSwitchDecoder.h"
	#include "forward-declarations.h"			#include "forward-declarations.h"

				#include "intel-pt.h"

	namespace lldb_private {			namespace lldb_private {
	namespace trace_intel_pt {			namespace trace_intel_pt {

				struct IntelPTThreadSubtrace {
				uint64_t tsc;
				uint64_t psb_offset;
				};
				jj10306Unsubmitted Not Done Reply Inline Actions docs jj10306: docs
				wallaceAuthorUnsubmitted Done Reply Inline Actions +1 wallace: +1

				/// This struct represents a continuous execution of a thread in a core,
				/// delimited by a context switch in and out, and a list of Intel PT subtraces
				/// that belong to this execution.
				struct IntelPTThreadContinousExecution {
				ThreadContinuousExecution thread_execution;
				std::vector<IntelPTThreadSubtrace> intelpt_subtraces;

				IntelPTThreadContinousExecution(
				const ThreadContinuousExecution &thread_execution)
				: thread_execution(thread_execution) {}

				/// Comparator by time
				bool operator<(const IntelPTThreadContinousExecution &o) const;
				};

	/// Decode a raw Intel PT trace given in \p buffer and append the decoded			/// Decode a raw Intel PT trace given in \p buffer and append the decoded
	/// instructions and errors in \p decoded_thread. It uses the low level libipt			/// instructions and errors in \p decoded_thread. It uses the low level libipt
	/// library underneath.			/// library underneath.
	void DecodeTrace(DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,			void DecodeTrace(DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,
	llvm::ArrayRef<uint8_t> buffer);			llvm::ArrayRef<uint8_t> buffer);

				void DecodeTrace(
				DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,
				const llvm::DenseMap<lldb::core_id_t, llvm::ArrayRef<uint8_t>> &buffers,
				const std::vector<IntelPTThreadContinousExecution> &executions);

				llvm::Expected<std::vector<IntelPTThreadSubtrace>>
				SplitTraceInContinuousExecutions(TraceIntelPT &trace_intel_pt,
				jj10306Unsubmitted Not Done Reply Inline Actions docs jj10306: docs
				wallaceAuthorUnsubmitted Done Reply Inline Actions +1 wallace: +1
				llvm::ArrayRef<uint8_t> buffer);

	} // namespace trace_intel_pt			} // namespace trace_intel_pt
	} // namespace lldb_private			} // namespace lldb_private

	#endif // LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H			#endif // LLDB_SOURCE_PLUGINS_TRACE_LIBIPT_DECODER_H

lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp

Show All 13 Lines

using namespace lldb;

using namespace lldb_private;

using namespace lldb_private::trace_intel_pt;

using namespace llvm;

// Simple struct used by the decoder to keep the state of the most

// recent TSC and a flag indicating whether TSCs are enabled, not enabled

// or we just don't yet.

struct TscInfo {

uint64_t tsc = 0;

LazyBool has_tsc = eLazyBoolCalculate;

explicit operator bool() const { return has_tsc == eLazyBoolYes; }

};

/// Class that decodes a raw buffer for a single thread using the low level

/// libipt library.

///

/// Throughout this code, the status of the decoder will be used to identify

/// events needed to be processed or errors in the decoder. The values can be

/// - negative: actual errors

/// - positive or zero: not an error, but a list of bits signaling the status

/// of the decoder, e.g. whether there are events that need to be decoded or

/// not.

class LibiptDecoder {

jj10306Unsubmitted

Not Done

nit: Is there a reason these classes/structs are not declared in the header file?

jj10306: nit: Is there a reason these classes/structs are not declared in the header file?

wallaceAuthorUnsubmitted

Done

they are not used by anyone else, so just that

wallace: they are not used by anyone else, so just that

public:

/// \param[in] decoder

/// A well configured decoder. Using the current state of that decoder,

/// decoding will start at its next valid PSB. It's not assumed that the

/// decoder is already pointing at a valid PSB.

///

/// \param[in] decoded_thread

/// A \a DecodedThread object where the decoded instructions will be

/// appended to. It might have already some instructions.

LibiptDecoder(pt_insn_decoder &decoder, DecodedThread &decoded_thread)

: m_decoder(decoder), m_decoded_thread(decoded_thread) {}

/// Decode all the instructions until the end of the trace.

/// The decoding flow is based on

/// https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#the-instruction-flow-decode-loop

/// https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#the-instruction-flow-decode-loop.

/// but with some relaxation to allow for gaps in the trace.

void DecodeUntilEndOfTrace() {

int status = pte_ok;

// Multiple loops indicate gaps in the trace, which are found by the inner

while (!IsLibiptError(status = FindNextSynchronizationPoint())) {

// call to DecodeInstructionsAndEvents.

// We have synchronized, so we can start decoding instructions and

while (true) {

// events.

int status = pt_insn_sync_forward(&m_decoder);

// Multiple loops indicate gaps in the trace.

if (IsLibiptError(status)) {

m_decoded_thread.Append(DecodedInstruction(status));

break;

}

DecodeInstructionsAndEvents(status);

}

/// Decode all the instructions that belong to the same PSB packet given its

/// offset.

void DecodePSB(uint64_t psb_offset) {

int status = pt_insn_sync_set(&m_decoder, psb_offset);

if (IsLibiptError(status)) {

m_decoded_thread.Append(DecodedInstruction(status));

return;

}

DecodeInstructionsAndEvents(status, /*stop_on_psb_change=*/true);

}

private:

/// Invoke the low level function \a pt_insn_next and store the decoded

/// instruction in the given \a DecodedInstruction.

///

/// \param[out] insn

/// The instruction builder where the pt_insn information will be stored.

///

/// \return

/// The status returned by pt_insn_next.

int DecodeNextInstruction(DecodedInstruction &insn) {

return pt_insn_next(&m_decoder, &insn.pt_insn, sizeof(insn.pt_insn));

}

/// Decode all the instructions and events until an error is found or the end

/// Decode all the instructions and events until an error is found, the end

/// of the trace is reached.

/// of the trace is reached, or optionally a new PSB is reached.

///

/// \param[in] status

/// The status that was result of synchronizing to the most recent PSB.

void DecodeInstructionsAndEvents(int status) {

///

while (DecodedInstruction insn = ProcessPTEvents(status)) {

/// \param[in] stop_on_psb_change

/// If \b true, decoding

/// An optional offset to a given PSB. Decoding stops if a different PSB is

/// reached.

jj10306Unsubmitted

Not Done

/// \param[in] stop_on_psb_change

- /// If \b true, decoding

- /// An optional offset to a given PSB. Decoding stops if a different PSB is

+ /// If \b true, decoding stops if a different PSB is

/// reached.

void DecodeInstructionsAndEvents(int status,

is this what you intended to say here?

jj10306: is this what you intended to say here?

wallaceAuthorUnsubmitted

Done

hehe yes

wallace: hehe yes

void DecodeInstructionsAndEvents(int status,

bool stop_on_psb_change = false) {

uint64_t psb_offset;

pt_insn_get_sync_offset(&m_decoder,

&psb_offset); // this can't fail because we got here

while (true) {

DecodedInstruction insn = ProcessPTEvents(status);

if (!insn) {

m_decoded_thread.Append(insn);

break;

}

if (stop_on_psb_change) {

uint64_t cur_psb_offset;

pt_insn_get_sync_offset(

&m_decoder, &cur_psb_offset); // this can't fail because we got here

if (cur_psb_offset != psb_offset)

break;

}

// The status returned by DecodeNextInstruction will need to be processed

// by ProcessPTEvents in the next loop if it is not an error.

if (IsLibiptError(status = DecodeNextInstruction(insn))) {

insn.libipt_error = status;

m_decoded_thread.Append(insn);

break;

}

m_decoded_thread.Append(insn);

Show All 14 Lines

private:

int FindNextSynchronizationPoint() {

// Try to sync the decoder. If it fails, then get the decoder_offset and

// try to sync again from the next synchronization point. If the

// new_decoder_offset is same as decoder_offset then we can't move to the

// next synchronization point. Otherwise, keep resyncing until either end

// of trace stream (eos) is reached or pt_insn_sync_forward() passes.

int status = pt_insn_sync_forward(&m_decoder);

if (!IsEndOfStream(status) && IsLibiptError(status)) {

uint64_t decoder_offset = 0;

int errcode_off = pt_insn_get_offset(&m_decoder, &decoder_offset);

if (!IsLibiptError(errcode_off)) { // we could get the offset

while (true) {

status = pt_insn_sync_forward(&m_decoder);

if (!IsLibiptError(status) || IsEndOfStream(status))

break;

uint64_t new_decoder_offset = 0;

errcode_off = pt_insn_get_offset(&m_decoder, &new_decoder_offset);

if (IsLibiptError(errcode_off))

break; // We can't further synchronize.

else if (new_decoder_offset <= decoder_offset) {

// We tried resyncing the decoder and it didn't make any progress

// because the offset didn't change. We will not make any further

// progress. Hence, we stop in this situation.

break;

}

// We'll try again starting from a new offset.

decoder_offset = new_decoder_offset;

}

// We make this call to record any synchronization errors.

if (IsLibiptError(status))

m_decoded_thread.Append(DecodedInstruction(status));

return status;

}

/// Before querying instructions, we need to query the events associated that

/// instruction e.g. timing events like ptev_tick, or paging events like

/// ptev_paging.

///

/// If an error is found, it will be appended to the trace.

///

/// \param[in] status

/// The status gotten from the previous instruction decoding or PSB

/// synchronization.

///

/// \return

/// A \a DecodedInstruction with event, tsc and error information.

DecodedInstruction ProcessPTEvents(int status) {

DecodedInstruction insn;

Show All 33 Lines

DecodedInstruction ProcessPTEvents(int status) {

}

// We refresh the TSC that might have changed after processing the events.

// See

// https://github.com/intel/libipt/blob/master/doc/man/pt_evt_next.3.md

RefreshTscInfo();

if (m_tsc_info)

insn.tsc = m_tsc_info.tsc;

if (!insn)

m_decoded_thread.Append(insn);

return insn;

}

/// Query the decoder for the most recent TSC timestamp and update

/// the inner tsc information accordingly.

void RefreshTscInfo() {

if (m_tsc_info.has_tsc == eLazyBoolNo)

return;

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

auto DecoderDeleter = [](pt_insn_decoder *decoder) {

pt_insn_free_decoder(decoder);

};

using PtInsnDecoderUP =

std::unique_ptr<pt_insn_decoder, decltype(DecoderDeleter)>;

static Expected<PtInsnDecoderUP>

CreateInstructionDecoder(DecodedThread &decoded_thread,

CreateInstructionDecoder(TraceIntelPT &trace_intel_pt,

TraceIntelPT &trace_intel_pt,

ArrayRef<uint8_t> buffer) {

Expected<pt_cpu> cpu_info = trace_intel_pt.GetCPUInfo();

if (!cpu_info)

return cpu_info.takeError();

pt_config config;

pt_config_init(&config);

config.cpu = *cpu_info;

int status = pte_ok;

if (IsLibiptError(status = pt_cpu_errata(&config.errata, &config.cpu)))

return make_error<IntelPTError>(status);

// The libipt library does not modify the trace buffer, hence the

// following casts are safe.

config.begin = const_cast<uint8_t *>(buffer.data());

config.end = const_cast<uint8_t *>(buffer.data() + buffer.size());

pt_insn_decoder *decoder_ptr = pt_insn_alloc_decoder(&config);

if (!decoder_ptr)

return make_error<IntelPTError>(-pte_nomem);

PtInsnDecoderUP decoder_up(decoder_ptr, DecoderDeleter);

pt_image *image = pt_insn_get_image(decoder_ptr);

return PtInsnDecoderUP(decoder_ptr, DecoderDeleter);

Process *process = decoded_thread.GetThread()->GetProcess().get();

}

static Error SetupMemoryImage(PtInsnDecoderUP &decoder_up, Process &process) {

pt_image *image = pt_insn_get_image(decoder_up.get());

int status = pte_ok;

if (IsLibiptError(

status = pt_image_set_callback(image, ReadProcessMemory, process)))

status = pt_image_set_callback(image, ReadProcessMemory, &process)))

return make_error<IntelPTError>(status);

return decoder_up;

return Error::success();

}

void lldb_private::trace_intel_pt::DecodeTrace(DecodedThread &decoded_thread,

TraceIntelPT &trace_intel_pt,

ArrayRef<uint8_t> buffer) {

Expected<PtInsnDecoderUP> decoder_up =

CreateInstructionDecoder(decoded_thread, trace_intel_pt, buffer);

CreateInstructionDecoder(trace_intel_pt, buffer);

if (!decoder_up)

return decoded_thread.SetAsFailed(decoder_up.takeError());

if (Error err = SetupMemoryImage(*decoder_up,

*decoded_thread.GetThread()->GetProcess()))

return decoded_thread.SetAsFailed(std::move(err));

LibiptDecoder libipt_decoder(*decoder_up.get(), decoded_thread);

libipt_decoder.DecodeUntilEndOfTrace();

}

void lldb_private::trace_intel_pt::DecodeTrace(

DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,

const DenseMap<lldb::core_id_t, llvm::ArrayRef<uint8_t>> &buffers,

const std::vector<IntelPTThreadContinousExecution> &executions) {

DenseMap<lldb::core_id_t, LibiptDecoder> decoders;

for (auto &core_id_buffer : buffers) {

Expected<PtInsnDecoderUP> decoder_up =

CreateInstructionDecoder(trace_intel_pt, core_id_buffer.second);

if (!decoder_up)

return decoded_thread.SetAsFailed(decoder_up.takeError());

if (Error err = SetupMemoryImage(*decoder_up,

*decoded_thread.GetThread()->GetProcess()))

return decoded_thread.SetAsFailed(std::move(err));

decoders.try_emplace(core_id_buffer.first,

LibiptDecoder(*decoder_up->release(), decoded_thread));

}

bool has_seen_psbs = false;

for (size_t i = 0; i < executions.size(); i++) {

const IntelPTThreadContinousExecution &execution = executions[i];

auto variant = execution.thread_execution.variant;

// If we haven't seen a PSB yet, then it's fine not to show errors

jj10306Unsubmitted

Not Done

Why is this the case?

jj10306: Why is this the case?

if (has_seen_psbs) {

if (execution.intelpt_subtraces.empty()) {

decoded_thread.AppendError(createStringError(

inconvertibleErrorCode(),

formatv("Unable to find intel pt data for thread execution with "

"tsc = {0} on core id = {1}",

execution.thread_execution.GetLowestKnownTSC(),

execution.thread_execution.core_id)));

}

// If the first execution is incomplete because it doesn't have a previous

// context switch in its cpu, all good.

if (variant == ThreadContinuousExecution::Variant::OnlyEnd ||

variant == ThreadContinuousExecution::Variant::HintedStart) {

decoded_thread.AppendError(createStringError(

inconvertibleErrorCode(),

formatv("Thread execution starting at tsc = {0} on core id = {1} "

"doesn't have a matching context switch in.",

execution.thread_execution.GetLowestKnownTSC(),

execution.thread_execution.core_id)));

}

LibiptDecoder &decoder =

decoders.find(execution.thread_execution.core_id)->second;

for (const IntelPTThreadSubtrace &intel_pt_execution :

execution.intelpt_subtraces) {

has_seen_psbs = true;

decoder.DecodePSB(intel_pt_execution.psb_offset);

}

// If we haven't seen a PSB yet, then it's fine not to show errors

if (has_seen_psbs) {

// If the last execution is incomplete because it doesn't have a following

// context switch in its cpu, all good.

if ((variant == ThreadContinuousExecution::Variant::OnlyStart &&

i + 1 != executions.size()) ||

variant == ThreadContinuousExecution::Variant::HintedEnd) {

decoded_thread.AppendError(createStringError(

inconvertibleErrorCode(),

formatv("Thread execution starting at tsc = {0} on core id = {1} "

"doesn't have a matching context switch out",

execution.thread_execution.GetLowestKnownTSC(),

execution.thread_execution.core_id)));

}

bool IntelPTThreadContinousExecution::operator<(

const IntelPTThreadContinousExecution &o) const {

// As the context switch might be incomplete, we look first for the first real

// PSB packet, which is a valid TSC. Otherwise, We query the thread execution

// itself for some tsc.

auto get_tsc = [](const IntelPTThreadContinousExecution &exec) {

return exec.intelpt_subtraces.empty()

? exec.thread_execution.GetLowestKnownTSC()

: exec.intelpt_subtraces.front().tsc;

};

return get_tsc(*this) < get_tsc(o);

}

Expected<std::vector<IntelPTThreadSubtrace>>

lldb_private::trace_intel_pt::SplitTraceInContinuousExecutions(

TraceIntelPT &trace_intel_pt, llvm::ArrayRef<uint8_t> buffer) {

Expected<PtInsnDecoderUP> decoder_up =

CreateInstructionDecoder(trace_intel_pt, buffer);

if (!decoder_up)

return decoder_up.takeError();

pt_insn_decoder *decoder = decoder_up.get().get();

std::vector<IntelPTThreadSubtrace> executions;

int status = pte_ok;

while (!IsLibiptError(status = pt_insn_sync_forward(decoder))) {

uint64_t tsc;

if (IsLibiptError(pt_insn_time(decoder, &tsc, nullptr, nullptr)))

return createStringError(inconvertibleErrorCode(),

"intel pt trace doesn't have TSC timestamps");

uint64_t psb_offset;

pt_insn_get_sync_offset(decoder,

&psb_offset); // this can't fail because we got here

executions.push_back({

tsc,

psb_offset,

});

}

return executions;

}

lldb/source/Plugins/Trace/intel-pt/PerfContextSwitchDecoder.h

This file was added.

//===-- PerfContextSwitchDecoder.h --======----------------------*- C++ -*-===//

jj10306Unsubmitted

Not Done

do the structures/logic contained in this file (and its .cpp) belong with the other Perf logic of LLDB or should does this belong with the intelpt specific code?

jj10306: do the structures/logic contained in this file (and its .cpp) belong with the other Perf logic…

wallaceAuthorUnsubmitted

Done

the other Perf.h file is lldb-server only, so they can't be together nor merged :(

wallace: the other Perf.h file is lldb-server only, so they can't be together nor merged :(

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#ifndef LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_PERFCONTEXTSWITCHDECODER_H

#define LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_PERFCONTEXTSWITCHDECODER_H

#include "lldb/Utility/TraceIntelPTGDBRemotePackets.h"

#include "lldb/lldb-types.h"

#include "llvm/Support/Error.h"

#include <vector>

namespace lldb_private {

namespace trace_intel_pt {

/// This class indicates the time interval in which a thread was running

/// continuously on a cpu core.

///

/// Note: we use the terms CPU and cores interchangeably.

struct ThreadContinuousExecution {

/// In most cases both the start and end of a continuous execution can be

/// accurately recovered from the context switch trace, but in some cases one

/// of these endpoints might be guessed or not known at all, due to contention

/// problems in the trace or because tracing was interrupted, e.g. with ioctl

/// calls, which causes gaps in the trace. Because of that, we identify which

/// situation we fall into with the following variants.

enum class Variant {

/// Both endpoints are known.

Complete,

/// The end is known and we have a lower bound for the start, i.e. the

/// previous execution in the same core happens strictly before the hinted

/// start.

HintedStart,

/// The start is known and we have an upper bound for the end, i.e. the next

/// execution in the same core happens strictly after the hinted end.

HintedEnd,

/// We only know the start. This might be the last entry of a core trace.

OnlyStart,

/// We only know the end. This might be the first entry or a core trace.

OnlyEnd,

} variant;

/// \return

/// The lowest tsc that we are sure of, i.e. not hinted.

uint64_t GetLowestKnownTSC() const;

/// \return

/// The known or hinted start tsc, or 0 if the variant is \a OnlyEnd.

uint64_t GetStartTSC() const;

/// \return

/// The known or hinted end tsc, or max \a uint64_t if the variant is \a

/// OnlyStart.

uint64_t GetEndTSC() const;

/// Constructors for the different variants of this object

///

/// \{