This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/source/Plugins/Trace/intel-pt/
-
source/
-
Plugins/
-
Trace/
-
intel-pt/
6
DecodedThread.h
-
DecodedThread.cpp
-
IntelPTDecoder.cpp
-
TraceCursorIntelPT.h
-
TraceCursorIntelPT.cpp

Differential D123375

[lldb][intelpt] Reduce trace memory usage by grouping instructions
AbandonedPublic

Authored by zrthxn on Apr 8 2022, 3:18 AM.

Download Raw Diff

Details

Reviewers

wallace
jj10306

Summary

Grouping instructions with the same high 48 bits and storing prefixes. Simple lookup improved to get exponentially faster instruction address lookups.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zrthxn created this revision.Apr 8 2022, 3:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 3:18 AM

zrthxn requested review of this revision.Apr 8 2022, 3:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 3:18 AM

Herald added a subscriber: lldb-commits. · View Herald Transcript

zrthxn edited the summary of this revision. (Show Details)Apr 8 2022, 3:19 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptApr 8 2022, 3:19 AM

Harbormaster completed remote builds in B158659: Diff 421472.Apr 8 2022, 3:20 AM

Example of improvement in memory usage

thread #1: tid = 42805
  Raw trace size: 4096 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 4394.58 KiB
  Average memory usage per instruction: 5.00 bytes

where previously the same trace took ~12 KiB and 13 bytes per instruction

I did a first pass on this diff. I'm asking to refactor a bit the InstructionBlock classes so that they are smarter. Besides that, if you use IDs more ubiquitously and stop using instruction indices everywhere, everything becomes much simpler.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
112–135	Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address. Let's also try to remove numbers from the comments, so that if we modify them, we don't need to update the comment. It's hard to update comments because the compiler will never complaint if the comment doesn't make sense anymore. Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned. If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file. Also, as this struct is not just a simple bag of data, let's make it a proper class. I renamed a few things to create a nicer API. First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise. Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block.
173–201	all of these become simpler if you work directly with the id instead of the instruction index
260–263	you can move these two to the new InstructionsBlock class
271	if you convert this from `instruction index -> TSC` to `instruction id -> TSC`, then everything becomes simpler
274–276	i have the impression that you don't need this anymore
279	you can also make it `instruction id -> error message`

This revision now requires changes to proceed.Apr 11 2022, 5:11 PM

I've tried most of these changes to see the effect it has, but in my opinion this adds quite a lot of code complexity for not enough benefit in terms of memory usage which was our goal. I think this will make the DecodedThread even more of a monolith class with single-use subclasses... are you sure we should continue with this?
We can still use instruction ID more widely though without having to turn instruction block into a class.

let me think about it :)

zrthxn abandoned this revision.Jun 17 2022, 11:38 AM

Herald added a subscriber: Michael137. · View Herald TranscriptJun 17 2022, 11:38 AM

Revision Contents

Path

Size

lldb/

source/

Plugins/

Trace/

intel-pt/

59 lines

218 lines

2 lines

2 lines

TraceCursorIntelPT.cpp

25 lines

Diff 421472

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines public:

struct LibiptErrors { struct LibiptErrors {

// libipt error -> count // libipt error -> count

llvm::DenseMap<const char *, int> libipt_errors; llvm::DenseMap<const char *, int> libipt_errors;

int total_count = 0; int total_count = 0;

void RecordError(int libipt_error_code); void RecordError(int libipt_error_code);

}; };

/// \struct InstructionPointer

/// Structure that stores a 6 byte prefix a list of 2 byte suffixes for each

/// instruction that is appended to the trace, in order. Concatenating the

/// prefix and suffix for any instruction ID (which is the concatenation of

/// two indices) will give the instruction load address.

///

/// Generally most consecutive instructions will be close. Unless you are

/// doing a function call between different shared libraries, all your

/// instructions should be close in memory. This means that the addresses of

/// most consecutive instructions will have the same prefix. For we'll divide

/// the instruction addresses by the first 48 bits (6 bytes).

/// Each new instruction will now occupy 2 bytes. And each time you see a

/// change in the first 6 bytes, you'll pay the cost of storing that prefix,

/// which takes 8 bytes.

struct InstructionPointer {

uint64_t insn_prefix;

std::vector<uint16_t> suffixes;

InstructionPointer(uint64_t prefix, uint16_t suffix)

: insn_prefix(prefix), suffixes(std::vector<uint16_t>({suffix})){};

size_t AppendInstructionSuffix(uint16_t suffix);

uint64_t GetFullInstructionAddress(size_t suffix_index) const;

};

wallaceUnsubmitted

Not Done

void RecordError(int libipt_error_code);

};

- /// \struct InstructionPointer

- /// Structure that stores a 6 byte prefix a list of 2 byte suffixes for each

- /// instruction that is appended to the trace, in order. Concatenating the

- /// prefix and suffix for any instruction ID (which is the concatenation of

- /// two indices) will give the instruction load address.

+ /// \class InstructionsBlock

+ /// Class that stores a sequence of instructions with a common prefix.

+ so that each individual instruction is stored using its suffix. The size of the suffix

+ /// can be set with the template parameter.

///

- /// Generally most consecutive instructions will be close. Unless you are

+ /// Generally most consecutive instructions will be "close". Unless you are

/// doing a function call between different shared libraries, all your

/// instructions should be close in memory. This means that the addresses of

- /// most consecutive instructions will have the same prefix. For we'll divide

- /// the instruction addresses by the first 48 bits (6 bytes).

- /// Each new instruction will now occupy 2 bytes. And each time you see a

- /// change in the first 6 bytes, you'll pay the cost of storing that prefix,

- /// which takes 8 bytes.

- struct InstructionPointer {

- uint64_t insn_prefix;

- std::vector<uint16_t> suffixes;

- InstructionPointer(uint64_t prefix, uint16_t suffix)

- : insn_prefix(prefix), suffixes(std::vector<uint16_t>({suffix})){};

+ /// most consecutive instructions will have the same prefix, and that's a way to

+ /// optimize the total memory consumption.

+ template<typename S, typename std::enable_if<std::is_unsigned<S>::value>::type* = nullptr>

+ class InstructionsBlock {

+ public:

- size_t AppendInstructionSuffix(uint16_t suffix);

- uint64_t GetFullInstructionAddress(size_t suffix_index) const;

+ InstructionsBlock(lldb::addr_t load_address, uint8_t byte_size, pt_insn_class insn_class) {

+ lldb::addr_t mask = std::numeric_limits<S>::max();

+ insn_prefix = load_address & ~mask;

+ suffixes.push_back(load_address & mask);

+ }

+ void HasSamePrefix(lldb::addr_t load_address);

+ void Append(lldb::addr_t load_address, uint8_t byte_size, pt_insn_class insn_class);

+ lldb::addr_t GetLoadAddressAddress(lldb::user_id_t id) const;

+ size_t GetSize() const;

+ private:

+ lldb::addr_t m_insn_prefix;

+ std::vector<S> m_suffixes;

+ std::vector<uint8_t> m_byte_sizes;

+ std::vector<pt_insn_class> m_insn_classes;

};

+ class Instructions {

+ public:

+ lldb::user_id_t Append(lldb::addr_t load_address, uint8_t byte_size, pt_insn_class insn_class) {

+ if (!m_blocks.empty() && m_blocks.last().GetSize() < kMaxBlockSize && m_blocks.last().SharesSamePrefix(load_address))

+ m_blocks.last().Append(load_address, byte_size, insn_class);

+ else

+ m_blocks.emplace(load_address, byte_size, insn_class);

+ return (m_blocks.size() - 1) << kMaxBlockSizePower || (m_blocks.last().GetSize() - 1);

+ }

+ lldb::addr_t GetLoadAddres(lldb::user_id_t insn_id) {

+ return m_blocks[insn_id >> kMaxBlockSizePower].GetLoadAddress(insn_id && std::numeric_limits<uint32_t>::max());

+ ]]

+ private:

+ static const int kMaxBlockSizePower = 32;

+ /// The encoding used for instruction IDs requires having a maximum size for each block.

+ static const int kMaxBlockSize = 1ULL << kMaxBlockSizePower;

+ std::vector<InstructionsBlock<uint16_t>> m_blocks;

+ }

DecodedThread(lldb::ThreadSP thread_sp);

Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address.
Let's also try to remove numbers from the comments, so that if we modify them, we don't need to update the comment. It's hard to update comments because the compiler will never complaint if the comment doesn't make sense anymore.

Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned.

If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file.

Also, as this struct is not just a simple bag of data, let's make it a proper class.

I renamed a few things to create a nicer API.

First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise.

Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block.

wallace: Let's rename it to InstructionsBlock or another similar name. An instruction pointer is…

DecodedThread(lldb::ThreadSP thread_sp); DecodedThread(lldb::ThreadSP thread_sp);

/// Utility constructor that initializes the trace with a provided error. /// Utility constructor that initializes the trace with a provided error.

DecodedThread(lldb::ThreadSP thread_sp, llvm::Error &&err); DecodedThread(lldb::ThreadSP thread_sp, llvm::Error &&err);

/// Append a successfully decoded instruction. /// Append a successfully decoded instruction.

void AppendInstruction(const pt_insn &instruction); void AppendInstruction(const pt_insn &instruction);

/// Append a sucessfully decoded instruction with an associated TSC timestamp. /// Append a sucessfully decoded instruction with an associated TSC timestamp.

void AppendInstruction(const pt_insn &instruction, uint64_t tsc); void AppendInstruction(const pt_insn &instruction, uint64_t tsc);

/// Append a decoding error (i.e. an instruction that failed to be decoded). /// Append a decoding error (i.e. an instruction that failed to be decoded).

void AppendError(llvm::Error &&error); void AppendError(llvm::Error &&error);

/// Append a decoding error with a corresponding TSC. /// Append a decoding error with a corresponding TSC.

void AppendError(llvm::Error &&error, uint64_t tsc); void AppendError(llvm::Error &&error, uint64_t tsc);

/// Get the total number of instruction pointers from the decoded trace. /// Get the total number of instruction pointers from the decoded trace.

/// This will include instructions that indicate errors (or gaps) in the /// This will include instructions that indicate errors (or gaps) in the

/// trace. For an instruction error, you can access its underlying error /// trace. For an instruction error, you can access its underlying error

/// message with the \a GetErrorByInstructionIndex() method. /// message with the \a GetErrorByInstructionIndex() method.

size_t GetInstructionsCount() const; size_t GetInstructionsCount() const;

/// For each \a InstructionPointer block, calculate the net number of

/// instructions upto that block. See also, \a m_ipblock_sizes member.

void CalcIPBlockSizes();

/// Convert instruction ID to index

size_t ToIndex(lldb::user_id_t id) const;

/// Convert instruction index to ID

lldb::user_id_t ToID(size_t index) const;

/// Calculate next instruction ID

lldb::user_id_t NextID(lldb::user_id_t id) const;

/// Calculate previous instruction ID

lldb::user_id_t PrevID(lldb::user_id_t id) const;

/// \return /// \return

/// The load address of the instruction at the given index, or \a /// The load address of the instruction at the given index, or \a

/// LLDB_INVALID_ADDRESS if it is an error. /// LLDB_INVALID_ADDRESS if it is an error.

lldb::addr_t GetInstructionLoadAddress(size_t insn_index) const; lldb::addr_t GetInstructionLoadAddress(size_t insn_id) const;

/// Get the \a lldb::TraceInstructionControlFlowType categories of the /// Get the \a lldb::TraceInstructionControlFlowType categories of the

/// instruction. /// instruction.

/// ///

/// \return /// \return

/// The control flow categories, or \b 0 if the instruction is an error. /// The control flow categories, or \b 0 if the instruction is an error.

lldb::TraceInstructionControlFlowType lldb::TraceInstructionControlFlowType

GetInstructionControlFlowType(size_t insn_index) const; GetInstructionControlFlowType(lldb::user_id_t insn_id) const;

/// Construct the TSC range that covers the given instruction index. /// Construct the TSC range that covers the given instruction index.

/// This operation is O(logn) and should be used sparingly. /// This operation is O(logn) and should be used sparingly.

/// If the trace was collected with TSC support, all the instructions of /// If the trace was collected with TSC support, all the instructions of

/// the trace will have associated TSCs. This means that this method will /// the trace will have associated TSCs. This means that this method will

/// only return \b llvm::None if there are no TSCs whatsoever in the trace. /// only return \b llvm::None if there are no TSCs whatsoever in the trace.

/// ///

/// \param[in] insn_index /// \param[in] insn_index

/// The instruction index in question. /// The instruction index in question.

/// ///

/// \param[in] hint_range /// \param[in] hint_range

/// An optional range that might include the given index or might be a /// An optional range that might include the given index or might be a

/// neighbor of it. It might help speed it traversals of the trace with /// neighbor of it. It might help speed it traversals of the trace with

/// short jumps. /// short jumps.

llvm::Optional<TscRange> CalculateTscRange( llvm::Optional<TscRange> CalculateTscRangeByIndex(

size_t insn_index, size_t insn_index,

const llvm::Optional<DecodedThread::TscRange> &hint_range) const; const llvm::Optional<DecodedThread::TscRange> &hint_range) const;

wallaceUnsubmitted

Not Done

all of these become simpler if you work directly with the id instead of the instruction index

wallace: all of these become simpler if you work directly with the id instead of the instruction index

/// Check if an instruction given by its index is an error. /// Check if an instruction given by its index is an error.

bool IsInstructionAnError(size_t insn_idx) const; bool IsInstructionAnError(size_t insn_idx) const;

/// Get the error associated with a given instruction index. /// Get the error associated with a given instruction index.

/// ///

/// \return /// \return

/// The error message of \b nullptr if the given index /// The error message of \b nullptr if the given index

Show All 37 Lines

private: private:

/// Notify this class that the last added instruction or error has /// Notify this class that the last added instruction or error has

/// an associated TSC. /// an associated TSC.

void RecordTscForLastInstruction(uint64_t tsc); void RecordTscForLastInstruction(uint64_t tsc);

/// When adding new members to this class, make sure /// When adding new members to this class, make sure

/// to update \a CalculateApproximateMemoryUsage() accordingly. /// to update \a CalculateApproximateMemoryUsage() accordingly.

lldb::ThreadSP m_thread_sp; lldb::ThreadSP m_thread_sp;

/// The low level storage of all instruction addresses. Each instruction has /// The low level storage of all instruction addresses. This stores

/// an index in this vector and it will be used in other parts of the code. /// instructions in blocks such that instructions with the same high 48 bits

std::vector<lldb::addr_t> m_instruction_ips; /// are stored together. Each instruction has an ID in this vector and it

/// will be used in other parts of the code.

std::vector<InstructionPointer> m_instruction_blocks;

/// The size in bytes of each instruction. /// The size in bytes of each instruction.

std::vector<uint8_t> m_instruction_sizes; std::vector<uint8_t> m_instruction_sizes;

/// The libipt instruction class for each instruction. /// The libipt instruction class for each instruction.

std::vector<pt_insn_class> m_instruction_classes; std::vector<pt_insn_class> m_instruction_classes;

wallaceUnsubmitted

Not Done

you can move these two to the new InstructionsBlock class

wallace: you can move these two to the new InstructionsBlock class

/// This map contains the TSCs of the decoded instructions. It maps /// This map contains the TSCs of the decoded instructions. It maps

/// `instruction index -> TSC`, where `instruction index` is the first index /// `instruction index -> TSC`, where `instruction index` is the first index

/// at which the mapped TSC appears. We use this representation because TSCs /// at which the mapped TSC appears. We use this representation because TSCs

/// are sporadic and we can think of them as ranges. If TSCs are present in /// are sporadic and we can think of them as ranges. If TSCs are present in

/// the trace, all instructions will have an associated TSC, including the /// the trace, all instructions will have an associated TSC, including the

/// first one. Otherwise, this map will be empty. /// first one. Otherwise, this map will be empty.

std::map<size_t, uint64_t> m_instruction_timestamps; std::map<size_t, uint64_t> m_instruction_timestamps;

wallaceUnsubmitted

Not Done

if you convert this from instruction index -> TSC to instruction id -> TSC, then everything becomes simpler

wallace: if you convert this from `instruction index -> TSC` to `instruction id -> TSC`, then everything…

/// This is the chronologically last TSC that has been added. /// This is the chronologically last TSC that has been added.

llvm::Optional<uint64_t> m_last_tsc = llvm::None; llvm::Optional<uint64_t> m_last_tsc = llvm::None;

// This variables stores the messages of all the error instructions in the /// For each \a InstructionPointer block, this contains the net number of

// trace. It maps `instruction index -> error message`. /// instructions upto that block

std::vector<size_t> m_ipblock_sizes;

wallaceUnsubmitted

Not Done

i have the impression that you don't need this anymore

wallace: i have the impression that you don't need this anymore

/// This variables stores the messages of all the error instructions in the

/// trace. It maps `instruction index -> error message`.

llvm::DenseMap<uint64_t, std::string> m_errors; llvm::DenseMap<uint64_t, std::string> m_errors;

wallaceUnsubmitted

Not Done

you can also make it instruction id -> error message

wallace: you can also make it `instruction id -> error message`

/// The size in bytes of the raw buffer before decoding. It might be None if /// The size in bytes of the raw buffer before decoding. It might be None if

/// the decoding failed. /// the decoding failed.

llvm::Optional<size_t> m_raw_trace_size; llvm::Optional<size_t> m_raw_trace_size;

/// All occurrences of libipt errors when decoding TSCs. /// All occurrences of libipt errors when decoding TSCs.

LibiptErrors m_tsc_errors; LibiptErrors m_tsc_errors;

}; };

using DecodedThreadSP = std::shared_ptr<DecodedThread>; using DecodedThreadSP = std::shared_ptr<DecodedThread>;

} // namespace trace_intel_pt } // namespace trace_intel_pt

} // namespace lldb_private } // namespace lldb_private

#endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_DECODEDTHREAD_H #endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_DECODEDTHREAD_H

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp

//===-- DecodedThread.cpp -------------------------------------------------===//		//===-- DecodedThread.cpp -------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "DecodedThread.h"		#include "DecodedThread.h"

#include <intel-pt.h>		#include <intel-pt.h>
#include <memory>		#include <memory>
		#include <vector>

#include "TraceCursorIntelPT.h"		#include "TraceCursorIntelPT.h"
#include "lldb/Utility/StreamString.h"		#include "lldb/Utility/StreamString.h"

using namespace lldb;		using namespace lldb;
using namespace lldb_private;		using namespace lldb_private;
using namespace lldb_private::trace_intel_pt;		using namespace lldb_private::trace_intel_pt;
using namespace llvm;		using namespace llvm;

char IntelPTError::ID;		char IntelPTError::ID;

IntelPTError::IntelPTError(int libipt_error_code, lldb::addr_t address)		IntelPTError::IntelPTError(int libipt_error_code, addr_t address)
: m_libipt_error_code(libipt_error_code), m_address(address) {		: m_libipt_error_code(libipt_error_code), m_address(address) {
assert(libipt_error_code < 0);		assert(libipt_error_code < 0);
}		}

void IntelPTError::log(llvm::raw_ostream &OS) const {		void IntelPTError::log(llvm::raw_ostream &OS) const {
const char *libipt_error_message = pt_errstr(pt_errcode(m_libipt_error_code));		const char *libipt_error_message = pt_errstr(pt_errcode(m_libipt_error_code));
if (m_address != LLDB_INVALID_ADDRESS && m_address > 0) {		if (m_address != LLDB_INVALID_ADDRESS && m_address > 0) {
write_hex(OS, m_address, HexPrintStyle::PrefixLower, 18);		write_hex(OS, m_address, HexPrintStyle::PrefixLower, 18);
OS << " ";		OS << " ";
}		}
OS << "error: " << libipt_error_message;		OS << "error: " << libipt_error_message;
}		}

		DecodedThread::DecodedThread(ThreadSP thread_sp) : m_thread_sp(thread_sp) {}

		DecodedThread::DecodedThread(ThreadSP thread_sp, Error &&error)
		: m_thread_sp(thread_sp) {
		AppendError(std::move(error));
		}

		void DecodedThread::SetRawTraceSize(size_t size) { m_raw_trace_size = size; }

		TraceCursorUP DecodedThread::GetCursor() {
		// We insert a fake error signaling an empty trace if needed becasue the
		// TraceCursor requires non-empty traces.
		if (m_instruction_blocks.empty())
		AppendError(createStringError(inconvertibleErrorCode(), "empty trace"));
		return std::make_unique<TraceCursorIntelPT>(m_thread_sp, shared_from_this());
		}

Optional<size_t> DecodedThread::GetRawTraceSize() const {		Optional<size_t> DecodedThread::GetRawTraceSize() const {
return m_raw_trace_size;		return m_raw_trace_size;
}		}

size_t DecodedThread::GetInstructionsCount() const {		size_t DecodedThread::GetInstructionsCount() const {
return m_instruction_ips.size();		size_t count = 0;
		for (const InstructionPointer &ip : m_instruction_blocks)
		count += ip.suffixes.size();
		return count;
		}

		uint64_t DecodedThread::InstructionPointer::GetFullInstructionAddress(
		size_t suffix_index) const {
		uint16_t suffix = suffixes[suffix_index];
		uint64_t address = (insn_prefix << 16) \| (uint64_t)suffix;
		return address;
		}

		void DecodedThread::CalcIPBlockSizes() {
		size_t acc_insn_count = 0;
		for (const InstructionPointer &ip : m_instruction_blocks) {
		acc_insn_count += ip.suffixes.size();
		m_ipblock_sizes.push_back(acc_insn_count);
		}
}		}

lldb::addr_t DecodedThread::GetInstructionLoadAddress(size_t insn_index) const {		size_t DecodedThread::ToIndex(user_id_t id) const {
return m_instruction_ips[insn_index];		return m_ipblock_sizes[id >> 32] + (UINT32_MAX & id);
		}

		user_id_t DecodedThread::ToID(size_t index) const {
		if (m_ipblock_sizes.size() == 0) {
		// Simple slow search if block sizes were not set up
		size_t prefix_index = 0;
		for (const InstructionPointer &ip : m_instruction_blocks) {
		if (index >= 0 && index < ip.suffixes.size())
		return (prefix_index << 32 \| UINT32_MAX & index);

		index -= ip.suffixes.size();
		prefix_index++;
		}

		return LLDB_INVALID_ADDRESS;
		}

		// Binary fast search if block sizes are available
		size_t lower = 0, upper = m_ipblock_sizes.size() - 1;
		while (upper != lower && upper - lower != 1) {
		size_t mid = (upper + lower) / 2;
		if (index > m_ipblock_sizes[mid])
		upper = mid;
		else
		lower = mid;
		}

		if (lower > 0)
		index -= m_ipblock_sizes[lower];
		return (upper << 32 \| UINT32_MAX & index);
		}

		user_id_t DecodedThread::NextID(user_id_t id) const {
		size_t prefix_idx = id >> 32;
		size_t suffix_idx = UINT32_MAX & id;
		if (m_instruction_blocks[prefix_idx].suffixes.size() - 1 > suffix_idx &&
		suffix_idx < UINT32_MAX)
		return id + 1;
		else if (m_instruction_blocks.size() - 1 > prefix_idx &&
		prefix_idx < UINT32_MAX)
		return (prefix_idx + 1) << 32;
		else
		return id; // Sorry cant help you, already at MAX,MAX
		}

		user_id_t DecodedThread::PrevID(user_id_t id) const {
		size_t prefix_idx = id >> 32;
		size_t suffix_idx = UINT32_MAX & id;
		if (suffix_idx < 0)
		return id - 1;
		else if (prefix_idx < 0)
		return (prefix_idx - 1) << 32 \|
		(UINT32_MAX &
		(m_instruction_blocks[prefix_idx - 1].suffixes.size() - 1));
		else
		return id; // Sorry cant help you, already at 0,0
		}

		addr_t DecodedThread::GetInstructionLoadAddress(user_id_t insn_id) const {
		return m_instruction_blocks[insn_id >> 32].GetFullInstructionAddress(
		UINT32_MAX & insn_id);
}		}

TraceInstructionControlFlowType		TraceInstructionControlFlowType
DecodedThread::GetInstructionControlFlowType(size_t insn_index) const {		DecodedThread::GetInstructionControlFlowType(user_id_t insn_id) const {
if (IsInstructionAnError(insn_index))		if (IsInstructionAnError(insn_id))
return (TraceInstructionControlFlowType)0;		return (TraceInstructionControlFlowType)0;

TraceInstructionControlFlowType mask =		TraceInstructionControlFlowType mask =
eTraceInstructionControlFlowTypeInstruction;		eTraceInstructionControlFlowTypeInstruction;

lldb::addr_t load_address = m_instruction_ips[insn_index];		addr_t load_address = GetInstructionLoadAddress(insn_id);
		size_t insn_index = ToIndex(insn_id);
uint8_t insn_byte_size = m_instruction_sizes[insn_index];		uint8_t insn_byte_size = m_instruction_sizes[insn_index];
pt_insn_class iclass = m_instruction_classes[insn_index];		pt_insn_class iclass = m_instruction_classes[insn_index];

switch (iclass) {		switch (iclass) {
case ptic_cond_jump:		case ptic_cond_jump:
case ptic_jump:		case ptic_jump:
case ptic_far_jump:		case ptic_far_jump:
mask \|= eTraceInstructionControlFlowTypeBranch;		mask \|= eTraceInstructionControlFlowTypeBranch;
if (insn_index + 1 < m_instruction_ips.size() &&		if (insn_index + 1 < m_instruction_blocks.size() &&
load_address + insn_byte_size != m_instruction_ips[insn_index + 1])		load_address + insn_byte_size !=
		GetInstructionLoadAddress(NextID(insn_id)))
mask \|= eTraceInstructionControlFlowTypeTakenBranch;		mask \|= eTraceInstructionControlFlowTypeTakenBranch;
break;		break;
case ptic_return:		case ptic_return:
case ptic_far_return:		case ptic_far_return:
mask \|= eTraceInstructionControlFlowTypeReturn;		mask \|= eTraceInstructionControlFlowTypeReturn;
break;		break;
case ptic_call:		case ptic_call:
case ptic_far_call:		case ptic_far_call:
mask \|= eTraceInstructionControlFlowTypeCall;		mask \|= eTraceInstructionControlFlowTypeCall;
break;		break;
default:		default:
break;		break;
}		}

return mask;		return mask;
}		}

ThreadSP DecodedThread::GetThread() { return m_thread_sp; }		ThreadSP DecodedThread::GetThread() { return m_thread_sp; }

void DecodedThread::RecordTscForLastInstruction(uint64_t tsc) {		size_t
if (!m_last_tsc \|\| *m_last_tsc != tsc) {		DecodedThread::InstructionPointer::AppendInstructionSuffix(uint16_t suffix) {
// In case the first instructions are errors or did not have a TSC, we'll		suffixes.push_back(suffix);
// get a first valid TSC not in position 0. We can safely force these error		return suffixes.size() - 1;
// instructions to use the first valid TSC, so that all the trace has TSCs.
size_t start_index =
m_instruction_timestamps.empty() ? 0 : m_instruction_ips.size() - 1;
m_instruction_timestamps.emplace(start_index, tsc);
m_last_tsc = tsc;
}
}		}

void DecodedThread::AppendInstruction(const pt_insn &insn) {		void DecodedThread::AppendInstruction(const pt_insn &insn) {
m_instruction_ips.emplace_back(insn.ip);		uint16_t suffix = UINT16_MAX & insn.ip;
		uint64_t prefix = insn.ip >> 16;

		auto last_ip = m_instruction_blocks.end();
		last_ip--;
		if (!m_instruction_blocks.empty() && last_ip->insn_prefix == prefix &&
		last_ip->suffixes.size() < UINT32_MAX)
		last_ip->AppendInstructionSuffix(suffix);
		else
		m_instruction_blocks.emplace_back(prefix, suffix);

m_instruction_sizes.emplace_back(insn.size);		m_instruction_sizes.emplace_back(insn.size);
m_instruction_classes.emplace_back(insn.iclass);		m_instruction_classes.emplace_back(insn.iclass);
}		}

void DecodedThread::AppendInstruction(const pt_insn &insn, uint64_t tsc) {		void DecodedThread::AppendInstruction(const pt_insn &insn, uint64_t tsc) {
AppendInstruction(insn);		AppendInstruction(insn);
RecordTscForLastInstruction(tsc);		RecordTscForLastInstruction(tsc);
}		}

void DecodedThread::AppendError(llvm::Error &&error) {		void DecodedThread::AppendError(llvm::Error &&error) {
m_errors.try_emplace(m_instruction_ips.size(), toString(std::move(error)));		m_errors.try_emplace(m_instruction_blocks.size(), toString(std::move(error)));
m_instruction_ips.emplace_back(LLDB_INVALID_ADDRESS);		AppendInstruction(pt_insn{.ip = LLDB_INVALID_ADDRESS,
m_instruction_sizes.emplace_back(0);		.iclass = pt_insn_class::ptic_error,
m_instruction_classes.emplace_back(pt_insn_class::ptic_error);		.size = 0});
}		}

void DecodedThread::AppendError(llvm::Error &&error, uint64_t tsc) {		void DecodedThread::AppendError(llvm::Error &&error, uint64_t tsc) {
AppendError(std::move(error));		AppendError(std::move(error));
RecordTscForLastInstruction(tsc);		RecordTscForLastInstruction(tsc);
}		}

void DecodedThread::LibiptErrors::RecordError(int libipt_error_code) {		void DecodedThread::LibiptErrors::RecordError(int libipt_error_code) {
libipt_errors[pt_errstr(pt_errcode(libipt_error_code))]++;		libipt_errors[pt_errstr(pt_errcode(libipt_error_code))]++;
total_count++;		total_count++;
}		}

		bool DecodedThread::IsInstructionAnError(user_id_t insn_id) const {
		return GetInstructionLoadAddress(insn_id) == LLDB_INVALID_ADDRESS;
		}

		const char *DecodedThread::GetErrorByInstructionIndex(size_t insn_idx) {
		auto it = m_errors.find(insn_idx);
		if (it == m_errors.end())
		return nullptr;

		return it->second.c_str();
		}

void DecodedThread::RecordTscError(int libipt_error_code) {		void DecodedThread::RecordTscError(int libipt_error_code) {
m_tsc_errors.RecordError(libipt_error_code);		m_tsc_errors.RecordError(libipt_error_code);
}		}

const DecodedThread::LibiptErrors &DecodedThread::GetTscErrors() const {		const DecodedThread::LibiptErrors &DecodedThread::GetTscErrors() const {
return m_tsc_errors;		return m_tsc_errors;
}		}

Optional<DecodedThread::TscRange> DecodedThread::CalculateTscRange(		Optional<DecodedThread::TscRange> DecodedThread::CalculateTscRangeByIndex(
size_t insn_index,		size_t insn_index,
const Optional<DecodedThread::TscRange> &hint_range) const {		const Optional<DecodedThread::TscRange> &hint_range) const {
// We first try to check the given hint range in case we are traversing the		// We first try to check the given hint range in case we are traversing the
// trace in short jumps. If that fails, then we do the more expensive		// trace in short jumps. If that fails, then we do the more expensive
// arbitrary lookup.		// arbitrary lookup.
if (hint_range) {		if (hint_range) {
Optional<TscRange> candidate_range;		Optional<TscRange> candidate_range;
if (insn_index < hint_range->GetStartInstructionIndex())		if (insn_index < hint_range->GetStartInstructionIndex())
Show All 9 Lines	Optional<DecodedThread::TscRange> DecodedThread::CalculateTscRangeByIndex(
// Now we do a more expensive lookup		// Now we do a more expensive lookup
auto it = m_instruction_timestamps.upper_bound(insn_index);		auto it = m_instruction_timestamps.upper_bound(insn_index);
if (it == m_instruction_timestamps.begin())		if (it == m_instruction_timestamps.begin())
return None;		return None;

return TscRange(--it, *this);		return TscRange(--it, *this);
}		}

bool DecodedThread::IsInstructionAnError(size_t insn_idx) const {		void DecodedThread::RecordTscForLastInstruction(uint64_t tsc) {
return m_instruction_ips[insn_idx] == LLDB_INVALID_ADDRESS;		if (!m_last_tsc \|\| *m_last_tsc != tsc) {
}		// In case the first instructions are errors or did not have a TSC, we'll
		// get a first valid TSC not in position 0. We can safely force these error
const char *DecodedThread::GetErrorByInstructionIndex(size_t insn_idx) {		// instructions to use the first valid TSC, so that all the trace has TSCs.
auto it = m_errors.find(insn_idx);		size_t start_index =
if (it == m_errors.end())		m_instruction_timestamps.empty() ? 0 : m_instruction_blocks.size() - 1;
return nullptr;		m_instruction_timestamps.emplace(start_index, tsc);
		m_last_tsc = tsc;
return it->second.c_str();
}

DecodedThread::DecodedThread(ThreadSP thread_sp) : m_thread_sp(thread_sp) {}

DecodedThread::DecodedThread(ThreadSP thread_sp, Error &&error)
: m_thread_sp(thread_sp) {
AppendError(std::move(error));
}

void DecodedThread::SetRawTraceSize(size_t size) { m_raw_trace_size = size; }

lldb::TraceCursorUP DecodedThread::GetCursor() {
// We insert a fake error signaling an empty trace if needed becasue the
// TraceCursor requires non-empty traces.
if (m_instruction_ips.empty())
AppendError(createStringError(inconvertibleErrorCode(), "empty trace"));
return std::make_unique<TraceCursorIntelPT>(m_thread_sp, shared_from_this());
}		}

size_t DecodedThread::CalculateApproximateMemoryUsage() const {
return sizeof(pt_insn::ip) * m_instruction_ips.size() +
sizeof(pt_insn::size) * m_instruction_sizes.size() +
sizeof(pt_insn::iclass) * m_instruction_classes.size() +
(sizeof(size_t) + sizeof(uint64_t)) * m_instruction_timestamps.size() +
m_errors.getMemorySize();
}		}

DecodedThread::TscRange::TscRange(std::map<size_t, uint64_t>::const_iterator it,		DecodedThread::TscRange::TscRange(std::map<size_t, uint64_t>::const_iterator it,
const DecodedThread &decoded_thread)		const DecodedThread &decoded_thread)
: m_it(it), m_decoded_thread(&decoded_thread) {		: m_it(it), m_decoded_thread(&decoded_thread) {
auto next_it = m_it;		auto next_it = m_it;
++next_it;		++next_it;
m_end_index = (next_it == m_decoded_thread->m_instruction_timestamps.end())		m_end_index = (next_it == m_decoded_thread->m_instruction_timestamps.end())
Show All 26 Lines

Optional<DecodedThread::TscRange> DecodedThread::TscRange::Prev() const {		Optional<DecodedThread::TscRange> DecodedThread::TscRange::Prev() const {
if (m_it == m_decoded_thread->m_instruction_timestamps.begin())		if (m_it == m_decoded_thread->m_instruction_timestamps.begin())
return None;		return None;
auto prev_it = m_it;		auto prev_it = m_it;
--prev_it;		--prev_it;
return TscRange(prev_it, *m_decoded_thread);		return TscRange(prev_it, *m_decoded_thread);
}		}

		size_t DecodedThread::CalculateApproximateMemoryUsage() const {
		return sizeof(InstructionPointer) * m_instruction_blocks.size() +
		sizeof(pt_insn::size) * m_instruction_sizes.size() +
		sizeof(pt_insn::iclass) * m_instruction_classes.size() +
		(sizeof(size_t) + sizeof(uint64_t)) * m_instruction_timestamps.size() +
		m_errors.getMemorySize();
		}

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	while (true) {
if (errcode != -pte_eos)		if (errcode != -pte_eos)
AppendError(decoded_thread,		AppendError(decoded_thread,
make_error<IntelPTError>(errcode, insn.ip), tsc_info);		make_error<IntelPTError>(errcode, insn.ip), tsc_info);
break;		break;
}		}
AppendInstruction(decoded_thread, insn, tsc_info);		AppendInstruction(decoded_thread, insn, tsc_info);
}		}
}		}

		decoded_thread.CalcIPBlockSizes();
}		}

/// Callback used by libipt for reading the process memory.		/// Callback used by libipt for reading the process memory.
///		///
/// More information can be found in		/// More information can be found in
/// https://github.com/intel/libipt/blob/master/doc/man/pt_image_set_callback.3.md		/// https://github.com/intel/libipt/blob/master/doc/man/pt_image_set_callback.3.md
static int ReadProcessMemory(uint8_t *buffer, size_t size,		static int ReadProcessMemory(uint8_t *buffer, size_t size,
const pt_asid * /* unused */, uint64_t pc,		const pt_asid * /* unused */, uint64_t pc,
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h

	Show All 40 Lines

	private:			private:
	size_t GetInternalInstructionSize();			size_t GetInternalInstructionSize();

	/// Storage of the actual instructions			/// Storage of the actual instructions
	DecodedThreadSP m_decoded_thread_sp;			DecodedThreadSP m_decoded_thread_sp;
	/// Internal instruction index currently pointing at.			/// Internal instruction index currently pointing at.
	size_t m_pos;			size_t m_pos;
				/// Internal instruction ID currently pointing at.
				lldb::user_id_t m_id;
	/// Tsc range covering the current instruction.			/// Tsc range covering the current instruction.
	llvm::Optional<DecodedThread::TscRange> m_tsc_range;			llvm::Optional<DecodedThread::TscRange> m_tsc_range;
	};			};

	} // namespace trace_intel_pt			} // namespace trace_intel_pt
	} // namespace lldb_private			} // namespace lldb_private

	#endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_TRACECURSORINTELPT_H			#endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_TRACECURSORINTELPT_H

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp

Show All 17 Lines
using namespace llvm;		using namespace llvm;

TraceCursorIntelPT::TraceCursorIntelPT(ThreadSP thread_sp,		TraceCursorIntelPT::TraceCursorIntelPT(ThreadSP thread_sp,
DecodedThreadSP decoded_thread_sp)		DecodedThreadSP decoded_thread_sp)
: TraceCursor(thread_sp), m_decoded_thread_sp(decoded_thread_sp) {		: TraceCursor(thread_sp), m_decoded_thread_sp(decoded_thread_sp) {
assert(m_decoded_thread_sp->GetInstructionsCount() > 0 &&		assert(m_decoded_thread_sp->GetInstructionsCount() > 0 &&
"a trace should have at least one instruction or error");		"a trace should have at least one instruction or error");
m_pos = m_decoded_thread_sp->GetInstructionsCount() - 1;		m_pos = m_decoded_thread_sp->GetInstructionsCount() - 1;
		m_id = m_decoded_thread_sp->ToID(m_pos);
m_tsc_range =		m_tsc_range =
m_decoded_thread_sp->CalculateTscRange(m_pos, /hint_range/ None);		m_decoded_thread_sp->CalculateTscRangeByIndex(m_pos, /hint_range/ None);
}		}

size_t TraceCursorIntelPT::GetInternalInstructionSize() {		size_t TraceCursorIntelPT::GetInternalInstructionSize() {
return m_decoded_thread_sp->GetInstructionsCount();		return m_decoded_thread_sp->GetInstructionsCount();
}		}

bool TraceCursorIntelPT::Next() {		bool TraceCursorIntelPT::Next() {
auto canMoveOne = [&]() {		auto canMoveOne = [&]() {
if (IsForwards())		if (IsForwards())
return m_pos + 1 < GetInternalInstructionSize();		return m_pos + 1 < GetInternalInstructionSize();
return m_pos > 0;		return m_pos > 0;
};		};

size_t initial_pos = m_pos;		size_t initial_pos = m_pos;

while (canMoveOne()) {		while (canMoveOne()) {
m_pos += IsForwards() ? 1 : -1;		m_pos += IsForwards() ? 1 : -1;
		m_id = m_decoded_thread_sp->ToID(m_pos);

if (m_tsc_range && !m_tsc_range->InRange(m_pos))		if (m_tsc_range && !m_tsc_range->InRange(m_pos))
m_tsc_range = IsForwards() ? m_tsc_range->Next() : m_tsc_range->Prev();		m_tsc_range = IsForwards() ? m_tsc_range->Next() : m_tsc_range->Prev();

if (!m_ignore_errors && IsError())		if (!m_ignore_errors && IsError())
return true;		return true;
if (GetInstructionControlFlowType() & m_granularity)		if (GetInstructionControlFlowType() & m_granularity)
return true;		return true;
Show All 23 Lines	case TraceCursor::SeekType::Current:
int64_t new_pos = fitPosToBounds(offset + m_pos);		int64_t new_pos = fitPosToBounds(offset + m_pos);
int64_t dist = m_pos - new_pos;		int64_t dist = m_pos - new_pos;
m_pos = new_pos;		m_pos = new_pos;
return std::abs(dist);		return std::abs(dist);
}		}
};		};

int64_t dist = FindDistanceAndSetPos();		int64_t dist = FindDistanceAndSetPos();
m_tsc_range = m_decoded_thread_sp->CalculateTscRange(m_pos, m_tsc_range);		m_id = m_decoded_thread_sp->ToID(m_pos);
		m_tsc_range =
		m_decoded_thread_sp->CalculateTscRangeByIndex(m_pos, m_tsc_range);
return dist;		return dist;
}		}

bool TraceCursorIntelPT::IsError() {		bool TraceCursorIntelPT::IsError() {
return m_decoded_thread_sp->IsInstructionAnError(m_pos);		return m_decoded_thread_sp->IsInstructionAnError(m_id);
}		}

const char *TraceCursorIntelPT::GetError() {		const char *TraceCursorIntelPT::GetError() {
return m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos);		return m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos);
}		}

lldb::addr_t TraceCursorIntelPT::GetLoadAddress() {		lldb::addr_t TraceCursorIntelPT::GetLoadAddress() {
return m_decoded_thread_sp->GetInstructionLoadAddress(m_pos);		return m_decoded_thread_sp->GetInstructionLoadAddress(m_id);
}		}

Optional<uint64_t>		Optional<uint64_t>
TraceCursorIntelPT::GetCounter(lldb::TraceCounter counter_type) {		TraceCursorIntelPT::GetCounter(lldb::TraceCounter counter_type) {
switch (counter_type) {		switch (counter_type) {
case lldb::eTraceCounterTSC:		case lldb::eTraceCounterTSC:
if (m_tsc_range)		if (m_tsc_range)
return m_tsc_range->GetTsc();		return m_tsc_range->GetTsc();
else		else
return llvm::None;		return llvm::None;
}		}
}		}

TraceInstructionControlFlowType		TraceInstructionControlFlowType
TraceCursorIntelPT::GetInstructionControlFlowType() {		TraceCursorIntelPT::GetInstructionControlFlowType() {
return m_decoded_thread_sp->GetInstructionControlFlowType(m_pos);		return m_decoded_thread_sp->GetInstructionControlFlowType(m_id);
}		}

bool TraceCursorIntelPT::GoToId(user_id_t id) {		bool TraceCursorIntelPT::GoToId(user_id_t id) {
if (m_decoded_thread_sp->GetInstructionsCount() <= id)		size_t idx = m_decoded_thread_sp->ToIndex(id);
		if (m_decoded_thread_sp->GetInstructionsCount() <= idx)
return false;		return false;
m_pos = id;		m_pos = idx;
m_tsc_range = m_decoded_thread_sp->CalculateTscRange(m_pos, m_tsc_range);		m_id = id;
		m_tsc_range =
		m_decoded_thread_sp->CalculateTscRangeByIndex(m_pos, m_tsc_range);

return true;		return true;
}		}

user_id_t TraceCursorIntelPT::GetId() const { return m_pos; }		user_id_t TraceCursorIntelPT::GetId() const { return m_id; }

This is an archive of the discontinued LLVM Phabricator instance.

[lldb][intelpt] Reduce trace memory usage by grouping instructionsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 421472

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp

[lldb][intelpt] Reduce trace memory usage by grouping instructions
AbandonedPublic