This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
source/Plugins/Trace/intel-pt/
-
Plugins/
-
Trace/
-
intel-pt/
31/31
DecodedThread.h
15/15
DecodedThread.cpp
3/3
TraceCursorIntelPT.h
12/12
TraceCursorIntelPT.cpp
2/2
TraceIntelPT.cpp
-
test/API/commands/trace/
-
API/
-
commands/
-
trace/
-
TestTraceDumpInfo.py
-
TestTraceLoad.py

Differential D122603

[intelpt] Refactor timestamps out of IntelPTInstruction
ClosedPublic

Authored by zrthxn on Mar 28 2022, 11:20 AM.

Download Raw Diff

Details

Reviewers

wallace
jj10306

Commits

rGca922a3559d7: [intelpt] Refactor timestamps out of `IntelPTInstruction`

Summary

Storing timestamps (TSCs) in a more efficient map at the decoded thread level to speed up TSC lookup, as well as reduce the amount of memory taken up by each instruction. Also introduced TSC range which keeps the current timestamp valid for all subsequent instructions until the next timestamp is emitted.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zrthxn created this revision.Mar 28 2022, 11:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 11:20 AM

zrthxn requested review of this revision.Mar 28 2022, 11:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 11:20 AM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B156610: Diff 418655.Mar 28 2022, 11:23 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptMar 28 2022, 11:23 AM

Update cursor timestamp when we have a new one

Harbormaster completed remote builds in B156612: Diff 418660.Mar 28 2022, 11:34 AM

let's better use the word TSC instead of timestamps, which is more accurate

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
164	receive `const &pt_insn` to avoid copies
166–167	Let's just use an overload void AppendInstruction(const pt_insn &instruction, uint64_t tsc);
181–182	This has to be an optional because it might not be present

wallace requested changes to this revision.Mar 28 2022, 12:16 PM

wallace added inline comments.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
94–96	doing [] is O(logn), and we want to be faster than this. You can do the following which is O(1) auto it = m_instruction_timestamps.end() if (it != m_instruction_timestamps.begin()) { it--; if (it->second != tsc) { // this tsc is not the same! m_instruction_timestamps.insert(insn_idx, tsc); } else { // this tsc is the same, do nothing } } you can further optimize this by storing the last tsc that has been appended, that way you don't even need to create iterators
115–118	this is wrong, you need to use upper_bound - 1, like this: if (m_instruction_timestamps.empty()) return None; auto it = m_instruction_timestamps.upper_bound(insn_idx); if (it == m_instruction_timestamps.begin()) return None; --it; return it->second; this will allow you to go to the largest index that is <= than insn_idx
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
181–182	better use insn_idx instead of index for all these variables
184–188	I actually like this, let's improve it
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
153–154 ↗	(On Diff #418660)
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
25–26	here you need to set the correct value of m_current_tsc
44–45	this will be O(logn). We can do better if m_current_tsc is the following little structure class DecodedThread { struct TscRange { size_t start_index; size_t end_index; size_t tsc; std::map<size_t, uint64_t>::iterator prev; std::map<size_t, uint64_t>::iterator next; }; } Optional<TscRange> m_current_tsc; Then you can ask the new method `Optional<TscRange> DecodedThread::GetTSCRange(size_t insn_index)` which will give you the entire range of the tsc that covers insn_index. With these numbers, you can do a comparison in this line to very quickly move from TSC to TSC only when needed. You can also have the method `DecodedThread::GetNextTscRange(const TscRange& range)` that computes in O(1) the next range, and you can similarly have GetPrevTscRange()`. The iterators will help you do that withing using lower_bound, which is O(1)
66–84	you need to calculate the new tsc_range after moving m_pos
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
45–46	Optional<uint64_t> m_current_tsc;

This revision now requires changes to proceed.Mar 28 2022, 12:16 PM

Introduced TscRange for quicker operation

Harbormaster completed remote builds in B156809: Diff 418937.Mar 29 2022, 11:26 AM

I'm proposing a new interface for the TscRange. Let me know if you have questions

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
95	we need to update it because of the optional
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
166–167	Let's improve the documentation and let's use the word TSC more ubiquitously
184
185–193	Let's add more logic to this object so that it handles as much as it can and we reduce the logic that was added to DecodedThread. We also don't need two iterators, just one is enough. Don't forget to add documentation to these new methods. Given this new definition for TscRange. The only method we need to add in DecodedThread is CalculateTscRange(size_t insn_index), and mention the documentation that this operation is O(logn) and should be used sparingly.
186–188
196–203	Let's improve the documentation and also get rid of the `struct` keyword in return types. That's old style C
219–238	don't start variable names with __, as may people think that those variables should be discarded. Let's just give it a proper name. Let's also use an Optional and let's add documentation to all the variables.

This revision now requires changes to proceed.Mar 29 2022, 12:07 PM

Change TscRange to class

Update memory calc function

Harbormaster completed remote builds in B156835: Diff 418975.Mar 29 2022, 2:11 PM

Harbormaster completed remote builds in B156837: Diff 418977.Mar 29 2022, 2:14 PM

Prevent crash on printing info when we have 0 instructions

Harbormaster completed remote builds in B156840: Diff 418981.Mar 29 2022, 2:26 PM

Some calculations are wrong, but overall this is good. We are very close!

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
117–118	now that I think of this, you can delete this, because if the map is empty, this function will return in line 117
137–140	undo these two lines
159	decoded_thread instead of ref
161–162	delete
164–168	seeing ++ and -- is very hard to read. I also prefer not to modify the `it` variable for cleanness. Also doing end->first might crash the program. I'm writing here a correct version
173
178	The comparison is not right. let's use <= in a specific order to make it easier to read
186	As m_it is valid, doing the comparison `m_it == m_decoded_thread.m_instruction_timestamps.end()` will always return false. Remember that .end() will return a fake iterator that points to no value. Besides that, don't modify m_it. Let's better create a new iterator
190–192	Similarly, this has to be improved. I also like to put `--it` statements in their own line to make it easier to read.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
182
187–188
189	Move this class to the beginning of the public section of DecodedThread for easier discoverability
191
207–213	let's better delete this. It adds some maintenance cost with little benefits
218	This comment is hard to follow. Let's just delete it because it's a private constructor
223–227	let's add another piece of information
225–228	Let's just delete this, as we can get them directly from m_it without doing any operations
227
230	Setting to llvm::None here is equivalent to doing it from all the constructors
233
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
45–52	No need to do `m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);` because its value has already been calculated in the constructor. We can simplify this as well
99–101	are you using git clang-format? I'm curious why this line changed
101–107
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
45	/// Tsc range covering the current instruction.
lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
124	Instead of doing `*raw_size`, better remove this number from CalculateApproximateMemoryUsage()

This revision now requires changes to proceed.Mar 30 2022, 6:56 PM

Included requested changes, removed extra members

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
45–52	It is possible that when TraceCursorIntelPT is created the m_current_tsc is None, for example when just started the trace and tried to dump instructions... But then if a tsc is emitted later, this would cause it to remain None since we don't re-calculate it if it was initially None
99–101	Yes I am. I think its because its longer than 80 chars.
101–107	m_current_tsc is already checked at the beginning of this function

Harbormaster completed remote builds in B157111: Diff 419347.Mar 31 2022, 12:34 AM

Change tsc check anyway

Harbormaster completed remote builds in B157112: Diff 419348.Mar 31 2022, 12:40 AM

Fixed issue with TSC becoming invalid midway through trace

Harbormaster completed remote builds in B157228: Diff 419517.Mar 31 2022, 10:57 AM

Updated tests according to new memory usage calculation

Harbormaster completed remote builds in B157230: Diff 419519.Mar 31 2022, 11:08 AM

zrthxn marked an inline comment as done.Mar 31 2022, 11:09 AM

almost there! Mostly cosmetic changes needed

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
94–98	We need to handle a special case. It might happen that the first instruction is an error, which won't have a TSC, and the second instruction is an actual instruction, and from that point on you'll always have TSCs. For this case, we can assume that the TSC of the error instruction at the beginning of the trace is the same as the first valid TSC.
117–118	delete these two lines. The rest of the code will work well without it
158–160
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
104	+1
140	Let's be more verbose with the names to improve readability
142
148	let's receive a reference here and then convert it to pointer, so that we minimize the number of places with pointers. Also, if later we decide to use a shared_ptr instead of a pointer, we can do it inside of the constructor without changing this line
150
181–182
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
65–84	we can simplify this so that we only invoke CalculateTscRange once
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
46	rename it to `m_tsc_range`. The word current is very redundant in this case
lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
123–124	Use doubles, as the average might not be a whole number

This revision now requires changes to proceed.Mar 31 2022, 11:43 AM

Incorporate feedback and update tests

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
65–84	This is incorrect The converted code always returns 0. I've refactored it to have CalculateTscRange once but its a side-effect-y function and will need some future attention.

Harbormaster completed remote builds in B157262: Diff 419560.Mar 31 2022, 1:32 PM

one last nit and good to go

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
82	don't use auto for simple types

jj10306 added inline comments.Mar 31 2022, 3:24 PM

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145	nit: No need to friend the enclosing class since C++11 - https://en.cppreference.com/w/cpp/language/nested_types

wallace added inline comments.Mar 31 2022, 3:35 PM

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145	TIL!

Dont use auto for simple types

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145	We need the friend because we are using a private constructor from outside, in DecodedThread::CalculateTscRange and a couple other places. The idea is to let only DecodedThread create TscRange.

Harbormaster completed remote builds in B157310: Diff 419619.Mar 31 2022, 9:37 PM

lgtm

This revision is now accepted and ready to land.Mar 31 2022, 9:50 PM

Don't forget to update the description of this diff and of the commit before pushing (you need to do both). Include the avg instruction size for a trace of at least 10k instructions as well :)

The difference in memory usage is appreciable with large number of instructions, as shown below

# Before (with current metrics, total memory does not include raw trace size)
  Raw trace size: 2048 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 56143.10 KiB
  Average memory usage per instruction: 63.87 bytes

# After
  Raw trace size: 2048 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 42187.69 KiB
  Average memory usage per instruction: 48.00 bytes

Harbormaster completed remote builds in B157338: Diff 419656.Apr 1 2022, 12:50 AM

Closed by commit rGca922a3559d7: [intelpt] Refactor timestamps out of `IntelPTInstruction` (authored by zrthxn). · Explain WhyApr 1 2022, 9:22 AM

This revision was automatically updated to reflect the committed changes.

zrthxn added a commit: rGca922a3559d7: [intelpt] Refactor timestamps out of `IntelPTInstruction`.

Revision Contents

Path

Size

lldb/

source/

Plugins/

Trace/

intel-pt/

DecodedThread.h

97 lines

DecodedThread.cpp

82 lines

TraceCursorIntelPT.h

2 lines

TraceCursorIntelPT.cpp

49 lines

TraceIntelPT.cpp

5 lines

test/

API/

commands/

trace/

TestTraceDumpInfo.py

4 lines

TestTraceLoad.py

4 lines

Diff 419783

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

/// ///

/// Gaps in the trace can come in a few flavors: /// Gaps in the trace can come in a few flavors:

/// - tracing gaps (e.g. tracing was paused and then resumed) /// - tracing gaps (e.g. tracing was paused and then resumed)

/// - tracing errors (e.g. buffer overflow) /// - tracing errors (e.g. buffer overflow)

/// - decoding errors (e.g. some memory region couldn't be decoded) /// - decoding errors (e.g. some memory region couldn't be decoded)

/// As mentioned, any gap is represented as an error in this class. /// As mentioned, any gap is represented as an error in this class.

class IntelPTInstruction { class IntelPTInstruction {

public: public:

IntelPTInstruction(const pt_insn &pt_insn, uint64_t timestamp)

: m_pt_insn(pt_insn), m_timestamp(timestamp), m_is_error(false) {}

IntelPTInstruction(const pt_insn &pt_insn) IntelPTInstruction(const pt_insn &pt_insn)

: m_pt_insn(pt_insn), m_is_error(false) {} : m_pt_insn(pt_insn), m_is_error(false) {}

/// Error constructor /// Error constructor

IntelPTInstruction(); IntelPTInstruction();

/// Check if this object represents an error (i.e. a gap). /// Check if this object represents an error (i.e. a gap).

/// ///

/// \return /// \return

/// Whether this object represents an error. /// Whether this object represents an error.

bool IsError() const; bool IsError() const;

/// \return /// \return

/// The instruction pointer address, or \a LLDB_INVALID_ADDRESS if it is /// The instruction pointer address, or \a LLDB_INVALID_ADDRESS if it is

/// an error. /// an error.

lldb::addr_t GetLoadAddress() const; lldb::addr_t GetLoadAddress() const;

/// Get the size in bytes of an instance of this class /// Get the size in bytes of an instance of this class

static size_t GetMemoryUsage(); static size_t GetMemoryUsage();

/// Get the timestamp associated with the current instruction. The timestamp

/// is similar to what a rdtsc instruction would return.

///

/// \return

/// The timestamp or \b llvm::None if not available.

llvm::Optional<uint64_t> GetTimestampCounter() const;

/// Get the \a lldb::TraceInstructionControlFlowType categories of the /// Get the \a lldb::TraceInstructionControlFlowType categories of the

/// instruction. /// instruction.

/// ///

/// \param[in] next_load_address /// \param[in] next_load_address

/// The address of the next instruction in the trace or \b /// The address of the next instruction in the trace or \b

/// LLDB_INVALID_ADDRESS if not available. /// LLDB_INVALID_ADDRESS if not available.

/// ///

/// \return /// \return

/// The control flow categories, or \b 0 if the instruction is an error. /// The control flow categories, or \b 0 if the instruction is an error.

lldb::TraceInstructionControlFlowType lldb::TraceInstructionControlFlowType

GetControlFlowType(lldb::addr_t next_load_address) const; GetControlFlowType(lldb::addr_t next_load_address) const;

IntelPTInstruction(IntelPTInstruction &&other) = default; IntelPTInstruction(IntelPTInstruction &&other) = default;

private: private:

IntelPTInstruction(const IntelPTInstruction &other) = delete; IntelPTInstruction(const IntelPTInstruction &other) = delete;

const IntelPTInstruction &operator=(const IntelPTInstruction &other) = delete; const IntelPTInstruction &operator=(const IntelPTInstruction &other) = delete;

// When adding new members to this class, make sure to update // When adding new members to this class, make sure to update

// IntelPTInstruction::GetNonErrorMemoryUsage() if needed. // IntelPTInstruction::GetMemoryUsage() if needed.

wallaceUnsubmitted

Done

wallace: +1

pt_insn m_pt_insn; pt_insn m_pt_insn;

llvm::Optional<uint64_t> m_timestamp;

bool m_is_error; bool m_is_error;

}; };

/// \class DecodedThread /// \class DecodedThread

/// Class holding the instructions and function call hierarchy obtained from /// Class holding the instructions and function call hierarchy obtained from

/// decoding a trace, as well as a position cursor used when reverse debugging /// decoding a trace, as well as a position cursor used when reverse debugging

/// the trace. /// the trace.

/// ///

/// Each decoded thread contains a cursor to the current position the user is /// Each decoded thread contains a cursor to the current position the user is

/// stopped at. See \a Trace::GetCursorPosition for more information. /// stopped at. See \a Trace::GetCursorPosition for more information.

class DecodedThread : public std::enable_shared_from_this<DecodedThread> { class DecodedThread : public std::enable_shared_from_this<DecodedThread> {

public: public:

/// \class TscRange

/// Class that represents the trace range associated with a given TSC.

/// It provides efficient iteration to the previous or next TSC range in the

/// decoded trace.

///

/// TSC timestamps are emitted by the decoder infrequently, which means

/// that each TSC covers a range of instruction indices, which can be used to

/// speed up TSC lookups.

class TscRange {

public:

/// Check if this TSC range includes the given instruction index.

bool InRange(size_t insn_index);

/// Get the next range chronologically.

llvm::Optional<TscRange> Next();

/// Get the previous range chronologically.

llvm::Optional<TscRange> Prev();

/// Get the TSC value.

size_t GetTsc() const;

/// Get the smallest instruction index that has this TSC.

size_t GetStartInstructionIndex() const;

wallaceUnsubmitted

Done

/// Get the smallest instruction index that has this TSC.

- size_t GetStart() const;

+ size_t GetStartInstructionIndex() const;

/// Get the largest instruction index that has this TSC.

Let's be more verbose with the names to improve readability

wallace: Let's be more verbose with the names to improve readability

/// Get the largest instruction index that has this TSC.

size_t GetEndInstructionIndex() const;

wallaceUnsubmitted

Done

/// Get the largest instruction index that has this TSC.

- size_t GetEnd() const;

+ size_t GetEndInstructionIndex() const;

private:

wallace:

private:

friend class DecodedThread;

jj10306Unsubmitted

Done

nit: No need to friend the enclosing class since C++11 - https://en.cppreference.com/w/cpp/language/nested_types

jj10306: nit: No need to friend the enclosing class since C++11 - https://en.cppreference.

wallaceUnsubmitted

Done

TIL!

wallace: TIL!

zrthxnAuthorUnsubmitted

Done

We need the friend because we are using a private constructor from outside, in DecodedThread::CalculateTscRange and a couple other places. The idea is to let only DecodedThread create TscRange.

zrthxn: We need the friend because we are using a private constructor from outside, in DecodedThread…

TscRange(std::map<size_t, uint64_t>::const_iterator it,

const DecodedThread &decoded_thread);

wallaceUnsubmitted

Done

TscRange(std::map<size_t, uint64_t>::const_iterator it,

- const DecodedThread *decoded_thread);

+ const DecodedThread &decoded_thread);

/// The current range

let's receive a reference here and then convert it to pointer, so that we minimize the number of places with pointers. Also, if later we decide to use a shared_ptr instead of a pointer, we can do it inside of the constructor without changing this line

wallace: let's receive a reference here and then convert it to pointer, so that we minimize the number…

/// The iterator pointing to the beginning of the range.

wallaceUnsubmitted

Done

const DecodedThread *decoded_thread);

- /// The current range

- std::map<size_t, uint64_t>::const_iterator m_it;

+ /// The iterator pointing to the beginning of the range std::map<size_t, uint64_t>::const_iterator m_it;

wallace:

std::map<size_t, uint64_t>::const_iterator m_it;

/// The largest instruction index that has this TSC.

size_t m_end_index;

const DecodedThread *m_decoded_thread;

};

DecodedThread(lldb::ThreadSP thread_sp); DecodedThread(lldb::ThreadSP thread_sp);

/// Utility constructor that initializes the trace with a provided error. /// Utility constructor that initializes the trace with a provided error.

DecodedThread(lldb::ThreadSP thread_sp, llvm::Error &&err); DecodedThread(lldb::ThreadSP thread_sp, llvm::Error &&err);

/// Append a successfully decoded instruction.

void AppendInstruction(const pt_insn &instruction);

wallaceUnsubmitted

Done

receive const &pt_insn to avoid copies

wallace: receive `const &pt_insn` to avoid copies

/// Append a sucessfully decoded instruction with an associated TSC timestamp.

void AppendInstruction(const pt_insn &instruction, uint64_t tsc);

wallaceUnsubmitted

Done

Let's just use an overload

void AppendInstruction(const pt_insn &instruction, uint64_t tsc);

wallace: Let's just use an overload void AppendInstruction(const pt_insn &instruction, uint64_t tsc);

wallaceUnsubmitted

Done

void AppendInstruction(const pt_insn& instruction);

- /// Append a timestamp at the index of the last instruction.

- void AppendInstruction(const pt_insn& instruction, uint64_t timestamp);

+ /// Append a sucessfully decoded instruction along with an associated TSC timestamp.

+ void AppendInstruction(const pt_insn& instruction, uint64_t tsc);

/// Append a decoding error (i.e. an instruction that failed to be decoded).

Let's improve the documentation and let's use the word TSC more ubiquitously

wallace: Let's improve the documentation and let's use the word TSC more ubiquitously

/// Append a decoding error (i.e. an instruction that failed to be decoded).

void AppendError(llvm::Error &&error);

/// Get the instructions from the decoded trace. Some of them might indicate /// Get the instructions from the decoded trace. Some of them might indicate

/// errors (i.e. gaps) in the trace. For an instruction error, you can access /// errors (i.e. gaps) in the trace. For an instruction error, you can access

/// its underlying error message with the \a GetErrorByInstructionIndex() /// its underlying error message with the \a GetErrorByInstructionIndex()

/// method. /// method.

/// ///

/// \return /// \return

/// The instructions of the trace. /// The instructions of the trace.

llvm::ArrayRef<IntelPTInstruction> GetInstructions() const; llvm::ArrayRef<IntelPTInstruction> GetInstructions() const;

/// Construct the TSC range that covers the given instruction index.

/// This operation is O(logn) and should be used sparingly.

wallaceUnsubmitted

Done

llvm::ArrayRef<IntelPTInstruction> GetInstructions() const;

- /// Get timestamp of an instruction by its index.

- uint64_t GetInstructionTimestamp(size_t index) const;

+ /// Get the chronologically most recent TSC of an instruction by its index.

+ llvm::Optional<uint64_t> GetInstructionTSC(size_t index) const;

/// Check if the instruction at a given index was an error.

This has to be an optional because it might not be present

wallace: This has to be an optional because it might not be present

wallaceUnsubmitted

Done

better use insn_idx instead of index for all these variables

wallace: better use insn_idx instead of index for all these variables

wallaceUnsubmitted

Done

/// \class TscRange

- /// Class that represents the instruction range associated with a given TSC.

+ /// Class that represents the trace range associated with a given TSC.

/// It provides efficient iteration to the previous or next TSC range in the

wallace:

wallaceUnsubmitted

Done

llvm::ArrayRef<IntelPTInstruction> GetInstructions() const;

/// Construct the TSC range that covers the given instruction index.

/// This operation is O(logn) and should be used sparingly.

+ /// If the trace was collected with TSC support, all the instructions of

+ /// the trace will have associated TSCs. This means that this method will

+ /// only return \b llvm::None if there are no TSCs whatsoever in the trace.

llvm::Optional<TscRange> CalculateTscRange(size_t insn_index) const;

wallace:

/// If the trace was collected with TSC support, all the instructions of

/// the trace will have associated TSCs. This means that this method will

wallaceUnsubmitted

Done

llvm::Optional<std::map<size_t, uint64_t>::const_iterator> GetTSCIterator(size_t index) const;

- // Range of timestamp iterators for a given index

+ /// TSC timestamps are emitted by the decoder not very frequently, which means that each TSC covers a range of

+ /// instruction indices, which we can use to speed up TSC lookups.

struct TscRange {

wallace:

/// only return \b llvm::None if there are no TSCs whatsoever in the trace.

llvm::Optional<TscRange> CalculateTscRange(size_t insn_index) const;

/// Check if an instruction given by its index is an error.

wallaceUnsubmitted

Done

uint64_t GetInstructionTimestamp(size_t index) const;

- /// Check if the instruction at a given index was an error.

- /// -- Reasoning (this will be removed before committing)

- /// This is faster and less wasteful of memory than creating an ArrayRef

- /// every time that you need to check this, with GetInstructions()[i].IsError()

- bool GetInstructionIsError(size_t insn_idx) const;

+ /// Check if an instruction given by its index is an error.

+ bool IsInstructionAnError(size_t insn_idx) const;

/// Get the error associated with a given instruction index.

I actually like this, let's improve it

wallace: I actually like this, let's improve it

wallaceUnsubmitted

Done

struct TscRange {

+ /// The TSC value

uint64_t tsc;

+ /// The smallest instruction index that has this TSC.

size_t start_index;

+ /// The larges instruction index that has this TSC.

size_t end_index;

std::map<size_t, uint64_t>::const_iterator next;

wallace:

wallaceUnsubmitted

Done

/// TSC timestamps are emitted by the decoder infrequently, which means

- /// that each TSC covers a range of instruction indices, which we can use to

+ /// that each TSC covers a range of instruction indices, which we can be used to

/// speed up TSC lookups.

class TscRange {

wallace:

bool IsInstructionAnError(size_t insn_idx) const;

wallaceUnsubmitted

Done

Move this class to the beginning of the public section of DecodedThread for easier discoverability

wallace: Move this class to the beginning of the public section of DecodedThread for easier…

/// Get the error associated with a given instruction index. /// Get the error associated with a given instruction index.

wallaceUnsubmitted

Done

public:

- // Check if current TSC range covers given instruction index.

+ // Check if this TSC range includes the given instruction index.

bool InRange(size_t insn_index);

wallace:

/// ///

/// \return /// \return

wallaceUnsubmitted

Done

// Range of timestamp iterators for a given index

- struct TscRange {

- uint64_t tsc;

- size_t start_index;

- size_t end_index;

- std::map<size_t, uint64_t>::const_iterator next;

- std::map<size_t, uint64_t>::const_iterator prev;

+ /// Class that represents the instruction range associated with a given TSC. It provides efficient iteration

+ /// to the previous or next TSC range in the decoded trace.

+ class TscRange {

+ public:

+ bool InRange(size_t insn_index);

- // Construct TscRange respecting bounds of timestamp map in thread

- TscRange(std::map<size_t, uint64_t>::const_iterator it, const DecodedThread& ref);

+ Optional<TscRange> Next();

+ Optional<TscRange> Prev();

+ private:

+ friend class DecodedThread;

+ TscRange(std::map<size_t, uint64_t>::const_iterator it, const &decoded_thread);

+ std::map<size_t, uint64_t>::const_iterator m_it;

+ const DecodedThread &m_decoded_thread;

+ uint64_t m_tsc;

+ size_t m_start_index;

+ size_t m_end_index;

};

/// Get timestamp of an instruction by its index.

Let's add more logic to this object so that it handles as much as it can and we reduce the logic that was added to DecodedThread. We also don't need two iterators, just one is enough. Don't forget to add documentation to these new methods.

Given this new definition for TscRange. The only method we need to add in DecodedThread is CalculateTscRange(size_t insn_index), and mention the documentation that this operation is O(logn) and should be used sparingly.

wallace: Let's add more logic to this object so that it handles as much as it can and we reduce the…

/// The error message of \b nullptr if the given index /// The error message of \b nullptr if the given index

/// points to a valid instruction. /// points to a valid instruction.

const char *GetErrorByInstructionIndex(uint64_t ins_idx); const char *GetErrorByInstructionIndex(size_t ins_idx);

/// Append a successfully decoded instruction.

template <typename... Ts> void AppendInstruction(Ts... instruction_args) {

m_instructions.emplace_back(instruction_args...);

}

/// Append a decoding error (i.e. an instruction that failed to be decoded).

void AppendError(llvm::Error &&error);

/// Get a new cursor for the decoded thread. /// Get a new cursor for the decoded thread.

lldb::TraceCursorUP GetCursor(); lldb::TraceCursorUP GetCursor();

/// Set the size in bytes of the corresponding Intel PT raw trace. /// Set the size in bytes of the corresponding Intel PT raw trace.

void SetRawTraceSize(size_t size); void SetRawTraceSize(size_t size);

wallaceUnsubmitted

Done

TscRange(std::map<size_t, uint64_t>::const_iterator it, const DecodedThread& ref);

};

- /// Get timestamp of an instruction by its index.

- llvm::Optional<struct TscRange> GetTSCRange(size_t insn_index) const;

+ /// Get the TSC range of an instruction by its index.

+ llvm::Optional<TscRange> GetTSCRange(size_t insn_index) const;

- /// Get timestamp of an instruction by its index.

- llvm::Optional<struct TscRange> GetNextTSCRange(const TscRange& range) const;

+ /// Given a TSC range, get the next range chronologically.

+ llvm::Optional<TscRange> GetNextTSCRange(const TscRange& range) const;

- /// Get timestamp of an instruction by its index.

- llvm::Optional<struct TscRange> GetPrevTSCRange(const TscRange& range) const;

+ /// Given a TSC range, get the previous range chronologically.

+ llvm::Optional<TscRange> GetPrevTSCRange(const TscRange& range) const;

/// Check if an instruction given by its index is an error.

Let's improve the documentation and also get rid of the struct keyword in return types. That's old style C

wallace: Let's improve the documentation and also get rid of the `struct` keyword in return types.

/// Get the size in bytes of the corresponding Intel PT raw trace. /// Get the size in bytes of the corresponding Intel PT raw trace.

/// ///

/// \return /// \return

/// The size of the trace, or \b llvm::None if not available. /// The size of the trace, or \b llvm::None if not available.

llvm::Optional<size_t> GetRawTraceSize() const; llvm::Optional<size_t> GetRawTraceSize() const;

/// The approximate size in bytes used by this instance, /// The approximate size in bytes used by this instance,

/// including all the already decoded instructions. /// including all the already decoded instructions.

size_t CalculateApproximateMemoryUsage() const; size_t CalculateApproximateMemoryUsage() const;

wallaceUnsubmitted

Done

let's better delete this. It adds some maintenance cost with little benefits

wallace: let's better delete this. It adds some maintenance cost with little benefits

lldb::ThreadSP GetThread(); lldb::ThreadSP GetThread();

private: private:

/// When adding new members to this class, make sure /// When adding new members to this class, make sure

/// to update \a CalculateApproximateMemoryUsage() accordingly. /// to update \a CalculateApproximateMemoryUsage() accordingly.

wallaceUnsubmitted

Done

This comment is hard to follow. Let's just delete it because it's a private constructor

wallace: This comment is hard to follow. Let's just delete it because it's a private constructor

lldb::ThreadSP m_thread_sp; lldb::ThreadSP m_thread_sp;

/// The low level storage of all instruction addresses. Each instruction has

/// an index in this vector and it will be used in other parts of the code.

std::vector<IntelPTInstruction> m_instructions; std::vector<IntelPTInstruction> m_instructions;

/// This map contains the TSCs of the decoded instructions. It maps

/// `instruction index -> TSC`, where `instruction index` is the first index

/// at which the mapped TSC appears. We use this representation because TSCs

/// are sporadic and we can think of them as ranges. If TSCs are present in

/// the trace, all instructions will have an associated TSC, including the

wallaceUnsubmitted

Done

/// appears. We use this representation because TSCs are sporadic and we can

- /// think of it as ranges.

+ /// think of them as ranges.

std::map<size_t, uint64_t> m_instruction_timestamps;

wallace:

wallaceUnsubmitted

Done

std::vector<IntelPTInstruction> m_instructions;

- /// This map contains the TSCs of the decoded instructions. It might be empty

- /// if the trace doesn't contain TSCs. It maps `instruction index -> TSC`,

+ /// This map contains the TSCs of the decoded instructions. It maps `instruction index -> TSC`,

/// where `instruction index` is the first index at which the mapped TSC

/// appears. We use this representation because TSCs are sporadic and we can

/// think of it as ranges.

+ /// If TSCs are present in the trace, all instructions will have an associated TSC, including the first one. Otherwise, this map will be empty.

std::map<size_t, uint64_t> m_instruction_timestamps;

let's add another piece of information

wallace: let's add another piece of information

/// first one. Otherwise, this map will be empty.

wallaceUnsubmitted

Done

Let's just delete this, as we can get them directly from m_it without doing any operations

wallace: Let's just delete this, as we can get them directly from m_it without doing any operations

std::map<size_t, uint64_t> m_instruction_timestamps;

/// This is the chronologically last TSC that has been added.

wallaceUnsubmitted

Done

/// This is the chronologically last TSC that has been added.

- llvm::Optional<uint64_t> m_last_tsc;

+ llvm::Optional<uint64_t> m_last_tsc = llvm::None;

// This variables stores the messages of all the error instructions in the

Setting to llvm::None here is equivalent to doing it from all the constructors

wallace: Setting to llvm::None here is equivalent to doing it from all the constructors

llvm::Optional<uint64_t> m_last_tsc = llvm::None;

// This variables stores the messages of all the error instructions in the

// trace. It maps `instruction index -> error message`.

wallaceUnsubmitted

Done

size_t m_end_index;

};

- /// Construct a TSC range of an instruction by its index.

+ /// Construct the TSC range that covers the given instruction index.

/// This operation is O(logn) and should be used sparingly.

wallace:

llvm::DenseMap<uint64_t, std::string> m_errors; llvm::DenseMap<uint64_t, std::string> m_errors;

/// The size in bytes of the raw buffer before decoding. It might be None if

/// the decoding failed.

llvm::Optional<size_t> m_raw_trace_size; llvm::Optional<size_t> m_raw_trace_size;

}; };

wallaceUnsubmitted

Done

/// to update \a CalculateApproximateMemoryUsage() accordingly.

lldb::ThreadSP m_thread_sp;

+ /// This is the low level storage of all instruction addresses. Each instruction has an index in this vector

+ /// and it will be used in other parts of the code.

std::vector<IntelPTInstruction> m_instructions;

+ /// This map contains the TSCs of the decoded instructions. It might be empty if the trace doesn't contain TSCs.

+ /// It maps `instruction index -> TSC`, where `instruction index` is the first index at which the mapped TSC appears. We use this representation because TSCs are sporadic and we can think of it as ranges.

std::map<size_t, uint64_t> m_instruction_timestamps;

+ /// This is the chronologically last TSC that has been added.

+ llvm::Optional<uint64_t> m_last_tsc;

+ /// This variables stores the messages of all the error instructions in the trace. It maps `instruction index -> error message`.

llvm::DenseMap<uint64_t, std::string> m_errors;

+ /// The size in bytes of the raw buffer before decoding. It might be None if the decoding failed.

llvm::Optional<size_t> m_raw_trace_size;

- uint64_t __last_tsc;

};

using DecodedThreadSP = std::shared_ptr<DecodedThread>;

don't start variable names with __, as may people think that those variables should be discarded. Let's just give it a proper name. Let's also use an Optional and let's add documentation to all the variables.

wallace: don't start variable names with __, as may people think that those variables should be…

using DecodedThreadSP = std::shared_ptr<DecodedThread>; using DecodedThreadSP = std::shared_ptr<DecodedThread>;

} // namespace trace_intel_pt } // namespace trace_intel_pt

} // namespace lldb_private } // namespace lldb_private

#endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_DECODEDTHREAD_H #endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_DECODEDTHREAD_H

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines

bool IntelPTInstruction::IsError() const { return m_is_error; } bool IntelPTInstruction::IsError() const { return m_is_error; }

lldb::addr_t IntelPTInstruction::GetLoadAddress() const { return m_pt_insn.ip; } lldb::addr_t IntelPTInstruction::GetLoadAddress() const { return m_pt_insn.ip; }

size_t IntelPTInstruction::GetMemoryUsage() { size_t IntelPTInstruction::GetMemoryUsage() {

return sizeof(IntelPTInstruction); return sizeof(IntelPTInstruction);

} }

Optional<uint64_t> IntelPTInstruction::GetTimestampCounter() const {

return m_timestamp;

}

Optional<size_t> DecodedThread::GetRawTraceSize() const { Optional<size_t> DecodedThread::GetRawTraceSize() const {

return m_raw_trace_size; return m_raw_trace_size;

} }

TraceInstructionControlFlowType TraceInstructionControlFlowType

IntelPTInstruction::GetControlFlowType(lldb::addr_t next_load_address) const { IntelPTInstruction::GetControlFlowType(lldb::addr_t next_load_address) const {

if (IsError()) if (IsError())

return (TraceInstructionControlFlowType)0; return (TraceInstructionControlFlowType)0;

Show All 21 Lines default:

break; break;

} }

return mask; return mask;

} }

ThreadSP DecodedThread::GetThread() { return m_thread_sp; } ThreadSP DecodedThread::GetThread() { return m_thread_sp; }

void DecodedThread::AppendInstruction(const pt_insn &insn) {

m_instructions.emplace_back(insn);

}

void DecodedThread::AppendInstruction(const pt_insn &insn, uint64_t tsc) {

m_instructions.emplace_back(insn);

if (!m_last_tsc || *m_last_tsc != tsc) {

wallaceUnsubmitted

Done

m_instructions.emplace_back(insn);

- if (__last_tsc != tsc) {

+ if (!m_last_tsc || m_last_tsc* != tsc) {

m_instruction_timestamps.emplace(m_instructions.size() - 1, tsc);

we need to update it because of the optional

wallace: we need to update it because of the optional

// In case the first instructions are errors or did not have a TSC, we'll

wallaceUnsubmitted

Done

doing [] is O(logn), and we want to be faster than this.

You can do the following which is O(1)

auto it = m_instruction_timestamps.end()
if (it != m_instruction_timestamps.begin()) {
  it--;
  if (it->second != tsc) {
    // this tsc is not the same!
   m_instruction_timestamps.insert(insn_idx, tsc);
  } else {
    // this tsc is the same, do nothing
  }
}

you can further optimize this by storing the last tsc that has been appended, that way you don't even need to create iterators

wallace: doing [] is O(logn), and we want to be faster than this. You can do the following which is O…

// get a first valid TSC not in position 0. We can safely force these error

// instructions to use the first valid TSC, so that all the trace has TSCs.

wallaceUnsubmitted

Done

void DecodedThread::AppendInstruction(const pt_insn &insn, uint64_t tsc) {

m_instructions.emplace_back(insn);

if (!m_last_tsc || *m_last_tsc != tsc) {

- m_instruction_timestamps.emplace(m_instructions.size() - 1, tsc);

+ // In case the first instructions are errors, we'll get a first valid TSC not in position 0. We can

+ // safely force these error instructions to use the first valid TSC, so that all the trace has TSCs.

+ size_t start_index = m_instruction_timestamps.empty() ? 0 : m_instructions.size() - 1;

+ m_instruction_timestamps.emplace(start_index, tsc);

m_last_tsc = tsc;

}

void DecodedThread::AppendError(llvm::Error &&error) {

We need to handle a special case. It might happen that the first instruction is an error, which won't have a TSC, and the second instruction is an actual instruction, and from that point on you'll always have TSCs. For this case, we can assume that the TSC of the error instruction at the beginning of the trace is the same as the first valid TSC.

wallace: We need to handle a special case. It might happen that the first instruction is an error, which…

size_t start_index =

m_instruction_timestamps.empty() ? 0 : m_instructions.size() - 1;

m_instruction_timestamps.emplace(start_index, tsc);

m_last_tsc = tsc;

}

void DecodedThread::AppendError(llvm::Error &&error) { void DecodedThread::AppendError(llvm::Error &&error) {

m_errors.try_emplace(m_instructions.size(), toString(std::move(error))); m_errors.try_emplace(m_instructions.size(), toString(std::move(error)));

m_instructions.emplace_back(); m_instructions.emplace_back();

} }

ArrayRef<IntelPTInstruction> DecodedThread::GetInstructions() const { ArrayRef<IntelPTInstruction> DecodedThread::GetInstructions() const {

return makeArrayRef(m_instructions); return makeArrayRef(m_instructions);

} }

const char *DecodedThread::GetErrorByInstructionIndex(uint64_t idx) { Optional<DecodedThread::TscRange>

auto it = m_errors.find(idx); DecodedThread::CalculateTscRange(size_t insn_index) const {

auto it = m_instruction_timestamps.upper_bound(insn_index);

if (it == m_instruction_timestamps.begin())

wallaceUnsubmitted

Done

this is wrong, you need to use upper_bound - 1, like this:

if (m_instruction_timestamps.empty())
  return None;

auto it = m_instruction_timestamps.upper_bound(insn_idx);
if (it == m_instruction_timestamps.begin())
  return None;

--it;
return it->second;

this will allow you to go to the largest index that is <= than insn_idx

wallace: this is wrong, you need to use upper_bound - 1, like this: if (m_instruction_timestamps.

wallaceUnsubmitted

Done

now that I think of this, you can delete this, because if the map is empty, this function will return in line 117

wallace: now that I think of this, you can delete this, because if the map is empty, this function will…

wallaceUnsubmitted

Done

delete these two lines. The rest of the code will work well without it

wallace: delete these two lines. The rest of the code will work well without it

return None;

return TscRange(--it, *this);

}

bool DecodedThread::IsInstructionAnError(size_t insn_idx) const {

return m_instructions[insn_idx].IsError();

}

const char *DecodedThread::GetErrorByInstructionIndex(size_t insn_idx) {

auto it = m_errors.find(insn_idx);

if (it == m_errors.end()) if (it == m_errors.end())

return nullptr; return nullptr;

return it->second.c_str(); return it->second.c_str();

} }

DecodedThread::DecodedThread(ThreadSP thread_sp) : m_thread_sp(thread_sp) {} DecodedThread::DecodedThread(ThreadSP thread_sp) : m_thread_sp(thread_sp) {}

DecodedThread::DecodedThread(ThreadSP thread_sp, Error &&error) DecodedThread::DecodedThread(ThreadSP thread_sp, Error &&error)

: m_thread_sp(thread_sp) { : m_thread_sp(thread_sp) {

AppendError(std::move(error)); AppendError(std::move(error));

wallaceUnsubmitted

Done

undo these two lines

wallace: undo these two lines

} }

void DecodedThread::SetRawTraceSize(size_t size) { m_raw_trace_size = size; } void DecodedThread::SetRawTraceSize(size_t size) { m_raw_trace_size = size; }

lldb::TraceCursorUP DecodedThread::GetCursor() { lldb::TraceCursorUP DecodedThread::GetCursor() {

// We insert a fake error signaling an empty trace if needed becasue the // We insert a fake error signaling an empty trace if needed becasue the

// TraceCursor requires non-empty traces. // TraceCursor requires non-empty traces.

if (m_instructions.empty()) if (m_instructions.empty())

AppendError(createStringError(inconvertibleErrorCode(), "empty trace")); AppendError(createStringError(inconvertibleErrorCode(), "empty trace"));

return std::make_unique<TraceCursorIntelPT>(m_thread_sp, shared_from_this()); return std::make_unique<TraceCursorIntelPT>(m_thread_sp, shared_from_this());

} }

size_t DecodedThread::CalculateApproximateMemoryUsage() const { size_t DecodedThread::CalculateApproximateMemoryUsage() const {

return m_raw_trace_size.getValueOr(0) + return IntelPTInstruction::GetMemoryUsage() * m_instructions.size() +

IntelPTInstruction::GetMemoryUsage() * m_instructions.size() +

m_errors.getMemorySize(); m_errors.getMemorySize();

} }

DecodedThread::TscRange::TscRange(std::map<size_t, uint64_t>::const_iterator it,

const DecodedThread &decoded_thread)

wallaceUnsubmitted

Done

decoded_thread instead of ref

wallace: decoded_thread instead of ref

: m_it(it), m_decoded_thread(&decoded_thread) {

wallaceUnsubmitted

Done

m_errors.getMemorySize();

}

DecodedThread::TscRange::TscRange(std::map<size_t, uint64_t>::const_iterator it,

- const DecodedThread *decoded_thread)

- : m_it(it), m_decoded_thread(decoded_thread) {

+ const DecodedThread &decoded_thread)

+ : m_it(it), m_decoded_thread(&decoded_thread) {

auto next_it = m_it;

wallace:

auto next_it = m_it;

++next_it;

wallaceUnsubmitted

Done

delete

wallace: delete

m_end_index = (next_it == m_decoded_thread->m_instruction_timestamps.end())

? m_decoded_thread->GetInstructions().size() - 1

: next_it->first - 1;

}

size_t DecodedThread::TscRange::GetTsc() const { return m_it->second; }

wallaceUnsubmitted

Done

m_tsc = it->second;

- auto end = m_decoded_thread.m_instruction_timestamps.end();

- if (it != end)

- m_end_index = (++it--)->first - 1;

- else

- m_end_index = end->first;

+ auto next_it = m_it;

+ ++next_it;

+ m_end_index = next_it == m_decoded_thread.m_instruction_timestamps.end())

+ ? m_decoded_thread.GetInstructions().size() - 1;

+ : next_it->first - 1;

}

size_t DecodedThread::TscRange::GetTsc() const { return m_tsc; }

seeing ++ and -- is very hard to read. I also prefer not to modify the it variable for cleanness. Also doing end->first might crash the program. I'm writing here a correct version

wallace: seeing ++ and -- is very hard to read. I also prefer not to modify the `it` variable for…

size_t DecodedThread::TscRange::GetStartInstructionIndex() const {

return m_it->first;

}

wallaceUnsubmitted

Done

size_t DecodedThread::TscRange::GetTsc() const { return m_tsc; }

- size_t DecodedThread::TscRange::GetStart() const { return m_start_index; }

+ size_t DecodedThread::TscRange::GetStart() const { return m_it->first; }

size_t DecodedThread::TscRange::GetEnd() const { return m_end_index; }

wallace:

size_t DecodedThread::TscRange::GetEndInstructionIndex() const {

return m_end_index;

}

bool DecodedThread::TscRange::InRange(size_t insn_index) {

wallaceUnsubmitted

Done

bool DecodedThread::TscRange::InRange(size_t insn_index) {

- if (insn_index < m_end_index && insn_index > m_start_index)

+ return m_start_index <= insn_index && insn_index <= m_end_index;

return true;

The comparison is not right. let's use <= in a specific order to make it easier to read

wallace: The comparison is not right. let's use <= in a specific order to make it easier to read

return GetStartInstructionIndex() <= insn_index &&

insn_index <= GetEndInstructionIndex();

}

Optional<DecodedThread::TscRange> DecodedThread::TscRange::Next() {

auto next_it = m_it;

++next_it;

if (next_it == m_decoded_thread->m_instruction_timestamps.end())

wallaceUnsubmitted

Done

return None;

- return TscRange(++m_it, m_decoded_thread);

+ auto next_it = m_it;

+ ++next_it;

+ if (m_it == m_decoded_thread.m_instruction_timestamps.end())

+ return None;

+ return TscRange(next_it, m_decoded_thread);

}

Optional<DecodedThread::TscRange> DecodedThread::TscRange::Prev() {

As m_it is valid, doing the comparison m_it == m_decoded_thread.m_instruction_timestamps.end() will always return false. Remember that .end() will return a fake iterator that points to no value.

Besides that, don't modify m_it. Let's better create a new iterator

wallace: As m_it is valid, doing the comparison `m_it == m_decoded_thread.m_instruction_timestamps.end…

return None;

return TscRange(next_it, *m_decoded_thread);

}

Optional<DecodedThread::TscRange> DecodedThread::TscRange::Prev() {

if (m_it == m_decoded_thread->m_instruction_timestamps.begin())

wallaceUnsubmitted

Done

Optional<DecodedThread::TscRange> DecodedThread::TscRange::Prev() {

- if (m_it == m_decoded_thread.m_instruction_timestamps.end())

+ if (m_it == m_decoded_thread.m_instruction_timestamps.begin())

return None;

- return TscRange(--m_it, m_decoded_thread);

+ auto prev_it = m_it;

+ --prev_it;

+ return TscRange(prev_it, m_decoded_thread);

}

Similarly, this has to be improved. I also like to put --it statements in their own line to make it easier to read.

wallace: Similarly, this has to be improved. I also like to put `--it` statements in their own line to…

return None;

auto prev_it = m_it;

--prev_it;

return TscRange(prev_it, *m_decoded_thread);

}

No newline at end of file

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h

Show All 36 Lines

private: private:

size_t GetInternalInstructionSize(); size_t GetInternalInstructionSize();

/// Storage of the actual instructions /// Storage of the actual instructions

DecodedThreadSP m_decoded_thread_sp; DecodedThreadSP m_decoded_thread_sp;

/// Internal instruction index currently pointing at. /// Internal instruction index currently pointing at.

size_t m_pos; size_t m_pos;

/// Tsc range covering the current instruction.

wallaceUnsubmitted

Done

/// Tsc range covering the current instruction.

wallace: /// Tsc range covering the current instruction.

llvm::Optional<DecodedThread::TscRange> m_tsc_range;

wallaceUnsubmitted

Done

Optional<uint64_t> m_current_tsc;

wallace: Optional<uint64_t> m_current_tsc;

wallaceUnsubmitted

Done

/// Tsc range covering the current instruction.

- llvm::Optional<DecodedThread::TscRange> m_current_tsc;

+ llvm::Optional<DecodedThread::TscRange> m_tsc_range;

};

} // namespace trace_intel_pt

rename it to m_tsc_range. The word current is very redundant in this case

wallace: rename it to `m_tsc_range`. The word current is very redundant in this case

}; };

} // namespace trace_intel_pt } // namespace trace_intel_pt

} // namespace lldb_private } // namespace lldb_private

#endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_TRACECURSORINTELPT_H #endif // LLDB_SOURCE_PLUGINS_TRACE_INTEL_PT_TRACECURSORINTELPT_H

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp

Show All 16 Lines

using namespace lldb_private::trace_intel_pt; using namespace lldb_private::trace_intel_pt;

using namespace llvm; using namespace llvm;

TraceCursorIntelPT::TraceCursorIntelPT(ThreadSP thread_sp, TraceCursorIntelPT::TraceCursorIntelPT(ThreadSP thread_sp,

DecodedThreadSP decoded_thread_sp) DecodedThreadSP decoded_thread_sp)

: TraceCursor(thread_sp), m_decoded_thread_sp(decoded_thread_sp) { : TraceCursor(thread_sp), m_decoded_thread_sp(decoded_thread_sp) {

assert(!m_decoded_thread_sp->GetInstructions().empty() && assert(!m_decoded_thread_sp->GetInstructions().empty() &&

"a trace should have at least one instruction or error"); "a trace should have at least one instruction or error");

m_pos = m_decoded_thread_sp->GetInstructions().size() - 1; m_pos = m_decoded_thread_sp->GetInstructions().size() - 1;

m_tsc_range = m_decoded_thread_sp->CalculateTscRange(m_pos);

wallaceUnsubmitted

Done

here you need to set the correct value of m_current_tsc

wallace: here you need to set the correct value of m_current_tsc

} }

size_t TraceCursorIntelPT::GetInternalInstructionSize() { size_t TraceCursorIntelPT::GetInternalInstructionSize() {

return m_decoded_thread_sp->GetInstructions().size(); return m_decoded_thread_sp->GetInstructions().size();

} }

bool TraceCursorIntelPT::Next() { bool TraceCursorIntelPT::Next() {

auto canMoveOne = [&]() { auto canMoveOne = [&]() {

if (IsForwards()) if (IsForwards())

return m_pos + 1 < GetInternalInstructionSize(); return m_pos + 1 < GetInternalInstructionSize();

return m_pos > 0; return m_pos > 0;

}; };

size_t initial_pos = m_pos; size_t initial_pos = m_pos;

while (canMoveOne()) { while (canMoveOne()) {

m_pos += IsForwards() ? 1 : -1; m_pos += IsForwards() ? 1 : -1;

if (m_tsc_range && !m_tsc_range->InRange(m_pos))

wallaceUnsubmitted

Done

this will be O(logn). We can do better if m_current_tsc is the following little structure

class DecodedThread {

struct TscRange {
  size_t start_index;
  size_t end_index;
  size_t tsc;
  std::map<size_t, uint64_t>::iterator prev;
  std::map<size_t, uint64_t>::iterator next;
};

}

Optional<TscRange> m_current_tsc;

Then you can ask the new method Optional<TscRange> DecodedThread::GetTSCRange(size_t insn_index) which will give you the entire range of the tsc that covers insn_index. With these numbers, you can do a comparison in this line to very quickly move from TSC to TSC only when needed.

You can also have the method DecodedThread::GetNextTscRange(const TscRange& range) that computes in O(1) the next range, and you can similarly have GetPrevTscRange()`. The iterators will help you do that withing using lower_bound, which is O(1)

wallace: this will be O(logn). We can do better if m_current_tsc is the following little structure…

m_tsc_range = IsForwards() ? m_tsc_range->Next() : m_tsc_range->Prev();

if (!m_ignore_errors && IsError()) if (!m_ignore_errors && IsError())

return true; return true;

if (GetInstructionControlFlowType() & m_granularity) if (GetInstructionControlFlowType() & m_granularity)

return true; return true;

} }

wallaceUnsubmitted

Done

m_pos += IsForwards() ? 1 : -1;

- if (!m_current_tsc)

- m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);

- else if (!m_current_tsc->InRange(m_pos)) {

- if (m_pos > m_current_tsc->GetEnd())

- m_current_tsc = m_current_tsc->Next();

- if (m_pos < m_current_tsc->GetStart())

- m_current_tsc = m_current_tsc->Prev();

- }

+ if (m_current_tsc && !m_current_tsc->InRange(m_pos))

+ m_current_tsc = IsForwards() ? m_current_tsc->Next() : m_current_tsc->Prev();

if (!m_ignore_errors && IsError())

No need to do m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos); because its value has already been calculated in the constructor. We can simplify this as well

wallace: No need to do `m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);` because its…

zrthxnAuthorUnsubmitted

Done

It is possible that when TraceCursorIntelPT is created the m_current_tsc is None, for example when just started the trace and tried to dump instructions... But then if a tsc is emitted later, this would cause it to remain None since we don't re-calculate it if it was initially None

zrthxn: It is possible that when TraceCursorIntelPT is created the m_current_tsc is None, for example…

// Didn't find any matching instructions // Didn't find any matching instructions

m_pos = initial_pos; m_pos = initial_pos;

return false; return false;

} }

size_t TraceCursorIntelPT::Seek(int64_t offset, SeekType origin) { size_t TraceCursorIntelPT::Seek(int64_t offset, SeekType origin) {

int64_t last_index = GetInternalInstructionSize() - 1; int64_t last_index = GetInternalInstructionSize() - 1;

auto fitPosToBounds = [&](int64_t raw_pos) -> int64_t { auto fitPosToBounds = [&](int64_t raw_pos) -> int64_t {

return std::min(std::max((int64_t)0, raw_pos), last_index); return std::min(std::max((int64_t)0, raw_pos), last_index);

}; };

auto FindDistanceAndSetPos = [&]() -> int64_t {

switch (origin) { switch (origin) {

case TraceCursor::SeekType::Set: case TraceCursor::SeekType::Set:

m_pos = fitPosToBounds(offset); m_pos = fitPosToBounds(offset);

return m_pos; return m_pos;

case TraceCursor::SeekType::End: case TraceCursor::SeekType::End:

m_pos = fitPosToBounds(offset + last_index); m_pos = fitPosToBounds(offset + last_index);

return last_index - m_pos; return last_index - m_pos;

case TraceCursor::SeekType::Current: case TraceCursor::SeekType::Current:

int64_t new_pos = fitPosToBounds(offset + m_pos); int64_t new_pos = fitPosToBounds(offset + m_pos);

int64_t dist = m_pos - new_pos; int64_t dist = m_pos - new_pos;

m_pos = new_pos; m_pos = new_pos;

return std::abs(dist); return std::abs(dist);

} }

};

int64_t dist = FindDistanceAndSetPos();

wallaceUnsubmitted

Done

}

};

- auto dist = FindDistanceAndSetPos();

+ size_t dist = FindDistanceAndSetPos();

m_tsc_range = m_decoded_thread_sp->CalculateTscRange(m_pos);

don't use auto for simple types

wallace: don't use auto for simple types

m_tsc_range = m_decoded_thread_sp->CalculateTscRange(m_pos);

return dist;

wallaceUnsubmitted

Done

you need to calculate the new tsc_range after moving m_pos

wallace: you need to calculate the new tsc_range after moving m_pos

wallaceUnsubmitted

Done

return std::min(std::max((int64_t)0, raw_pos), last_index);

};

- switch (origin) {

- case TraceCursor::SeekType::Set:

- m_pos = fitPosToBounds(offset);

- m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);

- return m_pos;

- case TraceCursor::SeekType::End:

- m_pos = fitPosToBounds(offset + last_index);

- m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);

- return last_index - m_pos;

- case TraceCursor::SeekType::Current:

- int64_t new_pos = fitPosToBounds(offset + m_pos);

- int64_t dist = m_pos - new_pos;

- m_pos = new_pos;

- m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);

- return std::abs(dist);

- }

+ auto FindNewPos = [&]() -> {

+ switch (origin) {

+ case TraceCursor::SeekType::Set:

+ return fitPosToBounds(offset);

+ case TraceCursor::SeekType::End:

+ return fitPosToBounds(offset + last_index);

+ case TraceCursor::SeekType::Current:

+ return fitPosToBounds(offset + m_pos);

+ }

+ int64_t new_pos = FindNewPos();

+ m_pos = new_pos;

+ m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos);

+ return std::abs(new_pos - m_pos);

}

bool TraceCursorIntelPT::IsError() {

we can simplify this so that we only invoke CalculateTscRange once

wallace: we can simplify this so that we only invoke CalculateTscRange once

zrthxnAuthorUnsubmitted

Done

This is incorrect The converted code always returns 0. I've refactored it to have CalculateTscRange once but its a side-effect-y function and will need some future attention.

zrthxn: This is incorrect The converted code always returns 0. I've refactored it to have…

} }

bool TraceCursorIntelPT::IsError() { bool TraceCursorIntelPT::IsError() {

return m_decoded_thread_sp->GetInstructions()[m_pos].IsError(); return m_decoded_thread_sp->IsInstructionAnError(m_pos);

} }

const char *TraceCursorIntelPT::GetError() { const char *TraceCursorIntelPT::GetError() {

return m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos); return m_decoded_thread_sp->GetErrorByInstructionIndex(m_pos);

} }

lldb::addr_t TraceCursorIntelPT::GetLoadAddress() { lldb::addr_t TraceCursorIntelPT::GetLoadAddress() {

return m_decoded_thread_sp->GetInstructions()[m_pos].GetLoadAddress(); return m_decoded_thread_sp->GetInstructions()[m_pos].GetLoadAddress();

} }

Optional<uint64_t> TraceCursorIntelPT::GetCounter(lldb::TraceCounter counter_type) { Optional<uint64_t>

TraceCursorIntelPT::GetCounter(lldb::TraceCounter counter_type) {

switch (counter_type) { switch (counter_type) {

wallaceUnsubmitted

Done

are you using git clang-format? I'm curious why this line changed

wallace: are you using git clang-format? I'm curious why this line changed

zrthxnAuthorUnsubmitted

Done

Yes I am. I think its because its longer than 80 chars.

zrthxn: Yes I am. I think its because its longer than 80 chars.

case lldb::eTraceCounterTSC: case lldb::eTraceCounterTSC:

return m_decoded_thread_sp->GetInstructions()[m_pos].GetTimestampCounter(); if (m_tsc_range)

return m_tsc_range->GetTsc();

else

return llvm::None;

} }

wallaceUnsubmitted

Done

TraceCursorIntelPT::GetCounter(lldb::TraceCounter counter_type) {

if (!m_current_tsc)

return None;

switch (counter_type) {

case lldb::eTraceCounterTSC:

- return m_current_tsc->GetTsc();

+ if (m_current_tsc)

+ return m_current_tsc->GetTsc();

+ else

+ return None'

}

TraceInstructionControlFlowType

wallace:

zrthxnAuthorUnsubmitted

Done

m_current_tsc is already checked at the beginning of this function

zrthxn: m_current_tsc is already checked at the beginning of this function

} }

TraceInstructionControlFlowType TraceInstructionControlFlowType

TraceCursorIntelPT::GetInstructionControlFlowType() { TraceCursorIntelPT::GetInstructionControlFlowType() {

lldb::addr_t next_load_address = lldb::addr_t next_load_address =

m_pos + 1 < GetInternalInstructionSize() m_pos + 1 < GetInternalInstructionSize()

? m_decoded_thread_sp->GetInstructions()[m_pos + 1].GetLoadAddress() ? m_decoded_thread_sp->GetInstructions()[m_pos + 1].GetLoadAddress()

: LLDB_INVALID_ADDRESS; : LLDB_INVALID_ADDRESS;

return m_decoded_thread_sp->GetInstructions()[m_pos].GetControlFlowType( return m_decoded_thread_sp->GetInstructions()[m_pos].GetControlFlowType(

next_load_address); next_load_address);

} }

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	void TraceIntelPT::DumpTraceInfo(Thread &thread, Stream &s, bool verbose) {

size_t insn_len = Decode(thread)->GetInstructions().size();		size_t insn_len = Decode(thread)->GetInstructions().size();
size_t mem_used = Decode(thread)->CalculateApproximateMemoryUsage();		size_t mem_used = Decode(thread)->CalculateApproximateMemoryUsage();

s.Printf(" Raw trace size: %zu KiB\n", *raw_size / 1024);		s.Printf(" Raw trace size: %zu KiB\n", *raw_size / 1024);
s.Printf(" Total number of instructions: %zu\n", insn_len);		s.Printf(" Total number of instructions: %zu\n", insn_len);
s.Printf(" Total approximate memory usage: %0.2lf KiB\n",		s.Printf(" Total approximate memory usage: %0.2lf KiB\n",
(double)mem_used / 1024);		(double)mem_used / 1024);
s.Printf(" Average memory usage per instruction: %zu bytes\n",		if (insn_len != 0)
mem_used / insn_len);		s.Printf(" Average memory usage per instruction: %0.2lf bytes\n",
		(double)mem_used / insn_len);
		wallaceUnsubmitted Done Reply Inline Actions Instead of doing `raw_size`, better remove this number from CalculateApproximateMemoryUsage() wallace:* Instead of doing ` *raw_size`, better remove this number from CalculateApproximateMemoryUsage()
		wallaceUnsubmitted Done Reply Inline Actions Use doubles, as the average might not be a whole number wallace: Use doubles, as the average might not be a whole number
return;		return;
}		}

Optional<size_t> TraceIntelPT::GetRawTraceSize(Thread &thread) {		Optional<size_t> TraceIntelPT::GetRawTraceSize(Thread &thread) {
if (IsTraced(thread.GetID()))		if (IsTraced(thread.GetID()))
return Decode(thread)->GetRawTraceSize();		return Decode(thread)->GetRawTraceSize();
else		else
return None;		return None;
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

lldb/test/API/commands/trace/TestTraceDumpInfo.py

Show All 34 Lines	def testDumpRawTraceSize(self):
substrs=["intel-pt"])		substrs=["intel-pt"])

self.expect("thread trace dump info",		self.expect("thread trace dump info",
substrs=['''Trace technology: intel-pt		substrs=['''Trace technology: intel-pt

thread #1: tid = 3842849		thread #1: tid = 3842849
Raw trace size: 4 KiB		Raw trace size: 4 KiB
Total number of instructions: 21		Total number of instructions: 21
Total approximate memory usage: 5.31 KiB		Total approximate memory usage: 0.98 KiB
Average memory usage per instruction: 259 bytes'''])		Average memory usage per instruction: 48.00 bytes'''])

lldb/test/API/commands/trace/TestTraceLoad.py

Show All 32 Lines	def testLoadTrace(self):
# check that the Process and Thread objects were created correctly		# check that the Process and Thread objects were created correctly
self.expect("thread info", substrs=["tid = 3842849"])		self.expect("thread info", substrs=["tid = 3842849"])
self.expect("thread list", substrs=["Process 1234 stopped", "tid = 3842849"])		self.expect("thread list", substrs=["Process 1234 stopped", "tid = 3842849"])
self.expect("thread trace dump info", substrs=['''Trace technology: intel-pt		self.expect("thread trace dump info", substrs=['''Trace technology: intel-pt

thread #1: tid = 3842849		thread #1: tid = 3842849
Raw trace size: 4 KiB		Raw trace size: 4 KiB
Total number of instructions: 21		Total number of instructions: 21
Total approximate memory usage: 5.31 KiB		Total approximate memory usage: 0.98 KiB
Average memory usage per instruction: 259 bytes'''])		Average memory usage per instruction: 48.00 bytes'''])

def testLoadInvalidTraces(self):		def testLoadInvalidTraces(self):
src_dir = self.getSourceDir()		src_dir = self.getSourceDir()
# We test first an invalid type		# We test first an invalid type
self.expect("trace load -v " + os.path.join(src_dir, "intelpt-trace", "trace_bad.json"), error=True,		self.expect("trace load -v " + os.path.join(src_dir, "intelpt-trace", "trace_bad.json"), error=True,
substrs=['''error: expected object at traceSession.processes[0]		substrs=['''error: expected object at traceSession.processes[0]

Context:		Context:
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[intelpt] Refactor timestamps out of IntelPTInstructionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419783

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp

lldb/test/API/commands/trace/TestTraceDumpInfo.py

lldb/test/API/commands/trace/TestTraceLoad.py

[intelpt] Refactor timestamps out of IntelPTInstruction
ClosedPublic