This is an archive of the discontinued LLVM Phabricator instance.

[intelpt] Refactor timestamps out of IntelPTInstruction
ClosedPublic

Authored by zrthxn on Mar 28 2022, 11:20 AM.

Details

Summary

Storing timestamps (TSCs) in a more efficient map at the decoded thread level to speed up TSC lookup, as well as reduce the amount of memory taken up by each instruction. Also introduced TSC range which keeps the current timestamp valid for all subsequent instructions until the next timestamp is emitted.

Diff Detail

Event Timeline

zrthxn created this revision.Mar 28 2022, 11:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 11:20 AM
zrthxn requested review of this revision.Mar 28 2022, 11:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 11:20 AM
zrthxn updated this revision to Diff 418660.Mar 28 2022, 11:31 AM

Update cursor timestamp when we have a new one

let's better use the word TSC instead of timestamps, which is more accurate

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
164

receive const &pt_insn to avoid copies

166–167

Let's just use an overload

void AppendInstruction(const pt_insn &instruction, uint64_t tsc);
181–182

This has to be an optional because it might not be present

wallace requested changes to this revision.Mar 28 2022, 12:16 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
94–96

doing [] is O(logn), and we want to be faster than this.

You can do the following which is O(1)

auto it = m_instruction_timestamps.end()
if (it != m_instruction_timestamps.begin()) {
  it--;
  if (it->second != tsc) {
    // this tsc is not the same!
   m_instruction_timestamps.insert(insn_idx, tsc);
  } else {
    // this tsc is the same, do nothing
  }
}

you can further optimize this by storing the last tsc that has been appended, that way you don't even need to create iterators

115–118

this is wrong, you need to use upper_bound - 1, like this:

if (m_instruction_timestamps.empty())
  return None;

auto it = m_instruction_timestamps.upper_bound(insn_idx);
if (it == m_instruction_timestamps.begin())
  return None;

--it;
return it->second;

this will allow you to go to the largest index that is <= than insn_idx

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
181–182

better use insn_idx instead of index for all these variables

184–188

I actually like this, let's improve it

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
153–154 ↗(On Diff #418660)
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
25–26

here you need to set the correct value of m_current_tsc

44–45

this will be O(logn). We can do better if m_current_tsc is the following little structure

class DecodedThread {

struct TscRange {
  size_t start_index;
  size_t end_index;
  size_t tsc;
  std::map<size_t, uint64_t>::iterator prev;
  std::map<size_t, uint64_t>::iterator next;
};

}

Optional<TscRange> m_current_tsc;

Then you can ask the new method Optional<TscRange> DecodedThread::GetTSCRange(size_t insn_index) which will give you the entire range of the tsc that covers insn_index. With these numbers, you can do a comparison in this line to very quickly move from TSC to TSC only when needed.

You can also have the method DecodedThread::GetNextTscRange(const TscRange& range) that computes in O(1) the next range, and you can similarly have GetPrevTscRange()`. The iterators will help you do that withing using lower_bound, which is O(1)

66–84

you need to calculate the new tsc_range after moving m_pos

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
45–46

Optional<uint64_t> m_current_tsc;

This revision now requires changes to proceed.Mar 28 2022, 12:16 PM
zrthxn updated this revision to Diff 418937.Mar 29 2022, 11:23 AM
zrthxn marked 12 inline comments as done.

Introduced TscRange for quicker operation

wallace requested changes to this revision.Mar 29 2022, 12:07 PM

I'm proposing a new interface for the TscRange. Let me know if you have questions

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
95

we need to update it because of the optional

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
166–167

Let's improve the documentation and let's use the word TSC more ubiquitously

184
185–193

Let's add more logic to this object so that it handles as much as it can and we reduce the logic that was added to DecodedThread. We also don't need two iterators, just one is enough. Don't forget to add documentation to these new methods.

Given this new definition for TscRange. The only method we need to add in DecodedThread is CalculateTscRange(size_t insn_index), and mention the documentation that this operation is O(logn) and should be used sparingly.

186–188
196–203

Let's improve the documentation and also get rid of the struct keyword in return types. That's old style C

219–238

don't start variable names with __, as may people think that those variables should be discarded. Let's just give it a proper name. Let's also use an Optional and let's add documentation to all the variables.

This revision now requires changes to proceed.Mar 29 2022, 12:07 PM
zrthxn updated this revision to Diff 418975.Mar 29 2022, 2:08 PM
zrthxn marked 7 inline comments as done.
zrthxn retitled this revision from [wip][intelpt] Refactor timestamps out of `IntelPTInstruction` to [intelpt] Refactor timestamps out of IntelPTInstruction.

Change TscRange to class

zrthxn updated this revision to Diff 418977.Mar 29 2022, 2:11 PM

Update memory calc function

zrthxn updated this revision to Diff 418981.Mar 29 2022, 2:23 PM

Prevent crash on printing info when we have 0 instructions

wallace requested changes to this revision.Mar 30 2022, 6:56 PM

Some calculations are wrong, but overall this is good. We are very close!

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
117–118

now that I think of this, you can delete this, because if the map is empty, this function will return in line 117

137–140

undo these two lines

159

decoded_thread instead of ref

161–162

delete

164–168

seeing ++ and -- is very hard to read. I also prefer not to modify the it variable for cleanness. Also doing end->first might crash the program. I'm writing here a correct version

173
178

The comparison is not right. let's use <= in a specific order to make it easier to read

186

As m_it is valid, doing the comparison m_it == m_decoded_thread.m_instruction_timestamps.end() will always return false. Remember that .end() will return a fake iterator that points to no value.

Besides that, don't modify m_it. Let's better create a new iterator

190–192

Similarly, this has to be improved. I also like to put --it statements in their own line to make it easier to read.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
182
187–188
189

Move this class to the beginning of the public section of DecodedThread for easier discoverability

191
207–213

let's better delete this. It adds some maintenance cost with little benefits

218

This comment is hard to follow. Let's just delete it because it's a private constructor

223–227

let's add another piece of information

225–228

Let's just delete this, as we can get them directly from m_it without doing any operations

227
230

Setting to llvm::None here is equivalent to doing it from all the constructors

233
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
45–52

No need to do m_current_tsc = m_decoded_thread_sp->CalculateTscRange(m_pos); because its value has already been calculated in the constructor. We can simplify this as well

99–101

are you using git clang-format? I'm curious why this line changed

101–107
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
45

/// Tsc range covering the current instruction.

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
124

Instead of doing *raw_size, better remove this number from CalculateApproximateMemoryUsage()

This revision now requires changes to proceed.Mar 30 2022, 6:56 PM
zrthxn updated this revision to Diff 419347.Mar 31 2022, 12:31 AM
zrthxn marked 23 inline comments as done.

Included requested changes, removed extra members

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
45–52

It is possible that when TraceCursorIntelPT is created the m_current_tsc is None, for example when just started the trace and tried to dump instructions... But then if a tsc is emitted later, this would cause it to remain None since we don't re-calculate it if it was initially None

99–101

Yes I am. I think its because its longer than 80 chars.

101–107

m_current_tsc is already checked at the beginning of this function

zrthxn updated this revision to Diff 419348.Mar 31 2022, 12:35 AM

Change tsc check anyway

zrthxn updated this revision to Diff 419517.Mar 31 2022, 10:54 AM
zrthxn marked an inline comment as done.

Fixed issue with TSC becoming invalid midway through trace

zrthxn updated this revision to Diff 419519.Mar 31 2022, 11:06 AM

Updated tests according to new memory usage calculation

zrthxn marked an inline comment as done.Mar 31 2022, 11:09 AM
wallace requested changes to this revision.Mar 31 2022, 11:43 AM

almost there! Mostly cosmetic changes needed

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
94–98

We need to handle a special case. It might happen that the first instruction is an error, which won't have a TSC, and the second instruction is an actual instruction, and from that point on you'll always have TSCs. For this case, we can assume that the TSC of the error instruction at the beginning of the trace is the same as the first valid TSC.

117–118

delete these two lines. The rest of the code will work well without it

158–160
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
104

+1

140

Let's be more verbose with the names to improve readability

142
148

let's receive a reference here and then convert it to pointer, so that we minimize the number of places with pointers. Also, if later we decide to use a shared_ptr instead of a pointer, we can do it inside of the constructor without changing this line

150
181–182
lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
65–84

we can simplify this so that we only invoke CalculateTscRange once

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
46

rename it to m_tsc_range. The word current is very redundant in this case

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
123–124

Use doubles, as the average might not be a whole number

This revision now requires changes to proceed.Mar 31 2022, 11:43 AM
zrthxn updated this revision to Diff 419560.Mar 31 2022, 1:29 PM
zrthxn marked 12 inline comments as done.

Incorporate feedback and update tests

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
65–84

This is incorrect The converted code always returns 0. I've refactored it to have CalculateTscRange once but its a side-effect-y function and will need some future attention.

one last nit and good to go

lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
82

don't use auto for simple types

jj10306 added inline comments.Mar 31 2022, 3:24 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145

nit: No need to friend the enclosing class since C++11 - https://en.cppreference.com/w/cpp/language/nested_types

wallace added inline comments.Mar 31 2022, 3:35 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145

TIL!

zrthxn updated this revision to Diff 419619.Mar 31 2022, 9:35 PM
zrthxn marked 3 inline comments as done.

Dont use auto for simple types

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
145

We need the friend because we are using a private constructor from outside, in DecodedThread::CalculateTscRange and a couple other places. The idea is to let only DecodedThread create TscRange.

wallace accepted this revision.Mar 31 2022, 9:50 PM

lgtm

This revision is now accepted and ready to land.Mar 31 2022, 9:50 PM

Don't forget to update the description of this diff and of the commit before pushing (you need to do both). Include the avg instruction size for a trace of at least 10k instructions as well :)

zrthxn updated this revision to Diff 419656.Apr 1 2022, 12:48 AM
zrthxn edited the summary of this revision. (Show Details)

The difference in memory usage is appreciable with large number of instructions, as shown below

# Before (with current metrics, total memory does not include raw trace size)
  Raw trace size: 2048 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 56143.10 KiB
  Average memory usage per instruction: 63.87 bytes
# After
  Raw trace size: 2048 KiB
  Total number of instructions: 900004
  Total approximate memory usage: 42187.69 KiB
  Average memory usage per instruction: 48.00 bytes
This revision was automatically updated to reflect the committed changes.