This is an archive of the discontinued LLVM Phabricator instance.

[trace][intel-pt] Implement the basic decoding functionality
ClosedPublic

Authored by wallace on Oct 12 2020, 4:52 PM.

Details

Summary

Depends on D89408.

This diff finally implements trace decoding!

The current interface is

$ trace load /path/to/trace/session/file.json
$ thread trace dump instructions

thread #1: tid = 3842849, total instructions = 22
  [ 0] 0x40052d
  [ 1] 0x40052f  
  ...
  [19] 0x400521

$ # simply enter, which is a repeat command
  [20] 0x40052d
  [21] 0x400529
  ...

This doesn't do any disassembly, which will be done in the next diff.

Changes:

  • Added an IntelPTDecoder class, that is a wrapper for libipt, which is the actual library that performs the decoding.
  • Added TraceThreadDecoder class that decodes traces and memoizes the result to avoid repeating the decoding step.
  • Added a DecodedThread class, which represents the output from decoding and that for the time being only stores the list of reconstructed instructions. Later it'll contain the function call hierarchy, which will enable reconstructing backtraces.
  • Added basic APIs for accessing the trace in Trace.h:
    • GetInstructionCount, which counts the number of instructions traced for a given thread
    • IsTraceFailed, which returns an Error if decoding a thread failed
    • ForEachInstruction, which iterates on the instructions traced for a given thread, concealing the internal storage of threads, as plug-ins can decide to generate the instructions on the fly or to store them all in a vector, like I do.
  • DumpTraceInstructions was updated to print the instructions or show an error message if decoding was impossible.
  • Tests included

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
labath added inline comments.Oct 13 2020, 7:52 AM
lldb/include/lldb/Target/Trace.h
167

It sounds like this could just be llvm::function_ref<bool(size_t index, Expected<addr_t> load_addr> and the Instruction class does not even need to exist. At least not here -- it may still be useful for the PT plugin to store the instructions in some sort of an error-or-load-addr union, but there's no need to impose that organization on anyone else.

191

I'd expect a method called IsXXX to return bool.

lldb/include/lldb/Target/TraceThread.h
23 โ†—(On Diff #297736)

I'm very confused by TraceThread residing in the Target library and ProcessTrace in Plugin/Process/Trace. I think both of them should be in the same place (I could make a case for either location).

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
78

std::move(instructions)

85

ArrayRef<IntelPTInstruction> maybe?

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
25โ€“32

llvm::MemoryBuffer::getFile(filename)

34

and? That's what the code seems to be doing.

36

Please put static first (I'm surprised this even compiles).

69

static std::vector<IntelPTInstruction> DecodeInstructions

178

Expected<vector<Insns>>

198

Use the GetPath overload returing a std::string

200

So will this make the library load the file into memory once again? Is there no way to make it use the copy already loaded by lldb?

lldb/source/Target/TraceSessionFileParser.cpp
125

This is not a dependency in the strictest sense but it still means that this code would explode if the ProcessTrace "plugin" is plugged "out". It sounds like that, in this design, the ProcessTrace class should just be a part of lldb core.

wallace updated this revision to Diff 298052.EditedOct 13 2020, 11:59 PM
wallace marked 11 inline comments as done.

Changes:

  • Changed the callback signature of ForEachInstruction to receive an Expected<load_addr>
  • Use Expected more ubiquitously in IntelPTDecoder
  • Use MemoryBuffer to read the trace file
  • Implemented ProcessTrace::DoReadMemory, which now is used by libipt instead of loading each object file inside libipt
  • Made the decoding function a little bit more readable
  • Small fixes here and there

Notes:

  • I added a small test where a instruction can't be decoded due to missing memory mapping. I'll add later more robust tests.
  • I'm also not fond of the ProcessTrace class being implemented in a plugin, however, it seems that all processes are implemented that way and the main method for creating processes Target::CreateProcess relies on plugins for creating the right instance. I could create an overload of that class that receives a callback that creates a specific process class, although it might create an way to unintendendly bypass the existing Target::CreateProcess. Another option is to move the class to lldb core and register the plugin from there, even though it'd be the only class doing that in the codebase. @labath , what do you think?
wallace updated this revision to Diff 298253.Oct 14 2020, 3:41 PM
  • Moved the non intel-pt changes to D89408
  • Added some more complex tests:
    • A multifile case in which even the dynamic linker code is decoded
    • A variation of that case that doesn't declare the dynamic linker in the json file, which results in a trace with a missing block of instructions, but still is able to decode main.cpp
  • Some general cleanup
wallace edited the summary of this revision. (Show Details)Oct 14 2020, 3:42 PM
wallace edited the summary of this revision. (Show Details)Oct 15 2020, 3:50 PM
clayborg requested changes to this revision.Oct 16 2020, 11:38 AM

Great start to this. Many comments inlined.

When you are dumping instructions we are only showing one hex value. Is this the instruction address or the opcode itself?

lldb/include/lldb/Target/Trace.h
143โ€“144

I am going to comment here on what this function should look like and how it will be used by all of the APIs. This function can probably be used to implement the forward and reverse stepping/continue commands eventually. So I would propose that this function should be able to start from a given index and be able to go forward or backward in the instructions (for forward/reverse step/continue).

So with this in mind how about:

void TraverseInstructions(const Thread &thread, size_t position, bool forward, std::function<...> callback) = 0;
167

This should probably be passed by value as it could contain an error. If there is an error the error must be consumed. If we pass by reference then it is unclear who must consume the error.

178

Will we always know instruction count? Could this to very expensive to calculate? Can we add this to the generic trace API and expect all trace formats to implement this?

192

Can we just use the ForEachInstruction and get the error during that call? Is this call redundant? If there is an error, it might be better to get it via the ForEachInstruction function and know where the problem is. If there is no data, you will get the error on the first access to the first instruction. Knowing where the error happened might help the user have more context.

lldb/source/Target/ProcessTrace.cpp
124โ€“125

You should be able to just call:

size_t Target::ReadMemoryFromFileCache(const Address &addr, void *dst, size_t dst_len, Status &error);

It already does what you are doing here if all that is happening here is reading from loaded object file section data.

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
42โ€“65

What is the default count here? 19? Seems like an odd number to choose as a default?

100

Do we have a test for when the offset is invalid? Another test for the count being too large and the output would get truncated?

105โ€“106

We should be showing addresses here. It doesn't matter if they are mapped or not. This will happen for JIT'ed code.

126

Why aren't we showing the address here? We will run into cases, for possibly JIT'ed code where we won't have a section for an address, so we should still show the address

This revision now requires changes to proceed.Oct 16 2020, 11:38 AM
wallace marked 8 inline comments as done.Oct 16 2020, 1:56 PM

When you are dumping instructions we are only showing one hex value. Is this the instruction address or the opcode itself?

That's the instruction load address. In a later diff I'll implement pretty-printing similar to the "disassembly" command, but for now this is a good start.

lldb/source/Target/ProcessTrace.cpp
124โ€“125

This is exactly what I needed! The name is just not very precise =P

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
42โ€“65

The default is 20, but I'm adding the 20th-element here for clarity

105โ€“106

I'll do this in a later diff. Currently libipt doesn't report the addresses that it fails to decode, but I'm planning on making a patch on libipt to support that.

126

Repeating my message from above:

I'll do this in a later diff. Currently libipt doesn't report the addresses that it fails to decode, but I'm planning on making a patch on libipt to support that.

wallace updated this revision to Diff 298751.Oct 16 2020, 1:57 PM
wallace marked 2 inline comments as done.
  • C hanged the instruction iterator to the proposed signature
  • Added SupportsInstructionsCount to determine whether GetInstructionCount is supported by the plug-in. This count can be useful for debugging purposes, and at least for a while we wonโ€™t support lazy decoding of instructions.
  • Removed GetTraceErrorStatus following Gregโ€™s idea
  • Added the requested test cases
  • Some small fixes here and there
Herald added a project: Restricted Project. ยท View Herald TranscriptOct 16 2020, 1:57 PM
Herald added a subscriber: llvm-commits. ยท View Herald Transcript
This comment was removed by wallace.
wallace updated this revision to Diff 298762.Oct 16 2020, 2:34 PM

Figured out how to show the addressed that failed to decode. Now the output in those cases is something like

[ 4] no memory mapped at this address: 0x7ffff7df1950

This should address all of Greg's comments

wallace marked 2 inline comments as done.Oct 16 2020, 2:35 PM
wallace added inline comments.
lldb/test/API/commands/trace/TestTraceDumpInstructions.py
105โ€“106

Already fixed, disregard this comment

126

Already fixed, disregard this comment

vsk added a subscriber: vsk.Oct 16 2020, 2:41 PM

I'm not sure if the current revision of the patch reflects the long-term testing strategy. But if so, I'm quite concerned about the proliferation of large binary files in the repo (like ld.so, or the raw trace binary itself). These are opaque blobs that are hard to understand. Also, each time we add one, we're imposing a sizeable tax on everyone working with the llvm-project monorepo.

One possible alternative:

  • Design a textual description for the raw trace contents, and possibly a way to convert an existing trace file into this textual format
  • Check in assembly, and use llvm-mc/clang to generate executables during testing
lldb/tools/intel-features/intel-pt/Decoder.cpp
251 โ†—(On Diff #298751)

Please group your includes.

260 โ†—(On Diff #298751)

unintentional whitespace diff?

301 โ†—(On Diff #298751)

Not sure what this is in aid of?

llvm/include/llvm/Support/Error.h
1016 โ†—(On Diff #298751)

Probably best to not add another escape-hatch to permit fast and loose error handling. This seems to be used in a lambda passed to TraverseInstructions. There might be a way to avoid invoking the callback in the case where the expected value is thrown away.

wallace updated this revision to Diff 298768.Oct 16 2020, 2:48 PM

removing unwanted changes in lldb/tools/intel-features/intel-pt/Decoder.cpp

lldb/tools/intel-features/intel-pt/Decoder.cpp
301 โ†—(On Diff #298751)

I erroneously included this, I'm reverting the changes to this file.

@vsk, I agree with you regarding the files. At the moment our implementation of intel-pt tracing doesn't support collecting a trace, but soon we'll do so. Then, we'll be able to generate these trace files on the fly as the tests run, so I imagine I'll be deleting these binary files. For the time being I doubt I'll include any new binary, as what is included is more than enough to test the basic decoding functionalities.

wallace added inline comments.Oct 16 2020, 3:04 PM
llvm/include/llvm/Support/Error.h
1016 โ†—(On Diff #298751)

Sure, will remove this

wallace updated this revision to Diff 298771.Oct 16 2020, 3:10 PM
  • Remove the llvm::consumeExpected function, as @vsk suggested.
clayborg requested changes to this revision.Oct 16 2020, 4:25 PM

Instead of just seeing the address, we should disassemble the instruction at the address in this patch for clarity if it is available. If we can't read the opcode from the object files, we need to display an appropriate message after the address in the output of "thread trace dump instructions".

This revision now requires changes to proceed.Oct 16 2020, 4:25 PM

As I stated before we should be printing the disassembly for each instruction on the output lines. We should probably also normalize the address hex value to it doesn't change widths. Something like:

[ 0] 0x000000000040065f: pushq  %rbp
[ 1] 0x000000000040065a: movq   %rsp, %rbp
[ 2] 0x0000000000400657: movl   $0x0, -0x1c(%rbp)
[ 3] 0x0000000000400654: callq  0x100000f82               ; symbol stub for: printf
[ 4] 0x00007ffff7df1950: movl   %edi, -0x20(%rbp)

And possibly in this patch or in another patch, we should print out when the source file and line changes

a.out`main @ main.cpp:12
  [ 0] 0x000000000040065f: pushq  %rbp
  [ 1] 0x000000000040065a: movq   %rsp, %rbp
  [ 2] 0x0000000000400657: movl   $0x0, -0x1c(%rbp)
a.out`main @ main.cpp:13
  [ 3] 0x0000000000400654: callq  0x00007ffff7df1950               ; symbol stub for: printf
libc.so`printf
  [ 4] 0x00007ffff7df1950: movl   %edi, -0x20(%rbp)
lldb/test/API/commands/trace/TestTraceDumpInstructions.py
121โ€“122

These lines should start with the address like all other lines. Then the question is what the output should look like. Do we really need to tell the user that there is no memory mapped here? Can we just print "<???>" or nothing if we have no information like:

[0] 0x400518: <???>
[1] 0x400511: <???>
142
[ 4] 0x7ffff7df1950 <???>
vsk added a comment.Oct 16 2020, 4:42 PM

@vsk, I agree with you regarding the files. At the moment our implementation of intel-pt tracing doesn't support collecting a trace, but soon we'll do so. Then, we'll be able to generate these trace files on the fly as the tests run, so I imagine I'll be deleting these binary files. For the time being I doubt I'll include any new binary, as what is included is more than enough to test the basic decoding functionalities.

That seems promising. Deleting those binary files after the fact doesn't address the issue, though, as they'd be part of the history. I have a question about that ld-2.17.so file in particular: is there no way to decoder/traverse a trace of a process that loads a dylib, without copying all of ld.so into the source tree? That seems very surprising -- I'd expect the decoder API to allow you to skip right over PC ranges that have nothing to do with the binary you want to debug.

lldb/source/Target/Trace.cpp
85

Just 'assert(num); return ceill(log10(num));'?

In D89283#2336280, @vsk wrote:

@vsk, I agree with you regarding the files. At the moment our implementation of intel-pt tracing doesn't support collecting a trace, but soon we'll do so. Then, we'll be able to generate these trace files on the fly as the tests run, so I imagine I'll be deleting these binary files. For the time being I doubt I'll include any new binary, as what is included is more than enough to test the basic decoding functionalities.

That seems promising. Deleting those binary files after the fact doesn't address the issue, though, as they'd be part of the history. I have a question about that ld-2.17.so file in particular: is there no way to decoder/traverse a trace of a process that loads a dylib, without copying all of ld.so into the source tree? That seems very surprising -- I'd expect the decoder API to allow you to skip right over PC ranges that have nothing to do with the binary you want to debug.

I would very much like to keep some tests which use pre-baked traces (binaries are a different matter). I have a couple of reasons for that:

  • tests exercising the tracing code path require appropriate hardware. Checked in traces can run everywhere (assuming one can build the pt library there)
  • due to their systemic nature, it will be hard to test various edge conditions (missing binaries, corrupt traces, ...) with a end-to-end test
  • we had some issues with end-to-end tests being flaky due to the fact that cpu writes the traces asynchronously -- I am not sure we ever fully figured that out

Now, we're definitely going to need tests which check the tracing functionality, but I think that having these kinds of traces is definitely good. As for ld.so, while it's not the end of the world (we have much larger binaries), it would definitely be nice to avoid it. What functionality is that test exercising? If you want traces that cross multiple modules, maybe you could capture a trace from the middle of an application, after ld.so is done, and show how the application is ping-ponging between some functions in different shared libraries.

In D89283#2336090, @vsk wrote:

One possible alternative:

  • Design a textual description for the raw trace contents, and possibly a way to convert an existing trace file into this textual format

I think that converting the textual trace description would essentially mean reimplementing the intel-pt library. That might be nice (llvm is definitely fond of reimplementing things), but I'm not sure if that's what these guys have signed up for.

  • Check in assembly, and use llvm-mc/clang to generate executables during testing

This is definitely doable, though I'd probably go for yaml2obj, as llvm-mc&clang cannot guarantee the exact placement of instructions in memory. It looks like obj2yaml has grown program header support since last time I checked this (previously you had to write them by hand), so it may be that a plain obj2yaml | yaml2obj would just work now. (It probably won't produce a fully functional binary, but it might be close enough.)

lldb/include/lldb/Target/Trace.h
176โ€“195

How about Optional<size_t> GetInstructionCount? That makes it less likely to develop an accidental dependancy on this interface (though I fear that might still happen without an subclass which actually returns None here).

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
32

raw_string_ostream would be more llvm-y (the std::hex part in particular is very non-idiomatic)

42

Do you want anyone to modify the vector? Return ArrayRef<IntelPTInstruction>

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
33

static

81

static

159

static_cast is enough here

179

I presume that the pt library does not actually modify this data. Maybe a short note saying that.

labath added inline comments.Oct 19 2020, 6:29 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
189

It looks like this can only fail if the image argument is null (which can only happen if the decoder is null, which is checked). An assert would be enough for that. (For a proper error handling you should have also freed the decoder object on the error path, which is how i came to thing about this).

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
69

drop std::make_pair, it's cleaner.

94

this makes a copy, which you probably did not want.

98

i>=0 is always true. You'll have to do this trick with signed numbers (ssize_t?)

109

I'm having doubts about this "I have an thread but was not able to decode _anything_ about it" state is worth it. Having many different ways to report errors just increases the chance of something going wrong, and in TraverseInstructions you're already treating this state as a pseudo-instruction.

Maybe that representation should be used all the way down? Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).

lldb/source/Target/Trace.cpp
85

that would have to be log10(num+1), though I'm not sure what to thing of the floating point arithmetic...

222

The cast to int64_t won't change the actual value of the result (though it may invoke UB due to signed wraparound). What exactly are you trying to achieve here?

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
44โ€“49

Are you sure that printing this backwards is the best way to display this? The resulting disassembly is going to look quite weird. I think that printing this in the "normal" direction would make it easier to figure out what the program was doing. For people who are only interested in the final PC value it should not be a problem to skip to the last line of the output (the last line is also more likely to remain visible if the dump produces lots of data).

wallace marked 3 inline comments as done.Oct 19 2020, 4:08 PM

@labath is right regarding the need of pre-baked binaries to test specific conditions. I'll remove the ld binary, as it really tests nothing useful, and i'll try to use yaml to represent the binaries in a more concise format.

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
44โ€“49

First of all, I'm thinking about adding a flag to this command to choose the direction, as there are benefits of both.

Let's say, if you are interested in reading/understanding the last instructions up to a breakpoint, then reading the trace in reverse makes sense, as you don't know where to start reading from, but you know where to end. Imagine you have 100K instructions, where do you start? It seems sometimes better to read the last instructions and then ask for a few of the earlier instructions, and keep doing that until you find what you are interested in. On the other hand, if you want to analyze forwards what happens from a certain point, this API is quite annoying and I imagine you'd prefer to read it forwards.

So I propose

thread trace dump instructions --count <> --start-position <> [--forwards | -f] [--backwards | -b]

I'd keep -b as default, as it's useful when analyzing crashes or stops on breakpoints. The default --start-position when reading forwards could be the oldest chronological instruction, and the default when reading backwards could be the earliest chronologically.

With this, I'd change the indices. I'd make index [0] to be the oldest chronologically and [|trace| -1] to be the most recent.

@labath, @clayborg, what do you think? This might be flexible enough for the different kind of usages.

121โ€“122

I think it's highly important to tell the user that this is a very important error and not make it apparently inoffensive with the formatting. Let me elaborate why this is not an inoffensive error.

First of all, the encoded trace is composed of packets, composed of two main packets:

  • PSB: synchronization packet that contains the current PC. These packets are sporadic (often one for each 4KB of data), as they are big in size.
  • TNT: taken/not taken packet that contains one bit per branch executed by the processor. These packets are probably the most frequent and they don't include any PC.

When decoding, the decoder finds first a PSB packet, gaining the knowledge of the current PC, then it starts traversing the binary instruction by instruction until it finds a branch, in which case it finds the next TNT packet and learns if that branch was taken or not, then continuing the traversal in the correct direction.

This means that when the decoder can't read a memory address, then it won't be able to decode any TNT packets until the next PSB synchronization point. In fact, in this diff, when there's an instruction decoding error, we skip to the next PSB and resume decoding from there. This problem implies that we are skipping potentially thousands of instructions. In other words, if you see

[0]: 0x400518
[1]: 0x400511
[2]: no memory mapped at this address: 0x400502
[3]: 0x400500

Then that means that between instructions [3] and [1] there were an unknown number of instructions that couldn't be decoded, the first one of them being at 0x400502. We won't be able to do anything useful with those instructions, and the user would need to provide the missing module and redecode to reconstruct the full trace.

clayborg added inline comments.Oct 19 2020, 4:44 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
32

That or "lldb_private::StreamString". Both have similar functionality. I prefer StreamString because it is simpler. With raw_string_ostream, you have to make a std::string, put it into the raw_string_ostream and then flush it prior to getting the string result.

42

yeah llvm::ArrayRef to avoid making copies is good.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
33

Make static or add an anonymous namespace around all of these functions so you don't have to mark them all as static.

81

Make static or add an anonymous namespace around all of these functions so you don't have to mark them all as static.

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
94

returning a llvm::ArrayRef to avoid the copy

98

Yes, switch to ssize_t, your delta is already signed. Also switch "delta" to ssize_t as well.

lldb/source/Target/Trace.cpp
222

Lots os signed/unsigned match issues possible. Best to make this rock solid.

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

If we reverse the direction, then hitting "enter" after doing one command won't flow as nicely as it does now. That being said I agree with Pavel that we should figure out what is expected. I generally think that earlier text is older.

I would not switch the indexes so that they change with any options that are specified. We currently have --start-position, but maybe this should be just --position? Or we specify:

--from-end <offset>

<offset> would be the index offset from the end (newest) of the data?

--from-start <offset>

<offset> would be the index offset from the start (oldest) of the data?

I would be fine with:

[--forwards | -f] [--backwards | -b]

but I think it would make sense to show the indexes in a consistent way regardless of what options are displayed. Maybe it makes sense to always show the true index where zero is the oldest and N is the newest?

We do need to make sure the auto repeat command looks good though which will be hard with oldest to newest ordering.

wallace marked an inline comment as done.Oct 19 2020, 5:01 PM
wallace added inline comments.
lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

What about this:

We expose the indices in a chronologically increasing way, where [0] is the oldest instruction and [N] is the newest. Then we have the two options suggested by Greg

--from-end <offset>

Where offset is an index or the string "end", meaning the last instruction of the trace, in case the user doesn't know the index of it. Then the instructions are printed

[offset]
[offset - 1]
...
[offset - K]

And if there's a repeat command, this would be printed

[offset - K - 1]
[offset - K - 2]
...

Which would look nicely as a contiguous list of instructions if concatenated.

The other option would be

--from-start<offset>

Where offset is an index. Then the instructions are printed

[offset]
[offset + 1]
[offset + 2]
...
[offset + K]

And after a repeat command, you'd get

[offset + K + 1]
[offset + K + 2]
...

I think this would serve all purposes.

wallace marked 16 inline comments as done.Oct 19 2020, 5:30 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
189

good catch!

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
98

TIL!

109

Maybe that representation should be used all the way down?

I'll follow that path. This will create consistency through the code

Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).

I wish I could do that, but decoding is very expensive and should be done lazily. According to Intel and my tests, if a thread was traced during T seconds, then decoding takes around 10T, which is a big amount of time if you were tracing 10 threads for 5 seconds, which would take 500 seconds to decode. At least for know we are not doing parallel decoding. I imagine at some point we'll have to work on that.

wallace updated this revision to Diff 299236.EditedOct 19 2020, 6:43 PM
wallace marked 3 inline comments as done.

Addressed almost all changes.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.
  • Showing the disassembly of the instructions, requested by Greg, is not yet done.
  • I'm representing now all decoding errors as instructions, which in fact simplifies the API a good deal. However, I'm still not happy with the API, right now we have

    class Trace { void TraverseInstructions(Thread, callback, ...); int GetInstructionsCount(Thread...); }

Right now these methods are not doing anything (or returning 0 in the case of the instructions count) if the provided Thread is not traced by the Trace instance.

Besides, eventually we'll add methods like

vector<addr_t> GetBacktrace(Thread, position)

position ReverseNext(Thread, position)
position ReverseSingleStep(Thread, position)

We could leave the code as it is, which results in a simple API with some undefined behavior if an invalid Thread is passed.

Another option is to have

class Trace {
  TracedThreadSP GetTraceForThread(Thread ...); 
}

class TracedThread {
  void TraverseInstructions(callback, ...);
  int GetInstructionsCount()
 
  position ReverseNext(Thread, position)
  position ReverseSingleStep(Thread, position)
}

This creates a level of indirection that forces the caller to check if the returned TracedTrace is a valid pointer or not. If the pointer is invalid, the caller can show an error message or fail silently, otherwise, TraverseInstructions and GetInstructionsCount will perform valid operations.

This might become useful to avoid errors in complex scenarios as the API grows. But then, how often will someone invoke these methods with the wrong thread? And this TracedThread object would have to be passed around a lot if the developer doesn't want to if the thread is invalid in each callsite.

The first form of the API looks simpler without that level of indirection, which is cool. But the second one adds some boilerplate in each callsite, with some safety benefits.

What do you guys think?

Addressed almost all changes.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.
  • Showing the disassembly of the instructions, requested by Greg, is not yet done.
  • I'm representing now all decoding errors as instructions, which in fact simplifies the API a good deal. However, I'm still not happy with the API, right now we have

    class Trace { void TraverseInstructions(Thread, callback, ...); int GetInstructionsCount(Thread...); }

Do we need to know the instruction count? Again, won't this be really expensive to calculate?

Right now these methods are not doing anything (or returning 0 in the case of the instructions count) if the provided Thread is not traced by the Trace instance.

If we get rid of

int Trace::GetInstructionsCount(Thread...);

Then the first call to TraverseInstructions can return an appropriate error to the callback in the Expected<addr_t> right?

Besides, eventually we'll add methods like

vector<addr_t> GetBacktrace(Thread, position)

We might want the thread to be able to produce real stack frames so we can re-use the current StackFrame classes. Each stack frame will probably only contain the PC. This call could be used to create the StackFrames in the trace thread class.

position ReverseNext(Thread, position)
position ReverseSingleStep(Thread, position)

We should be able to implement these in Trace.cpp and have it only use TraverseInstructions right? Do we also need ReverseContinue() to be able to backup until we hit a breakpoint or the start of the trace data?

We could leave the code as it is, which results in a simple API with some undefined behavior if an invalid Thread is passed.

I like the current API. Great if we can remove the Trace::GetInstructionsCount() API to keep things as simple as possible and allow us to lazily fetch instructions as needed as we eventually step and continue around.

If an invalid thread is passed, then TraverseInstructions can just return an error on the first callback. If we still need GetInstructionCount(), we can have it return 0 if the thread has no trace instructions.

Another option is to have

class Trace {
  TracedThreadSP GetTraceForThread(Thread ...); 
}

class TracedThread {
  void TraverseInstructions(callback, ...);
  int GetInstructionsCount()
 
  position ReverseNext(Thread, position)
  position ReverseSingleStep(Thread, position)
}

This creates a level of indirection that forces the caller to check if the returned TracedTrace is a valid pointer or not. If the pointer is invalid, the caller can show an error message or fail silently, otherwise, TraverseInstructions and GetInstructionsCount will perform valid operations.

I don't think we need this. I think we have enough API here to implement all of this stuff through the lldb_private::Thread class as it can call through to the Trace APIs with the thread class pointer.

This might become useful to avoid errors in complex scenarios as the API grows. But then, how often will someone invoke these methods with the wrong thread? And this TracedThread object would have to be passed around a lot if the developer doesn't want to if the thread is invalid in each callsite.

The first form of the API looks simpler without that level of indirection, which is cool. But the second one adds some boilerplate in each callsite, with some safety benefits.

What do you guys think?

Lets not do this extra indirection, I like the current API with or without GetInstructionsCount

wallace added a comment.EditedOct 19 2020, 10:57 PM

Do we need to know the instruction count? Again, won't this be really expensive to calculate?

It is expensive, but today I realized that it might be necessary to decode the entire trace if you want things to work nicely.

Let's take, for example, a very common future use case: "you stop at a breakpoint, and then you do reverse-next and print the backtrace". Getting the backtrace requires decoding all the instructions, as potentially the first instruction in the trace corresponds to the oldest frame of the backtrace of your breakpoint position (which would be the case if you start tracing at main). You won't know all the frames of your backtrace unless you decode it all. In general, you can't have the backtrace of the i-th position unless you decode everything up to that position. And without the backtrace you really don't know where you are, it's hard for the user to make sense of the state of the program without it.

Also, if would be difficult to provide comprehensive position offsets to the user in the "dump instructions" command unless we also show the maximum index, which is equivalent to the instruction count.

Lets not do this extra indirection, I like the current API with or without GetInstructionsCount

That's reassuring. I'm also of that opinion.

I think we have enough API here to implement all of this stuff through the lldb_private::Thread class as it can call through to the Trace APIs with the thread class pointer.

Great!

If an invalid thread is passed, then TraverseInstructions can just return an error on the first callback. If we still need GetInstructionCount(), we can have it return 0 if the thread has no trace instructions.

Sure

We should be able to implement these in Trace.cpp and have it only use TraverseInstructions right? Do we also need ReverseContinue() to be able to backup until we hit a breakpoint or the start of the trace data?

Yes, definitely. We'll be adding more stuff to it like ReverseContinue.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.

Well, yaml is text so it's not surprising that it'd be larger (though sometimes it can actually be smaller, because it e.g. omits padding, or because you can reduce by deleting irrelevant stuff; and git can store text diffs efficiently). But there's the other aspect that Vedant mentioned -- their opaqueness/reviewability. With a yaml file, one can see (directly in the review window or in his text editor) what kind input is the program being fed and correlate that with the expected output. This is not perfect because a lot of the details (e.g. the disassembly, and most importantly the trace file) is still obscured, but it's better than nothing. So if it works, I'd still go for the yaml option (and I have to send a big thank you to whoever implemented the program header support).

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
12

I guess this is not needed anymore (?)

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
67โ€“69

Maybe better left for a separate patch, but if we're going to have loads of these objects, then the we'd better optimize its size.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
33

Actually, the coding standards say anonymous namespaces should not be used for functions.

192

Can't have side-effects inside assertions.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.h
22โ€“45

I don't see the value of this class -- it's only used once, and in a very ephemeral way. ThreadTraceDecoder::Decode could just call CreateDecoderAndDecode directly (and it'd be shorter for it).

67

please delete the copy assignment as well

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

If you can pull that off -- great. However, I have doubts (and judging by the other comments, you're starting to have some too) regarding how long will you be able to postpone the decoding. For example, lldb very much likes to know the PC value of each thread (can you really blame it?) -- so much that we've added a special field to the gdb-remote stop-reply packet which lists the pc values of all threads.

That leads me to believe that you'll need to decode at least the last block of each thread's trace very early on

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

Ah... this is tricky...

The concatenation aspect is nice, but I'm not sure it trumps the "earlier/higher text is older" intuition. Even if I'm analyzing backwards, I think I'd prefer seeing a discontinuous set of lists which go the "right way" instead of a single continuous list which goes "backwards". I.e. I think I'd find this:

6
7
8
9
10
(lldb)
1
2
3
4
5

easier to read than this:

10
9
8
7
6
(lldb)
5
4
3
2
1

However, I don't see myself using this anytime soon, so if you think the latter is the best way to represent this, then fine. The thing we choose here is not set in stone anyway, and we can re-examine this later...

121โ€“122

This does beg the question of whether we shouldn't make the distinction even more obvious by breaking the sequence numbers in some way. I don't really have an answer to that question, though...

Some two level namespacing?

sequence 1:
[0]: 0x47
[1]: 0x48
error: no memory mapped at 0x42

sequence 2:
[0]: 0x147
...
wallace marked 10 inline comments as done.Oct 20 2020, 5:37 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

Yes, I ended up understanding more of LLDB and it seems that it'll be as you describe. I'll try to limit the initial decoding to at most the PC of each thread, which would indeed be very beneficial, because we could catch early some critical errors. I'll do that early decoding in another diff.

wallace updated this revision to Diff 299524.EditedOct 20 2020, 5:44 PM

Some updates, especially regarding the indexing and dump ordering, which I discussed at lenght with some coworkers:

  • I've addressed most of the issues, except for the disassembly and the yaml object files, which I'll do tomorrow.
  • I've changed the indexing to have [0] be the oldest instruction, and [N] the most recent one. This matches with most people's expectations on the order of lines in a text file.
  • I've changed the instruction dump to be

    thread trace dump instructions --count <> --position <>

    And it prints in reverse order, for example
[10]
[11]
[12]
[13]
[14]
And after a repeat command, it prints
[5]
[6]
[7]
[8]
[9]

and so on

The common feedback I got is that this is the most intuitive way to understand the text.

  • I've added a position member to DecodedThread, which in the future we'll move as we perform reverse debugging. For now, it's default value is the end of the trace. I'm using it for the instruction dump.
  • Now when there is a sequence of errors, I'm not printing their indices and I'm adding a whitespace, like
[ 0] 0x40064f
[ 1] 0x400540
[ 2] 0x400546
[ 3] 0x40054b
[ 4] 0x400510
[ 5] 0x400516
error: no memory mapped at this address 0x7ffff7df1950
error: no memory mapped at this address 0x400516

[ 8] 0x400657
[ 9] 0x40065a
[10] 0x40065f

Also the feedback I got about this is that showing the index of the error can confuse some people, and that the whitespace makes it very clear that there's missing information and that the error is not inoffensive.

  • I also made the requested small fixes here and there.
wallace updated this revision to Diff 300134.EditedOct 22 2020, 6:37 PM

The diff is the now ready for review. There are a few updates, including some design decisions after some chats with Greg.

  • Now the dump command includes disassembly information and symbol context information whenever relevant, e.g.
$ thread trace dump instructions --count 50

thread #1: tid = 815455, total instructions = 46
  a.out`main + 15 at main.cpp:10
    [ 0] 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
  a.out`symbol stub for: foo()
    [ 1] 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
    [ 2] 0x0000000000400546    pushq  $0x2
    [ 3] 0x000000000040054b    jmp    0x400510
  a.out`(none)
    [ 4] 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
    [ 5] 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
    [ 6] 0x00007ffff7df1950    error: no memory mapped at this address
    ...instructions missing
  a.out`main + 20 at main.cpp:10
    [ 7] 0x0000000000400674    movl   %eax, -0xc(%rbp)
  a.out`main + 23 at main.cpp:12
    [ 8] 0x0000000000400677    movl   -0xc(%rbp), %eax

This disassembly and symbol dumping also works for inline functions, as described in the tests.

  • A flag for the command has been added (-r), which prints raw instruction addresses, similar to the one in the disassembly command, e.g.
$ thread trace dump instructions --raw

 thread #1: tid = 3842849, total instructions = 21
   [ 1] 0x0000000000400518    
   [ 2] 0x000000000040051f    
   [ 3] 0x0000000000400529    
   [ 4] 0x000000000040052d    
   [ 5] 0x0000000000400521    
   [ 6] 0x0000000000400525 
  • I tried to use yaml to represent the binary fails, but it doesn't work. The yaml representation doesn't have all the required information and the decoder complaints about missing memory. At this point it's better to just include the binary and have strong tests.
  • I'm leaving for a future diff restructuring the IntelPTInstruction class so that is more memory efficient. Something to keep in mind is that errors will be seldom, as they represent gaps in the trace, and most of the instructions will be correct addresses.

I'm not sure if I'll have time to go through this again, but it seems ok. One question inline, though.

The diff is the now ready for review. There are a few updates, including some design decisions after some chats with Greg.

  • Now the dump command includes disassembly information and symbol context information whenever relevant, e.g.

I should've said something earlier, but I think the decision to the actual disassembling in a separate patch was a good one. This patch was big enough as it was.

  • I tried to use yaml to represent the binary fails, but it doesn't work. The yaml representation doesn't have all the required information and the decoder complaints about missing memory. At this point it's better to just include the binary and have strong tests.

That makes me sad, but I am not going to hold this patch over that. I would encourage you to find and implement the missing bits in yaml2obj though...

lldb/packages/Python/lldbsuite/test/lldbtest.py
2116

I think that we could just drop the len(cmd)==0 check. It's pretty unlikely that anyone will do that by accident (and have all of his checks will still pass)...

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
54โ€“57

I don't see this used anywhere. And if it's not used, how is the dump command implemented?

(I'm guessing this is used to implement per-thread "last dumped instruction" positions. I'm not sure if that feature is worth it (the list command doesn't have that), but if that's the case maybe the name should also be more specific, as that position is unlikely to be useful for anything else.)

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

Sounds good.

I am slightly worried about the emphasis on sequential instruction numbers in this design. It seems like it'd be hard to avoid decoding the entire trace if one needs to assign a sequential id to each instruction. But let's see how it goes...

wallace marked 7 inline comments as done.Oct 26 2020, 7:53 PM

That makes me sad, but I am not going to hold this patch over that. I would encourage you to find and implement the missing bits in yaml2obj though...

I'll do that later as a way to learn yaml2obj.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
54โ€“57

I'm not using it in this patch, but I added it along with the GetCurrentPosition. I plan to use it for the reverse debugging case, in which a reverse-next would move this value to a different position, that would get picked by any subsequent dump or reverse command. I'll improve the documentation and the function name

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

I think that the actual problem is we can avoid decoding the entire trace in the first place. This will be unavoidable if we want to show backtraces, as the frames are scattered throughout the trace and there's no way to know where they are unless you decode it all. I don't know of any other efficient tracing mechanism that doesn't have this problem. When I implement the backtrace reconstruction we can have a much better picture of what's possible and what not.

wallace updated this revision to Diff 300864.Oct 26 2020, 7:56 PM
wallace marked 2 inline comments as done.

Address issues.

vsk added a comment.Oct 27 2020, 10:25 AM

@wallace thanks for winnowing the test objects. I left an inline suggestion about simplifying the error-handling in IntelPTInstruction. Other than that, mechanically this is looking good. Thanks!

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
37

Does this IntelPTInstruction constructor ever get called with a non-zero libipt error code?

41

It might be a bit cleaner to just have two IntelPTInstruction constructors: one that accepts a const pt_insn & and another that accepts an llvm::Error. To do that, you'd need to introduce a PT-specific ErrorInfo (class IntelPTError : public ErrorInfo<IntelPTError> { ... }). This can wrap some sort of generic error (as a std::string, perhaps), or a libipt-specific error.

It'd be a little more up front work, but the benefit is that it simplifies error handling (there's just one type of error, only one error value to check in IsError, etc).

wallace updated this revision to Diff 301128.Oct 27 2020, 4:53 PM

Followed @vsk's suggestion. I didn't do exactly what he said, but I ended up creating my own IntelPTError to represent better the different kinds of errors and make sure we can create a correct llvm::Error when traversing the instructions.
The IntelPTInstruction code is much simpler now.

friendly ping

vsk added inline comments.Oct 30 2020, 11:16 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
27

Do we need a default constructor for this pure-virtual class?

37

Still wondering about this: why do we need to construct an IntelPTInstruction using a pt_insn as well as an error code?

117

Still doesn't feel like this is aligning with the design of llvm::Error - they're supposed to be unique and lightweight. Specifically:

  • Why is the error instance shareable? Do multiple threads need to handle the same error?
  • For that matter, why is IntelPTInstruction copyable?
  • How much are we gaining by storing the libipt_error_code vs. storing a bool m_is_gap field?

Imo it seems a bit too complicated to have three different ways to construct an IntelPTInstruction with an error. Would be nice to just have one?

wallace added inline comments.Oct 30 2020, 12:10 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
27

The compiler forces me to do it because I declare below a deleted copy constructor.

37

Indeed, there are cases in which the libipt decoder library fails to decode an instruction, but contains some information about it, like the address (among other info). Then you have a partially correct pt_insn object along with an error code

117

Will try to fix that now.

Definitely I shouldn't make it copyable, which will simplify the code.

Regarding the error, it's hard to receive an llvm::Error in the constructor, because every time the user iterates on each instruction, that Error will need to be returned, but Errors are not copyable, only movable. That's why I found it easier to create a custom error class that can create Error's on demand.

wallace added inline comments.Oct 30 2020, 12:44 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
117

How much are we gaining by storing the libipt_error_code vs. storing a bool m_is_gap field?

Quite a lot. If we can report to the user why decoding failed, then the user can try to fix that accordingly. For example, if an address section is missing, we can report it and the user can provide that missing library and retry decoding.

wallace updated this revision to Diff 302000.Oct 30 2020, 1:05 PM

Address changes.

  • Made IntelPTError and IntelPTInstruction non-copyable, which simplified the code.
  • I still keep IntelPTError as a lightweight way to create Error objects on demand when traversing the trace. Notice that Errors are only movable, so I can't use the Error itself as storage.
  • I removed one error constructor and now there're only two, with improved documentation.
labath added a subscriber: lhames.Nov 2 2020, 12:43 AM

A more light-weight approach to this would be to store the ErrorInfo object from the Error and copy that when needed. Something like:

class IntelPTInstruction {
  std::unique_ptr<ErrorInfoBase> m_error; // Technically, this should be a vector, but we don't really make use of the error list functionality anywhere.
  
  IntelPTInstruction(Error err) {
    handleAllErrors(std::move(err), [&](std::unique_ptr<ErrorInfoBase> info) { m_error = std::move(info); }); // Stash it for later use
  }

  Error ToError() {
    if (m_error->isA<IntelPTError>())
      return make_error<IntelPTError>(static_cast<IntelPTError&>(*m_error)); // IntelPTError is copyable
    return make_error<StringError>(m_error->message(), m_error->convertToErrorCode()); // Just copy the error message
  }
};

But maybe there are other solutions too. Adding @lhames, in case he knows about any...

wallace updated this revision to Diff 302354.Nov 2 2020, 11:26 AM

Followed the suggestion by @labath. The code definitely looks better.

labath added a comment.Nov 4 2020, 2:34 AM

I'd say this looks ok now. Another advantage of doing the decoding early is that we could then simplify the IntelPTInstruction class. If we handle the early decoding errors (file not found, etc.) early, then the only kind of errors this class could contain are errors from the pt library -- and in that case we could just store them as an int instead of this ErrorInfoBase business.

Thanks. I'm thinking about doing the early decoding, as doing "trace load" right now doesn't do anything very useful unless you do "dump instructions". However, I'll try to do it once I finish implementing decoding for live processes, because I want to have a unified process for both, and live process tracing should be lazy to avoid stopping the user flow for too long.

clayborg accepted this revision.Nov 5 2020, 3:39 PM

I am good too.

This revision is now accepted and ready to land.Nov 5 2020, 3:39 PM
This revision was landed with ongoing or failed builds.Nov 5 2020, 6:48 PM
This revision was automatically updated to reflect the committed changes.