This is an archive of the discontinued LLVM Phabricator instance.

[trace][intel-pt] Implement the basic decoding functionality
ClosedPublic

Authored by wallace on Oct 12 2020, 4:52 PM.

Details

Summary

Depends on D89408.

This diff finally implements trace decoding!

The current interface is

$ trace load /path/to/trace/session/file.json
$ thread trace dump instructions

thread #1: tid = 3842849, total instructions = 22
  [ 0] 0x40052d
  [ 1] 0x40052f  
  ...
  [19] 0x400521

$ # simply enter, which is a repeat command
  [20] 0x40052d
  [21] 0x400529
  ...

This doesn't do any disassembly, which will be done in the next diff.

Changes:

  • Added an IntelPTDecoder class, that is a wrapper for libipt, which is the actual library that performs the decoding.
  • Added TraceThreadDecoder class that decodes traces and memoizes the result to avoid repeating the decoding step.
  • Added a DecodedThread class, which represents the output from decoding and that for the time being only stores the list of reconstructed instructions. Later it'll contain the function call hierarchy, which will enable reconstructing backtraces.
  • Added basic APIs for accessing the trace in Trace.h:
    • GetInstructionCount, which counts the number of instructions traced for a given thread
    • IsTraceFailed, which returns an Error if decoding a thread failed
    • ForEachInstruction, which iterates on the instructions traced for a given thread, concealing the internal storage of threads, as plug-ins can decide to generate the instructions on the fly or to store them all in a vector, like I do.
  • DumpTraceInstructions was updated to print the instructions or show an error message if decoding was impossible.
  • Tests included

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
labath added inline comments.Oct 19 2020, 6:29 AM
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
189

It looks like this can only fail if the image argument is null (which can only happen if the decoder is null, which is checked). An assert would be enough for that. (For a proper error handling you should have also freed the decoder object on the error path, which is how i came to thing about this).

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
69

drop std::make_pair, it's cleaner.

94

this makes a copy, which you probably did not want.

98

i>=0 is always true. You'll have to do this trick with signed numbers (ssize_t?)

109

I'm having doubts about this "I have an thread but was not able to decode _anything_ about it" state is worth it. Having many different ways to report errors just increases the chance of something going wrong, and in TraverseInstructions you're already treating this state as a pseudo-instruction.

Maybe that representation should be used all the way down? Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).

lldb/source/Target/Trace.cpp
85

that would have to be log10(num+1), though I'm not sure what to thing of the floating point arithmetic...

222

The cast to int64_t won't change the actual value of the result (though it may invoke UB due to signed wraparound). What exactly are you trying to achieve here?

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
44–49

Are you sure that printing this backwards is the best way to display this? The resulting disassembly is going to look quite weird. I think that printing this in the "normal" direction would make it easier to figure out what the program was doing. For people who are only interested in the final PC value it should not be a problem to skip to the last line of the output (the last line is also more likely to remain visible if the dump produces lots of data).

wallace marked 3 inline comments as done.Oct 19 2020, 4:08 PM

@labath is right regarding the need of pre-baked binaries to test specific conditions. I'll remove the ld binary, as it really tests nothing useful, and i'll try to use yaml to represent the binaries in a more concise format.

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
44–49

First of all, I'm thinking about adding a flag to this command to choose the direction, as there are benefits of both.

Let's say, if you are interested in reading/understanding the last instructions up to a breakpoint, then reading the trace in reverse makes sense, as you don't know where to start reading from, but you know where to end. Imagine you have 100K instructions, where do you start? It seems sometimes better to read the last instructions and then ask for a few of the earlier instructions, and keep doing that until you find what you are interested in. On the other hand, if you want to analyze forwards what happens from a certain point, this API is quite annoying and I imagine you'd prefer to read it forwards.

So I propose

thread trace dump instructions --count <> --start-position <> [--forwards | -f] [--backwards | -b]

I'd keep -b as default, as it's useful when analyzing crashes or stops on breakpoints. The default --start-position when reading forwards could be the oldest chronological instruction, and the default when reading backwards could be the earliest chronologically.

With this, I'd change the indices. I'd make index [0] to be the oldest chronologically and [|trace| -1] to be the most recent.

@labath, @clayborg, what do you think? This might be flexible enough for the different kind of usages.

121–122

I think it's highly important to tell the user that this is a very important error and not make it apparently inoffensive with the formatting. Let me elaborate why this is not an inoffensive error.

First of all, the encoded trace is composed of packets, composed of two main packets:

  • PSB: synchronization packet that contains the current PC. These packets are sporadic (often one for each 4KB of data), as they are big in size.
  • TNT: taken/not taken packet that contains one bit per branch executed by the processor. These packets are probably the most frequent and they don't include any PC.

When decoding, the decoder finds first a PSB packet, gaining the knowledge of the current PC, then it starts traversing the binary instruction by instruction until it finds a branch, in which case it finds the next TNT packet and learns if that branch was taken or not, then continuing the traversal in the correct direction.

This means that when the decoder can't read a memory address, then it won't be able to decode any TNT packets until the next PSB synchronization point. In fact, in this diff, when there's an instruction decoding error, we skip to the next PSB and resume decoding from there. This problem implies that we are skipping potentially thousands of instructions. In other words, if you see

[0]: 0x400518
[1]: 0x400511
[2]: no memory mapped at this address: 0x400502
[3]: 0x400500

Then that means that between instructions [3] and [1] there were an unknown number of instructions that couldn't be decoded, the first one of them being at 0x400502. We won't be able to do anything useful with those instructions, and the user would need to provide the missing module and redecode to reconstruct the full trace.

clayborg added inline comments.Oct 19 2020, 4:44 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
32

That or "lldb_private::StreamString". Both have similar functionality. I prefer StreamString because it is simpler. With raw_string_ostream, you have to make a std::string, put it into the raw_string_ostream and then flush it prior to getting the string result.

42

yeah llvm::ArrayRef to avoid making copies is good.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
33

Make static or add an anonymous namespace around all of these functions so you don't have to mark them all as static.

81

Make static or add an anonymous namespace around all of these functions so you don't have to mark them all as static.

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
94

returning a llvm::ArrayRef to avoid the copy

98

Yes, switch to ssize_t, your delta is already signed. Also switch "delta" to ssize_t as well.

lldb/source/Target/Trace.cpp
222

Lots os signed/unsigned match issues possible. Best to make this rock solid.

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

If we reverse the direction, then hitting "enter" after doing one command won't flow as nicely as it does now. That being said I agree with Pavel that we should figure out what is expected. I generally think that earlier text is older.

I would not switch the indexes so that they change with any options that are specified. We currently have --start-position, but maybe this should be just --position? Or we specify:

--from-end <offset>

<offset> would be the index offset from the end (newest) of the data?

--from-start <offset>

<offset> would be the index offset from the start (oldest) of the data?

I would be fine with:

[--forwards | -f] [--backwards | -b]

but I think it would make sense to show the indexes in a consistent way regardless of what options are displayed. Maybe it makes sense to always show the true index where zero is the oldest and N is the newest?

We do need to make sure the auto repeat command looks good though which will be hard with oldest to newest ordering.

wallace marked an inline comment as done.Oct 19 2020, 5:01 PM
wallace added inline comments.
lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

What about this:

We expose the indices in a chronologically increasing way, where [0] is the oldest instruction and [N] is the newest. Then we have the two options suggested by Greg

--from-end <offset>

Where offset is an index or the string "end", meaning the last instruction of the trace, in case the user doesn't know the index of it. Then the instructions are printed

[offset]
[offset - 1]
...
[offset - K]

And if there's a repeat command, this would be printed

[offset - K - 1]
[offset - K - 2]
...

Which would look nicely as a contiguous list of instructions if concatenated.

The other option would be

--from-start<offset>

Where offset is an index. Then the instructions are printed

[offset]
[offset + 1]
[offset + 2]
...
[offset + K]

And after a repeat command, you'd get

[offset + K + 1]
[offset + K + 2]
...

I think this would serve all purposes.

wallace marked 16 inline comments as done.Oct 19 2020, 5:30 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
189

good catch!

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
98

TIL!

109

Maybe that representation should be used all the way down?

I'll follow that path. This will create consistency through the code

Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).

I wish I could do that, but decoding is very expensive and should be done lazily. According to Intel and my tests, if a thread was traced during T seconds, then decoding takes around 10T, which is a big amount of time if you were tracing 10 threads for 5 seconds, which would take 500 seconds to decode. At least for know we are not doing parallel decoding. I imagine at some point we'll have to work on that.

wallace updated this revision to Diff 299236.EditedOct 19 2020, 6:43 PM
wallace marked 3 inline comments as done.

Addressed almost all changes.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.
  • Showing the disassembly of the instructions, requested by Greg, is not yet done.
  • I'm representing now all decoding errors as instructions, which in fact simplifies the API a good deal. However, I'm still not happy with the API, right now we have

    class Trace { void TraverseInstructions(Thread, callback, ...); int GetInstructionsCount(Thread...); }

Right now these methods are not doing anything (or returning 0 in the case of the instructions count) if the provided Thread is not traced by the Trace instance.

Besides, eventually we'll add methods like

vector<addr_t> GetBacktrace(Thread, position)

position ReverseNext(Thread, position)
position ReverseSingleStep(Thread, position)

We could leave the code as it is, which results in a simple API with some undefined behavior if an invalid Thread is passed.

Another option is to have

class Trace {
  TracedThreadSP GetTraceForThread(Thread ...); 
}

class TracedThread {
  void TraverseInstructions(callback, ...);
  int GetInstructionsCount()
 
  position ReverseNext(Thread, position)
  position ReverseSingleStep(Thread, position)
}

This creates a level of indirection that forces the caller to check if the returned TracedTrace is a valid pointer or not. If the pointer is invalid, the caller can show an error message or fail silently, otherwise, TraverseInstructions and GetInstructionsCount will perform valid operations.

This might become useful to avoid errors in complex scenarios as the API grows. But then, how often will someone invoke these methods with the wrong thread? And this TracedThread object would have to be passed around a lot if the developer doesn't want to if the thread is invalid in each callsite.

The first form of the API looks simpler without that level of indirection, which is cool. But the second one adds some boilerplate in each callsite, with some safety benefits.

What do you guys think?

Addressed almost all changes.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.
  • Showing the disassembly of the instructions, requested by Greg, is not yet done.
  • I'm representing now all decoding errors as instructions, which in fact simplifies the API a good deal. However, I'm still not happy with the API, right now we have

    class Trace { void TraverseInstructions(Thread, callback, ...); int GetInstructionsCount(Thread...); }

Do we need to know the instruction count? Again, won't this be really expensive to calculate?

Right now these methods are not doing anything (or returning 0 in the case of the instructions count) if the provided Thread is not traced by the Trace instance.

If we get rid of

int Trace::GetInstructionsCount(Thread...);

Then the first call to TraverseInstructions can return an appropriate error to the callback in the Expected<addr_t> right?

Besides, eventually we'll add methods like

vector<addr_t> GetBacktrace(Thread, position)

We might want the thread to be able to produce real stack frames so we can re-use the current StackFrame classes. Each stack frame will probably only contain the PC. This call could be used to create the StackFrames in the trace thread class.

position ReverseNext(Thread, position)
position ReverseSingleStep(Thread, position)

We should be able to implement these in Trace.cpp and have it only use TraverseInstructions right? Do we also need ReverseContinue() to be able to backup until we hit a breakpoint or the start of the trace data?

We could leave the code as it is, which results in a simple API with some undefined behavior if an invalid Thread is passed.

I like the current API. Great if we can remove the Trace::GetInstructionsCount() API to keep things as simple as possible and allow us to lazily fetch instructions as needed as we eventually step and continue around.

If an invalid thread is passed, then TraverseInstructions can just return an error on the first callback. If we still need GetInstructionCount(), we can have it return 0 if the thread has no trace instructions.

Another option is to have

class Trace {
  TracedThreadSP GetTraceForThread(Thread ...); 
}

class TracedThread {
  void TraverseInstructions(callback, ...);
  int GetInstructionsCount()
 
  position ReverseNext(Thread, position)
  position ReverseSingleStep(Thread, position)
}

This creates a level of indirection that forces the caller to check if the returned TracedTrace is a valid pointer or not. If the pointer is invalid, the caller can show an error message or fail silently, otherwise, TraverseInstructions and GetInstructionsCount will perform valid operations.

I don't think we need this. I think we have enough API here to implement all of this stuff through the lldb_private::Thread class as it can call through to the Trace APIs with the thread class pointer.

This might become useful to avoid errors in complex scenarios as the API grows. But then, how often will someone invoke these methods with the wrong thread? And this TracedThread object would have to be passed around a lot if the developer doesn't want to if the thread is invalid in each callsite.

The first form of the API looks simpler without that level of indirection, which is cool. But the second one adds some boilerplate in each callsite, with some safety benefits.

What do you guys think?

Lets not do this extra indirection, I like the current API with or without GetInstructionsCount

wallace added a comment.EditedOct 19 2020, 10:57 PM

Do we need to know the instruction count? Again, won't this be really expensive to calculate?

It is expensive, but today I realized that it might be necessary to decode the entire trace if you want things to work nicely.

Let's take, for example, a very common future use case: "you stop at a breakpoint, and then you do reverse-next and print the backtrace". Getting the backtrace requires decoding all the instructions, as potentially the first instruction in the trace corresponds to the oldest frame of the backtrace of your breakpoint position (which would be the case if you start tracing at main). You won't know all the frames of your backtrace unless you decode it all. In general, you can't have the backtrace of the i-th position unless you decode everything up to that position. And without the backtrace you really don't know where you are, it's hard for the user to make sense of the state of the program without it.

Also, if would be difficult to provide comprehensive position offsets to the user in the "dump instructions" command unless we also show the maximum index, which is equivalent to the instruction count.

Lets not do this extra indirection, I like the current API with or without GetInstructionsCount

That's reassuring. I'm also of that opinion.

I think we have enough API here to implement all of this stuff through the lldb_private::Thread class as it can call through to the Trace APIs with the thread class pointer.

Great!

If an invalid thread is passed, then TraverseInstructions can just return an error on the first callback. If we still need GetInstructionCount(), we can have it return 0 if the thread has no trace instructions.

Sure

We should be able to implement these in Trace.cpp and have it only use TraverseInstructions right? Do we also need ReverseContinue() to be able to backup until we hit a breakpoint or the start of the trace data?

Yes, definitely. We'll be adding more stuff to it like ReverseContinue.

What's left to discuss:

  • I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.

Well, yaml is text so it's not surprising that it'd be larger (though sometimes it can actually be smaller, because it e.g. omits padding, or because you can reduce by deleting irrelevant stuff; and git can store text diffs efficiently). But there's the other aspect that Vedant mentioned -- their opaqueness/reviewability. With a yaml file, one can see (directly in the review window or in his text editor) what kind input is the program being fed and correlate that with the expected output. This is not perfect because a lot of the details (e.g. the disassembly, and most importantly the trace file) is still obscured, but it's better than nothing. So if it works, I'd still go for the yaml option (and I have to send a big thank you to whoever implemented the program header support).

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
12

I guess this is not needed anymore (?)

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
67–69

Maybe better left for a separate patch, but if we're going to have loads of these objects, then the we'd better optimize its size.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp
33

Actually, the coding standards say anonymous namespaces should not be used for functions.

192

Can't have side-effects inside assertions.

lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.h
22–45

I don't see the value of this class -- it's only used once, and in a very ephemeral way. ThreadTraceDecoder::Decode could just call CreateDecoderAndDecode directly (and it'd be shorter for it).

67

please delete the copy assignment as well

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

If you can pull that off -- great. However, I have doubts (and judging by the other comments, you're starting to have some too) regarding how long will you be able to postpone the decoding. For example, lldb very much likes to know the PC value of each thread (can you really blame it?) -- so much that we've added a special field to the gdb-remote stop-reply packet which lists the pc values of all threads.

That leads me to believe that you'll need to decode at least the last block of each thread's trace very early on

lldb/test/API/commands/trace/TestTraceDumpInstructions.py
49

Ah... this is tricky...

The concatenation aspect is nice, but I'm not sure it trumps the "earlier/higher text is older" intuition. Even if I'm analyzing backwards, I think I'd prefer seeing a discontinuous set of lists which go the "right way" instead of a single continuous list which goes "backwards". I.e. I think I'd find this:

6
7
8
9
10
(lldb)
1
2
3
4
5

easier to read than this:

10
9
8
7
6
(lldb)
5
4
3
2
1

However, I don't see myself using this anytime soon, so if you think the latter is the best way to represent this, then fine. The thing we choose here is not set in stone anyway, and we can re-examine this later...

121–122

This does beg the question of whether we shouldn't make the distinction even more obvious by breaking the sequence numbers in some way. I don't really have an answer to that question, though...

Some two level namespacing?

sequence 1:
[0]: 0x47
[1]: 0x48
error: no memory mapped at 0x42

sequence 2:
[0]: 0x147
...
wallace marked 10 inline comments as done.Oct 20 2020, 5:37 PM
wallace added inline comments.
lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

Yes, I ended up understanding more of LLDB and it seems that it'll be as you describe. I'll try to limit the initial decoding to at most the PC of each thread, which would indeed be very beneficial, because we could catch early some critical errors. I'll do that early decoding in another diff.

wallace updated this revision to Diff 299524.EditedOct 20 2020, 5:44 PM

Some updates, especially regarding the indexing and dump ordering, which I discussed at lenght with some coworkers:

  • I've addressed most of the issues, except for the disassembly and the yaml object files, which I'll do tomorrow.
  • I've changed the indexing to have [0] be the oldest instruction, and [N] the most recent one. This matches with most people's expectations on the order of lines in a text file.
  • I've changed the instruction dump to be

    thread trace dump instructions --count <> --position <>

    And it prints in reverse order, for example
[10]
[11]
[12]
[13]
[14]
And after a repeat command, it prints
[5]
[6]
[7]
[8]
[9]

and so on

The common feedback I got is that this is the most intuitive way to understand the text.

  • I've added a position member to DecodedThread, which in the future we'll move as we perform reverse debugging. For now, it's default value is the end of the trace. I'm using it for the instruction dump.
  • Now when there is a sequence of errors, I'm not printing their indices and I'm adding a whitespace, like
[ 0] 0x40064f
[ 1] 0x400540
[ 2] 0x400546
[ 3] 0x40054b
[ 4] 0x400510
[ 5] 0x400516
error: no memory mapped at this address 0x7ffff7df1950
error: no memory mapped at this address 0x400516

[ 8] 0x400657
[ 9] 0x40065a
[10] 0x40065f

Also the feedback I got about this is that showing the index of the error can confuse some people, and that the whitespace makes it very clear that there's missing information and that the error is not inoffensive.

  • I also made the requested small fixes here and there.
wallace updated this revision to Diff 300134.EditedOct 22 2020, 6:37 PM

The diff is the now ready for review. There are a few updates, including some design decisions after some chats with Greg.

  • Now the dump command includes disassembly information and symbol context information whenever relevant, e.g.
$ thread trace dump instructions --count 50

thread #1: tid = 815455, total instructions = 46
  a.out`main + 15 at main.cpp:10
    [ 0] 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
  a.out`symbol stub for: foo()
    [ 1] 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
    [ 2] 0x0000000000400546    pushq  $0x2
    [ 3] 0x000000000040054b    jmp    0x400510
  a.out`(none)
    [ 4] 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
    [ 5] 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
    [ 6] 0x00007ffff7df1950    error: no memory mapped at this address
    ...instructions missing
  a.out`main + 20 at main.cpp:10
    [ 7] 0x0000000000400674    movl   %eax, -0xc(%rbp)
  a.out`main + 23 at main.cpp:12
    [ 8] 0x0000000000400677    movl   -0xc(%rbp), %eax

This disassembly and symbol dumping also works for inline functions, as described in the tests.

  • A flag for the command has been added (-r), which prints raw instruction addresses, similar to the one in the disassembly command, e.g.
$ thread trace dump instructions --raw

 thread #1: tid = 3842849, total instructions = 21
   [ 1] 0x0000000000400518    
   [ 2] 0x000000000040051f    
   [ 3] 0x0000000000400529    
   [ 4] 0x000000000040052d    
   [ 5] 0x0000000000400521    
   [ 6] 0x0000000000400525 
  • I tried to use yaml to represent the binary fails, but it doesn't work. The yaml representation doesn't have all the required information and the decoder complaints about missing memory. At this point it's better to just include the binary and have strong tests.
  • I'm leaving for a future diff restructuring the IntelPTInstruction class so that is more memory efficient. Something to keep in mind is that errors will be seldom, as they represent gaps in the trace, and most of the instructions will be correct addresses.

I'm not sure if I'll have time to go through this again, but it seems ok. One question inline, though.

The diff is the now ready for review. There are a few updates, including some design decisions after some chats with Greg.

  • Now the dump command includes disassembly information and symbol context information whenever relevant, e.g.

I should've said something earlier, but I think the decision to the actual disassembling in a separate patch was a good one. This patch was big enough as it was.

  • I tried to use yaml to represent the binary fails, but it doesn't work. The yaml representation doesn't have all the required information and the decoder complaints about missing memory. At this point it's better to just include the binary and have strong tests.

That makes me sad, but I am not going to hold this patch over that. I would encourage you to find and implement the missing bits in yaml2obj though...

lldb/packages/Python/lldbsuite/test/lldbtest.py
2111

I think that we could just drop the len(cmd)==0 check. It's pretty unlikely that anyone will do that by accident (and have all of his checks will still pass)...

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
54–57

I don't see this used anywhere. And if it's not used, how is the dump command implemented?

(I'm guessing this is used to implement per-thread "last dumped instruction" positions. I'm not sure if that feature is worth it (the list command doesn't have that), but if that's the case maybe the name should also be more specific, as that position is unlikely to be useful for anything else.)

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

Sounds good.

I am slightly worried about the emphasis on sequential instruction numbers in this design. It seems like it'd be hard to avoid decoding the entire trace if one needs to assign a sequential id to each instruction. But let's see how it goes...

wallace marked 7 inline comments as done.Oct 26 2020, 7:53 PM

That makes me sad, but I am not going to hold this patch over that. I would encourage you to find and implement the missing bits in yaml2obj though...

I'll do that later as a way to learn yaml2obj.

lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
54–57

I'm not using it in this patch, but I added it along with the GetCurrentPosition. I plan to use it for the reverse debugging case, in which a reverse-next would move this value to a different position, that would get picked by any subsequent dump or reverse command. I'll improve the documentation and the function name

lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
109

I think that the actual problem is we can avoid decoding the entire trace in the first place. This will be unavoidable if we want to show backtraces, as the frames are scattered throughout the trace and there's no way to know where they are unless you decode it all. I don't know of any other efficient tracing mechanism that doesn't have this problem. When I implement the backtrace reconstruction we can have a much better picture of what's possible and what not.

wallace updated this revision to Diff 300864.Oct 26 2020, 7:56 PM
wallace marked 2 inline comments as done.

Address issues.

vsk added a comment.Oct 27 2020, 10:25 AM

@wallace thanks for winnowing the test objects. I left an inline suggestion about simplifying the error-handling in IntelPTInstruction. Other than that, mechanically this is looking good. Thanks!

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
37

Does this IntelPTInstruction constructor ever get called with a non-zero libipt error code?

41

It might be a bit cleaner to just have two IntelPTInstruction constructors: one that accepts a const pt_insn & and another that accepts an llvm::Error. To do that, you'd need to introduce a PT-specific ErrorInfo (class IntelPTError : public ErrorInfo<IntelPTError> { ... }). This can wrap some sort of generic error (as a std::string, perhaps), or a libipt-specific error.

It'd be a little more up front work, but the benefit is that it simplifies error handling (there's just one type of error, only one error value to check in IsError, etc).

wallace updated this revision to Diff 301128.Oct 27 2020, 4:53 PM

Followed @vsk's suggestion. I didn't do exactly what he said, but I ended up creating my own IntelPTError to represent better the different kinds of errors and make sure we can create a correct llvm::Error when traversing the instructions.
The IntelPTInstruction code is much simpler now.

friendly ping

vsk added inline comments.Oct 30 2020, 11:16 AM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
26

Do we need a default constructor for this pure-virtual class?

37

Still wondering about this: why do we need to construct an IntelPTInstruction using a pt_insn as well as an error code?

116

Still doesn't feel like this is aligning with the design of llvm::Error - they're supposed to be unique and lightweight. Specifically:

  • Why is the error instance shareable? Do multiple threads need to handle the same error?
  • For that matter, why is IntelPTInstruction copyable?
  • How much are we gaining by storing the libipt_error_code vs. storing a bool m_is_gap field?

Imo it seems a bit too complicated to have three different ways to construct an IntelPTInstruction with an error. Would be nice to just have one?

wallace added inline comments.Oct 30 2020, 12:10 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
26

The compiler forces me to do it because I declare below a deleted copy constructor.

37

Indeed, there are cases in which the libipt decoder library fails to decode an instruction, but contains some information about it, like the address (among other info). Then you have a partially correct pt_insn object along with an error code

116

Will try to fix that now.

Definitely I shouldn't make it copyable, which will simplify the code.

Regarding the error, it's hard to receive an llvm::Error in the constructor, because every time the user iterates on each instruction, that Error will need to be returned, but Errors are not copyable, only movable. That's why I found it easier to create a custom error class that can create Error's on demand.

wallace added inline comments.Oct 30 2020, 12:44 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
116

How much are we gaining by storing the libipt_error_code vs. storing a bool m_is_gap field?

Quite a lot. If we can report to the user why decoding failed, then the user can try to fix that accordingly. For example, if an address section is missing, we can report it and the user can provide that missing library and retry decoding.

wallace updated this revision to Diff 302000.Oct 30 2020, 1:05 PM

Address changes.

  • Made IntelPTError and IntelPTInstruction non-copyable, which simplified the code.
  • I still keep IntelPTError as a lightweight way to create Error objects on demand when traversing the trace. Notice that Errors are only movable, so I can't use the Error itself as storage.
  • I removed one error constructor and now there're only two, with improved documentation.
labath added a subscriber: lhames.Nov 2 2020, 12:43 AM

A more light-weight approach to this would be to store the ErrorInfo object from the Error and copy that when needed. Something like:

class IntelPTInstruction {
  std::unique_ptr<ErrorInfoBase> m_error; // Technically, this should be a vector, but we don't really make use of the error list functionality anywhere.
  
  IntelPTInstruction(Error err) {
    handleAllErrors(std::move(err), [&](std::unique_ptr<ErrorInfoBase> info) { m_error = std::move(info); }); // Stash it for later use
  }

  Error ToError() {
    if (m_error->isA<IntelPTError>())
      return make_error<IntelPTError>(static_cast<IntelPTError&>(*m_error)); // IntelPTError is copyable
    return make_error<StringError>(m_error->message(), m_error->convertToErrorCode()); // Just copy the error message
  }
};

But maybe there are other solutions too. Adding @lhames, in case he knows about any...

wallace updated this revision to Diff 302354.Nov 2 2020, 11:26 AM

Followed the suggestion by @labath. The code definitely looks better.

labath added a comment.Nov 4 2020, 2:34 AM

I'd say this looks ok now. Another advantage of doing the decoding early is that we could then simplify the IntelPTInstruction class. If we handle the early decoding errors (file not found, etc.) early, then the only kind of errors this class could contain are errors from the pt library -- and in that case we could just store them as an int instead of this ErrorInfoBase business.

Thanks. I'm thinking about doing the early decoding, as doing "trace load" right now doesn't do anything very useful unless you do "dump instructions". However, I'll try to do it once I finish implementing decoding for live processes, because I want to have a unified process for both, and live process tracing should be lazy to avoid stopping the user flow for too long.

clayborg accepted this revision.Nov 5 2020, 3:39 PM

I am good too.

This revision is now accepted and ready to land.Nov 5 2020, 3:39 PM
This revision was landed with ongoing or failed builds.Nov 5 2020, 6:48 PM
This revision was automatically updated to reflect the committed changes.