Page MenuHomePhabricator

[trace][intel-pt] Implement trace start and trace stop

Authored by wallace on Nov 17 2020, 8:43 PM.



This implements the interactive trace start and stop methods.

This diff ended up being much larger than I anticipated because, by doing it, I found that I had implemented in the beginning many things in a non optimal way. In any case, the code is much better now.

There's a lot of boilerplate code due to the gdb-remote protocol, but the main changes are:

  • New tracing packets: jLLDBTraceStop, jLLDBTraceStart, jLLDBTraceGetBinaryData. The gdb-remote packet definitions are quite comprehensive.
  • Implementation of the "process trace start|stop" and "thread trace start|stop" commands.
  • Implementation of an API in Trace.h to interact with live traces.
  • Created an IntelPTDecoder for live threads, that use the debugger's stop id as checkpoint for its internal cache.
  • Added a functionality to stop the process in case "process tracing" is enabled and a new thread can't traced.
  • Reworked how traces are manager in lldb-server, including the removal of the old Intel code
  • Renamed a bunch of things for consistency and better naming
  • Added tests

I have some ideas to unify the code paths for post mortem and live threads, but I'll do that in another diff.

Example flow for a single threaded process:

(lldb) file /home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out                                                                                                                                                    Current executable set to '/home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 4 at main.cpp:2, address = 0x0000000000400511
(lldb) r
Process 1563054 launched: '/home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace/a.out' (x86_64)
Process 1563054 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400511 a.out`main at main.cpp:2
   1    int main() {
-> 2      int ret = 0;
   4      for (int i = 0; i < 4; i++)
   5        ret ^= 1;
   7      return ret;
(lldb) thread trace start
(lldb) b 7
Breakpoint 2: where = a.out`main + 34 at main.cpp:7, address = 0x000000000040052f
(lldb) c
Process 1563054 resuming
Process 1563054 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
    frame #0: 0x000000000040052f a.out`main at main.cpp:7
   4      for (int i = 0; i < 4; i++)
   5        ret ^= 1;
-> 7      return ret;
   8    }
(lldb) thread trace dump instructions 
thread #1: tid = 1563054, total instructions = 21
  a.out`main + 11 at main.cpp:4
    [ 1] 0x0000000000400518    movl   $0x0, -0x8(%rbp)
    [ 2] 0x000000000040051f    jmp    0x400529                  ; <+28> at main.cpp:4
  a.out`main + 28 at main.cpp:4
    [ 3] 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
    [ 4] 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5
  a.out`main + 28 at main.cpp:4
    [19] 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
    [20] 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5

Example flow for a multi-threaded process with "all" tracing:

(lldb) file ~/llvm-sand/build/Release/Linux-x86_64/llvm/lldb-test-build.noindex/commands/trace/multiple-threads/TestTraceStartStopMultipleThreads.testStartMultipleLiveThreadsWithAllStops/a.out
Current executable set to '/home/wallace/llvm-sand/build/Release/Linux-x86_64/llvm/lldb-test-build.noindex/commands/trace/multiple-threads/TestTraceStartStopMultipleThreads.testStartMultipleLiveThreadsWithAllStops/a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 15 at main.cpp:17:15, address = 0x0000000000400a6f
(lldb) b 6
Breakpoint 2: where = a.out`f3() + 4 at main.cpp:6:5, address = 0x00000000004009f4
(lldb) b 11
Breakpoint 3: where = a.out`f2() + 8 at main.cpp:11:5, address = 0x0000000000400a08
(lldb) r
Process 4030831 launched: '/home/wallace/llvm-sand/build/Release/Linux-x86_64/llvm/lldb-test-build.noindex/commands/trace/multiple-threads/TestTraceStartStopMultipleThreads.testStartMultipleLiveThreadsWithAllStops/a.out' (x86_64)
Process 4030831 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400a6f a.out`main at main.cpp:17:15
   14   }
   16   int main() { // main
-> 17     std::thread t2(f2);
   18     t2.join();
   19     return 0;
   20   }
(lldb) process trace start
(lldb) c
Process 4030831 resuming
Process 4030831 stopped
* thread #2, name = 'a.out', stop reason = breakpoint 3.1
    frame #0: 0x0000000000400a08 a.out`f2() at main.cpp:11:5
   9    void f2() {
   10     int n;
-> 11     n = 1; // thread 2
   12     std::thread t3(f3);
   13     t3.join();
   14   }
(lldb) thread trace dump instructions -c 5
thread #2: tid = 4032721, total instructions = 211
  a.out`void (*&&std::forward<void (*)()>(std::remove_reference<void (*)()>::type&))() + 13 at move.h:75:7
    [206] 0x0000000000400f5d    retq   
  a.out`void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) + 21 at invoke.h:60:14
    [207] 0x00000000004010b5    callq  *(%rax)
  a.out`f2() at main.cpp:9
    [208] 0x0000000000400a00    pushq  %rbp
    [209] 0x0000000000400a01    movq   %rsp, %rbp
    [210] 0x0000000000400a04    subq   $0x30, %rsp
(lldb) c
Process 4030831 resuming
Process 4030831 stopped
* thread #3, name = 'a.out', stop reason = breakpoint 2.1
    frame #0: 0x00000000004009f4 a.out`f3() at main.cpp:6:5
   4    void f3() {
   5      int m;
-> 6      m = 2; // thread 3
   7    }
   9    void f2() {
(lldb) thread trace dump instructions -c 5
thread #3: tid = 4034431, total instructions = 210
  a.out`void (*&&std::forward<void (*)()>(std::remove_reference<void (*)()>::type&))() + 12 at move.h:75:7
    [205] 0x0000000000400f5c    popq   %rbp
    [206] 0x0000000000400f5d    retq   
  a.out`void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) + 21 at invoke.h:60:14
    [207] 0x00000000004010b5    callq  *(%rax)
  a.out`f3() at main.cpp:4
    [208] 0x00000000004009f0    pushq  %rbp
    [209] 0x00000000004009f1    movq   %rsp, %rbp

Example flow for a simple crash and its trace dump:

(lldb) file /tmp/test/a.out 
Current executable set to '/tmp/test/a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 8 at main.cpp:6, address = 0x0000000000400845
(lldb) r
Process 4121521 launched: '/tmp/test/a.out' (x86_64)
Process 4121521 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400845 a.out`main at main.cpp:6
   4    int main() {
   5      int x;
-> 6      cin >> x;
   7      cout << 12 / x << endl;
   8      return 0;
   9    }
(lldb) process trace start
(lldb) c
Process 4121521 resuming
Process 4121521 stopped
* thread #1, name = 'a.out', stop reason = signal SIGFPE: integer divide by zero
    frame #0: 0x000000000040085f a.out`main at main.cpp:7
   4    int main() {
   5      int x;
   6      cin >> x;
-> 7      cout << 12 / x << endl;
   8      return 0;
   9    }
(lldb) thread trace dump instructions -c 5                                                                                                                                           thread #1: tid = 4121521, total instructions = 8704`std::istream::operator>>(int&) + 185
    [8699] 0x00007ffff7b52ce9    popq   %rbp
    [8700] 0x00007ffff7b52cea    retq   
  a.out`main + 25 at main.cpp:7
    [8701] 0x0000000000400856    movl   -0x4(%rbp), %ecx
    [8702] 0x0000000000400859    movl   $0xc, %eax
    [8703] 0x000000000040085e    cltd

Example of a failure due to insufficient process buffer

(lldb) b main
Breakpoint 1: where = a.out`main + 15 at main.cpp:17:15, address = 0x0000000000400a6f
(lldb) r
Process 2098567 launched: '/home/wallace/llvm-sand/build/Debug/fbcode-x86_64/llvm/lldb-test-build.noindex/commands/trace/multiple-threads/TestTraceStartStopMultipleThreads.testStartMultipleLiveThreadsWithAllStops/a.out' (x86_64)
Process 2098567 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400a6f a.out`main at main.cpp:17:15
   14   }
   16   int main() { // main
-> 17     std::thread t2(f2);
   18     t2.join();
   19     return 0;
   20   }
(lldb) process trace start -l 5000                                                                                                            (lldb) c
Process 2098567 resuming
Process 2098567 stopped
* thread #2, name = 'a.out', stop reason = Thread 2099442 can't be traced as the process trace size limit has been reached. Consider retracing with a higher limit.
    frame #0: 0x00007ffff70cf931`__clone + 49`__clone:
->  0x7ffff70cf931 <+49>: testq  %rax, %rax
    0x7ffff70cf934 <+52>: jl     0x7ffff70cf975            ; <+0>
    0x7ffff70cf936 <+54>: je     0x7ffff70cf939            ; <+57>
    0x7ffff70cf938 <+56>: retq

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
wallace edited the summary of this revision. (Show Details)Nov 17 2020, 8:49 PM
clayborg requested changes to this revision.Nov 17 2020, 9:31 PM

So I think I get the gist of what you were trying to go for with this patch. I stopped inline comments after I got the idea.

A few things:

We need to be able to enable tracing on all threads for a process and have lldb-server just "do the right thing" and enable tracing for all threads as they get created. If we have to stop the process each time a thread is created just to give LLDB the chance to enable tracing for that thread, this will be too slow and cumbersome. The ptrace implementation for unix already has to attach to each thread as it comes along by running some code in the NativeProcessProtocol subclass, so we should be able to start tracing any and all threads as needed.

NativeProcessProtocol should have generic plug-in like functions for starting and stopping tracing that each subclass will override. See inlined comments.

See inlined comments about the jLLDB packets and let me know what you think.


Do we need "type" here? Only one kind of trace is currently returned by jLLDBTraceSupportedType right now. I am ok with passing it down again in case we change to returning a list from jLLDBTraceSupportedType.


Do we want this as optional? What if we want to trace all threads? I would be a real pain if we always have to detect new threads showing up and then sending a jLLDBTraceStart as this will really slow down the process for no good reason. So it would be nice if we can specify this as optional and if the "tid" is not specified, have the lldb-server start tracing any new threads as they come along.


Do we need "type" here just like in jLLDBTraceStart?


Is this optional? Want if we want to stop tracing all threads?


Would it be better for this command to just return all the data we need? Maybe a single packet like:


And it would return JSON that includes all available data like:

  "vendor": "intel" | "unknown",
  "family": <decimal>,
  "model": <decimal>,
  "stepping": <decimal>,
  "tids": [
      "tid": <decimal>,
      "dataSize": <decimal>, // Size of all trace data for this thread in case it is huge, so we can download parts of it?

Then we have another packet to fetch the data for each thread with a packet like "jLLDBTraceGetData" whose schema for input would be "tid" and "offset" and "size". This would allow us to fetch partial data in case we need to interrupt the fetching since this data can be bigger. The return value for this function would be just like the "binary memory read" where the bytes are much more efficiently encoded as binary and not as HEX8 format.


Why do we need these callbacks? Can we just keep the "TraceCreateInstance create_callback" and then add virtual functions to the API in Trace.h?


Shouldn't this be generic so it can work with any NativeProcessProtocol implementation? Think about doing this same stuff on ARM. We don't want to have functions in NativeProcessProtocol like "StopIntelPTrace" and "StopARMTrace" and "StopARM64Trace" that are all stubbed out. It would be better to have something generic that passes along structured data to each function where the type of the trace is specified in the structured data.


Ditto above comment about making this generic.


Should this be named "StartTracing" in case the options don't specify a thread ID?


Seems like this should be named "StopTracing" and also take "const llvm::json::Value &options" kist ;oile the StartTracing call above so we can disable a single thread or all of them?


Shouldn't TParams just be JSON?

This revision now requires changes to proceed.Nov 17 2020, 9:31 PM
wallace updated this revision to Diff 311785.Dec 14 2020, 9:24 PM

Address comments. See the updated diff description for the updates.

wallace edited the summary of this revision. (Show Details)Dec 14 2020, 9:46 PM
wallace updated this revision to Diff 311787.Dec 14 2020, 9:47 PM
wallace edited the summary of this revision. (Show Details)


wallace edited the summary of this revision. (Show Details)Dec 14 2020, 9:50 PM
wallace edited the summary of this revision. (Show Details)Dec 14 2020, 9:58 PM
wallace updated this revision to Diff 319327.Jan 26 2021, 9:34 AM

rebase and fix compile error on Darwin

wallace updated this revision to Diff 319329.Jan 26 2021, 9:46 AM


The start and stop operations have 3 variants for now:

  • specificThreads: operate on specific threads (e.g. thread trace start 2 3 9)
  • currentAndFutureThreads: operate of all threads including future ones (e.g. thread trace start all)
  • process: operate on a process as a single trace instance, instead of operating on each thread. Not supported, but left there to make the code more generic. Eventually I'll have to implement this anyway. (e.g. maybe? process trace start)

So how does "currentAndFutureThreads" differ from "process"?

It might be better to have "thread trace start" be able to specify one or more threads (but not all), and use "process trace start" to enable tracing for all threads in this patch?


Seems like we don't need the "variant" here. If we have "tids" then we are enabling tracing for only that list of threads, and if it is missing, then we enable it for all threads.

See my questions in the non-inlined comment section about the "process" variant.


still unclear what this means. If we don't have it implemented, then we should probably leave it out for now.


Is this the trace buffer size that will be used for each thread? For all threads? Is this size divided up between the threads? What happens when new threads get created? Can we run out of memory in the kernel if a process creates too many threads if this buffer is for each thread? Need clarification on what this number means.


Any more details on what this is? Number? string? Why is it needed and what does it control/do?


Same comment as above. Why do we need the variant? If there is a "tids" then we do only those threads, else we do all threads in a process?


Are these custom for each plug-in? Any details on what intel-pt can specify here? You supplied intel PT specific info for other settings above in the docs.


Why was "const" remove from the "Thread" here?


Why no const?


move to TraceIntelPT.h/.cpp files. We don't want each plug-in added their definitions in this global TraceOptions.h file


move to TraceIntelPT.h/.cpp files. We don't want each plug-in added their definitions in this global TraceOptions.h file


We really shouldn't have any Intel PT stuff in this header file. It should be in the TraceIntelPT.h/.cpp file. We don't want each plug-in added their definitions in this global TraceOptions.h file


This is the public API header file. Move to lldb/include/lldb/lldb-private-enumerations.h


It would be nice to remove the "all" scenario here and just implement "process trace start".


remove "all" and create "process trace start"?


Having a fixed size of memory for each thread can be dangerous. What if the user specifies a really large value, and then creates thousands of threads? What will happen to the system? Will if fail gracefully? Can you deadlock your computer?

wallace added inline comments.Jan 26 2021, 6:33 PM

After thinking a lot about it, I think that providing whole-process tracing instead of per-thread tracing is going to be impractical for lldb, which means that only two variants are necessary and your suggestion is correct.

I'll elaborate more: whole-process tracing means using a single buffer per logical core, having each of these buffers contain the instructions of all the threads that run on its core. As Intel PT doesn't trace thread switches, it's necessary to add a secondary context switch trace to be able to produce per-thread traces, which is a requirement for analyzing the instruction of a thread. This splitting is done by correlating the timestamps of the context switches with the timestamps of the packets in the Intel PT trace. There are two problems here:

  • Tracing context switches requires the kernel to have certain capabilities that are not ubiquitous on all systems, and some of these are more accurate than others. One option is to use PERF_RECORD_SWITCH, but it's not available everywhere.
  • Even if you were able to trace context switches, now you depend on the granularity of the timestamps in the intel pt packets. Old Intel hardware provides a timestamp for every few KBs of packets, and newer hardware allows you to have a timestamp for each cpu cycle. In any case, you can end up with invalid splits, as you need to interpolate the timestamps for the packets whose timestamp is not available, and you could assign a packet to the wrong thread.

I was chatting about that with some infra folks at fb, and they think that they will only be able to support this kind of tracing by implementing a special collector that has access to a special feature in the kernel they are working on, and would only work on the most recent Intel hardware with high timestamp accuracy.

That being said, it seems that it's not worth working on that for LLDB. Each company that wants that level of tracing should work on their own way to do accurate thread splitting, and it would be misleading to implement something in LLDB that can produce wrong values. For LLDB we can use per-thread tracing, which will ensure 100% accuracy.


This is KB per thread. See my comment below about adding a limit on the total buffer memory used.


Yes, it's custom for each plug-in. I added entry for intel pt in line 411.


AFAIK, the buffer is stored in a region in memory that is fixed and shouldn't be swapped, as the CPU needs to write directly there without interrupting the traced process. So I imagine that if you have too many threads, then you might end up consuming all your RAM and killing your computer.

In any case, your point is valid. Given that we don't know beforehand how many things will end up being created, what about creating a setting for the maximum total amount of memory to use? E.g. 500MB. The user could override it if needed. At least with this, the first threads, which are commonly the long-lived threads, will be traced, and future short-lived threads won't be traced, but they don't matter that much.

See my inlined comments, but it would be great to be able to just trace a few threads if any "tids" are specified in the "jLLDBTraceStart" packet, and all threads using a per thread buffer and call that "process". We would still use a per thread buffer for "process" tracing, basically "currentAndFutureThreads" will become the equivalent of "process" tracing. Just have the Intel PT trace plug-in take care of it. We can avoid any use of the one buffer for the entire process issues you mentioned that way, and it effectively does what the user wants. Adding a max memory settings to the custom settings in "jLLDBTraceStart" for Intel PT sounds like a good idea as well.


I would be happy to just call "currentAndFutureThreads" to be the same thing as "process" here. No one cares how it is done, and this ends up tracing all threads anyway right? Can't we just make it happen? A user doesn't care if it uses the same trace buffer for all threads, just as long as all threads can be traced.


maybe better named as "threadBufferSize"? Does it have to be in KB? Does it actually have to be a multiple of kernel page sizes at all? The description isn't clear that this is a per thread buffer, so a better description would be great that explains this.


more description on this please?


max memory setting would be nice and should maybe be specified at trace start?

wallace abandoned this revision.Mar 18 2021, 11:17 AM
wallace marked 7 inline comments as done.

I'll redo this diff in smaller diffs

wallace reclaimed this revision.Mar 18 2021, 12:21 PM

Changed my mind :)

wallace updated this revision to Diff 333498.Mar 25 2021, 9:53 PM

Updates in the diff description

wallace edited the summary of this revision. (Show Details)Mar 25 2021, 9:55 PM
wallace edited the summary of this revision. (Show Details)
wallace marked 23 inline comments as done.Mar 25 2021, 10:05 PM
clayborg requested changes to this revision.Mar 29 2021, 3:00 PM
clayborg added inline comments.

Do we want to return a list of types here in case we have more than one option?


Is this comment out of date now? If we are doing thread tracing they should be just specific threads not right? Not "This is a best effort request, which tries to trace as many threads as possible".


Should we be rounding down or up? If someone specifies "1" for "threadBufferSize" it would round down to zero. Or maybe we specific this should be a multiple of 4K or we will fail?


Should this "cpuInfo" be a process data type and sent as binary that we fetch with jLLDBTraceGetBinaryData? The JSON would look something like:

  "tracedThreads": [
    { "tid": 123, "data": [ { "kind": "thread-data", "size": 1024 } ] },
    { "tid": 234, "data": [ { "kind": "thread-data", "size": 1024 } ] }
  "process-data": [
    { "pid": 345, "data",  [ { "kind": "cpuInfo", "size": 16 } ] }

We should have a processor trace specific stop reason so we should rename this to something like "eStopReasonProcessorTrace". We will use this for many things:

  • max memory for process is being exceeded (what this is being used
  • trace buffers are full and need to be emptied (for future use with fixed buffers that stop the process when they are full)
  • anything else trace related

Then you can encode things into the the StopInfo class. See StopInfo::CreateStopReasonWithSignal(...) and other functions that encode additional data for stop reasons.


"process trace"?


This will crash if there is no process, so maybe above if statement should be:

if (!m_trace_sp && m_process_sp)

"processor trace"?

This revision now requires changes to proceed.Mar 29 2021, 3:00 PM
wallace updated this revision to Diff 334275.Mar 30 2021, 2:50 PM

Addresses all issues, following our offline discussion.

Looking really good.

See inline comments about the need for a ProcessorTraceStopReason.

Since we have the eStopReasonProcessorTrace stop reason now, we should probably deliver these stops to the Trace plug-in so that it can respond maybe? Fine if we can to do this in another patch.


Do we need an extra enumeration for eStopReasonProcessorTrace that we would add as a parameter? I could think of:

enum ProcessorTraceStopReason {
///< Max memory for process has been exceeded  (current patch and the only reason we stop with eStopReasonProcessorTrace , right?)
eProcessorTraceProcessBufferFull = 1
///< For using a buffer that fills and then stops the process (future patch)
///< For when tracing specific threads and all threads that were traced have exited (future patch)

Then this data could be return from SBThread APIs via:

size_t SBThread::GetStopReasonDataCount();
uint64_t SBThread::GetStopReasonDataAtIndex(uint32_t idx);

Where SBThread::GetStopReasonDataCount() would return 1, and SBThread::GetStopReasonDataAtIndex(0) would return this enumeration. We would also use this enumeration in the Trace plug-ins when responding to stops that were eStopReasonProcessorTrace

I like that idea, but i'd rather do it in a different patch once we have a second stop event, so that I make sure the entire thing makes sense.

clayborg accepted this revision.Mar 30 2021, 5:07 PM

Sounds good. Lets gets this in then and start iterating on future patches!

This revision is now accepted and ready to land.Mar 30 2021, 5:07 PM
This revision was landed with ongoing or failed builds.Mar 30 2021, 5:31 PM
This revision was automatically updated to reflect the committed changes.