This is an archive of the discontinued LLVM Phabricator instance.

[trace][intelpt] Support system-wide tracing [13] - Add context switch decoding
ClosedPublic

Authored by wallace on May 23 2022, 8:18 PM.

Details

Summary
  • Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed.
  • The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information.
  • Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding.
  • Add the logic to conver nanoseconds to TSCs.
  • Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server.
  • Finish the necessary bits for loading and saving a multi-core trace bundle from disk.
  • Change some size_t to uint64_t for compatibility with 32 bit systems.

Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text:

(lldb) trace load /tmp/trace3/trace.json                                                                   
(lldb) thread trace dump info                                                                              

Trace technology: intel-pt

thread #1: tid = 4192415
  Total number of instructions: 1

  Memory usage:
    Total approximate memory usage (excluding raw trace): 2.51 KiB
    Average memory usage per instruction (excluding raw trace): 2573.00 bytes

  Timing for this thread:

  Timing for global tasks:
    Context switch trace decoding: 0.00s

  Events:
    Number of instructions with events: 0
    Number of individual events: 0

  Multi-core decoding:
    Total number of continuous executions found: 2499
    Number of continuous executions for this thread: 102

  Errors:
    Number of TSC decoding errors: 0

As you can see, it had 101 context switches, which makes sense given the 100 sleeps, and the total number of continuous executions for all threads is 2499. All the context switches were correctly decoded without a severe failure.

Diff Detail

Event Timeline

wallace created this revision.May 23 2022, 8:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 8:18 PM
Herald added subscribers: mgrang, mgorny. · View Herald Transcript
wallace requested review of this revision.May 23 2022, 8:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 8:18 PM
wallace edited the summary of this revision. (Show Details)May 23 2022, 8:19 PM
wallace edited the summary of this revision. (Show Details)May 23 2022, 8:23 PM
jj10306 requested changes to this revision.May 27 2022, 12:36 PM
jj10306 added inline comments.
lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp
23–43

should these structures live in Perf.h?

60–103

nit: if you switch the union for two optional values you can remove a lot of the redundancy between these methods.

137–142

can you help me understand how these two cases could happen? they seem fundamentally impossible given the nature of context switching - shouldn't all "continuous" execution be disjoint?

My current understanding is as follows:

Expected:
i1 o1 i4 o4 i9 o9

Impossible:
and this is not possible:
i1 i4 o4 o1 9 o9

Let me know if I'm missing something 🙂

147

why the + 1?

182

Do you think any of the general perf logic related to "decoding" the records should be moved to Perf.h/cpp?

197

nit: it feels weird casting to PerfContextSwitchRecord when we don't yet know if this is actually a context switch event without first looking at the header.
casting to perf_event_header and checking that first before interpreting the record as a context switch record seems like a better approach.
Given that currently only intelpt is using the LLDB's perf "library" this isn't a big deal, but if we wanted to make it more complete/robust, we should revisit this and improve our record handling design so it could be easily extended to support any record types.

199

can you link the documentation that states this?

200–201

using sizeof on uint64_t feels weird since the typename already implies the name. I think moving this value to a constant and explaining its value would make things cleaner.

205–206

same as above, can we link the docs. alternatively, link the docs at the top of the function or in the header and then reference that link at the appropriate spots in the code

264–273

this is lambda inception 😆

lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h
32–34

Where do these "guesses" come from?

41–62

what about just having two optionals fields to remove the "dangers" that unions introduce? Then the Variant enum can be used to guide whether the start and end values should be non-null just as it's being used to access the union currently

106–108

what is meant by "contention" here? Is this referring to the ipt aux buffer wrapping?

112

is this just the cores that the program ran on or all possible cores the process could have run on?

156

why is this needed?

This revision now requires changes to proceed.May 27 2022, 12:36 PM
wallace added inline comments.Jun 15 2022, 9:26 AM
lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp
23–43

I created a new file PerfContextSwitchDecoder. and moved this code there. We can't actually use Perf.h because that's private to lldb-server

60–103

in this case I'm willing to pay the cost of being super explicit in initialization to be able to be super safe when using the context switches. I don't want that for any reason we use context switch that we are not completely sure that is correct, so I'm adding very explicit variable names to prevent myself and others from making mistakes

137–142

this can happen if, for example, there are contention issues. You can have something like this

tid 12 in
tid 12 out
<now you start reading the context switch trace in the collector and update the tail pointer>
    during this time the following context switches happened but you lost them
    tid 13 in
    tid 13 out
    tid 14 in
<here the kernel resumes writing context switches because the tail pointer was updated>
tid 14 out

and now you have <tid 12 out> followed by <tid 14 out>. That means that you can rely on the first execution (tid 12), but you can't trust the second one (tid 14) because you don't really know when it started.

147

let's follow the example above, you have <tid 12 out: tsc A> followed by <tid 14 out: tsc B>

you don't really know when the execution of tid 14 started, but you know it was after tsc A, i.e. tsc A + 1. So that's a hinted start

182

see my first response above

197

makes sense. this is a code smell

199

sure

200–201

+1

205–206

+1

264–273

lol, i'll see how i can simplify it

lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h
32–34

see above

41–62

see above

106–108

it means when the trace reader thread couldn't keep up with the data and some context switch records were lost

112

all cores on the hardware. We don't know which ones our program ran on

156

i've improved the documentation, but basically this variable holds any fatal decoding error we see to prevent multiple failed decoding attempts. If this is not-null, then this means that we tried to decode but it failed badly, and we don't want to try to redecode again because that's expensive.

wallace requested review of this revision.Jun 15 2022, 9:27 AM
jj10306 accepted this revision.Jun 15 2022, 9:35 AM
This revision is now accepted and ready to land.Jun 15 2022, 9:35 AM