This is an archive of the discontinued LLVM Phabricator instance.

[trace][intelpt] Support system-wide tracing [9] - Collect and return context switch traces
ClosedPublic

Authored by wallace on May 18 2022, 8:49 AM.

Details

Summary
  • Add collection of context switches per cpu grouped with the per-cpu intel pt traces.
  • Move the state handling from the interl pt trace class to the PerfEvent one.
  • Add support for stopping and enabling perf event groups.
  • Return context switch entries as part of the jLLDBTraceGetState response.
  • Move the triggers of whenever the process stopped or resumed. Now the will-resume notification is in a better location, which will ensure that we'll capture the instructions that will be executed.
  • Remove IntelPTSingleBufferTraceUP. The unique pointer was useless.
  • Add unit tests

Diff Detail

Event Timeline

wallace created this revision.May 18 2022, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 8:49 AM
wallace requested review of this revision.May 18 2022, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 8:49 AM
wallace updated this revision to Diff 430490.May 18 2022, 1:28 PM
wallace edited the summary of this revision. (Show Details)

nit

jj10306 requested changes to this revision.May 19 2022, 8:57 AM

Submitting my comments so far, will continue my review in a bit

lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h
26–29

nit: these are just pointers to a buffer of bytes right? if so, maybe we could change the types to be const uint8_t *.
Obviously it doesn't make a difference since they are the same size, but when I see const char * my brain kinda immediately thinks STRING!
maybe it's just me though 🤩, wdyt?

lldb/source/Plugins/Process/Linux/IntelPTCollector.h
41–45

Why do we need to separate these out into two separate functions?

lldb/source/Plugins/Process/Linux/IntelPTMultiCoreTrace.cpp
36

thoughts on moving this to be a "utility" method of the Perf.cpp and then call that utility method from here? Moving it there would make it more accessible to other callers.

47

what does this do?

51–54

In addition to this context switching event needing to be part of the same group, you also must ensure that some other config bits (exclude_kernel, exclude_hv) are the same.
Also, what would happen if the disabled bit is set here but then the enabled bit of the intel pt event was set?

All of these considerations related to keeping the two events "in sync", are beginning to make me lean towards what I mentioned above about using the same perf event, because that would naturally remove any opportunities for the two events to be "out of sync"

56
101–116

to reduce opportunity for things to get out of sync if this is ever touched in the future?

lldb/source/Plugins/Process/Linux/IntelPTMultiCoreTrace.h
97

perhaps you could make this it's own structure? that would make things a bit less verbose here and in the ForEachCore callback?
also, this may make it clearer that they must be part of the same perf event and potentially would allow enforcing this easier?
Instead of creating an entirely new structure you also could make the ContextSwitchTrace an optional member of IntelPTSingleBuffer trace, but not sure if that's the best place to do the coupling.
Another option would be to just use the same underlying perf_event of IntelPTSingleBufferTrace. That would naturally enforce the constraint of the events being part of the same group and wouldn't require adding this new ContextSwitchTrace notion.

lots of options lol, I think things are fine as is but wanted to share my thoughts.
wdyt?

This revision now requires changes to proceed.May 19 2022, 8:57 AM
jj10306 added inline comments.May 19 2022, 6:43 PM
lldb/source/Plugins/Process/Linux/IntelPTMultiCoreTrace.cpp
122–153

Is there a way to consolidate these two methods?

lldb/source/Plugins/Process/Linux/IntelPTSingleBufferTrace.h
43

This wording is a bit confusing - maybe provide some additional context here about this flag controlling the whether the perf event is enabled/disabled at perf_event_open time

lldb/source/Plugins/Process/Linux/Perf.cpp
161

curious what adding the PROT_READ does here, given that this code was working with just PROT_WRITE it seems like PROT_WRITE implies PROT_READ

226

It's worth noting that this function should only be called when the aux buffer is configured as a ring buffer (ie the aux buffer was mmaped with PROT_READ). if the aux buffer mmaped with PROT_WRITE then the behavior of the head is different and the consumer is required to update the aux_tail when reading.
Perhaps we should check this before doing the reading or make it very clear in the docs that this should only be called on perf events with that specific configuration

lldb/source/Plugins/Process/Linux/Perf.h
101

is an enum needed for this? Since the perf event will only ever be in two states, feels like a bool would suffice

238

nit: This is wordy. imo the name needs not mention that it's flushed out, just put that in the docs

lldb/test/API/commands/trace/multiple-threads/TestTraceStartStopMultipleThreads.py
226–229

nice test!

wallace added inline comments.Jun 9 2022, 4:41 PM
lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h
26–29

oh no, these are just string enums, but it happens that c++ doesn't support natively string enums

lldb/source/Plugins/Process/Linux/IntelPTCollector.h
41–45

The "state" thing is an implementation detail, the most important thing for callers is to notify the collector what's happening with the target.

So I'm also going with this approach for clarity, following the standard that LLDB uses everywhere. DidStop and WillResume appear so in the LLDB code often that it's better to keep them here for consistency.

lldb/source/Plugins/Process/Linux/IntelPTMultiCoreTrace.cpp
36

good idea

47

this is super redundant because this field is already 0 after the memset

51–54

I've moved this function to Perf.h, so given this:

  • disabled is dominated by the group perf event state, so I'll use that state instead of asking for the disabled value.
  • exclude_hv is needed all the time
  • exclude_kernel as well

so, besides the change to disabled, the only two fields that we should really synchronize are exclude_hv and exclude_kernel, but I don't see us creating more perf events, so I don't think if it's worth creating more abstractions just for these two fields. We could do more in the future when we need to trace kernel. You can leave more suggestions in my newer diff

56

+1

101–116

+1

122–153

not really :(

lldb/source/Plugins/Process/Linux/IntelPTMultiCoreTrace.h
97

This map is barely used, so I don't want to create more structs for this. The idea of having an optional context switch trace is not very nice to me because that makes the code less composable. In any case, I'm also not a far of my gigantic map, but I don't find it bad enough to change it.
Something I'm doing now is to use the alias ContextSwitchTrace more ubiquitously to make other parts of the code more readable.

lldb/source/Plugins/Process/Linux/IntelPTSingleBufferTrace.h
43

+1

lldb/source/Plugins/Process/Linux/Perf.cpp
161

it was needed for context switches. I don't remember why though

226

i'll leave a comment for when we add a way to read the buffer in a non-cyclic manner

lldb/source/Plugins/Process/Linux/Perf.h
101

+1

238

I want to be super explicit, because this means that while we read, we are pausing the state of the event, so that's non-trivial side effect that should be clear by any caller.

wallace requested review of this revision.Jun 15 2022, 9:29 AM
jj10306 accepted this revision.Jun 15 2022, 9:34 AM
This revision is now accepted and ready to land.Jun 15 2022, 9:34 AM