This is an archive of the discontinued LLVM Phabricator instance.

[trace][intel pt] Add a cgroup filter
ClosedPublic

Authored by wallace on Jul 7 2022, 12:50 AM.

Details

Summary

Thanks to Gaurav Gaur (gaur@fb.com), who made most of the investigation to make this happen.

It turns out that cgroup filtering is relatively trivial and works
really nicely. Thid diffs adds automatic cgroup filtering when in
per-cpu mode, unless a new --disable-cgroup-filtering flag is passed in
the start command. At least on Meta machines, all processes are spawned
inside a cgroup by default, which comes super handy, because per cpu
tracing is now much more precise.

A manual test gave me this result

  • Without filtering:
Total number of trace items: 36083
Total number of continuous executions found: 229
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 98
Number of PSB blocks for this thread 2
Total number of unattributed PSB blocks found: 38
  • With filtering:
Total number of trace items: 87756
Total number of continuous executions found: 123
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 10
Number of PSB blocks for this thread 3
Total number of unattributed PSB blocks found: 2

Filtering gives us great results. The number of instructions collected
more than double (probalby because we have less noise in the trace), and
we have much less unattributed PSBs blocks and unrelated PSBs in
general. The ones that are unrelated probably belong to other processes
in the same cgroup.

Diff Detail

Event Timeline

wallace created this revision.Jul 7 2022, 12:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 12:50 AM
wallace requested review of this revision.Jul 7 2022, 12:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 12:50 AM
wallace edited the summary of this revision. (Show Details)Jul 7 2022, 12:50 AM
wallace edited the summary of this revision. (Show Details)Jul 7 2022, 12:52 AM
jj10306 requested changes to this revision.Jul 13 2022, 8:15 AM

looks good overall, just a couple questions from my end

lldb/source/Plugins/Process/Linux/IntelPTCollector.cpp
87–90

isn't the cgroup_file path going to have two slashes since slice starts with a slash?


in the case of the image above, wouldn't cgroup_file be "/sys/fs/cgroup//foo.slice/bar.service" instead of "/sys/fs/cgroup/foo.slice/bar.service"

lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
127

Help me understand this. To me this seems like it is counting all of the PSBs that don't belong to the current thread, whereas I would expect this to only count the PSBs that don't belong to any thread?
So based on my understanding we would be severely overcounting the number of unattributed PSB, but I think I'm just misunderstanding how this code flows.

This revision now requires changes to proceed.Jul 13 2022, 8:15 AM
wallace added inline comments.Jul 13 2022, 12:00 PM
lldb/source/Plugins/Process/Linux/IntelPTCollector.cpp
87–90

I think that's fine for system paths. You can have multiple consecutive //// and the system collapses them

lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
127

the current code is correct, I'll explain:

  • on_new_thread_execution is executed on all threads of the same CPU in chronological order
  • as on_new_thread_execution is being invoked repeatedly, the it iterator is being traversed and is always moving forwards
  • when on_new_thread_execution is invoked, the it iterator will look for psb blocks that happened before the given execution. These blocks do not belong to any thread execution.

Graphically, we have

----exec 1---- ---exec 2---- ---exec 3----
PSB1 PSB2 PSB3 PSB4 PSB5 PSB6

when on_new_thread_execution is invoked for exec2, it will be pointing at PSB3, which is the first PSB after exec 1. PSB3 comes before exec 2, so it'll be unattributed, then it will move to PSB4 and so on

wallace requested review of this revision.Jul 13 2022, 12:00 PM
jj10306 accepted this revision.Jul 13 2022, 12:25 PM

thanks for answering those questions, lgtm

lldb/source/Plugins/Process/Linux/IntelPTCollector.cpp
87–90

TIL

lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
127

makes sense, thanks for the explanation!

This revision is now accepted and ready to land.Jul 13 2022, 12:25 PM