This is an archive of the discontinued LLVM Phabricator instance.

[trace][intelpt] Fix multi CPU decoding TSC assertion error
ClosedPublic

Authored by jj10306 on Oct 24 2022, 8:35 AM.

Details

Summary

Occasionally the assertion that enforces increasing TSC values in DecodedThread::NotifyTsc
would get tripped during large multi CPU trace decoding.
The root cause of this issue was an assumption that all the data of a
PSB will fit within the start,end TSC of the "owning"
ThreadContinuousExecution. After investigating, this is not the case
because PSBs can have multiple TSCs.
This diff works around this issue by introducing a TSC upper bound for
each PSBBlockDecoder. This fixes the assertion failure by simply
"dropping" the remaining data of PSB whenever the TSC upper bound is
exceeded during decoding.
Future work will do a larger refactor of the multi CPU decoding to
remove the dependencies on this incorrect assumption so that PSB blocks
that span multiple ThreadContinuousExecutions are correctly handled.
correctly

Test Plan:
Run thread dump instructions 3 on a problematic trace and observe the assertion is no longer hit. The new error can be observed in the dump of events:

{
  "id": 5346,
  "error": "decoding truncated: TSC 64197386832965432 exceeds maximum TSC value 64197386633174537, will skip decoding the remaining data of the PSB (skipping 375 of 1336 bytes)"
},

Run relevant unittests

tools/lldb/unittests/Process/Linux/ProcessLinuxTests
tools/lldb/unittests/Utility/UtilityTests --gtest_filter=TraceGDBRemotePacketsTest.*
tools/lldb/unittests/Disassembler/x86/GetControlFlowKindx86Tests
./bin/lldb-dotest -p TestTrace

Diff Detail

Event Timeline

jj10306 created this revision.Oct 24 2022, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 8:35 AM
jj10306 requested review of this revision.Oct 24 2022, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2022, 8:35 AM
jj10306 edited the summary of this revision. (Show Details)Oct 24 2022, 8:37 AM
wallace requested changes to this revision.Oct 24 2022, 10:28 AM

pretty good! I just left cosmetic requests

lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
285–312

I suggest moving all of this to another function because it's a bit long

286

use * instead of value(). value() is rarely used in LLVM code. Also use >= instead of >

294–295

add a prefix message for easier regexp search

"decoding truncated: TSC packet {0} exceeds maximum TSC value {1} for this PSB block . Will resume decoding with the next PSB."

Use a better prefix if you can, and btw, you wrote the remining the remaining here.

296

ditto

299–307

I think let's better not include this to keep the error a bit smaller. In any case, you can do thread trace dump instructions <thread> -E and then look for the error prefix when debugging.

but if you insist, the byte offset message should come after the textual description of the error

308

return -pte_internal;

345

move this to the constructor so that it's highlighted by IDEs and appears in the public documentation

Maximum allowed value of TSCs decoded from this psb block. If this value is hit, then decoding for this block is stopped and an error is appended to the trace.
This revision now requires changes to proceed.Oct 24 2022, 10:28 AM
jj10306 updated this revision to Diff 470472.Oct 25 2022, 7:04 AM
jj10306 marked 7 inline comments as done.

Address comments

jj10306 edited the summary of this revision. (Show Details)Oct 25 2022, 7:04 AM
jj10306 added inline comments.
lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
299–307

I included the byte offset information so we can get an idea of what % of data we are losing each time this occurs. I'll clean this up a bit.

wallace accepted this revision.Oct 25 2022, 10:20 AM

thanks. Your explanation of why the verbose error is useful was good :)

This revision is now accepted and ready to land.Oct 25 2022, 10:20 AM
jj10306 edited the summary of this revision. (Show Details)Oct 26 2022, 11:38 AM