This is an archive of the discontinued LLVM Phabricator instance.

[trace][intel pt] Simple detection of infinite decoding loops
ClosedPublic

Authored by wallace on Oct 23 2022, 10:44 AM.

Details

Summary

The low-level decoder might fall into an infinite decoding loop for
various reasons, the simplest being an infinite direct loop reached due
to wrong handling of self-modified code in the kernel, e.g. it might
reach

0x0A: pause
0x0C: jump to 0x0A

In this case, all the code is sequential and requires no packets to be
decoded. The low-level decoder would produce an output like the
following

0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
... infinite amount of times

These cases require stopping the decoder to avoid infinite work and signal this
at least as a trace error.

  • Add a check that breaks decoding of a single PSB once 500k instructions have been decoded since the last packet was processed.
  • Add a check that looks for infinite loops after certain amount of instructions have been decoded since the last packet was processed.
  • Add some settings properties for tweaking the thresholds of the checks above. This is also nice because it does the basic work needed for future settings.
  • Add an AnomalyDetector class that inspects the DecodedThread and the libipt decoder in search for anomalies. These anomalies are then signaled as fatal errors in the trace.
  • Add an ErrorStats class that keeps track of all the errors in a DecodedThread, with a special counter for fatal errors.
  • Add an entry for decoded thread errors in the dump info command.

Some notes are added in the code and in the documention of the settings,
so please read them.

Besides that, I haven't been unable to create a test case in LLVM style, but
I've found an anomaly in the thread #12 of the trace
72533820-3eb8-4465-b8e4-4e6bf0ccca99 at Meta. We have to figure out how to
artificially create traces with this kind of anomalies in LLVM style.

With this change, that anomalous thread now shows:

(lldb)thread trace dump instructions 12 -e -i 23101

thread #12: tid = 8
    ...missing instructions
    23101: (error) anomalous trace: possible infinite loop detected of size 2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23100: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23099: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23098: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23097: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23096: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23095: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2

It used to be in an infinite loop where the decoder never stopped.

Besides that, the dump info command shows

(lldb) thread trace dump info 12

 Errors:
    Number of individual errors: 32
      Number of fatal errors: 1
      Number of other errors: 31

and in json format

(lldb) thread trace dump info 12 -j

 "errors": {
      "totalCount": 32,
      "libiptErrors": {},
      "fatalErrors": 1,
      "otherErrors": 31
    }

Diff Detail

Event Timeline

wallace created this revision.Oct 23 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2022, 10:44 AM
wallace requested review of this revision.Oct 23 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2022, 10:44 AM
wallace updated this revision to Diff 470047.Oct 23 2022, 10:00 PM
wallace edited the summary of this revision. (Show Details)

.

jj10306 requested changes to this revision.Oct 24 2022, 5:02 PM

looks good overall, mainly some questions and a few nits

lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
123–129
135

nice, I was about to add this as part of my diff (:

lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
206

can you explain why we are increasing the threshold?

235–241

if you move the --item_index in prev_insn_index lambda, would that allow you to remove this duplicated eTraceItemKindInstruction check and instead unconditionally call prev_insn_index?
or would this not work because the intention of the lamda is to skip the current event even if it's already an instruction

281

help me understand this please. I thought pt_insn_get_offset would always return a new, increasing offset every time this function is called.

This revision now requires changes to proceed.Oct 24 2022, 5:02 PM
wallace added inline comments.Oct 24 2022, 5:30 PM
lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
123–129

ahh good one

135

oh nice!

lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
206

the idea is to check for infinite loops sporadically without making the total checks in O(N^2) and instead do it in O(N)

If we first do a linear check in the trace, which is O(T) after T instructions are appended and there are no loops, we might want to check again in the future. We could wait for the next T instructions and then run another check, and if we fail, wait for the next T and so on. This result in a total time spent of O(T + 2T + 3T + 4T + ... + N) which is O(N^2). Instead, we can run the check after 2T, and then after 4T and then after 8T and so on. This gives us a geometric progression of (N + N/2 + N / 4 + ... + T) which is amortized total O(N). A similar algorithm is vector::push_back (https://cs.stackexchange.com/questions/9380/why-is-push-back-in-c-vectors-constant-amortized) which is total O(N) using a similar approach.

235–241

I like your idea. I think I can simplify the code

281

Not really. pt_insn_get_offset returns the offset of the last packet that was processed, and that single could lead to many individual sequential instructions until the next packet is needed.

Let's imagine that you have this trace

PSB with starting address of 0xAAA
TNT with 4 bits
TIP with address 0xFFF

What the decoder will do is to first read the PSB and start at IP 0xAAA. It'll then decode sequential instructions until it reaches the first branch or jump. It then needs to read the next packet, which is the TNT with 4 bits, so it will help decode the next 4 branches but not the fifth one. So the decoder will change the offset and resume decoding instructions sequentially until that fifth branch (or jump) is reached. Then the decoder will read the next packet, which is a TIP and tells the decoder to jump to address 0xFFF.
So this means that with the PSB, the decoder produced, let's say, 10 instructions, and with the TNT maybe 1000 were produced,, and then the decoder moved to the offset of the TIP for the next instruction.

wallace marked 5 inline comments as done.Oct 24 2022, 5:52 PM
wallace updated this revision to Diff 470343.Oct 24 2022, 5:58 PM

address issues and comments

jj10306 accepted this revision.Oct 25 2022, 5:53 AM
This revision is now accepted and ready to land.Oct 25 2022, 5:53 AM