This is an archive of the discontinued LLVM Phabricator instance.

[MachCore] Report arm64 thread exception state
ClosedPublic

Authored by vsk on Sep 14 2021, 5:01 PM.

Details

Summary

A MachO userspace corefile may contain LC_THREAD commands which specify
thread exception state.

For arm64* only (for now), report a human-readable version of this state
as the thread stop reason, instead of 'SIGSTOP'.

As a follow-up, similar functionality can be implemented for x86 cores
by translating the trapno/err exception registers.

rdar://82898146

Diff Detail

Event Timeline

vsk created this revision.Sep 14 2021, 5:01 PM
vsk requested review of this revision.Sep 14 2021, 5:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 5:01 PM
vsk updated this revision to Diff 372591.Sep 14 2021, 5:03 PM
  • Add license header to the .def file.
  • (I'm not sure whether/how this change can be tested, any pointers appreciated.)
shafik added a subscriber: shafik.Sep 14 2021, 5:45 PM
shafik added inline comments.
lldb/include/lldb/Target/AppleArm64ExceptionClass.h
15

We should use fixed sized integer types whenever possible, I suppose in this case uint32_t.

lldb/source/Plugins/Process/mach-core/ThreadMachCore.cpp
111

const

112

Does 26 have a meaning? I am guessing we are shifting to get the EC bits?

This is much more comprehensive than what I was thinking of, nicely done. The mapping of the Exception Class in the esr reg to a name is AArch64 specific, not Apple specific, isn't it? I haven't looked in the ARM ARM, but I suspect it's just the specific names you're using here which might be what we use at apple, but the exception codes are the same for any AArch64 target.

What do you think about printing the FAR register value (in particular) in the stop message? For many exception types, it contains the address that was relevant to the fault iirc, and even if we don't explicitly spell out what that relationship is (it's prob going to be different for each exception code), just having it printed may be helpful for a human reading the message.

I think the patch is fine if you prefer it this way, just giving some other ideas.

jasonmolenda added inline comments.Sep 14 2021, 8:02 PM
lldb/source/Plugins/Process/mach-core/ThreadMachCore.cpp
112

Yeah I think the comment table in AppleArm64ExceptionClass.def would be better placed here, showing the bit positions of the three fields in the esr register probably. AppleArm64ExceptionClass.def should make it clear that it is operating on the exception class field of the esr register, but the bit positions are more relevant here.

vsk updated this revision to Diff 372809.Sep 15 2021, 2:48 PM
vsk marked 3 inline comments as done.

Address review feedback:

  • Include far register in output, which now looks like e.g.:
(lldb) bt all
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x6261747563657860)
  * frame #0: 0x00000001aa5c2864 libsystem_platform.dylib`_platform_strlen + 4
  • Include note that the exception class list can/does contain some Apple-specific classes, but that it's largely 1:1 with the ARM spec.
lldb/include/lldb/Target/AppleArm64ExceptionClass.h
15

I don't think we should use fixed size integer types in cases where the size doesn't matter (e.g. when uint8_t would do, as it would in this case), since it's distracting.

lldb/source/Plugins/Process/mach-core/ThreadMachCore.cpp
112

Good point - actually, maybe it's cleaner to keep knowledge of the EC bit position in the header. Lmk what you think.

jasonmolenda added inline comments.Sep 15 2021, 3:06 PM
lldb/source/Plugins/Process/mach-core/ThreadMachCore.cpp
112

I was thinking the same thing, that getAppleArm64ExceptionClass might just take the $esr register value and get the necessary bits from it. I guess every caller of this is probably starting with the ESR register value, so having every caller know how to grab the EC value from ESR doesn't seem good. Some callers might want to use EC/IL/ISS to describe the nature of the error so they may need to know where these bits are in the register, but if they just want the EC name, it's unnecessary.

vsk added a comment.EditedSep 15 2021, 3:46 PM

FTR, I was able to write a test using process save-core, but even with -s dirty this is generating gigabytes of data and would presumably be super-flaky on the bots.
I also tried -s stack, but for the test program I passed in (just some 2-line .c file that dereferences NULL), process save-core -s stack ... generated an invalid .core with no sections, and lldb couldn't parse it.

Edit: This turned out to be user error, the thing I was testing was not supported with an out-of-tree (system) debugserver.

vsk updated this revision to Diff 372836.Sep 15 2021, 4:40 PM

very nice.

lldb/test/API/macosx/corefile-exception-reason/TestCorefileExceptionReason.py
19 ↗(On Diff #372836)

also not supported on armv7k.

jasonmolenda accepted this revision.Sep 15 2021, 4:44 PM
This revision is now accepted and ready to land.Sep 15 2021, 4:44 PM
vsk added inline comments.Sep 15 2021, 5:00 PM
lldb/test/API/macosx/corefile-exception-reason/TestCorefileExceptionReason.py
19 ↗(On Diff #372836)

Thanks. I'll fix this before landing the patch (eta tomorrow morning).

vsk updated this revision to Diff 373014.Sep 16 2021, 11:21 AM
  • Limit testing to arm64/arm64e only.
  • Set a stop reason for crashing threads only (and not for threads waiting on a syscall to finish).
  • Use @jasonmolenda's multi-threaded test case; make sure we only select the crashing thread.
vsk updated this revision to Diff 373016.Sep 16 2021, 11:25 AM
  • Add missing skipIf(arch=no_match(['arm64', 'arm64e'])) change.
This revision was landed with ongoing or failed builds.Sep 16 2021, 1:35 PM
This revision was automatically updated to reflect the committed changes.