This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Add a "thread extrainfo" LC_NOTE for Mach-O corefiles, to store the thread IDs of the threads
ClosedPublic

Authored by jasonmolenda on Aug 24 2023, 3:10 PM.

Details

Summary

This adds a new LC_NOTE to ObjectFileMachO to represent additional metadata about threads in the corefile. Mach-O corefiles have each thread represented by an LC_THREAD which can only contain an array of register context "flavors" (pre-defined register sets by the kernel). This LC_NOTE has a JSON dictionary with metadata to add to each thread.

struct thread_extrainfo {
   char json_dictionary[];  // must be nul '\0' terminated c-string
};

The JSON dictionary must have a threads key, and the value is an array. The array must have the same number of elements as there are LC_THREADs in the corefile, even if the entries are empty. Initially, only the thread_id key is supported. If a thread's dictionary does not specify a thread_id, lldb will make up a value for it.

An example:

{"threads":[{"thread_id":18427013},{"thread_id":18427037},{"thread_id":18427038}]}

which then looks like

(lldb) thr li
Process 0 stopped
  thread #1: tid = 0x1192c85, 0x0000000185b19ea8 libsystem_kernel.dylib`__semwait_signal + 8
* thread #2: tid = 0x1192c9d, 0x0000000100000d6c a.out`crash_worker(in=0x0000000000000000) at main.cpp:15:18, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
  thread #3: tid = 0x1192c9e, 0x0000000185b19ea8 libsystem_kernel.dylib`__semwait_signal + 8

in lldb.

This patch adds support to ObjectFileMachO's process save-core to emit this metadata, and support to ObjectFileMachO, ProcessMachCore, ThreadMachCore to read & use those thread IDs when provided.

I expect to be adding additional thread attributes in the near future -- name, queue name, maybe a stop reason? -- they will be simple additions this format, but I'm hoping to get feedback on the format itself now. This is the first time we've had an LC_NOTE that did not use a strict binary representation with required format - by putting JSON in there, we do not have strict definitions of what is allowed to be included in the dictionary, but I think there are enough different use cases that this is the right choice for this one.

Diff Detail

Event Timeline

jasonmolenda created this revision.Aug 24 2023, 3:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 3:10 PM
jasonmolenda requested review of this revision.Aug 24 2023, 3:10 PM

Or possibly we should just make "thread extrainfo" a JSON blob with a dictionary for each thread, which would make it easier to extend, instead of a binary format. I'm not sure I have a strong opinion.

As soon as I started thinking about thread names and queue names and mach exception data, all variable length things, a binary format doesn't seem ideal. We do already send a lot of thread information about processes in JSON in gdb remote serial protocol from debugserver today. It's the first time we'd use JSON in a Mach-O LC_NOTE though, so I'm interested to hear other people's thoughts.

jasonmolenda added a subscriber: jingham.

After thinking about this more, and talking with @jingham I rewrote the patch so LC_NOTE "thread extrainfo" is a JSON dictionary with a key threads that has an array. The number of entries in the array must match the number of LC_THREADs in the Mach-O corefile. Each array entry may have a thread_id key with a thread id for that LC_THREAD, or lldb will create a thread_id for it. I expect we will add more per-thread keys in the future.

The payload looks like

{"threads":[{"thread_id":18368681},{"thread_id":18368703},{"thread_id":18368704}]}

and JSON is stored as a c-string, requires a nul byte '\0' at the end.

I've treated the LC_NOTEs in Mach-O as strictly defined binary data until now, so producers and consumers both had a reference to work with. But I think in the case of per-thread information in a corefile, we will have different people wanting to add different fields -- some fixed width, some variable length -- and specifying a JSON format that can be extended more easily is probably the correct way to go here.

jasonmolenda edited the summary of this revision. (Show Details)Aug 25 2023, 10:47 AM

One interesting question about the basic design of this LC_NOTE. If I call it "thread extrainfo" and it is a JSON dictionary with a threads array -- how long before someone wants to stick some non-thread specific piece of data in here. Should it be a proc metadata with a threads array, and it doesn't become confusing as it is extended over years.

The idea itself looks good to me. It would be nice if we could maintain a schema for this format in documentation or elsewhere so future code spelunkers have an idea of what the parsing and saving code are doing while they read it.

lldb/test/API/macosx/corefile-exception-reason/TestCorefileExceptionReason.py
54–57

nit

JDevlieghere added inline comments.Aug 25 2023, 2:51 PM
lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
5682–5693

There's a *lot* of duplication between reading LC_NOTEs. At some point we should extract the common code instead of copy-pasting it every time.

5723–5724

StructuredData::Array::GetSize returns a size_t. No point in turning this into a uint32_t.

lldb/source/Plugins/Process/mach-core/ProcessMachCore.cpp
603

Capitalize + period.

608–619

This code is (presumably on purpose) trying to find gaps in the available tids: for example if I had [1, 2, ? ,4, ?, 6] this is going to turn that into [1, 2, (3), 4, (5), 6]. If that property matters, it probably should be part of the comment. If it doesn't, you could exploit the fact that the set is sorted and just take the last value (6 for the previous example) and return [1, 2, 4, 6, 7, 8], but again, I assume that's what you're trying to avoid here.

628–629

make_shared (avoid "naked new")

Thanks for the feedback @bulbazord & @JDevlieghere , I will address those points.

I am still trying to decide on the scope of this JSON LC_NOTE. Right now it's "thread extrainfo" and a dictionary with a "threads" key by definition. But I have a feeling it should be "process metadata", and currently it may contain a "threads" key which must be an array with the same number of elements as corefile has LC_THREADs, for future extension use.

jasonmolenda marked 6 inline comments as done.Aug 29 2023, 5:39 PM

Thanks for the feedback Alex & Jonas. Jonas' comment that we have a lot of different methods manually stepping over load commands to find their LC_NOTEs was something I'd been meaning to deal with some day, but hadn't done yet. I shouldn't complicate this patch by also doing that but ... I did. Updating the patch in a minute.

lldb/source/Plugins/Process/mach-core/ProcessMachCore.cpp
608–619

I wasn't trying to maintain the order of the created tids, you're right I shouldn't bother being so fancy, just grab the highest tid number seen and increment past it. I create the set by iterating over the vector, I'll just watch for the highest tid_t value in that loop instead.

jasonmolenda marked an inline comment as done.

Address feedback from Alex and Jonas. Most significantly, add a ObjectFileMachO::FindLC_NOTEByName method which all the LC_NOTE readers in ObjectFileMachO call. One of them specifically needs to iterate over multiple LC_NOTEs with the same name ("load binary"), so it returns a vector of offset & sizes of matching LC_NOTEs which looks a little unusual for the "there will only be one of these" LC_NOTEs. I also simplified the tid_t creation for missing tid_t values in ProcessMachCore when only some threads were given specified tid_t values.

Oh, and I did change the LC_NOTE name that I'm adding to "process metadata" and specified that it *may* contain a threads key, instead of *shall* contain a threads key.

jasonmolenda updated this revision to Diff 556327.EditedSep 8 2023, 5:22 PM

All of the feedback is addressed at this point. I had one decision I didn't like - the LC_NOTE has an explicit size in the load command, but I said that the JSON string needed to be nul byte ('\0') terminated, which is redundant. I've changed the implementation to (1) in the producer side, don't add a nul byte at the end of the string, and (2) in the consumer side, add the nul byte to the read buffer, before treating it as a c-string. To help emphasize that the size of the LC_NOTE is sufficient.

JDevlieghere accepted this revision.Sep 11 2023, 1:37 PM

I really dislike all the manual memory management that's going on in this file, but I realize that's not the result of this patch, but rather because of the code that was factored out. I'd love if someone would replace the mallo'c buffers with SmallString<0>s but that's orthogonal to this patch. LGTM.

lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
73

I don't think this is needed (anymore)?

lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.h
124

Thanks for factoring this out!

This revision is now accepted and ready to land.Sep 11 2023, 1:37 PM

Update patch to not use malloc/free for a temporary heap allocation; a std::string works fine. One caveat is that the c-strings we read in may have a nul byte terminator, and left alone the std::string will consider that part of the string; that can be preserved when using a string_view etc representation.

This revision was automatically updated to reflect the committed changes.