This is an archive of the discontinued LLVM Phabricator instance.

Set a default number of addressing bits for Darwin arm64 systems

Authored by jasonmolenda on Dec 9 2021, 12:48 AM.



For ARMv8.3 pointer auth signing, lldb needs to know how many bits are valid for addressing, and strip the high bits off of addresses before looking them up. During the initial bringup, we didn't have dynamic ways of getting these values so we hardcoded a default addressing mask in lldb, and there are still some cases where we need to support this, where we still don't have the value dynamically declared.

There are three parts to this patch, all needed to construct a test. The most important part is that when I print ptrauth values, I always print the actual uint64_t with ptrauth bits included, then I strip the bits and see if it resolves to an address that lldb knows about. If so, then I use the normal formatter. For this test case, I needed this behavior in our function pointer formatter. Our normal output might look like

(lldb) p fmain
(int (*)(...)) $0 = 0x0000000100003f90 (a.out`main at main.c:3)

and with an ARMv8.3 ABI with ptrauth bits, you might see

(lldb) p fmain
(int (*)(...)) $0 = 0xe46bff0100003f8c (0xe46bff0100003f8c)

which is not helpful. So this patch does

(lldb) p fmain
(int (*)(...)) $0 = 0xe46bff0100003f8c (actual=0x0000000100003f8c a.out`main at main.c:3)

I never want to hide the ptrauth bit reality from the user - because they may be debugging a problem with ptrauth signing itself. The three parts of the patch:

  1. ABIMacOSX_arm64::FixAddress changed so if no mask is specified, uses the default mask.
  1. ObjectFileMachO::GetAllArchSpecs when reading a Mach-O binary, we were previously clearing the Vendor from the Triple that we create for the binary (so it would be arm64e--). This loses information - realistically, any binary in a Mach-O container is most likely following the Darwin ABIs and I want to retain the vendor so we can select the correct ABI instead of grabbing any random arm64 ABI.
  1. ArchSpec::SetArchitecture - explicitly set the vendor to Apple when we have a Mach-O architecture.
  1. formatters::CXXFunctionPointerSummaryProvider the formatter change above.

The test case for this is a main.c with main() and a function pointer global to main, fmain(). Then there is a standalone program that creates this specific corefile without any metadata about the addressing bits -- lldb itself always adds this metadata, so I couldn't use lldb's built-in process save-core functionality. It's a tiny corefile that only has 8 bytes of DATA memory where the function pointer is stored. I have ideas for how I can use this test case for some upcoming changes I want to make, so I expect to get more use out of this test case.

Diff Detail

Event Timeline

jasonmolenda created this revision.Dec 9 2021, 12:48 AM
jasonmolenda requested review of this revision.Dec 9 2021, 12:48 AM
DavidSpickett added a subscriber: DavidSpickett.

Glad to see this, it will be useful on AArch64 Linux as well.

General question, is yaml2obj not flexible enough to do this? I guess the issue would be even if it does do MachO the test could only work on a precompiled file, this way allows you to build a.out each time.

I have ideas for how I can use this test case for some upcoming changes I want to make, so I expect to get more use out of this test case.

And having this program to build the corefile allows you the flexibility to do these future things you reference here?


You could do if (Process *process = ....


Remove space before the (. Unless this is a code style I'm missing.


Stray space ( here too.


Where does the closing ) come from, does the so_addr.Dump emit it?


Can you explain this logic? The "else" part I get, it's the sign extension check I'm not sure of.

I have to admit, we have this over in AArch64 Linux and it's been on my list to find out what it actually does and document it there too. My naive assumption is that removing signature bits would just be a mask but clearly I'm missing something.


You could pull that out to after the if.

if (header.filetype == MH_KEXT_BUNDLE) {

Does a.out get built here because the default testing makefile does that? Please comment that if so, otherwise it looks like you forgot to check in the prebuilt a.out.


I'd use the full commands here just for clarity.

Also given the brackets around the "actual" bit seem to be emitted by two different calls, include them in the match as well? Just in case one gets dropped at some point.


Is the name spelled like that or should it be "addressable"?


exit(1) here


Should you have some validity check here that the address of main and fmain is not still 0? Just in case you manage to read half way through an nm line and miss one of them.


What are the 4s?


Would help to have a comment here like:

// Data is written out in the order it will be in the file:
// Mach header
// Segment command
// Thread command
// Segment contents

Do I understand correctly, this points to segment_contents?


Why /4 not /8 here? Assuming 64 bit registers.

Is count in 32 bit words? (I found some sources on google but they didn't include the arm64 version)


And here you've manually put some bits in the position of the signing bits? Please comment with that.

I assume it doesn't matter if they're generated or not, lldb isn't going to try and auth it.

General question, is yaml2obj not flexible enough to do this? I guess the issue would be even if it does do MachO the test could only work on a precompiled file, this way allows you to build a.out each time.

Plus you get all the Mach specific constant values for free instead of writing umpteen magic numbers in a YAML file. So I think I just answered my own question here.

jasonmolenda marked 12 inline comments as done.Dec 9 2021, 10:19 AM

Thanks so much for the thorough review Davide, I really appreciate the help and time you spent doing this. I addressed the comments and will update the patch momentarily.

I have another small change I want to make soon where I add a new "load binary" LC_NOTE which will be a way for a corefile to tell lldb to load additional binaries for a corefile, and I'm planning to add this load binary command to the corefile and confirm that lldb loads the a.out correctly via that.

I didn't look at obj2yaml too hard, but the first thing I tried was running it on my created corefile, and it didn't include the memory contents (LC_SEGMENT_64), so I didn't pursue it much further. I knew I was going to add a custom LC_NOTE to this test corefile soon too and I don't think those are handled r.n. It was pretty straightforward to create a simple corefile.


This method (and Address::Dump) are concatenating text into the Stream sstr which is a symbolic description of the address; we want to print

fmain = 0xPTRAUTH (actual=0xACTUAL <name of function, source line, etc>)

so this is prepending the "actual=0xACTUAL " part, replacing the Address we're looking up with the ptrauth-stripped-off Address, and relying on Address::Dump to append its normal symbolication stuff. Then just below in this method we have

if (sstr.GetSize() > 0) {
  stream.Printf("(%s)", sstr.GetData());

where the parens are added.


I intend this to handle the case of code running in high memory, where bit 55 is set, indicating that all of the high bits should be set to 1. Unlike userland memory on Darwin which runs in low memory, where bit 55 is clear and so all the high bits should be cleared to 0 when removing the PAC bits.


Good point, altho looking at this bit of code my goal was to force Vendor to apple for any mach-o binary, not just kexts. Whatever, the key part was not clearing the Vendor here.


OK. I copied a Makefile I used in another test, where I have a helper program and the test binary, and didn't think about how it reads.


The LC_NOTE names are only 16 characters, so unfortunately, "addrable bits" it is. :)


The LC_THREAD load command has four uint32_t's at the start, but you're right and I was even thinking of commenting in this last night & did not. I added a comment with the format of LC_THREAD.


It's a Darwin specific feature register contexts; most API accept a register state "flavor" and a size, expressed in number of 32-bit words of content. So if a register context grows, a kernel etc can distinguish between an older smaller regctx and a newer larger one that both use the same flavor enum. (in reality I don't think I've ever seen this done with a register flavor as far as I can remember, but it's a common pattern used elsewhere too). Added a comment.

jasonmolenda marked 3 inline comments as done.

Update patchset to address David's feedback.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 9 2021, 10:53 PM
This revision was automatically updated to reflect the committed changes.
DavidSpickett added inline comments.Dec 10 2021, 2:07 AM

Got it! (for Linux I'm used to userspace addresses only)