Page MenuHomePhabricator

[lldb] Strip pointer authentication codes from aarch64 pc.
Needs ReviewPublic

Authored by justincohen on Mar 12 2021, 10:30 AM.

Details

Summary

Short of adding full support for PAC, strip the expected range of authentication codes.

Diff Detail

Event Timeline

justincohen created this revision.Mar 12 2021, 10:30 AM
justincohen requested review of this revision.Mar 12 2021, 10:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 10:30 AM
justincohen retitled this revision from Strip pointer authentication codes from aarch64 pc. to [lldb] Strip pointer authentication codes from aarch64 pc..Mar 12 2021, 10:47 AM
justincohen edited the summary of this revision. (Show Details)
omjavaid requested changes to this revision.Mar 15 2021, 2:57 AM
omjavaid added inline comments.
lldb/source/Plugins/ABI/AArch64/ABIAArch64.cpp
63 ↗(On Diff #330331)

How did you come up with this 36bit mask for PC reg I think this mask is not appropriate for Linux virtual address space which is 52 or 48 bits in length.

This revision now requires changes to proceed.Mar 15 2021, 2:57 AM
DavidSpickett added inline comments.
lldb/source/Plugins/ABI/AArch64/ABIAArch64.cpp
63 ↗(On Diff #330331)

If it's any help I looked for this too and found aarch64/functions/pac/calcbottompacbit/CalculateBottomPACBit pseudocode in the armarm. This is used by the PAC strip instructions. (though it is a bit mind bending to read)

Move logic to ABIMacOSX_arm64.h

justincohen added inline comments.Mar 15 2021, 8:09 AM
lldb/source/Plugins/ABI/AArch64/ABIAArch64.cpp
63 ↗(On Diff #330331)

I'll move this to lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h to not impact Linux.

This workaround seems consistent with other overrides of FixCodeAddress, my only concern is about the assumption of the number of bits that need to be preserved/stripped.

I hope @jasonmolenda gets a chance to chime in.

lldb/source/Plugins/ABI/AArch64/ABIMacOSX_arm64.h
68

Is that mask correct? If I'm counting correctly, it looks like it's preserving the lower 36 bits. I thought most 64-bit CPUs are configured to handle up to 40 or 48 bits of virtual address space.

I assume that future PAC support will check the CPU configuration, but is 36 bits the right assumption for this workaround?

Sorry for not replying to this earlier. Yeah, correct number of bits to mask off must be determined dynamically at runtime, by looking at the CPU's TCR_EL0.T0SZ value (or TCR_EL1.T0SZ for EL1 code like a kernel), which is the inverse of the number of bits used for addressing (T0Sz value of 25, 39 bits used for addressing). When I added this support to the apple version of lldb, I hardcoded the number of bits used on our main processors and added a setting of target.process.virtual-addressable-bits for different cores -- because we didn't have any way of fetching these values at runtime. Since then we've added a k-v pair to the qHostInfo,

send packet: $qHostInfo#9b
read packet: $cputype:16777228;cpusubtype:2;addressing_bits:47;ostype:macosx;watchpoint_exceptions_received:before;vendor:apple;os_version:11.3.0;maccatalyst_version:14.5;endian:little;ptrsize:8;#00

I need to finish this work and change this from a setting to an ivar in Process, and get all of this upstreamed. I also need to find a better name than ABI::FixCodeAddress, because we can have data addresses just as well (vtable pointer is a popular one), and an issue I didn't try to solve is that the question of *what* to strip is ABI dependent - a Linux ptrauth ABI might not add auth bits to the vtable pointer where the Darwin one does. But as long as you know a uint64_t is a pointer to a valid memory address, it is safe to strip the unaddressable bits off the top of that before dereferencing it - so I imagine we'll end up with calls into ABI::FixAddress (whatever it should be called) that are a superset of what every ptrauth-style ABI is authenticating.

I'm hoping to get moving on this upstreaming work early this summer, it's long overdue.

If this isn't soon enough--and I suspect from this patch that it is not--I can push up my patch for this not-great target.process.virtual-addressable-bits setting I came up with. It was definitely a stopgap during bringup, though, and I'd hoped not to.

How does this work for a core dump?

I added a Mach-O LC_NOTE which allows us to encode the number of bits used in addressing:

(I really need to create a docs/lc-notes.md to document these)

"addrable bits" LC_NOTE

"addrable bits" (number of bits used for addressing) data payload is a structure:
struct addressing_bit_count
{

uint32_t version;             // currently 2
uint32_t addressing_bits;     // # of bits used for addressing in this corefile's code, 0 if unknown
uint32_t addressing_bits_el0; // # of bits used for addressing in EL0 code on this machine, 0 if unknown
uint32_t addressing_bits_el1; // # of bits used for addressing in EL1 code on this machine, 0 if unknown

};
This LC_NOTE command specifies how many bits in pointers are actually used for addressing. The bits above these may be used for additional information (tagged pointers, pointer authentication bits), and the debugger may need to enforce an all-0 or all-1 clearing of these bits when computing what the pointers are pointing to.
This load command uses the convention of using Aarch64's TCR_EL0.T0SZ and TCR_EL1.T0SZ register values. The T0SZ values are the number of bits NOT used for addressing. For arm64, you compute the # of addressing bits by the formula 64 - TCR_ELx.T0SZ. For instance, a T0SZ value of 25 means (64 - 25 == 39) 39 addressing bits are in use. Bits 63..39 are used for pointer authentication, 38..0 is used for addressing.
A target device may have a different number of bits for different execution levels; for now this LC_NOTE only allows EL0 (typically user land processes on darwin systems) and EL1 (typically xnu kernel on darwin systems) values to be provided. We may add higher execution level T0SZ's in the future.
If the corefile producer knows the correct execution level for the code in the corefile, it can provide this directly in addressing_bits. Otherwise, it can provide addressing_bits_el0 and/or addressing_bits_el1 and the corefile consumer (lldb) will choose the most appropriate value to use, if available, for the corefile.

Breakpad/Crashpad are not transporting mach-o core files, they are using minidumps. minidumps don't contain any indication of the number of bits in the address. Apple Xcode lldb is still able to work with these minidumps correctly, while trunk lldb is not. How is it able to do this even when the dump file doesn't contain “addrable bits” or equivalent?

In the meantime, I'll look into adding something to the Crashpad minidump format to store an addrable bits mask, although I'm not clear how to grab this in userspace. Should sysctl machdep.virtual_address_size work on iOS? Can I grab TCR_ELx.T0SZ directly?

Minidumps should have the registers in the processor context. It seems LLDB knows about TCR_ELn for n > 0. Maybe TCR_EL0 is special?

If it's not available in the minidump, we'll need a plan for how to deal with these regardless of when Jason's implementation lands.

Before reading Jason's response, I was independently wondering whether it makes sense to temporarily introduce a variable to let the user set the mask, just until the workaround is replaced with final code. Given that there's precedent, I would support that. (I've not implemented one of these LLDB settings before, but I imagine it's pretty straightforward.)

TCR_ELx begins at 1 (see D13.2.123 TCR_EL1, Translation Control Register (EL1) in the armarm) and covers EL0 and 1. Looking at the pseudocode access to this is undefined at EL0.

So the OS would have to provide you some other way to read that, I know this is missing on Linux at the moment. A user setting might be good for us supporting Linux too. (though I realise your immediate concern is MacOS)

TCR_ELx begins at 1 (see D13.2.123 TCR_EL1, Translation Control Register (EL1) in the armarm) and covers EL0 and 1. Looking at the pseudocode access to this is undefined at EL0.

Ah LOL I hadn't read the ARM ARM when I wrote that, I thought this followed other control registers where EL0 and EL1 could have different settings here. My LC_NOTE only needed to record a single value.

Before reading Jason's response, I was independently wondering whether it makes sense to temporarily introduce a variable to let the user set the mask, just until the workaround is replaced with final code. Given that there's precedent, I would support that. (I've not implemented one of these LLDB settings before, but I imagine it's pretty straightforward.)

I have a similar hack for kernel debugging where the kernel has a global gT1Sz and I read that in DynamicLoaderDarwinKernel.cpp,

symbol = m_kernel.GetModule()->FindFirstSymbolWithNameAndType(
           arm64_T1Sz_value, eSymbolTypeData);
if (symbol) {
  if (symbol->GetByteSizeIsValid()) {
    addr_t sym_addr = symbol->GetLoadAddress(&m_process->GetTarget());
    uint64_t sym_value = m_process->GetTarget().ReadUnsignedIntegerFromMemory(
        sym_addr, false, bytesize, 0, error);
      // 64 - T1Sz is the highest bit used for auth.
      // The value we pass in to SetVirtualAddressableBits is
      // the number of bits used for addressing, so if
      // T1Sz is 25, then 64-25 == 39, bits 0..38 are used for
      // addressing, bits 39..63 are used for PAC/TBI or whatever.
      int virt_addr_bits = 64 - sym_value;
      m_process->SetVirtualAddressableBits(virt_addr_bits);

In the meantime, I'll look into adding something to the Crashpad minidump format to store an addrable bits mask, although I'm not clear how to grab this in userspace. Should sysctl machdep.virtual_address_size work on iOS? Can I grab TCR_ELx.T0SZ directly?

fwiw debugserver's code to provide the addressing_bits: in qHostInfo is upstreamed to the llvm.org sources, and it works on iOS as well,

static bool GetAddressingBits(uint32_t &addressing_bits) {
  static uint32_t g_addressing_bits = 0;
  static bool g_tried_addressing_bits_syscall = false;
  if (g_tried_addressing_bits_syscall == false) {
    size_t len = sizeof (uint32_t);
    if (::sysctlbyname("machdep.virtual_address_size",
          &g_addressing_bits, &len, NULL, 0) != 0) {
      g_addressing_bits = 0;
    }
  }
pcc added a subscriber: pcc.Mar 17 2021, 11:42 AM

TCR_ELx begins at 1 (see D13.2.123 TCR_EL1, Translation Control Register (EL1) in the armarm) and covers EL0 and 1. Looking at the pseudocode access to this is undefined at EL0.

So the OS would have to provide you some other way to read that, I know this is missing on Linux at the moment. A user setting might be good for us supporting Linux too. (though I realise your immediate concern is MacOS)

On Linux you can query the NT_ARM_PAC_MASK regset and examine the insn_mask field of the result.

Note that this is a mask rather than the number of bits as this also allows reflecting whether TBI{,D}0 was set at execution time. I would suggest using a mask in any persisted storage format.

uint32_t addressing_bits;
size_t len = sizeof (uint32_t);
ret = sysctlbyname("machdep.virtual_address_size", &addressing_bits, &len, NULL, 0);

returns ret == -1 on an iOS 14.4 device. I do see this work on an m1 mac.

uint32_t addressing_bits;
size_t len = sizeof (uint32_t);
ret = sysctlbyname("machdep.virtual_address_size", &addressing_bits, &len, NULL, 0);

returns ret == -1 on an iOS 14.4 device. I do see this work on an m1 mac.

Hmmm let me install iOS 14.4 on a device and check. The iOS devices have the default number of addressing bits in use (unlike the M1 macs) so it could be failing on iOS without me ever noticing.

TCR_ELx begins at 1 (see D13.2.123 TCR_EL1, Translation Control Register (EL1) in the armarm) and covers EL0 and 1. Looking at the pseudocode access to this is undefined at EL0.

I updated my local documentation & impl to only record the shared EL0/EL1 T0SZ value, thanks.

So the OS would have to provide you some other way to read that, I know this is missing on Linux at the moment. A user setting might be good for us supporting Linux too. (though I realise your immediate concern is MacOS)

OK we may need to retain the manual setting when I upstream this, instead of going with the pure Process-maintained value determined dynamically by gdb packet or corefile metadata. If this is something you need for your own FixCodeAddress prelim patch, I can upstream the target.process.virtual-addressable-bits setting (I think the name is fine, even once Process can determine this dynamically). We'll need to decide at some point what the correct behavior is when they conflict, but if only one is set the choice is straightforward.

TCR_ELx begins at 1 (see D13.2.123 TCR_EL1, Translation Control Register (EL1) in the armarm) and covers EL0 and 1. Looking at the pseudocode access to this is undefined at EL0.

I updated my local documentation & impl to only record the shared EL0/EL1 T0SZ value, thanks.

So the OS would have to provide you some other way to read that, I know this is missing on Linux at the moment. A user setting might be good for us supporting Linux too. (though I realise your immediate concern is MacOS)

OK we may need to retain the manual setting when I upstream this, instead of going with the pure Process-maintained value determined dynamically by gdb packet or corefile metadata. If this is something you need for your own FixCodeAddress prelim patch, I can upstream the target.process.virtual-addressable-bits setting (I think the name is fine, even once Process can determine this dynamically). We'll need to decide at some point what the correct behavior is when they conflict, but if only one is set the choice is straightforward.

I guess it would be important to host FixCodeAddress function in target.process given it may be needed for cleaning up top byte from breakpoint/watchpoint addresses and memory access address. I am thinking of adding FixCodeAddress to Process::CreateBreakpointSite and Process::ReadMemory which means it would be best to host a virtual-addressable-bits member in Process class and provide facility to update it dynamically otherwise set to architecture default. If you have your code handy may be you upstream it and we ll take it forward from there.

DavidSpickett added a comment.EditedMar 23 2021, 3:33 AM

OK we may need to retain the manual setting when I upstream this, instead of going with the pure Process-maintained value determined dynamically by gdb packet or corefile metadata. If this is something you need for your own FixCodeAddress prelim patch, I can upstream the target.process.virtual-addressable-bits setting (I think the name is fine, even once Process can determine this dynamically). We'll need to decide at some point what the correct behavior is when they conflict, but if only one is set the choice is straightforward.

From what pcc said, I was mistaken. So I don't think a setting would be needed (thanks for the offer though). The combination of top byte ignore and the PAC mask registers should give us what we need for Linux.

Edit: Omair got there before me. I'll defer to him since he's working on PAC specifically.

OK we may need to retain the manual setting when I upstream this, instead of going with the pure Process-maintained value determined dynamically by gdb packet or corefile metadata. If this is something you need for your own FixCodeAddress prelim patch, I can upstream the target.process.virtual-addressable-bits setting (I think the name is fine, even once Process can determine this dynamically). We'll need to decide at some point what the correct behavior is when they conflict, but if only one is set the choice is straightforward.

Were you able to confirm if sysctlbyname "machdep.virtual_address_size" works on iOS? I'm currently hard coding this information in minidump creation, as it's failing for me.

I uploaded https://reviews.llvm.org/D98886 which reads a mask from a minidump and sets target.process.pointer-authentication-address-mask. Would you consider that (as I assume this is going to be converted into a mask regardless, and I if there's a possibility for future non-contiguous bits)

What do you think?

Were you able to confirm if sysctlbyname "machdep.virtual_address_size" works on iOS? I'm currently hard coding this information in minidump creation, as it's failing for me.

Hey Justin, sorry I dropped this one. I'll check later tonight or tomorrow. Sometimes our internal devices are set up a little different than customer devices and things can work on the internal-configure one, so I need to play with it a bit.

I uploaded https://reviews.llvm.org/D98886 which reads a mask from a minidump and sets target.process.pointer-authentication-address-mask. Would you consider that (as I assume this is going to be converted into a mask regardless, and I if there's a possibility for future non-contiguous bits)

What do you think?

Clearing PAC bits is a little more complicated than just clearing the bits, though. Bit 55 tells us whether the high bits are all 0's or all 1's (on Darwin, in EL0 processes they're all 0's, in EL1, all 1's). If we had a setting to provide a mask instead of the number of bits that are valid in addressing, that might lead someone to try to use it for a different purpose. Trying to imagine a scenario like this, maybe someone could know that a certain range of the address space isn't used for a certain type of pointer, and that they could reuse those bits as a Top Byte Ignore kind of thing, but the generated code would need to clear/set those bits before dereferencing, or we'd need a CPU with that kind of capability. Maybe there could be examples of this today like the thumb bit on armv7, where the 0th bit on something with alignment restrictions can be used to carry metadata, although I can't think of anything like that on AArch/x86_64 (the only two targets I can really remember well these days).

Clearing PAC bits is a little more complicated than just clearing the bits, though. Bit 55 tells us whether the high bits are all 0's or all 1's (on Darwin, in EL0 processes they're all 0's, in EL1, all 1's). If we had a setting to provide a mask instead of the number of bits that are valid in addressing, that might lead someone to try to use it for a different purpose. Trying to imagine a scenario like this, maybe someone could know that a certain range of the address space isn't used for a certain type of pointer, and that they could reuse those bits as a Top Byte Ignore kind of thing, but the generated code would need to clear/set those bits before dereferencing, or we'd need a CPU with that kind of capability. Maybe there could be examples of this today like the thumb bit on armv7, where the 0th bit on something with alignment restrictions can be used to carry metadata, although I can't think of anything like that on AArch/x86_64 (the only two targets I can really remember well these days).

Copying over a comment from pcc@ on the minidump change here: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/2773358/5//COMMIT_MSG#15 :
> On Linux, the mask that you get from NT_ARM_PAC_MASK specifies which bits need to be cleared from the pointer. So to use the mask, you AND with the inverse.
> This also has the advantage that 0 is not a special case, since you can AND with its inverse and get the same pointer back.

What if we invert the mask, and do something like instead?
(ptr & (1ULL << 55)) ? (ptr | mask) : (ptr & ~mask);

This should be very similar to the pseudocode here: https://webcache.googleusercontent.com/search?q=cache:3fCUm601caMJ:https://developer.arm.com/ja/docs/ddi0596/latest/shared-pseudocode-functions/aarch64-functionspac-pseudocode+&cd=2&hl=en&ct=clnk&gl=us

Can we move comments over to --> https://reviews.llvm.org/D98886, which has these changes implemented?

omjavaid resigned from this revision.Jul 11 2021, 9:26 PM