This is an archive of the discontinued LLVM Phabricator instance.

Add SBValue::GetValueAsAddress(), strip off ptrauth, TBI, MTE bits on AArch64 systems
Changes PlannedPublic

Authored by jasonmolenda on Jan 27 2023, 4:41 PM.

Details

Summary

This patch adds a new method to SBValue, GetValueAsAddress(), which will take the uint64_t value in the SBValue and run it through the ABI's FixAddress method to clear any TBI/MTE/ptrauth bits on AArch64 targets. Script authors may want access to both the actual uint64_t value, and the address that will be accessed, in an SBValue, so I added a new method in addition to GetValueAsUnsigned to provide this.

I currently have SBValue::GetValueAsAddress NOT perform a type check, and possibly I should have it check the type's IsPointerType() before doing this, but at the same time if the script/driver is calling this method, it's probably best to just do that.

There's also changes to methods like ValueObject::CreateValueObjectFromAddress so we can get SBValue::Dereference and such to behave correct when you have an SBValue created from a signed pointer.

I have the attached test case set to run on any AArch64 system; on Darwin we do the same pointer stripping on any process regardless if it is using pointer auth (that is, for both "arm64" and "arm64e"). On Linux, a non-ptrauth process may not have an address mask and this test may fail because the bits I mask into the top nibble in the test program are not removed by GetValueAsAddress(). I'm not sure exactly, but I can remove this test from running on Linux, or add a check for isAArch64PAuth or something. This program never actually dereferences this pointer value with bits in the high nibble set, it's only set up for lldb to manipulate.

We've used this API inside Apple for a bit and it has worked well for our API users; of course if there is consensus that it should be done differently we'll find a way to handle that internally.

Diff Detail

Event Timeline

jasonmolenda created this revision.Jan 27 2023, 4:41 PM
jasonmolenda requested review of this revision.Jan 27 2023, 4:41 PM

I figured I'd leave some nits on a Friday afternoon.

lldb/bindings/interface/SBValue.i
117–132 ↗(On Diff #492947)

Nit: no //

lldb/source/API/SBValue.cpp
932–934

Nit: no braces around single life if for consistency with lines 935 and 393.

935–940

Nit^3: why not do the same as in GetStrippedPointerValue and CreateValueObjectFromAddress:

if (ProcessSP process_sp = m_opaque_sp->GetProcessSP())
  if (ABISP abi_sp = process_sp->GetABI())
        return abi_sp->FixCodeAddress(ret_val);
938–940

Nit^2: in other places where I upstreamed FixCodeAddress calls I went with the following LLVM pattern:

if (ABISP abi_sp = process_sp->GetABI())
  return abi_sp->FixCodeAddress(ret_val);

Thanks for the nits, good points.

lldb/source/API/SBValue.cpp
935–940

This method is static so I had to duplicate the functionality here. :/

935–940

(or add an ExecutionContext parameter)

JDevlieghere added inline comments.Jan 27 2023, 4:54 PM
lldb/source/API/SBValue.cpp
935–940

I meant the code style rather than the actual code, but yeah good point maybe this should be a helper ;-)

Fix the nits Jonas pointed out, thanks Jonas.

Script authors may want access to both the actual uint64_t value, and the address that will be accessed, in an SBValue, so I added a new method in addition to GetValueAsUnsigned to provide this.

In my rough cut of API stuff for memory tagging I had:

frame  = process.GetThreadAtIndex(0).GetFrameAtIndex(0)
mte_buf = frame.FindVariable("mte_buf").GetValueAsUnsigned()

# TODO: later we need better access to ABI
err = SBError()
mte_buf = process.FixDataAddress(mte_buf, err)

What you have here is a much better way of doing it.

Memory tagging is a use case for having both. When you get the tags back you want to check if the tag for the memory matches the one in the unmodified pointer.

I have the attached test case set to run on any AArch64 system; on Darwin we do the same pointer stripping on any process regardless if it is using pointer auth (that is, for both "arm64" and "arm64e"). On Linux, a non-ptrauth process may not have an address mask and this test may fail because the bits > I mask into the top nibble in the test program are not removed by GetValueAsAddress(). I'm not sure exactly, but I can remove this test from running on Linux, or add a check for isAArch64PAuth or something. This program never actually dereferences this pointer value with bits in the high nibble set, it's > only set up for lldb to manipulate.

I ran the test on AArch64 Linux without PAuth and it passes. AArch64 Linux always has TBI enabled for userspace (https://www.kernel.org/doc/html/latest/arm64/tagged-address-abi.html) (well, on by default but I am yet to see anyone disable it).

So lldb sees:

(lldb) process status -v
<...>
Addressable code address mask: 0xff00000000000000
Addressable data address mask: 0xff00000000000000
Number of bits used in addressing (code): 56

(and if you have PAuth on that mask just has more bits set)

So it's fine to run this on Linux.

BSDs and Windows on Arm will not have an address mask. I would explicitly list Linux and Mac OS for the test.

lldb/bindings/interface/SBValue.i
124 ↗(On Diff #492951)

Just for clarity, use the full names again here.

"with the authentication/top byte ignore/memory tagging bits"

126–127 ↗(On Diff #492951)

"is incorrect", drop the "an"

lldb/source/API/SBValue.cpp
936

We've talked about this before, and for now I think the result of FixCode and FixData are the same for debug purposes. In the event that that ever changes, our path is probably to document this as assuming a code address, then adding specific ValueAsCodeAddress and ValueAsDataAddress, correct?

(which sounds good to me, just checking we've thought that through)

You could document that in the docstring but there's a good chance it would cause more confusion than it's worth when at this time, it's not something users have to care about.

lldb/source/Core/ValueObject.cpp
1469

What even is a host address, and do we care here?

/// A host address value (for memory in the process that < A is
/// using liblldb).
HostAddress

I'd have to check more but could that be an issue if I was debugging from x86 (where the non-address bits must be 1 or 0) to AArch64?

Maybe it is just included here so we cover all enum values and in fact, addresses in this path will always be from the debugee.

3012

Probably best on its own change but I wonder if it's time to add Process::Fix<...>Address to dedupe all these conditionals.

lldb/source/DataFormatters/ValueObjectPrinter.cpp
432

Is there a way to cover this with a test? I think there is one for the (actual= part already, does that cover this part too?

lldb/test/API/api/clear-sbvalue-nonadressable-bits/TestClearSBValueNonAddressableBits.py
17

I would add skipif operating system not Mac OS or Linux.

jasonmolenda added inline comments.Feb 14 2023, 3:24 PM
lldb/source/API/SBValue.cpp
936

Yeah, this is a good point that I punted on a bit. On Darwin we are only using TCR_EL1.T0SZ to tell us how many bits are used for addressing; there's no distinction between code and data for us. I guess I should worry about someone defining a FixCodeAddress method in an ABI plugin that clears lower bits when instruction alignment is required. It may be that it's safer to use FixDataAddress if I have to choose between the two without any context. In our internal implementation in the apple sources, we only had FixAddress which were used for both. What do you think?

lldb/source/Core/ValueObject.cpp
1469

Excellent point. ValueObjects can point to host memory for sure. We can have a problem with AArch64 to AArch64 debug session; macs run in a larger address space than iOS devices for instance, so masking off the target iOS bits from a mac side address could make it invalid.

3012

Yes, this is a good idea. I will do this separately.

lldb/source/DataFormatters/ValueObjectPrinter.cpp
432

Good point. And actually, this codepath was never firing because both things it compared to were stripping the non-addressable bits off. Fixed this method, added to the test case.

Updated patch to address David's feedback. Thanks David!

Updated comments and SB API doc to conform to English. Not decided on whether we should use FixCodeAddress for our strippings here; the difference between Data & Code didn't come up with the Darwin design. Fixed code in ValueObjectPrinter to correctly append actual=. Added a test for actual= from the SBValue description to the test case. Limited the test case to Darwin & Linux systems.

In further testing, my formatter in ValueObject.cpp is not limited to pointers now. e.g. I have an intptr_t scratch variable and it formats as

(lldb) v scratch
(intptr_t) scratch = 0x3000000100008000 (actual=0x100008000)

this actual= text should only be appended for a ValueObject that is a pointer type; I don't want this shown for non-pointer types like a uint64_t, that's a bug. I need to fix this & update the patch once more.

Looking good, one question left on the test file.

lldb/source/API/SBValue.cpp
936

There is FixAnyAddress which is intended to be the least destructive of the 2, but not necessarily the correct one. If we truly have to handle code and data addresses differently I think we'll have a whole ABI/platform/architecture break to worry about on top of that.

Keep it as is. Just flagging it in case you'd had any concerns there, if you've got experience using it and it's fine that's good enough for me.

lldb/test/API/api/clear-sbvalue-nonadressable-bits/TestClearSBValueNonAddressableBits.py
30

Can you expand the alias here, all the vs are making my head spin.

lldb/test/API/api/clear-sbvalue-nonadressable-bits/main.c
17

Do you need the dlsym here for a specific reason? I'm not familiar with what that does.

Clean up the test case - the C file and python file - to address David's feedback and test it more thoroughly. Fix ValueObjectPrinter::PrintValueAndSummaryIfNeeded() to only add the actual=hex if this is a pointer type. I think we're all done with this one.

The test file now creates three variables, a pointer to a stack, a pointer to a global, and a function pointer, with TBI bits and without. e.g.

runCmd: frame variable
output: (int) count = 5
(int *) count_p = 0x000000016fdff258
(intptr_t) scratch = 3458764518115524608
(int *) count_invalid_p = 0x300000016fdff258 (actual=0x16fdff258)
(int (*)(...)) main_p = 0x0000000100003f28 (a.out`main at main.c:5)
(int (*)(...)) main_invalid_p = 0x3000000100003f28 (actual=0x100003f28) (a.out`main at main.c:5)
(int *) global_p = 0x0000000100004000
(int *) global_invalid_p = 0x3000000100004000 (actual=0x100004000)

We're not printing the symbol name for the global_p and global_invalid_p - this is unrelated to TBI, I might look into that later, whether it should print the name of the symbol here in the description.

DavidSpickett accepted this revision.Mar 2 2023, 1:41 AM

2 minor things otherwise LGTM.

lldb/test/API/api/clear-sbvalue-nonadressable-bits/TestClearSBValueNonAddressableBits.py
21

Stray newline.

lldb/test/API/api/clear-sbvalue-nonadressable-bits/main.c
2

Unused include.

This revision is now accepted and ready to land.Mar 2 2023, 1:41 AM

Updating patch with the final version, barring some fix for the armv7 ubuntu bot and TestDataFormatterSmartArray.py line 229 where it has a c-string pointer that gets its 0th bit cleared,

Got output:
(char *) strptr = 0x00400817 (actual=0x400816) ptr = [{ },{H}]

so we printed the actual= decorator.

Need to figure out something for the armv7 case before I try re-landing.

Maybe the right answer is to use FixDataAddress. It's possible someone could make a AArch64 FixCodeAddress that clears the low 2 bits.

Hm, I've also got some problems with some ObjC tagged pointers in the testsuite, regardless of the FixCodeAddress/FixDataAddress issue. Need to look at this a bit more.

jasonmolenda reopened this revision.Mar 2 2023, 3:31 PM
This revision is now accepted and ready to land.Mar 2 2023, 3:31 PM
jasonmolenda planned changes to this revision.Mar 9 2023, 6:41 PM

OK, I understand the objc testsuite failures. ObjC has tagged pointers, where the address of an object is either a virtual address, or a big old bag of bits that can contain a small value, like instead of a pointer to an NSObject object, you'll have a uint64_t that can be decoded to represent a small integer value without existing in memory. In ValueObject::GetPointerValue(), I clear non-addressable bits, which destroys the tagged pointer values. In the apple internal version of this, we have some additional code that checks the clang type of the ValueObject to see if pointer auth is used for this type before we clear them. But with TBI/MTE values, we need to clear it regardless of type. The correct fix is to audit the objc type formatters for uses of ValueObject::GetPointerValue and have them fetch an unsigned value if there's any possibility of a tagged pointer being used, this is probably going to be a little bit of work.

This is specifically causing fails in TestDataFormatterObjCCF.py, TestDataFormatterObjCNSDate.py, TestDataFormatterObjCNSNumber.py, TestDataFormatterNSIndexPath.py, TestNSArraySynthetic.py, TestPrintObjectArray.py.