This is an archive of the discontinued LLVM Phabricator instance.

lldb WIP/RFC: Adding support for address fixing on AArch64 with high and low memory addresses
ClosedPublic

Authored by jasonmolenda on May 23 2023, 10:47 PM.

Details

Summary

The number of bits used for addressing on AArch64 is a bit more complicated than we're representing today in lldb. The same values apply to both EL0 and EL1 execution, but the number of bits used in addressing may be different for low memory (0x000...) versus high memory (0xfff...) addresses. The Darwin kernel always has the same page table setup for low and high memory, but I'm supporting some teams that need to have different settings for each, and they're managing to capture a single corefile / JTAG connection capturing the virtual addresses for both high and low memory in a single process. Having a single number of addressing bits cannot handle this situation today. Internally we use the Linux model of code address and data address mask, but that also doesn't encompass this concept.

This patch adds a high memory code and data mask, mirroring our existing code/data masks. By default the high memory masks will either be default value 0, or will have the same mask values. I'll need three ways of receiving the correct address bit numbers: a setting, an LC_NOTE way to get it from a corefile, and a qProcessInfo way of getting it from a gdb-remote connection to a JTAG etc device.

To start, I changed target.process.virtual-addressable-bits from taking a uint to an array of strings (it should be an array of uint's but I wasn't getting that to work correctly, I'll play around with it more later). The array can be zero elements (no user override), a single element (used for all addresses, like today), or two elements (first number is low memory, second number is high memory). The alternative is to have an additional setting for those environments that need to specify a different address mask for high memory vrs. low memory.

I also changed Process::GetDataAddressMask and Process::GetCodeAddressMask to have a clear set of precedence for values. If the user specifies a number of addressing bits in that setting, this overrides what the system might tell lldb. The user's specified values do not overwrite/set the Process address mask.

Current behavior is that the mask is overwritten by the setting value if the mask is unset. But once the mask is set, the user setting is ignored. In practice this means you can set the setting ONCE in a Process lifetime, and then further changes to the setting are ignored. Made it a little annoying to experiment with this environment when I first started working on it. :)

None of this should change behavior on linux, but if folks from the linux world have a comment or reaction to this change, I would be interested to hear opinions. I haven't done much testing beyond the one test corefile, and I still need to work out how the two values are communicated in corefiles & live debug, but those are trivial details on the core idea.

FTR this is the AArch64 TCR_EL1.T0SZ and TCR_EL1.T1SZ. The values of this control register apply to both EL0 and EL1 execution, but the T0SZ value applies to the TTBR0_EL1 for low memory addresses and the TCR_EL1.T1SZ applies to TTBR1_EL1 for high memory addresses.

Diff Detail

Event Timeline

jasonmolenda created this revision.May 23 2023, 10:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 10:47 PM
jasonmolenda requested review of this revision.May 23 2023, 10:47 PM
jingham accepted this revision.May 24 2023, 4:52 PM

LGTM

The help string for the setting seems clear. There's also some logic to handle the setting vrs. the values we find from the stub which you describe in the comment to the review, but it would be nice to see that in a comment in the code somewhere to help out future generations.

This revision is now accepted and ready to land.May 24 2023, 4:52 PM

The array approach is cool but makes it hard to be backwards compatible: an old lldb is going to error out when presented with more than one value. If you made this two separate options, a client can use settings set -e to set the setting if it exists and still have valid low memory addresses if it doesn't.

bulbazord added inline comments.
lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
1099–1103

I think this comment needs to be updated? It doesn't look like you're calling SetVirtualAddressableBits here anymore.

Updated patch with Alex and Jonas' feedback incorporated. Most significantly, instead of making target.process.virtual-addressable-bits an array of uint values (between zero to two of them), I am leaving virtual-addressable-bits as-is, and adding a new target.process.highmem-virtual-addressable-bits setting. When this is set, its value will be used for setting high-memory signed addresses on AArch64 using the Apple ABI plugin. And the value in virtual-addressable-bits will be used for clearing low-memory signed addresses on AArch64 Apple. If this new highmem-virtual-addressable-bits is not set (the 99.9% most common case), virtual-addressable-bits applies to both address ranges.

This revision was automatically updated to reflect the committed changes.

None of this should change behavior on linux, but if folks from the linux world have a comment or reaction to this change, I would be interested to hear opinions. I haven't done much testing beyond the one test corefile, and I still need to work out how the two values are communicated in corefiles & live debug, but those are trivial details on the core idea.

I primarily work with lldb in Linux userspace, so this isn't an issue. Also seems unlikely that the Linux kernel would decide to change it, but if it did the scheme implemented here looks fine.