This is an archive of the discontinued LLVM Phabricator instance.

Relative VTables ABI on Fuchsia
AbandonedPublic

Authored by leonardchan on Jan 17 2020, 2:15 PM.

Details

Reviewers
phosek
mcgrathr
jakehehrlich
pcc
rsmith
rjmccall
ldionne
jdoerfert
Group Reviewers
Restricted Project
Summary

This contains the updated patch for enabling Peter Collingbourne's relative vtable ABI (described in https://bugs.llvm.org/show_bug.cgi?id=26723) on Fuchsia. The goal is to reduce memory usage by making vtables readonly.

Key differences:

  • LTO is not required
  • No white listing of classes. Fuchsia does not guarantee C++ ABI stability, so anything running on Fuchsia should either by compiled with this ABI or not depend on the C++ ABI at all.
  • Instead of taking the offset between the vtable and virtual function directly, we emit a DSO-local stub that just contains a tail call to the original virtual function, and take the offset between that and the vtable. We do this because there are some cases where the original function that would've been inserted into the vtable is not local or hidden and may require some kind of dynamic relocation which prevents the vtable from being readonly. On x86_64, taking the offset between the function and the vtable gets lowered to the offset between the PLT entry for the function and the vtable which gives us a PLT32 reloc. On Aarch64, right now only CALL26 and JUMP26 instructions generate PLT relocations, so we manifest them with stubs that are just jumps to the original function.
  • Apply the same technique with the RTTI component in the vtable and emit a proxy that results in a GOTPCREL reloc.
  • Update libcxxabi to use different arithmetic for finding the typeinfo pointer given the vtable pointer for Fuchsia
  • Hide the ABI behind a flag -fexperimental-relative-c++-abi-vtables that is only available through CCI (-Xclang will need to be passed before it)

Diff Detail

Event Timeline

leonardchan created this revision.Jan 17 2020, 2:15 PM

On Aarch64, right now only CALL26 and JUMP26 instructions generate PLT relocations, so we manifest them with stubs that are just jumps to the original function.

I think it would be worth considering defining a new relocation type for this. I think we should make the new reloc type represent the relative address shifted right by 2, which would allow it to be used freely in the aarch64 small code model (which permits programs up to 4GB), but would require the target to be 4-byte aligned, which seems like a reasonable restriction. On aarch64, decoding this would not require any additional instructions either (we can fold the lsl into the add). We could use the same technique for a new GOTPCREL relocation to be used for the RTTI component. @peter.smith what do you think?

In D72959#1832011, @pcc wrote:

On Aarch64, right now only CALL26 and JUMP26 instructions generate PLT relocations, so we manifest them with stubs that are just jumps to the original function.

I think it would be worth considering defining a new relocation type for this. I think we should make the new reloc type represent the relative address shifted right by 2, which would allow it to be used freely in the aarch64 small code model (which permits programs up to 4GB), but would require the target to be 4-byte aligned, which seems like a reasonable restriction. On aarch64, decoding this would not require any additional instructions either (we can fold the lsl into the add). We could use the same technique for a new GOTPCREL relocation to be used for the RTTI component. @peter.smith what do you think?

The idea of a new relocation type has come up before, as I recall it was something like the equivalent of R_X86_PLT32

| R_X86_64_PLT32 | 4 | word32 | L + A - P |
Where L is defined as: L Represents the place (section offset or address) of the Procedure Linkage Table entry for a symbol.

For Fuchsia there are two options:
1.) Ask for an ABI relocation type to be defined. I've raised an ABI issue with Arm, for it to progress I think it needs a "yes we really would like one, here is the definition we want to use, this is the use case and it could be used outside of Fuchsia at some point." For example I can see position independent C++ being useful in bare-metal embedded C++. The external facing email for feedback on the abi is arm.eabi@arm.com (address is on the first page of the doc below)
2.) There are two ranges of relocation types reserved for private and platform relocations: https://developer.arm.com/docs/ihi0056/f/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation

Private and platform-specific relocations
Private relocations for vendor experiments:

0xE000 to 0xEFFF for ELF64
0xE0 to 0xEF for ELF32
Platform ABI defined relocations:

0xF000 to 0xFFFF for ELF64
0xF0 to 0xFF for ELF32
Platform ABI relocations can only be interpreted when the EI_OSABI field is set to indicate the Platform ABI governing the definition.

All of the above codes will not be assigned by any future version of this standard.

If this could be generally useful outside of Fuchsia, then an official relocation would be preferable.

jdoerfert resigned from this revision.Jan 22 2020, 8:25 AM

There's two independently-useful things here for the relocation, right? First, it's useful to be able to express a relocation to a symbol that has the semantics of a particular function but doesn't necessarily have its address, and that concept could be used across many "physical" relocations; and second, it's potentially useful to have a shifted-by-two relative address relocation, at least on AArch64 where instructions (and v-table entries under this ABI) are always four-byte-aligned anyway. Is it possible to separate these concerns in ELF, e.g. by having a "symbol" that can be the target of any other relocation but which actually represents a function of unspecified address with the semantics of another function?

There's two independently-useful things here for the relocation, right? First, it's useful to be able to express a relocation to a symbol that has the semantics of a particular function but doesn't necessarily have its address, and that concept could be used across many "physical" relocations; and second, it's potentially useful to have a shifted-by-two relative address relocation, at least on AArch64 where instructions (and v-table entries under this ABI) are always four-byte-aligned anyway. Is it possible to separate these concerns in ELF, e.g. by having a "symbol" that can be the target of any other relocation but which actually represents a function of unspecified address with the semantics of another function?

It would be possible to design relocations like that, but I think it would be difficult to fit into existing multi-target designs, which assume that the relocation code alone encodes all the actions a linker needs to take (smart-format, dumb-linker). My understanding of the reasons behind this are:

  • The linker can have millions of relocations to resolve and having all the actions explicit in the code simplifies its job.
  • There is a large space of available relocation codes, partitioned per target, but a much more limited availability of target specific symbol types and flags.
  • The concerns can be separated at implementation time, for example the symbol lookup, encoding and calculation stages are independent and shared between the codes.

It is possible that there is a better trade-off in complexity, but anything radically different might need quite a bit of work to fit into an existing linker. Hope I've understood you correctly and apologies if the above isn't relevant.

Okay. In that case, I would ask that if you're considering going through the work of adding relocations, please consider adding both scaled and unscaled relative-offset-to-equivalent-symbol.

leonardchan edited the summary of this revision. (Show Details)

I updated the patch such that this everything here allows us to work with RTTI also. Since this is very large and has a lot of refactoring of the ItaniumCXXABI class and llvm changes independent from the FuchsiaCXXABI, I'll do my best to split them into smaller patches then formally ask for reviews on them. Keeping this around for reference as the point of reference I'll keep up to date.

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2020, 2:54 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
leonardchan planned changes to this revision.Apr 1 2020, 2:55 PM
ldionne requested changes to this revision.Apr 1 2020, 10:15 PM
ldionne added a subscriber: ldionne.
ldionne added inline comments.
libcxxabi/src/private_typeinfo.cpp
617

Please introduce a macro that generically expresses that relative vtables are enabled, and explain what it is. You can then enable that macro only on __Fuchsia__. I'd like to avoid freely adding platform-specific #ifdefs in the code when it's easy to avoid.

mcgrathr added inline comments.Apr 2 2020, 11:36 AM
libcxxabi/src/private_typeinfo.cpp
617

I think the libcxxabi change should be separated from the compiler change entirely.
It will need more configuration issues resolved before it usefully lands.
Let's keep this change just for the pure compiler feature.

leonardchan edited the summary of this revision. (Show Details)

Updated such that this ABI is only usable on Fuchsia and if -fexperimental-fuchsia-relative-c++-abi-vtables is passed.

phosek added inline comments.Apr 15 2020, 2:22 PM
clang/include/clang/Basic/LangOptions.def
353

I think we should omit the Fuchsia bit and make it a generic, this feature should be usable by any target.

clang/include/clang/Driver/Options.td
1328

Ditto, I'd drop the fuchsia bit from these flags.

leonardchan marked 2 inline comments as done.
leonardchan edited the summary of this revision. (Show Details)
  • Remove fuchsia pair from the argument and LangOpt.
  • Make the virtual base offset and offset to top 32-bits to make the whole vtable 32-bit aligned.
  • Introduce and check against the ItaniumVTableContext::VTableComponentLayout enum which will indicate that vtable components should be relative instead of pointers. Use this instead of making overridable virtual methods in ABI classes since this shouldn't be customized between ABIs.
  • Update tests.
rjmccall added inline comments.Apr 24 2020, 10:23 AM
clang/lib/CodeGen/CGVTables.cpp
621

There's already an addRelativeOffset on ConstantArrayBuilder; is that insufficient for some reason? I think that, if v-table building were refactored so that the places that build components also add them to the v-table, we'd end up with a lot more flexibility for the ABIs. We needed a similar sort of change for pointer authentication, which we haven't upstreamed to LLVM yet, but which you can see here:

https://github.com/apple/llvm-project/blob/apple/master/clang/lib/CodeGen/CGVTables.cpp

leonardchan marked an inline comment as done.Apr 24 2020, 11:08 AM
leonardchan added inline comments.
clang/lib/CodeGen/CGVTables.cpp
621

I actually did not know about this method, but it does seem to boil down to the same arithmetic used here. Will update to see if I can use the existing builders instead.

leonardchan marked an inline comment as done.Apr 27 2020, 7:28 PM
leonardchan added inline comments.
clang/lib/CodeGen/CGVTables.cpp
621

Ah, so it seems addRelativeOffset operates differently than I thought. Initially I thought it just took the offset relative to the start of the array being built, but it actually seems to be relative to the component that would be added to the builder. This is slightly different from the current implementation where the offset is instead taken relative to the address point of the current vtable.

We could technically still achieve the desired effect of offsets in the vtable if we were to use addTaggedRelativeOffset and subtract a constantaly increasing offset as more virtual function components are added, although I'm not sure how much more benefit that would offer.

An alternative approach could also be to just use addRelativeOffset for offsets relative to the component slot and update instances we access the vtable to adjust for this new arithmetic. I think this would just transform instances of llvm.load.relative(%vtable, %func_offset) to llvm.load.relative(%vtable + %func_offset , 0), which seems differerent from how it was intended to be used, but technically might work.

rjmccall added inline comments.Apr 27 2020, 10:16 PM
clang/lib/CodeGen/CGVTables.cpp
621

Interesting. Yeah, it should be relatively straightforward to add a method that adds a pointer subtraction starting from a different base, and there's already a function for getting a constant that represents the address of the current position in the struct.

I take it you've taught LLVM that it can emit that subtraction with a normal relative-offset relocation by using an addend? That must've been exciting. Or, yeah, we can add in that addend in the frontend.

I think I see the expected benefit: it should generally save an instruction from the call sequence, at least on targets like x86 that lack pre-incrementing addressing modes. It does mean that the v-table entries can't be independently described — e.g. there's really no way you could write a C++ type that behaves like a v-table entry — but that's not too much of a loss.

Updated to use builders for constructing the vtable in "leaf" functions instead of returning a value and propagating it back up to a builder. Also extended ConstantAggregateBuilderBase with a method to add a relative offset to a specific position in the aggregate being constructed.

rjmccall added inline comments.May 8 2020, 12:14 PM
clang/include/clang/Basic/LangOptions.def
355

"Use an ABI-incompatible v-table layout that uses relative references"

clang/include/clang/CodeGen/ConstantInitBuilder.h
230 ↗(On Diff #261894)

"relative to an element in this aggregate, identified by its index".

317 ↗(On Diff #261894)

Hmm, you've borrowed the last sentence from getAddrOfCurrentPosition, but it actually doesn't make any sense (in either place). I think it would be better to say something like "The returned pointer will have type T*, where T is the given type. This type can differ from the type of the actual element."

clang/lib/CodeGen/CGVTables.cpp
636

Lots of typos

644

That's not what this block is doing.

I think it would make sense to build this stub in a helper function.

Can you just avoid making the stub entirely if the function is known to be defined in this translation unit (which will include all the internal-linkage cases, but also things like hidden linkonce_odr)?

647

private is a specific linkage that you're not actually using.

651

You're propagating *all* the attributes (which is good, but the comment should be clearer).

724

lastAddrPoint is a confusing name for this.

739

Relying on a capitalization difference for this is not a good idea for readability.

791

It's not a "destructor" equivalent, it's when the virtual function is defined as deleted.

797

Prefer llvm::ConstantPointerNull::get.

832

Can we make a helper function for this?

875

Please use the same local-variable capitalization conventions as the existing code, and please don't recompute getNumVTables() each time through the loop.

I agree that vtableIndex is a clearer name than i, but if you're going to rename locals for readability, please also rename thisIndex and nextIndex.

879

"Relative offsets should be relative to the v-table address point, which is the first entry after the RTTI pointer."

Can we just do this scan ahead of time to find the component index of the address point when we're using the relative ABI? It typically won't be a large scan, and it'll be a lot more obviously correct than what you're doing here. Also, I think what you're doing here is broken for the RTTI pointer.

You could also add a method on VTableLayout to get the address point by table index.

leonardchan marked 15 inline comments as done.

Evidently I left a lot of spelling mistakes and leftover comments. Cleaned them up!

This should address most of @rjmccall's comments except for calculating the vtable address points beforehand which I will add soon. Just posting what I have so far.

clang/lib/CodeGen/CGVTables.cpp
644

Done and added a test for it. Was planning to add this later as a followup but it turned out to be pretty simple to implement.

875

Updated naming in addVTableComponent and addRelativeComponent also.

Calculate the address points once beforehand such that we can get an address point from a vtable index.

Also updated to emit a stub still for complete destructor components in the vtable. The reason for this is because, with optimizations enabled, it's possible for dso_local complete destructors of a child class to be replaced by destructors of a parent class in the child's vtable, even if that parent destructor is not guaranteed to be dso_local. Updated the no-stub-when-dso-local.cpp test to reflect this.

leonardchan marked an inline comment as done.May 20 2020, 1:55 PM

Otherwise LGTM.

clang/lib/CodeGen/CGVTables.cpp
735

componentName seems to no longer be used meaningfully. in this function.

[Github PR transition cleanup]

I think this work has landed as multiple separate patches, right? If so, can we please close this to clear up the review queue?

Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2023, 6:35 AM
leonardchan abandoned this revision.Sep 8 2023, 10:02 AM
leonardchan marked an inline comment as done.

This was landed in various other smaller changes