This is an archive of the discontinued LLVM Phabricator instance.

[JITLink] Add support for symbol aliases.
Needs ReviewPublic

Authored by lhames on Aug 21 2022, 9:45 PM.

Details

Summary

MachO, ELF and COFF all include support for symbol aliases -- alternative names
for existing symbols. They tend to be rare (JITLink has survived without them
until now), but we need to be able to support them where they are used.

This commit introduces alias support to the LinkGraph by adding new operations
on the Symbol class (and associated helpers in the LinkGraph). By representing
aliases as Symbol objects we avoid introducing a new abstraction and maintain
the property that Edges point at Symbols.

Aliases can be added using the addAlias method. Aliases have a target Symbol
(the symbol that they alias), Name, Size, and Linkage. By default aliases take
their Scope, Liveness, and Callability from the target symbol, though these
properties can be modified independently after they're created.

New LinkGraph operations are introduced to support inspecting aliases:

  • isAlias -- Returns true for symbols that are aliases, false otherwise.
  • getAliasTarget -- If the given symbol is an alias then returns the symbol that it is an alias for, otherwise returns null.
  • getAliasRoot -- Returns the "real" symbol that the given symbol points to, through any number of intermediate aliases (returns the given symbol itself if it is not an alias).
  • aliases_empty, aliases_size, aliases_begin, aliases_end, aliases -- Standard container-like operations on the list of aliases for the given symbol.
  • alias_tree_begin, alias_tree_end, alias_tree -- Provides iteration over the tree of aliases rooted at the given symbol. E.g. give the alias relationships [ A -> B, B -> C, D -> C ], alias_tree(C) will return the sequence [C, B, A, D]. Iteration with alias_tree is useful because it allows clients to write simple loops over all aliases (immediate and transitive) of a given symbol to update them.
  • applyToAliases -- Iterates over all aliases of a given symbol according to an AliasOperationPolicy argument, which is SubTree (operate on all aliases in the tree rooted at the given symbol), WholeTree (operate on all aliases of the tree rooted at the alias root of the given symbol, i.e. *all* aliases including parents and siblings), None (do not apply the operation to any Symbols other than the given one), or AssertNoAlias (an optimized version of None that strips any checks for alias info). The applyToAliases method is used in the updated implementation of several existing LinkGraph methods.

In addition to the new operations, several existing operations have had an
AliasMovePolicy argument added, which controls how aliases of the given Symbol
are handled when moving the given Symbol within the LinkGraph, or removing it
from the LinkGraph. The AliasesBecomeReal policy will cause any aliases to
become "real", disconnecting them from the symbol being operated on and
allowing them to keep their existing properties and location. The SubTree
policy will cause the move/removal to apply to the alias tree rooted at the
given Symbol. The WholeTree policy will cause the move/removal to apply to all
aliases in the alias tree, including parents and siblings of the given Symbol.

Together these changes aim to satisify the following:

  • Avoid substantially increasing costs for non-aliases. The new HasAliasInfo internal flag on Symbol allows us to quickly identify symbols that are neither aliases nor aliased (hopefully this will cover the majority of symbols) and skip any further alias-related work.
  • Allow clients to treat aliases like ordinary symbols for most purposes. E.g. aliases will show up during symbol iteration like any ordinary symbol, keeping "find-by-name" searches simple.
  • Allow clients to easily update groups of aliases (via alias_tree iteration).
  • Provide detailed alias information to clients who want it (via aliases(...) and getAliasTarget(...))
  • Force clients to think about aliases where they need to in order to keep them consistent. E.g. Requiring a policy choice when moving or removing symbols.

Diff Detail

Event Timeline

lhames created this revision.Aug 21 2022, 9:45 PM
lhames requested review of this revision.Aug 21 2022, 9:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2022, 9:45 PM

A couple of quick notes:

  1. This still needs some more unit tests.
  1. All the manual work to keep aliases consistently located is pretty gross. The main motivation for all this is that it would allow plugin writers to manipulate aliases separately. E.g. Even given "foo is an alias for bar", a plugin writer could choose to redirect those symbols to different places (e.g. to count entries via the "foo" and "bar" entrypoint differently). If we're willing to give that ability up we could potentially put all of the alias info (the names, sizes, linkages, scopes, etc.) into the AliasInfos side table and only keep the root Symbol in the graph, eliminating the consistency-maintenance work.

Another approach worth considering: What if we treated alias relationships like module-level metadata? I.e. JITLink wouldn't enforce the invariant that aliases point to the same location. Symbols would remain fully independent (as they are now), and we would just track the symbolic "<a> is an alias for <b>" relationship metadata on the side. Clients would be responsible for updating that if/where needed (although it would be automatically updated on symbol removal to avoid leaving stale entries).

The advantages: Simpler LinkGraph implementation. Clients can still access the metadata, and it provides an easy way to work with aliases where needed (the alias and alias-tree iterators would remain). Clients that don't want or need to care about aliases can ignore them.

The disadvantage: the "intent" of the metadata would be obscured by any passes that move alias symbols relative to one another without explicitly updating the metadata. E.g. if you have a graph containing symbol A and an alias B of A, and pass X decides to move B only to a different location, what should a later pass make of the relationship "B is an alias of A"?

That disadvantage is real, but seems likely to be rare. Given that JITLink pipelines and passes are also likely to remain relatively simple I think we could take this module-level-metadata route and then update passes to preserve metadata as needed.

Does the SubTree v. WholeTree terminology have precedent anywhere? I personally didn't find it suggestive enough to guess what it would mean, I think because I mostly think about aliases of regular symbols not aliases of aliases (ie. the "tree" part). Just a suggestion: how about SubAliases and AllAliases. If you disagree, feel free to stick with the current names, I don't feel that strongly about it.

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
434

Is there a reason that AliasMovePolicy does not also have this option?

918

I'm worried about this type storing AliasInfo by value. You're holding references and returning pointers to the values in several places, which I worry is going to blow up if you accidentally insert into the map at the wrong time. I haven't found any actual UAF in the below code, but there are several footguns here -- AliasInfo &AI that is lexically in scope after a mutation of the map (ie. only working because nothing touches AI again) -- also you're relying on the fact DenseMap::erase never shrinks.

1459

Why is it "LLVM_LIKELY" that there are aliases? I would have expected the opposite here. Same for aliases_begin and aliases_end above.

@benlangmuir Any take on the alternative proposal -- make alias metadata available, but leave it to clients to act on?

In that case the alias infrastructure and iterators from this patch would remain, but we'd drop the MovePolicy and automated updates for aliases.

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
434

AssertNoAliases was introduced early. I think that by the end of the patch I'd decided that it was a bad idea, but forgot to take it back out.

918

You're right -- you can definitely imagine an iteration over aliases being used to insert new aliases.

Aliases are expected to be rare -- we can probably allocate entries in the graph's BumpPtrAllocator and hold pointers to them in the map.

1459

I think the logic was "If you were writing general-case code for symbols that _might_ be aliased then you would have used alias_tree, so a call to aliases suggests knowledge that a symbol is aliased".

I don't really think this needs to be optimized though -- I'll just remove it.