This is an archive of the discontinued LLVM Phabricator instance.

[ORC] Fix in-process lookup of symbols without GlobalPrefix
Needs ReviewPublic

Authored by Hahnfeld on Jan 20 2023, 1:05 AM.

Details

Reviewers
lhames
sunho
Summary

On Windows, C++ symbols are prefixed not with the "global" prefix, but with a question mark. Their symbol names must be processed without stripping the first character.

Diff Detail

Event Timeline

Hahnfeld created this revision.Jan 20 2023, 1:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2023, 1:05 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Hahnfeld requested review of this revision.Jan 20 2023, 1:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2023, 1:05 AM

This change won't work for other platforms at the moment. E.g. On Darwin where the global prefix is _ an ORC lookup (which uses mangled names) for [ "foo", "_foo" ] should return [ nullptr, <addr of _foo> ], but with this patch it will return [ <addr of foo>, <addr of foo> ].

GlobalPrefix was always a bit of a hack -- I think we want to make this a generalized demangle function that returns a std::optional<std::string>, then we can write:

for (auto &KV : Symbols) {
  auto &Name = KV.first;

  if ((*Name).empty())
    continue;

  if (Allow && !Allow(Name))
    continue;

  auto DemangledName = Demangle(Name);
  if (!DemangledName)
    continue;

  if (void *Addr = Dylib.getAddressOfSymbol(DemangledName->c_str())) {
    NewSymbols[Name] = JITEvaluatedSymbol(
        static_cast<JITTargetAddress>(reinterpret_cast<uintptr_t>(Addr)),
        JITSymbolFlags::Exported);
  }
}

On windows the demangler can conditionally drop the ?, and on Darwin it can require the _ prefix (returning std::noneopt` when it's not present.

Hahnfeld updated this revision to Diff 491680.Jan 24 2023, 2:37 AM

Only allow leading question mark without GlobalPrefix according to DataLayout.

This change won't work for other platforms at the moment. E.g. On Darwin where the global prefix is _ an ORC lookup (which uses mangled names) for [ "foo", "_foo" ] should return [ nullptr, <addr of _foo> ], but with this patch it will return [ <addr of foo>, <addr of foo> ].

Sure, but is this actually relevant? Passing foo is not properly mangled and an error on the client side. *If* it is required to make this robust against all possible edge cases, we have to make fairly intrusive changes as shown in the last update to pass DataLayout::doNotMangleLeadingQuestionMark() to DynamicLibrarySearchGenerator (or require a DataLayout in the first place, but that's a bigger restriction IMO).

Sure, but is this actually relevant? Passing foo is not properly mangled and an error on the client side.

It's not an error: ORC lookups are always made in terms of raw symbol names (which we often refer to as "linker mangled"). The impedance mismatch in DynamicLibrarySearchGenerator comes from the fact that dlsym does not work with raw names, it works with C names, which will have been mangled with a leading underscore.

As an example, consider a JIT'd program consisting of two JITDylibs. JITDylib A contains an object built from assembly containing the label "foo" (no underscore, which is legal -- this isn't a C symbol). JITDylib B contains a DynamicLibraryDefinitionGenerator pointing at an on-disk libB.dylib which contains an _foo (presumably defined in C). A lookup with search order [ A, B ] for symbols [ "foo", "_foo" ] should find a result for "foo" from JITDylib A, and "_foo" from JITDylib B (where the result will have come from demangling "_foo" to "foo", then calling dlsym).

*If* it is required to make this robust against all possible edge cases, we have to make fairly intrusive changes as shown in the last update to pass DataLayout::doNotMangleLeadingQuestionMark() to DynamicLibrarySearchGenerator (or require a DataLayout in the first place, but that's a bigger restriction IMO).

As mentione, a good option for now would be to replace GlobalPrefix with a unique_function that takes a SymbolStringPtr and produces an optionally demangled string suitable for the platform's dynamic-symbol-lookup API.

Alternatively, maybe it's time to just add a DataLayout (or global default mangling and demangling functions) to ExecutionSession and let everybody reference that.

Ok, understood. I think this should already be taken care of by the last update from Tuesday.

As mentione, a good option for now would be to replace GlobalPrefix with a unique_function that takes a SymbolStringPtr and produces an optionally demangled string suitable for the platform's dynamic-symbol-lookup API.

I don't like this, we don't want every user of DynamicLibrarySearchGenerator to write their own demangler.

Alternatively, maybe it's time to just add a DataLayout (or global default mangling and demangling functions) to ExecutionSession and let everybody reference that.

We could, but it seems to work fine with an additional bool. Alternatively, we could just always treat '?' special at the beginning. Let me know.