This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Suggest extern "C" when an undefined reference is mangled while the definition is not
ClosedPublic

Authored by MaskRay on Oct 29 2019, 3:40 PM.

Details

Summary

When missing an extern "C" declaration, an undefined reference may be
mangled while the definition is not. Suggest the missing
extern "C" and the base name.
This kind of errors sometimes happen for FFI.

Diff Detail

Event Timeline

MaskRay created this revision.Oct 29 2019, 3:40 PM
MaskRay retitled this revision from [ELF] Suggest unmangled names as an alternative spelling to [ELF] Suggest the unmangled name as an alternative spelling.Oct 29 2019, 3:41 PM

If there were many references in an overload set abcde(int); abcde(float); etc. then we'd get multiple references matched against the suggestion abcde. I don't think this isn't the end of the world, as a user can make sense of it, even if it might look obviously wrong to some.

Thinking about the reverse problem, I think we could identify top level non-nested mangled function names by looking for a prefix of _Z(number), nested names would start with _ZN(number). We could catch these, extract the (number) of characters for the C name, and end up with something like a mapping from C name -> mangled name. If an undefined "C" name matches in the table we can demangle the mangled name. Given that there can be multiple C++ names that match the C name it might be better doing in a separate function with a custom diagnostic to handle multiple matches. Something like:

error: undefined symbol: abcde
>>> referenced by {{.*}}
>>> did you mean one of the C++ overloads such as abcde(int)

In summary I think this is a reasonable low cost addition, so I'm happy to add it. Might not be too much effort to do a separate function that could give a more specific diagnostic though.

grimar added inline comments.Oct 30 2019, 3:03 AM
lld/ELF/Relocations.cpp
766

I wonder if we might want to check if (name.startswith("_Z")) first. It might save a bit of calculations and isolate the logic a bit.

767

auto -> size_t ?

thakis added a subscriber: thakis.Oct 30 2019, 7:29 AM
thakis added inline comments.
lld/test/ELF/undef-spell-corrector.s
72 ↗(On Diff #226984)

Hm, this output is maybe not super understandable to people not super familiar with mangling. It says "abcde" right there in "abcde(int)", what do you mean with "did you mean"? :)

Maybe 'did you mean: extern "C" abcde'?

Doing this in both directions (when referencing a C definition with a C++ mangled name, or a C++ definition with an unmangled/C name) seems important/valuable. One comes from using a C header from C++ without wrapping it in "extern "C"" (hopefully inside the header, so it can be used reliably) and the other way happens when you want to write your C API's implementation in C++ (for C interop from a C++ library) but you haven't put "extern"C"" in the declaration of those functions in the header to be used from both C and C++.

But it could be done in two patches rather than trying to cover both cases here, if that's easier.

lld/ELF/Relocations.cpp
766

Rather than demangling and stringifying the whole name, then using a heuristic to find the function name - might it be better to use LLVM's demangling APIs to retrieve the name directly? llvm::itanium_demangle::Node::getBaseName

MaskRay updated this revision to Diff 227198.Oct 30 2019, 4:01 PM
MaskRay marked 2 inline comments as done.
MaskRay retitled this revision from [ELF] Suggest the unmangled name as an alternative spelling to [ELF] Suggest the extern "C" name as an alternative spelling.
MaskRay edited the summary of this revision. (Show Details)

Use ItaniumPartialDemangler

MaskRay marked 3 inline comments as done.Oct 30 2019, 4:02 PM

Doing this in both directions (when referencing a C definition with a C++ mangled name, or a C++ definition with an unmangled/C name) seems important/valuable. One comes from using a C header from C++ without wrapping it in "extern "C"" (hopefully inside the header, so it can be used reliably) and the other way happens when you want to write your C API's implementation in C++ (for C interop from a C++ library) but you haven't put "extern"C"" in the declaration of those functions in the header to be used from both C and C++.

I am open to both directions but was just waiting for more feedback for the other direction. Thanks for providing the motivation!

lld/ELF/Relocations.cpp
766

Thanks for the suggestion. I'll just call ItaniumPartialDemangler::getFunctionName. We can check whether the node kind is KNodeType but such optimization may not be necessary.

MaskRay marked an inline comment as done.Oct 30 2019, 5:03 PM

If there were many references in an overload set abcde(int); abcde(float); etc. then we'd get multiple references matched against the suggestion abcde. I don't think this isn't the end of the world, as a user can make sense of it, even if it might look obviously wrong to some.

Thinking about the reverse problem, I think we could identify top level non-nested mangled function names by looking for a prefix of _Z(number), nested names would start with _ZN(number). We could catch these, extract the (number) of characters for the C name, and end up with something like a mapping from C name -> mangled name. If an undefined "C" name matches in the table we can demangle the mangled name. Given that there can be multiple C++ names that match the C name it might be better doing in a separate function with a custom diagnostic to handle multiple matches. Something like:

error: undefined symbol: abcde
>>> referenced by {{.*}}
>>> did you mean one of the C++ overloads such as abcde(int)

In summary I think this is a reasonable low cost addition, so I'm happy to add it. Might not be too much effort to do a separate function that could give a more specific diagnostic though.

Thanks for the suggestion! I am motivated to do the other direction now... D69650 implements the other direction (definition is mangled; reference is not).

New diagnostic looks great.

lld/ELF/Relocations.cpp
766

Is "_Z" always correct? Blocks for example start with several underscores (3 maybe?). Do we need this check here?

MaskRay marked an inline comment as done.Oct 31 2019, 2:41 PM
MaskRay added inline comments.
lld/ELF/Relocations.cpp
766

I think very few if any people will complain if the blocks feature does not get a better diagnostic on an ELF platform, so we probably don't need to try one to four underscores.

MaskRay updated this revision to Diff 227358.Oct 31 2019, 2:53 PM
MaskRay retitled this revision from [ELF] Suggest the extern "C" name as an alternative spelling to [ELF] Suggest extern "C" when an undefined reference is mangled while the definition is not.
MaskRay removed a subscriber: mgorny.
  • [ELF] Suggest an arbitrary C++ overload as an alternative spelling
ruiu accepted this revision.Nov 8 2019, 2:33 AM

LGTM

lld/ELF/Relocations.cpp
773

pre_hint -> preHint (but can be just hint?)

This revision is now accepted and ready to land.Nov 8 2019, 2:33 AM
This revision was automatically updated to reflect the committed changes.