When missing an extern "C" declaration, an undefined reference may be
mangled while the definition is not. Suggest the missing
extern "C" and the base name.
This kind of errors sometimes happen for FFI.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 40373 Build 40480: arc lint + arc unit
Event Timeline
If there were many references in an overload set abcde(int); abcde(float); etc. then we'd get multiple references matched against the suggestion abcde. I don't think this isn't the end of the world, as a user can make sense of it, even if it might look obviously wrong to some.
Thinking about the reverse problem, I think we could identify top level non-nested mangled function names by looking for a prefix of _Z(number), nested names would start with _ZN(number). We could catch these, extract the (number) of characters for the C name, and end up with something like a mapping from C name -> mangled name. If an undefined "C" name matches in the table we can demangle the mangled name. Given that there can be multiple C++ names that match the C name it might be better doing in a separate function with a custom diagnostic to handle multiple matches. Something like:
error: undefined symbol: abcde >>> referenced by {{.*}} >>> did you mean one of the C++ overloads such as abcde(int)
In summary I think this is a reasonable low cost addition, so I'm happy to add it. Might not be too much effort to do a separate function that could give a more specific diagnostic though.
lld/test/ELF/undef-spell-corrector.s | ||
---|---|---|
72 ↗ | (On Diff #226984) | Hm, this output is maybe not super understandable to people not super familiar with mangling. It says "abcde" right there in "abcde(int)", what do you mean with "did you mean"? :) Maybe 'did you mean: extern "C" abcde'? |
Doing this in both directions (when referencing a C definition with a C++ mangled name, or a C++ definition with an unmangled/C name) seems important/valuable. One comes from using a C header from C++ without wrapping it in "extern "C"" (hopefully inside the header, so it can be used reliably) and the other way happens when you want to write your C API's implementation in C++ (for C interop from a C++ library) but you haven't put "extern"C"" in the declaration of those functions in the header to be used from both C and C++.
But it could be done in two patches rather than trying to cover both cases here, if that's easier.
lld/ELF/Relocations.cpp | ||
---|---|---|
766 | Rather than demangling and stringifying the whole name, then using a heuristic to find the function name - might it be better to use LLVM's demangling APIs to retrieve the name directly? llvm::itanium_demangle::Node::getBaseName |
Doing this in both directions (when referencing a C definition with a C++ mangled name, or a C++ definition with an unmangled/C name) seems important/valuable. One comes from using a C header from C++ without wrapping it in "extern "C"" (hopefully inside the header, so it can be used reliably) and the other way happens when you want to write your C API's implementation in C++ (for C interop from a C++ library) but you haven't put "extern"C"" in the declaration of those functions in the header to be used from both C and C++.
I am open to both directions but was just waiting for more feedback for the other direction. Thanks for providing the motivation!
lld/ELF/Relocations.cpp | ||
---|---|---|
766 | Thanks for the suggestion. I'll just call ItaniumPartialDemangler::getFunctionName. We can check whether the node kind is KNodeType but such optimization may not be necessary. |
Thanks for the suggestion! I am motivated to do the other direction now... D69650 implements the other direction (definition is mangled; reference is not).
New diagnostic looks great.
lld/ELF/Relocations.cpp | ||
---|---|---|
766 | Is "_Z" always correct? Blocks for example start with several underscores (3 maybe?). Do we need this check here? |
lld/ELF/Relocations.cpp | ||
---|---|---|
766 | I think very few if any people will complain if the blocks feature does not get a better diagnostic on an ELF platform, so we probably don't need to try one to four underscores. |
I wonder if we might want to check if (name.startswith("_Z")) first. It might save a bit of calculations and isolate the logic a bit.