This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Suggest extern "C" when the definition is mangled while an undefined reference is not
ClosedPublic

Authored by MaskRay on Oct 30 2019, 5:01 PM.

Details

Summary

The definition may be mangled while an undefined reference is not.
This may come up when the reference is from a C file or the definition
misses an extern "C".

Suggest an arbitrary <source-name> that matches the undefined reference,
if such a definition exists.

ld.lld: error: undefined symbol: foo
>>> referenced by a.o:(.text+0x1)
>>> did you mean to declare foo(int) as extern "C"?
>>> defined in: a1.o

Event Timeline

MaskRay created this revision.Oct 30 2019, 5:01 PM
dblaikie added inline comments.Oct 30 2019, 5:14 PM
lld/ELF/Relocations.cpp
705–706

I'm not super comfortable with this hand-written demangling (maintenance burden/code duplication, etc) - is there something in the itanium demangling library that already does this/can be reused (or refactored so it can be reused)?

792–795

If we're only going to make one suggestion, is there any way to short-circuit the forEachSymbol to stop on the first candidate found, rather than searching for a lot of candidates that'll be ignored?

(equally, should we be printing more candidates, and maybe printing the source line/file they were defined on in the error message to help people find the right one?)

lld/test/ELF/undef-suggest-extern-cxx.s
15 ↗(On Diff #227203)

This situation might be worth a few more words for the user.

This current phrasing makes it sounds like the person who wrote the call to foo, might've intended to call foo(int) - when they might've actually called foo(int), but with C mangling & there's a good chance the /call/ is correct (it'd be rare that someone wrote a header file intending to only use it for C++ and manage to just by accident write perfectly valid C in that header file without the intent that that file ever be included from C code) - and that it's the definition in C++ that needs to be fixed.

"did you mean to define foo(int) (or some other overload) as extern "C"?"

might be the phrasing to go for?

MaskRay marked 2 inline comments as done.Oct 30 2019, 5:37 PM
MaskRay added inline comments.
lld/ELF/Relocations.cpp
705–706

One way that uses the LLVMDemangle library is to call ItaniumPartialDemangler::partialDemangle, then check whether the root node is of type KNodeType, then check its string.

The concern is that demangling every symbol table name may be slow. Similarly, in --version-script and --dynamic-list handling, we demangle all symbol table names on demand (only when extern "C++" is seen) because doing it unconditionally can be slow.

792–795

Unfortunately, forEachSymbol cannot be short-circuited (D62381). It was intended for parallel execution but it is not implemented.


(equally, should we be printing more candidates, and maybe printing the source line/file they were defined on in the error message to help people find the right one?)

We suggest the object file name: >>> defined in:
it should have provided some hint...

dblaikie added inline comments.Oct 30 2019, 5:48 PM
lld/ELF/Relocations.cpp
705–706

This is only in the error case, right? We're already going to print an error to the user - so it seems we're already doing this lazily enough to not be doing it wastefully/needlessly (especially in a "Good" codepath).

792–795

the use of forEachSymbol here isn't thread safe, since it'd racily access 's' - so perhaps forEachSymbol could be changed to allow short-circuiting (since its callers likely depend on it not being parallel for now) & a separate parallel version could be added at some point & callers who want that behavior could be migrated to it

Also, at least as a partial alternative to reduce the cost of demangling, the callback itself could short-circuit:

for (auto &it : map)
  if (!s && canSuggestExternCXX(name, it.first))
    s = it.second;
if (!s)
  symtab->forEachSymbol([&](Symbol *sym) {
    if (!s && canSuggestExternCXX(name, sym->getName()))
      s = sym;
  });

(unless there's a reason you would like to prefer the last result, or the result from "forEachSymbol" over the result from the map? In which case you can put the forEachSymbol before the for loop over the map (& skip the for loop over the map if you already found a result in the forEachSymbol))

ruiu added inline comments.Oct 31 2019, 1:40 AM
lld/test/ELF/undef-suggest-extern-cxx.s
15 ↗(On Diff #227203)

I like that error message style. But if the user's intention was to link foo as a C function, showing that function as foo(int) is a bit confusing? How about something like this?

we found a C++ symbol that has a similar name -- did you forget to surround a declaration with 'extern "C"'?

Thanks for the update.

lld/ELF/Relocations.cpp
707

I think it will be worth defining how long is too long for a better error message, then testing some large programs with the full demangling to see if it makes a difference that anyone would be able to perceive and get frustrated with. I'd be happy with even a few seconds for a better error message.

MaskRay marked 2 inline comments as done.Oct 31 2019, 2:22 PM
MaskRay added inline comments.
lld/ELF/Relocations.cpp
705–706

I agree there is a leaky abstraction here - it require some knowledge about the mangling scheme. The closest I can come up with is:

static bool canSuggestExternCXX(StringRef ref, StringRef def) {
  llvm::ItaniumPartialDemangler d;
  if (d.partialDemangle(def.str().c_str()))
    return false;
  char *buf = d.getFunctionName(nullptr, nullptr);
  if (!buf)
    return false;
  bool ret = ref == buf;
  free(buf);
  return ret;
}

I am not sure it is necessarily better.

The comment def is in the form of _Z <length> <ref> <ignored> is here to clarify the approach we take.

792–795

forEachSymbol is defined as:

void forEachSymbol(llvm::function_ref<void(Symbol *)> fn) {
  for (Symbol *sym : symVector)
    if (!sym->isPlaceholder())
      fn(sym);
}

It cannot be short-circuited, but there is no thread safety issue. If you or @ruiu think https://reviews.llvm.org/D62381#1518668 looks good, I can refactor the existing forEachSymbol call sites and send a patch for review.

MaskRay updated this revision to Diff 227360.Oct 31 2019, 2:55 PM
MaskRay retitled this revision from [ELF] Suggest an arbitrary C++ overload as an alternative spelling to [ELF] Suggest extern "C" when the definition is mangled while an undefined reference is not.
MaskRay edited the summary of this revision. (Show Details)

Improve suggestion

MaskRay marked an inline comment as done.Nov 5 2019, 9:06 AM
MaskRay added inline comments.
lld/ELF/Relocations.cpp
705–706

Opinions on the demangling method to choose? 1. as-is 2. ItaniumPartialDemangler

dblaikie added inline comments.Nov 5 2019, 9:19 AM
lld/ELF/Relocations.cpp
705–706

I'd still favor the ItaniumPartialDemangler, personally - though improvements to that API wouldn't hurt either (eg: could there be a non-dynamic "getFunctionName" that only gives an answer on a simple function name & otherwise returns null? (& thus it wouldn't have dynamic allocation - it'd just return a StringRef into the existing mangled name))

MaskRay updated this revision to Diff 227912.Nov 5 2019, 10:09 AM
  • Rename canSuggestExternCXX to canSuggestExternCForCXX
  • Use ItaniumPartialDemangler.

For the record, the original approach I used was:

static bool canSuggestExternCXX(StringRef ref, StringRef def) {
  unsigned len;
  return def.consume_front("_Z") && !def.consumeInteger(10, len) &&
         len == ref.size() && def.take_front(len) == ref;
}
MaskRay marked 5 inline comments as done.Nov 5 2019, 10:09 AM
dblaikie added inline comments.Nov 5 2019, 10:21 AM
lld/ELF/Relocations.cpp
792–795

Not quite following the D62381 connection here - oh, a range-based rather than callback-based iteration over non-placeholder symbols? Sure. I'll leave that up to Rui & can be refactored here (before or after this is submitted)

But the map for loop before the forEachSymbol looks like it could be refactored already - and could avoid doing more suggestion work once it finds the first suggestion, since the first is likely to be as good as the last? (so no point continuing to search once you find the first)

MaskRay updated this revision to Diff 227922.Nov 5 2019, 10:47 AM
MaskRay marked 2 inline comments as done.

Address review comments

dblaikie accepted this revision.Nov 5 2019, 10:52 AM

This looks good to me - but probably wouldn't hurt to get @ruiu to sign off on it as well.

This revision is now accepted and ready to land.Nov 5 2019, 10:52 AM

I'm happy with this aswell. Thanks for the updates.

ruiu accepted this revision.Nov 8 2019, 2:34 AM

LGTM

lld/ELF/Relocations.cpp
709

nit: I'd add parentheses to the right-hand side.

800–801

preHint and postHint

This revision was automatically updated to reflect the committed changes.