This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Group undefined symbol diagnostics by symbol
ClosedPublic

Authored by BertalanD on Jun 14 2022, 8:32 AM.

Details

Reviewers
int3
thakis
Group Reviewers
Restricted Project
Commits
rGd61341768cf0: [lld-macho] Group undefined symbol diagnostics by symbol
Summary

ld64.lld used to print the "undefined symbol" line for each reference to
an undefined symbol previously:

ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)

ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)

Now they are deduplicated:

ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)

As with the other lld ports, only the first 3 references are printed.

Please commit this diff with the following author info:
Daniel Bertalan <dani@danielbertalan.dev>

Diff Detail

Event Timeline

BertalanD created this revision.Jun 14 2022, 8:32 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 14 2022, 8:32 AM
BertalanD requested review of this revision.Jun 14 2022, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 8:32 AM
BertalanD edited the summary of this revision. (Show Details)Jun 14 2022, 8:40 AM
int3 added inline comments.Jun 14 2022, 8:59 AM
lld/MachO/SymbolTable.cpp
349–355

Something like -u isn't really a location. It makes more sense to me to use Loc to mean section+offset, like LLD-ELF is doing.

How about calling this something like namedSources, given that treatUndefinedSymbol already calls the same value source?

360
371

codebase convention is to use explicit types instead of auto unless the type is already mentioned on the same line, or if it's some complicated e.g. iterator type

BertalanD updated this revision to Diff 436827.Jun 14 2022, 9:41 AM

Spelled out type names, changed code to use just the macho namespace and tweaked namedLocs's name.

BertalanD marked an inline comment as done.Jun 14 2022, 9:45 AM

Thank you for the review. I hope I fixed everything you pointed out.

lld/MachO/SymbolTable.cpp
349–355

That's a much better name, thanks for the suggestion!

Is relocLocs fine as is, or do you have a better idea for it? I know it doesn't have the best name.

360

Changed the other two functions that had the incorrect namespace. I'm touching them anyways, so I hope it's fine to include that change in this patch.

thakis accepted this revision.Jun 14 2022, 10:37 AM

Looks great!

Note to self, for reference: ELF was D63344, COFF was D63646.

This revision is now accepted and ready to land.Jun 14 2022, 10:37 AM
thakis added inline comments.Jun 14 2022, 10:38 AM
lld/MachO/SymbolTable.cpp
349–355

Maybe codeReferences for the relocs and otherReferences for the named ones? But the current names work for me personally too. Let's see if Jez has a preference.

int3 accepted this revision.Jun 14 2022, 10:52 AM
int3 added inline comments.
lld/MachO/SymbolTable.cpp
349–355

I like @thakis' idea :)

360

looks good, thanks!

Applied naming suggestions.

This revision was automatically updated to reflect the committed changes.

This might be breaking sanitizer builds: https://lab.llvm.org/buildbot/#/builders/5/builds/24787/steps/19/logs/stdio

Could you please take a look?
Thanks!

FAIL: lld :: MachO/arch-multiple.s (66137 of 66440)
******************** TEST 'lld :: MachO/arch-multiple.s' FAILED ********************
Script:
--
: 'RUN: at line 3';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64-apple-macos -o /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/tools/lld/test/MachO/Output/arch-multiple.s.tmp.o /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/arch-multiple.s
: 'RUN: at line 4';   not ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -fatal_warnings -o /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/tools/lld/test/MachO/Output/arch-multiple.s.tmp.out -arch_multiple /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/tools/lld/test/MachO/Output/arch-multiple.s.tmp.o 2>&1 | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/arch-multiple.s
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/arch-multiple.s:6:10: error: CHECK: expected string not found in input
# CHECK: error: undefined symbol for arch x86_64: _foo
BertalanD added a comment.EditedJun 15 2022, 2:13 PM

@kstoimenov

I had a quick look at it, but I couldn't yet figure out what went wrong just by looking at the source code.

I'm going to try to set up a similar ASan bootstrapping build overnight, and thakis (who's mentoring me for GSoC) is aware of it as well and said that he will investigate too.

I don't have commit access, so feel free to make the revert commit for me in the meantime.

edit: d'oh, I meant to say UBSan

I can reproduce the failures with use_ubsan=true (…after D127906). (I can't repro them with use_asan=true fwiw.) Investigating.

(Even with this reverted locally, ELF/version-script-complex-wildcards.s still fails for me under ubsan. That's due to a different change. Or maybe my local setup is somehow weird.)

This is due to an ODR violation: Both lld/ELF/Relocations.cpp and lld/MachO/SymbolTable.cpp now define a ::UndefinedDiag type, and so they both have the same name for the implicit move ctor etc. Since those are inline, the linker just picks one of the two UndefinedDiag::UndefinedDiag() move ctors at random and uses it to move both types, even though they are distinct.

This fixes it, for example:

diff --git a/lld/MachO/SymbolTable.cpp b/lld/MachO/SymbolTable.cpp
index 7705fbd5f667..5bc622126071 100644
--- a/lld/MachO/SymbolTable.cpp
+++ b/lld/MachO/SymbolTable.cpp
@@ -345,6 +345,7 @@ static bool recoverFromUndefinedSymbol(const Undefined &sym) {
   return false;
 }
 
+namespace {
 struct UndefinedDiag {
   struct SectionAndOffset {
     const InputSection *isec;
@@ -355,7 +356,8 @@ struct UndefinedDiag {
   std::vector<std::string> otherReferences;
 };
 
-static MapVector<const Undefined *, UndefinedDiag> undefs;
+MapVector<const Undefined *, UndefinedDiag> undefs;
+}
 
 void macho::reportPendingUndefinedSymbols() {
   for (const auto &undef : undefs) {

I'll reland with that fix.