Page MenuHomePhabricator

lld/elf: Deduplicate undefined symbol diagnostics
ClosedPublic

Authored by thakis on Jun 14 2019, 9:55 AM.

Details

Summary

Before:

ld.lld: error: undefined symbol: f()
>>> referenced by test.cc:3
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-9c0808.o:(g())

ld.lld: error: undefined symbol: f()
>>> referenced by test.cc:4
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-9c0808.o:(h())

ld.lld: error: undefined symbol: f()
>>> referenced by test.cc:5
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-9c0808.o:(j())

ld.lld: error: undefined symbol: k()
>>> referenced by test.cc:5
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-9c0808.o:(j())

ld.lld: error: undefined symbol: f()
>>> referenced by test2.cc:2
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test2-07b391.o:(asdf())
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Now:

ld.lld: error: undefined symbol: f()
>>> referenced by test.cc:3
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-0e07ba.o:(g())
>>> referenced by test.cc:4
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-0e07ba.o:(h())
>>> referenced by test.cc:5
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-0e07ba.o:(j())
>>> referenced by test2.cc:2
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test2-6bdb24.o:(asdf())

ld.lld: error: undefined symbol: k()
>>> referenced by test.cc:5
>>>               /var/folders/c5/8d7sdn1x2mg92mj0rndghhdr0000gn/T/test-0e07ba.o:(j())
clang: error: linker command failed with exit code 1 (use -v to see invocation)

If there are more than 10 references to an undefined symbol, only the
first 10 are printed.

Fixes PR42260.

Diff Detail

Repository
rL LLVM

Event Timeline

thakis created this revision.Jun 14 2019, 9:55 AM
Herald added a project: Restricted Project. · View Herald Transcript
ruiu added a comment.Jun 17 2019, 4:36 AM

I think that the three patches make sense only when they are combined, so could you merge them as a single patch?

lld/Common/ErrorHandler.cpp
137 ↗(On Diff #204800)

To me, reporting a single undefined symbol that is used by multiple object files is still counted as a single error. That kind of error is actually likely to be caused by a single error to forget defining a symbol or something.

I don't feel strongly about that, though. But in that case, I'd choose a simpler one. Handle all error() calls as a single erorr is simpler.

squash commits, and remove error limit logic

thakis marked an inline comment as done.Jun 17 2019, 5:30 AM
thakis added inline comments.
lld/Common/ErrorHandler.cpp
137 ↗(On Diff #204800)

I was unsure about this too. I think the motivation for -error-limit is probably to prevent unbounded error output by default, and if a single undef symbol can now produce an unboundedly long list of references, we lose that property. But the implementation was not very elegant. Another idea would be to only omit the first (say) 3 references and then say ">> and 532 more references". If we wanted we could then add a "-undef-limit" flag for people who want to see more than 3. WDYT?

ruiu added a comment.Jun 17 2019, 5:44 AM

I agree that that's perhaps better to omit an error message as >> and 532 more references if it is too large, but I don't know how large it can be. Can this be hundreds of lines?

lld/ELF/Relocations.cpp
720 ↗(On Diff #205045)

nit: add a blank line

In lld we honestly don't care too much about eliminating global initializers, so it can probably be just a global variable, but you need to erase the contents before the first use so that multiple runs of lld::elf::main in the same process works as expected.

784 ↗(On Diff #205045)

Replace auto with a concrete type.

814–815 ↗(On Diff #205045)

You can define bool IsWarning instead of constructing an entire Undef here

818 ↗(On Diff #205045)

Is this really called by a multithreaded code?

821 ↗(On Diff #205045)

Then you can construct an UndefinedDiag as an unnamed struct

Undefs().push_back({&Sym, {{&Sec, Offset}}, IsWarning});
thakis marked 7 inline comments as done.

comments

thakis edited the summary of this revision. (Show Details)Jun 17 2019, 8:05 AM

I made it so that it prints at most 10 references. Most of the time when I get this diagnostic, it's because the symbol doesn't exist, and seeing many references to it isn't useful.

lld/ELF/Relocations.cpp
720 ↗(On Diff #205045)

Made it a global. (You might want to reconsider it at some point, though. The main cost of static initializers, other than that initialization order is undefined, is that the code for the initializer needs to be paged in. For example, the coff linker needs to load all the bits of the elf linker that is for static initializers, even though they'll never be used. See also https://web.archive.org/web/20160811161720/https://blog.mozilla.org/tglek/2010/05/27/startup-backward-constructors/ / https://glandium.org/blog/?cat=13&paged=3)

818 ↗(On Diff #205045)

Hm, I hadn't really checked, I had believed the comment on the bug. It's called from here http://llvm-cs.pcc.me.uk/tools/lld/ELF/Writer.cpp#1741 , but it looks like forEachRelSec() doesn't do any parallel for loops, so probably not. Removed the lock.

ruiu added a comment.Jun 17 2019, 8:16 PM

Generally looking good, but I have one question.

Before this patch, undefined symbols are reported as we find them. Now, we buffer error messages and print them out after visiting all relocations. So the delay to get a first error message if there's an undefined symbol will increase with this patch. How long is the increase? If it is only a few hundred milliseconds, it is okay, but if it the second time scale, we may have to do this in a different way.

lld/ELF/Relocations.cpp
744 ↗(On Diff #205075)

k -> K

or just remove k

745 ↗(On Diff #205075)

i -> I

779 ↗(On Diff #205075)

I

780 ↗(On Diff #205075)

auto -> UndefinedDiag

grimar added a subscriber: grimar.Jun 18 2019, 1:08 AM
grimar added inline comments.
lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

Seems this field can be calculated at any time, does it worth to keep the precalculated value?

778 ↗(On Diff #205075)

I'd use llvm::DenseMap. It is probably a bit more consistent with the rest LLD ELF code,
and allows then to use its lookup method below:

if (Symbol *S = FirstRef.lookup(Undef.Sym)) {
...
}
790 ↗(On Diff #205075)

It is not clang formatted.

MaskRay added inline comments.Jun 18 2019, 1:41 AM
lld/test/ELF/undef-multi.s
4 ↗(On Diff #205075)

%t.exe -> /dev/null if you don't need the output.

17 ↗(On Diff #205075)

Just a suggestion. It is now common to use ## for comments that are not RUN/CHECK lines.

MaskRay added inline comments.Jun 18 2019, 1:44 AM
lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

IsWarning is computed on a per-symbol basis:

bool CanBeExternal = !Sym.isLocal() && Sym.computeBinding() != STB_LOCAL &&
                       Sym.Visibility == STV_DEFAULT;

  bool IsWarning =
      (Config->UnresolvedSymbols == UnresolvedPolicy::Warn && CanBeExternal) ||
      Config->NoinhibitExec;
grimar added inline comments.Jun 18 2019, 1:46 AM
lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

Yes and we have Symbol * UndefinedDiag::Sym here, so can calculate it on fly I think.

thakis marked 10 inline comments as done.

comments

How long is the increase?

That's a good question. Intuitively I'd say it shouldn't be all that much: Undefined diags were emitted doing relocation scanning, which I think happens fairly late in the link, and now it happens after it. So the max latency added is the time it takes to scan relocations. I'd expect that relocation scanning time is small compared to total link time. But I'll measure and report back.

lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

It's needed as a return value; doing the computation twice seems less good to me?

778 ↗(On Diff #205075)

Done, I think. (I had to change the value type to UndefinedDiag, and do if (UndefinedDiag *... to make it work; that's my best guess at what you meant. Maybe I got it wrong.)

lld/test/ELF/undef-multi.s
17 ↗(On Diff #205075)
$ git grep -l '^# ' lld/test/ | wc -l
    1388
$ git grep -l '^## ' lld/test/ | wc -l
     201

In LLVM, it's 4766/176.

So it's an order of magnitude more common to not do this. Unless there's an effort to rewrite all of llvm's existing tests to this new style and enforce it, using ## for comments seems like it decreases llvm's self-consistency and I suggest everyone adding comments like this to stop doing it.

grimar added inline comments.Jun 18 2019, 5:10 AM
lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

I think usually we do not store things that can be computated trivially. Lets see what others think about this one though.

778 ↗(On Diff #205075)

You got it right.

ruiu added a comment.Jun 18 2019, 5:47 AM

How long is the increase?

That's a good question. Intuitively I'd say it shouldn't be all that much: Undefined diags were emitted doing relocation scanning, which I think happens fairly late in the link, and now it happens after it. So the max latency added is the time it takes to scan relocations. I'd expect that relocation scanning time is small compared to total link time. But I'll measure and report back.

Thanks. I have a feeling that the relocation scan pass is one of the heaviest pass in lld, so it's worth measuring.

How long is the increase?

That's a good question. Intuitively I'd say it shouldn't be all that much: Undefined diags were emitted doing relocation scanning, which I think happens fairly late in the link, and now it happens after it. So the max latency added is the time it takes to scan relocations. I'd expect that relocation scanning time is small compared to total link time. But I'll measure and report back.

Thanks. I have a feeling that the relocation scan pass is one of the heaviest pass in lld, so it's worth measuring.

I hooked up the ScopedTimer stuff that lld/COFF uses to lld/ELF (if there's interest, I can send a patch for that), and here's the output for linking libblink_core.so in a debug build:

  Reloc scanning:               239 ms (  4.0%)
-------------------------------------------------
Total Link Time:               5975 ms (100.0%)
ruiu added a comment.Jun 18 2019, 5:50 AM

How long is the increase?

That's a good question. Intuitively I'd say it shouldn't be all that much: Undefined diags were emitted doing relocation scanning, which I think happens fairly late in the link, and now it happens after it. So the max latency added is the time it takes to scan relocations. I'd expect that relocation scanning time is small compared to total link time. But I'll measure and report back.

Thanks. I have a feeling that the relocation scan pass is one of the heaviest pass in lld, so it's worth measuring.

I hooked up the ScopedTimer stuff that lld/COFF uses to lld/ELF (if there's interest, I can send a patch for that), and here's the output for linking libblink_core.so in a debug build:

  Reloc scanning:               239 ms (  4.0%)
-------------------------------------------------
Total Link Time:               5975 ms (100.0%)

Ah, this number seems completely acceptable. Thank you very much for measuring it.

Apologies, that number is wrong, I had a duplicate call to ScopedTimer in the main function. Here's the correct number:

  Reloc scanning:               220 ms (  7.0%)
-------------------------------------------------
Total Link Time:               3147 ms (100.0%)

I'll also do a static library build of chrome.

ruiu added inline comments.Jun 18 2019, 5:54 AM
lld/ELF/Relocations.cpp
718 ↗(On Diff #205075)

Repeating the same code in two functions may be bad, but if you factor the code out and define a predicate function to determine whether we should report an error or a warning, it might look good. It may be worth trying.

For a static debug build of chrome (since the link time is a bit long, this configuration isn't officially supported and hits an assert("don't use this config") that I commented out):

  Reloc scanning:              1304 ms (  7.0%)
-------------------------------------------------
Total Link Time:              18744 ms (100.0%)

So it's not nothing, but it's also < 10 %, and this isn't link slowdown but latency for seeing undefined symbol diagnostics. Personally, I've never started looking at undefined symbols while they're still scrolling by, and latency for being done with printing undefined symbols is unchanged. So I'd say this is fine, but it's a bit of a judgement call.

ruiu added a comment.Jun 18 2019, 6:07 AM

For a static debug build of chrome (since the link time is a bit long, this configuration isn't officially supported and hits an assert("don't use this config") that I commented out):

  Reloc scanning:              1304 ms (  7.0%)
-------------------------------------------------
Total Link Time:              18744 ms (100.0%)

So it's not nothing, but it's also < 10 %, and this isn't link slowdown but latency for seeing undefined symbol diagnostics. Personally, I've never started looking at undefined symbols while they're still scrolling by, and latency for being done with printing undefined symbols is unchanged. So I'd say this is fine, but it's a bit of a judgement call.

I'd think that what you should actually compare is the time to get an error message before this patch and after this patch, instead of the total link time. If lld spends zero time before relocation processing (which is unrealistic but just an example), users would have been able to get an error message immediately (assuming that relocations errors are everywhere) and they would have to wait 1300 ms after this patch.

But if you wait for 10 seconds for lld to start relocation processing, additional 1 second may not be a big deal.

use predicate function instead of storing IsWarning

actually use predicate function

The latest patch set uses predicate functions instead of storing IsWarning(). It looks less clear to me, but let me know what you think.

I'd think that what you should actually compare is the time to get an error message before this patch and after this patch, instead of the total link time. If lld spends zero time before relocation processing (which is unrealistic but just an example), users would have been able to get an error message immediately (assuming that relocations errors are everywhere) and they would have to wait 1300 ms after this patch.

But if you wait for 10 seconds for lld to start relocation processing, additional 1 second may not be a big deal.

Ah, I see. I added more timers:

  Before reloc scanning:      12705 ms ( 66.5%)
  Reloc scanning:              1362 ms (  7.1%)
  After reloc scanning:        5046 ms ( 26.4%)
-------------------------------------------------
Total Link Time:              19113 ms (100.0%)

So before you'd get an error 12.7s-14.0s in, now it's consistently after 14s. (But if there are multiple errors that scroll by, chances are you could only act after 14s previously as well.)

grimar, should I keep the current version with the predicate, or the version with the IsWarning flag? I don't have a strong preference.

And would there be interest in a -time-report flag that adds a summary like the above? (Not with "before reloc scanning" and "after reloc scanning", but with timers like "reading inputs" etc, like lld/COFF has.)

grimar, should I keep the current version with the predicate, or the version with the IsWarning flag? I don't have a strong preference.

My opinion is that original version was slightly simpler, so I would return to using IsWarning flag probably. Unless there is a better way to use predicates.

back to IsWarning

MaskRay added inline comments.Jun 18 2019, 6:47 AM
lld/ELF/Relocations.cpp
789 ↗(On Diff #205318)

I think this loop can be changed to range-based

thakis marked 2 inline comments as done.

use range loop

thakis marked 4 inline comments as done.Jun 18 2019, 7:02 AM

All comments addressed :) Can I get an lg?

All comments addressed :) Can I get an lg?

I think you will:)

I have no more comments on this patch.

ruiu accepted this revision.Jun 19 2019, 7:33 AM

LGTM

lld/ELF/Relocations.cpp
747 ↗(On Diff #205330)

nit: you can do I++ here.

This revision is now accepted and ready to land.Jun 19 2019, 7:33 AM
thakis marked an inline comment as done.Jun 20 2019, 11:11 AM

Thanks!

lld/ELF/Relocations.cpp
747 ↗(On Diff #205330)

It's not quite the same: If the break is taken, I is incremented then, and it isn't currently. I could adjust for that in the if below, or start I at -1 (and change the init value at -1 and the type to int and add a cat to the if below), but everything I could think of so far is more complicated than what I currently have.

This revision was automatically updated to reflect the committed changes.