This is an archive of the discontinued LLVM Phabricator instance.

[LLD] Add summary support
ClosedPublic

Authored by aganea on Feb 24 2019, 7:22 PM.

Details

Summary

I am proposing a very simple summary of the data processed by LLD, to be displayed at the end of linking.
Currently, this is enabled through a hidden (LLD-specific) COFF flag /summary, but it could be used in the other drivers too.
Here's an example output:

                                    Summary
--------------------------------------------------------------------------------
           4777 Input OBJ files (expanded from all cmd-line inputs)
             68 Dependent PDB files
             31 Dependent PCH OBJ files
          49283 Output PDB strings
        1785681 Output global symbol records

It is purposely very simple: it displays only a <u64,string> pair; there's no grouping or sorting whatsoever.
Also, the line will not be displayed if the value is zero, to avoid noise.

Diff Detail

Repository
rL LLVM

Event Timeline

aganea created this revision.Feb 24 2019, 7:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2019, 7:22 PM
aganea added a comment.EditedFeb 25 2019, 6:05 AM

I would also like an overall opinion from you all: when using /summary in production, the numbers can get pretty big.
For example, a day-to-day developement build:

                                     Summary
--------------------------------------------------------------------------------
           4785 Input OBJ files (expanded from all cmd-line inputs)
             71 Dependent PDB files
             27 Dependent PCH OBJ files
      139948902 Input type records (across all OBJ and dependencies)
     8449407544 Input type records bytes (across all OBJ and dependencies)
        9664019 Output merged type records
       40258663 Output merged symbol records
          48454 Output PDB strings

A similar target, but non-unity TUs:

                                    Summary
--------------------------------------------------------------------------------
           6031 Input OBJ files (expanded from all cmd-line inputs)
              0 Dependent PDB files
              1 Dependent PCH OBJ files
     1854813694 Input type records (across all OBJ and dependencies)
   107738367784 Input type records bytes (across all OBJ and dependencies)
       14259032 Output merged type records
       86900071 Output merged symbol records
          23641 Output PDB strings

The numbers become meaningless because they are too big. It might take you a bit to count the digits to understand it's 107 GB there for the "Input type records bytes" line.

We could use the locale, but things are not much better:

                                    Summary
--------------------------------------------------------------------------------
            6,031 Input OBJ files (expanded from all cmd-line inputs)
                0 Dependent PDB files
                1 Dependent PCH OBJ files
    1,854,813,694 Input type records (across all OBJ and dependencies)
  107,738,367,784 Input type records bytes (across all OBJ and dependencies)
       14,259,032 Output merged type records
       86,900,071 Output merged symbol records
           23,641 Output PDB strings

So maybe we could round some of these numbers, just to give a first order of magnitude information:

                                    Summary
--------------------------------------------------------------------------------
            6,031 Input OBJ files (expanded from all cmd-line inputs)
                0 Dependent PDB files
                1 Dependent PCH OBJ files
             1.8B Input type records (across all OBJ and dependencies)
          107.7GB Input type records bytes (across all OBJ and dependencies)
            14.2M Output merged type records
            86.9M Output merged symbol records
           23,641 Output PDB strings

And maybe have /summary+ if you want the precise numbers. What do you think?

ruiu added a comment.Feb 25 2019, 3:06 PM

Could you elaborate a bit about why you want this feature?

Instead of adding a flag, you might want to do that when /verbose is given.

COFF/PDB.cpp
1361 ↗(On Diff #188099)

Looks like instead of pushing stats data to a Summary class and print it out later, you could print out the stats right here.

aganea marked 2 inline comments as done.EditedFeb 25 2019, 7:07 PM

Could you elaborate a bit about why you want this feature?

Instead of adding a flag, you might want to do that when /verbose is given.

Essentially to have a better understanding of where the timings go. With many different clients and many different usages, seeing the /time values does not tell me what caused those timings. /summary gives me more hints on the "what" and also allows compares between different usages. This is primarily geared towards the PDB generation, because that's what takes the most time, and because that's where I'd like to put the most efforts on.

In contrast, /verbose is more for the day-to-day users of the linker. For example if you want to understand wrong lib paths, or missing symbols.
I suppose I could print the information I want for summary at the end of /verbose output, however that forces to wait on the excruciatingly long /verbose console output. Output which cannot be redirected by the way, for a reason I haven't investigated yet.
The following does not produce the expected result - instead the text is half-printed on the console, and the dump file only contains partial information (Discarded symbols and the cmd-line):

lld-link.exe ... /verbose >dump.txt
COFF/PDB.cpp
1361 ↗(On Diff #188099)

That would skew profiles recorded for /time. I often use /time and /summary at the same time.

rnk added a comment.Feb 27 2019, 1:53 PM
In D58599#1410086, @aganea wrote:> The following does not produce the expected result - instead the text is half-printed on the console, and the dump file only contains partial information (Discarded symbols and the cmd-line):
lld-link.exe ... /verbose >dump.txt

Because, of course, we log to stderr. =p I think it's just a holdover convention from the compiler, where everything must be sent to stderr in case the user puts -o - on the command line, which would end up mixing diagnostics with binary output. This gets set here:
https://github.com/llvm/llvm-project/blob/master/lld/include/lld/Common/ErrorHandler.h#L89

You could probably change the convention for lld-link, but maybe it's best to leave ELF alone.

aganea marked an inline comment as done.Feb 27 2019, 3:14 PM
In D58599#1412659, @rnk wrote:

Because, of course, we log to stderr. =p

Thanks Reid! :) I figured it was something related to stderr, but turns out my command-line wasn't right.
One must redirect stderr to stdout after the redirection to file:

lld-link.exe ... /verbose >dump.txt 2>&1
ruiu added a comment.Feb 27 2019, 4:29 PM

Could you elaborate a bit about why you want this feature?

Instead of adding a flag, you might want to do that when /verbose is given.

Essentially to have a better understanding of where the timings go. With many different clients and many different usages, seeing the /time values does not tell me what caused those timings. /summary gives me more hints on the "what" and also allows compares between different usages. This is primarily geared towards the PDB generation, because that's what takes the most time, and because that's where I'd like to put the most efforts on.

In contrast, /verbose is more for the day-to-day users of the linker. For example if you want to understand wrong lib paths, or missing symbols.

It sounds like /summary is more like a developer option and should be hidden from the user. By hiding it, I mean the option shouldn't be in the option list in /help.

COFF/PDB.cpp
1361 ↗(On Diff #188099)

You can stop the timer and resume it after you are done with summary.

aganea marked 2 inline comments as done.Mar 12 2019, 11:10 AM

It sounds like /summary is more like a developer option and should be hidden from the user. By hiding it, I mean the option shouldn't be in the option list in /help.

It is hidden indeed as it is.

Please let me know if you find this feature useful and if you'd use it. I don't expect you to accept all of my patches, but I thought this would be useful for the other drivers too.

COFF/PDB.cpp
1361 ↗(On Diff #188099)

The main selling point was for this to be a one-liner call. However, it's only meant for high-level information, so maybe it won't be that much of an issue for timings.

Common/Summary.cpp
33 ↗(On Diff #188099)

Alternatively, I could accumulate the stats into a static string, and print it on exit. Would you consider that a better approach?

ruiu added inline comments.Mar 12 2019, 11:17 AM
COFF/PDB.cpp
1361 ↗(On Diff #188099)

That's my feeling. Printing out a few lines doesn't seem to affect too much to the statistics of /time. I still believe you can just print it out from here.

aganea marked an inline comment as done.Mar 12 2019, 11:19 AM
aganea added inline comments.
COFF/PDB.cpp
1361 ↗(On Diff #188099)

I will do that.

aganea updated this revision to Diff 190306.Mar 12 2019, 12:10 PM

Simplified as suggested.

Given that lines are printed on-the-spot now, unfortunately they can be interleaved with warnings:

                                    Summary
--------------------------------------------------------------------------------
              3 Input OBJ files (expanded from all cmd-line inputs)
lld-link: warning: Cannot use debug info for 'precomp-invalid.obj' [LNK4099]
>>> failed to load reference 'f:\svn\lld\test\coff\precomp\precomp.obj': The signature does not match; the file(s) might be out of date.
              2 Dependent PCH OBJ files
              4 Output PDB strings
              3 Output global symbol records

Is that what we want?
It wouldn't make things more complicated if we accumulated the messages, and print them on exit?

ruiu added a comment.Mar 12 2019, 12:49 PM

I guess that you could just move the code to the end of the function?

aganea updated this revision to Diff 190498.Mar 13 2019, 1:53 PM

I guess that you could just move the code to the end of the function?

I've moved printing the statistics in its own function. Does this look more acceptable?

ruiu added inline comments.Mar 13 2019, 2:03 PM
COFF/PDB.cpp
1365 ↗(On Diff #190498)

I'm sorry to be repeat saying the same thing, but can't you just print them out in this function? This way, there's no chance that some other log message is displayed between the following lines.

aganea marked an inline comment as done.EditedMar 13 2019, 2:19 PM
This comment has been deleted.
COFF/PDB.cpp
1365 ↗(On Diff #190498)

Of course, but my original intent was to provide an interface for adding statistics for the whole LLD. Meanning not just for the PDBLinker, but allowing to log stats from anywhere. Thus the original intent for buffering the values instead of printing on the spot.
We're drifting away from that, and I can very well remove the Summary class, and print everything here. Is that what you suggest?

ruiu added a comment.Mar 13 2019, 2:27 PM

Yes, I don't think we need Suymmary class. I'd really like to avoid creating a generic infrastructure too far ahead of time. If we would like to print out stuff all at once at the end of the process in the future, we'll be able to easily do that and change the existing code to print out to that output buffer instead of out().

aganea updated this revision to Diff 190609.Mar 14 2019, 6:00 AM

Yes, I don't think we need Suymmary class. I'd really like to avoid creating a generic infrastructure too far ahead of time. If we would like to print out stuff all at once at the end of the process in the future, we'll be able to easily do that and change the existing code to print out to that output buffer instead of out().

Ok, once you've mentionned outs() it made everything clear, and I agree. I'll keep that in mind for potential upgrades, if need arises.
In the meanwhile, please see the updated code.

ruiu added a comment.Mar 14 2019, 9:41 AM

Generally looking good, thank you for updating it.

COFF/Driver.cpp
1002–1003 ↗(On Diff #190609)

nit: you could directly assign, like Config->ShowSummary = Args.hasArg(OPT_summary).

COFF/PDB.cpp
1374–1375 ↗(On Diff #190609)

Printing out all values even if they are zero makes more sense to me, because having zero records is still a piece of information.

aganea updated this revision to Diff 190657.Mar 14 2019, 10:13 AM
aganea marked 2 inline comments as done.

As requested. Thanks to you for taking the time!

ruiu accepted this revision.Mar 14 2019, 10:18 AM

LGTM

This revision is now accepted and ready to land.Mar 14 2019, 10:18 AM
This revision was automatically updated to reflect the committed changes.