This is an archive of the discontinued LLVM Phabricator instance.

[lldCOFF] Print detailed timing information with /VERBOSE
ClosedPublic

Authored by zturner on Jan 10 2018, 1:13 PM.

Details

Summary

When passing /VERBOSE, lld will now print detailed timing information about the various linking phases. For example, when linking clang with /verbose, I get this output:

Timing:
  Code Layout Time:          (  1.38%)   229 ms
  Commit Output File:        (  0.09%)    15 ms
  PDB Emission (Cumulative): ( 96.03%) 15969 ms
    Globals Stream Layout:   (  1.18%)   197 ms
    Symbol Merging:          ( 26.26%)  4367 ms
    Type Merging:            ( 36.94%)  6142 ms
    TPI Stream Layout:       (  6.95%)  1155 ms
    Commit to Disk:          ( 18.65%)  3101 ms
-----------------------------------------------
  Total Link Time:           (100.00%) 16629 ms

Diff Detail

Event Timeline

zturner created this revision.Jan 10 2018, 1:13 PM
ruiu added inline comments.Jan 10 2018, 1:26 PM
lld/COFF/Config.h
113

Why do you have to store them to Config? Config is for configuration variables and not a generic place to store miscellaneous data.

lld/COFF/PDB.cpp
1017–1018

These things should be done in the RAII style.

lld/COFF/Writer.cpp
286

I'd print out the timing information as we measure, instead of printing out all the information at end.

306–325

Please avoid using formatv in lld because we do not use it in other places in lld. We don't remember the format string format.

zturner added inline comments.Jan 10 2018, 1:30 PM
lld/COFF/PDB.cpp
1017–1018

I didn't do RAII style here because we want to start and stop in the middle of functions sometimes. It seemed inconvenient to have to make a nested scope just for that.

lld/COFF/Writer.cpp
286

If we do it this way it might not all appear together. This will make it harder to parse the output, as you might have to scroll the terminal around to find the different pieces of information you're interested in.

306–325

It's pretty ugly to do any kind of nice alignment without it. I try to avoid it if there's a better way, but in this case I think it would add a lot of additional code for no real benefit.

ruiu added inline comments.Jan 10 2018, 1:31 PM
lld/COFF/PDB.cpp
1017–1018

I guess it's because the timing information is printed out along with other debug info. How about adding a new command line just for this feature?

ruiu added inline comments.Jan 10 2018, 1:33 PM
lld/COFF/PDB.cpp
1017–1018

It's worth noting that /verbose makes the linker slower because generating and emitting debug messages is sometimes not cheap. For that reason it makes sense to separate the flag.

zturner updated this revision to Diff 129342.Jan 10 2018, 2:21 PM
  • Moved the printing and collection of this information out of /VERBOSE and into a new option, /SHOWTIMING.
  • Removed a few explicit calls to Timer.end() where RAII-style end would work (the class already calls end() in its destructor anyway). A few calls to end() are still explicit, where we need to do it in the middle of a function.
  • Moved the statistics out of the Config class and into a separate class called TimingInfo, which is created in Writer and passed through to the PDB linker.
pcc added a subscriber: pcc.Jan 10 2018, 2:39 PM

Could this also include some of the earlier linking phases in Driver.cpp (I'm thinking input file reading, GC, ICF)?

In D41915#972686, @pcc wrote:

Could this also include some of the earlier linking phases in Driver.cpp (I'm thinking input file reading, GC, ICF)?

Yea, I can do that. Stay tuned.

ruiu added inline comments.Jan 10 2018, 2:56 PM
lld/COFF/Driver.cpp
845–847

For consistency:

// Handle /showtiming
if (Args.hasArg(OPT_show_timing))
  Config->ShowTiming = true;
lld/include/lld/Common/LinkTimer.h
18 ↗(On Diff #129342)

Since everything in lld namespace is related to linking, I'd name this Timer.

21 ↗(On Diff #129342)

Why don't you use nanosecond?

30 ↗(On Diff #129342)

Please write the actual type instead of auto.

31 ↗(On Diff #129342)

It is better to use chrono::high_resolution_clock, or at least chrono::steady_clock. system_clock is not guaranteed to be monotonic (so it can rewind.)

zturner updated this revision to Diff 129375.Jan 10 2018, 5:24 PM

I went to start implementing peter's suggestion, and things started getting a little interesting. A lot of the things you might want to time are behind command line options, and it would be nice if we didn't show anything when the lines weren't relevant.

At the same time, I don't like writing code and then having someone re-invent the same thing later that is slightly different. So although nobody's using this now, I wanted to do this in a way that could benefit, for example, the ELF linker or the MachO linker at some point in the future.

So I tried to generalize this a little bit. Everything is moved into Common now, and the only thing defined in COFF is a set of enums that represent the various statistics that you might want to time. You can add phases dynamically (for example, only if an option is set on the command line), and all the printing logic is raised up into common too, and it does all the work of figuring out the right set of statistics you're timing, calculating percentages, etc, so nobody else has to deal with the formatting and alignment again.

just to illustrate what I mean, currently when I link with /OPT:REF + /OPT:ICF + /DEBUG I get this:

  Input File Reading:        (  3.54%)   576 ms
  ICF:                       (  0.66%)   107 ms
  GC:                        (  1.34%)   218 ms
  Code Layout:               (  0.79%)   128 ms
  Commit Output File:        (  0.07%)    11 ms
  PDB Emission (Cumulative): ( 93.46%) 15214 ms
    Globals Stream Layout:   (  1.09%)   177 ms
    TPI Stream Layout:       (  6.71%)  1093 ms
    Type Merging:            ( 35.43%)  5767 ms
    Symbol Merging:          ( 24.43%)  3977 ms
    Commit to Disk:          ( 17.90%)  2913 ms
------------------------------------------------
Total Link Time:             (100.00%) 16278 ms

whereas if I link with only /OPT:ICF and /OPT:REF but remove /DEBUG I get this:

  Input File Reading: ( 54.80%) 542 ms
  ICF:                ( 11.93%) 118 ms
  GC:                 ( 16.38%) 162 ms
  Code Layout:        ( 13.14%) 130 ms
  Commit Output File: (  1.31%)  13 ms
---------------------------------------
Total Link Time:      (100.00%) 989 ms

I think this is a more friendly output and more useful from a usability perspective. But if I went a little overboard let me know :)

pcc added inline comments.Jan 10 2018, 6:40 PM
lld/COFF/Driver.cpp
951

I gather that the intent is to add a "ThinLTO" phase to the timing output only if we end up doing LTO, but a user is not required to pass this flag in order to use LTO. Instead, we always perform LTO if the user passes at least one bitcode input file. You can find the check for that condition as well as the code that actually does LTO in the function SymbolTable::addCombinedLTOObject().

Also, I'd call this "LTO" and not "ThinLTO" because the linker supports both full and thin LTO.

zturner updated this revision to Diff 129475.Jan 11 2018, 10:34 AM
  • Renamed LinkTimer to Timer, as suggested by Rui.
  • Moved the LTO setup into the proper place.
  • Changed from system_clock to high_resolution_clock.

I didn't change ms to nanoseconds because I find nanoseconds hard to read. It prints more than 10 digits of numbers, and it takes some mental strain to translate that back into something I can understand. If anyone feels strongly that ms is not a fine enough granularity though, let me know. Making the milliseconds a floating point might be another alternative.

zturner updated this revision to Diff 129495.Jan 11 2018, 11:45 AM

Storing the accumulated time in nanoseconds. Turns out when you store in milliseconds you can accumulate a large amount of truncation error. So I'm now accumulating nanoseconds but still printing milliseconds.

ruiu added inline comments.Jan 11 2018, 12:45 PM
lld/COFF/Driver.cpp
715

Timer::TimerPhaseRAII is too long. I'd name this ScopedTimer and name the local variable T. As long as you use T as a name for the timer, there should be no confusion. I imagine that you could use it like this:

ScopedTimer T(TotalTimer);
lld/COFF/Timing.h
17

Instead of defining one timer which contains multiple timers in it, can you define a timer for each measurement like this?

namespace lld {
namespace coff {
Timer TotalTimer("Total");
Timer FileReadTimer("Input File Reading");
...
}
}

Then you can remove this enum.

lld/Common/Timer.cpp
14

Please do not use auto.

18

Use double unless there's a reason to use float.

34

Instead of formatv, please use llvm::format which takes printf-style format string. It is more familiar (and thus easy to read) to most people who work on lld.

103

This is a bit too overdesigned at the moment. Just use some fixed width (e.g. 20 chars) as the width of the name field. I'd reduce the amount of code as much as possible.

lld/include/lld/Common/Timer.h
56

I don't think there's a valid use case for this constructor.

zturner added inline comments.Jan 11 2018, 1:07 PM
lld/COFF/Timing.h
17

The whole idea is that a single class aggregates all of the timers so that at the end, there is one print function that handles all the formatting and alignment. Also, how would nested timers work? For example, I want to have a timer for "PDB Link Time" but then subtimers for the various phases of PDB linking, and those should be indented under the main timer to make it easier to understand the results. See some of the sample outputs I posted earlier, for example.

You could make it all work by having Timer contain a TimerManager&, and then writing something like:

namespace lld {
namespace coff {
TimerManager Timers;
Timer TotalTimer("Total", Timers);
Timer FileReadTimer("Input File Reading", Timers);
...
}
}

but this looks like a worse API to me. Finally, there is the issue of dynamic timers. If someone doesn't pass /DEBUG on the command line, there is no reason to have a bunch of output related to PDB linking phases. So this is why there is the addPhase function, which will add a timer dynamically.

I think this gives the best output.

lld/Common/Timer.cpp
34

Can you show me a printf line that prints N spaces, followed by a string, followed by M spaces, followed by a some right aligned text? I honestly don't know how to do this with printf.

That said, I don't know why we should be scared of or avoid using standard LLVM support types and classes. It's about the same as a python format string, and once you do understand it, it's much easier (not to mention safer) than printf format string. printf format strings are also the source of countless portability bugs.

The code here doesn't really use anything fancy, and it's mostly self explanatory since instead of embedding all the formatting in the format string, I used the adapters like fmt_align which spell out exactly what they're doing, so familiarity shouldn't be an issue.

103

I have a separate patch offline which adds some more timers and which goes past 20. So this would cause it to not be aligned. We can make it 40, or 60, but then it will look ugly in most cases. This is all code hidden behind an option that you have to explicitly specify, so is there any disadvantage to just doing it "correctly"?

ruiu added inline comments.Jan 11 2018, 1:53 PM
lld/COFF/Timing.h
17

We can organize timer "trees" by passing "parent" timer to child timers like this:

Timer TotalTimer("Total");
Timer FileReadTimer("Input File Reading", &TotalTimer);

And then make timer's "print" function print out its timing information as well as its children's timing information. Then all you have to do at end is to call the "print" function on TotalTimer which recursively prints out all timer info.

lld/Common/Timer.cpp
34

You may be more familiar with this format string than printf-style format string, but that's not true for many other people. And honestly I don't think people would want to spend time on understanding the new format string that occur only a few times in lld (even for making sure that that's the same as Python's format string.) lld is part of the LLVM project, but at the same time, this is a project that people who are not working on LLVM/Clang are interested in, and I don't want to use an LLVM-specific utility if the more language-standard-ish one works sufficiently.

As to how to write it in printf, how about this?

$ printf '% 20s: (% .2f%%) % 5d ms\n' 'Code layout time' 1.379 229
Code layout time: ( 1.38%)   229 ms
103

The disadvantage is the amount of code and time to understand how it works. Please keep it simple even if it looks a bit stupid. In addition to that, calculating the width doesn't create a desired output. If some timer is 80 columns long, you'll get something like

Timing:
  Code Layout Time:                                                                            (  1.38%)   229 ms
  Commit Output File:                                                                          (  0.09%)    15 ms
  Very loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong name: (  0.00%)     0 ms
  PDB Emission (Cumulative):                                                                   ( 96.03%) 15969 ms
    Globals Stream Layout:                                                                     (  1.18%)   197 ms
    Symbol Merging:                                                                            ( 26.26%)  4367 ms
    Type Merging:                                                                              ( 36.94%)  6142 ms
    TPI Stream Layout:                                                                         (  6.95%)  1155 ms
    Commit to Disk:                                                   ( 18.65%)  3101 ms
-----------------------------------------------
  Total Link Time:           (100.00%) 16629 ms

I think what we want to do is to insert a reasonable number of space between ':' and '(' so it doesn't wrap around for most lines. We should format this assuming that the terminal size is 80 columns, so variable width is not really desirable.

Timing:
  Code Layout Time:             (  1.38%)   229 ms
  Commit Output File:           (  0.09%)    15 ms
Very loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong name: (  0.00%)     0 ms
  PDB Emission (Cumulative):    ( 96.03%) 15969 ms
    Globals Stream Layout:      (  1.18%)   197 ms
    Symbol Merging:             ( 26.26%)  4367 ms
    Type Merging:               ( 36.94%)  6142 ms
    TPI Stream Layout:          (  6.95%)  1155 ms
    Commit to Disk:             ( 18.65%)  3101 ms
-----------------------------------------------
  Total Link Time:           (100.00%) 16629 ms
zturner updated this revision to Diff 129557.Jan 11 2018, 4:37 PM

Fixes suggested by rui.

ruiu added inline comments.Jan 11 2018, 9:41 PM
lld/COFF/Driver.cpp
51

Can you remove addChildTimer if you pass a parent timer to the constructor?

lld/COFF/PDB.cpp
891

I found T1, T2, ... (which you did in Writer.cpp) is better than T, U, V.

lld/Common/Timer.cpp
9

High resolution timer is very cheap on modern processors (I believe it is compiled to RSTSC instruction on x86, for example), so I think we should always enable this and just don't print out the final result if Config->ShowTiming is false.

63

I believe this can be written without repeat:

Stream << std::string('-', 49);
81–88
std::string S = (std::string(' ', Depth * 2) + Name).str();
format("% 30s (%6.2f%%) %5d ms", S, 100 * millis() / TotalDuration, (int)millis());

is perhaps a bit easier to read?

lld/include/lld/Common/Timer.h
36–42

We usually write private members after public members.

zturner added inline comments.Jan 11 2018, 10:09 PM
lld/COFF/Driver.cpp
51

No because we don't want every timer to be enabled always. Only if a certain branch of code is run (for example we only want to add PDB timers if /DEBUG is specified)

rnk added a subscriber: rnk.Jan 12 2018, 10:33 AM

I'd suggest putting this under MSVC's /TIME flag: https://cs.chromium.org/chromium/src/build/config/win/BUILD.gn?type=cs&q=win_linker_timing&sq=package:chromium&l=281

It would then naturally work in Chrome builds that set the win_linker_timing=true gn args.

zturner updated this revision to Diff 129699.Jan 12 2018, 1:27 PM

More fixes as suggested by rui

zturner added inline comments.Jan 12 2018, 1:28 PM
lld/Common/Timer.cpp
81–88

This isn't quite the same. printf right justifies %30s, I want it left justified. I also don't want it left justified to 30, because the indentation would have changed that. I think it's easier to just explicitly pad on the left and pad on the right.

ruiu added inline comments.Jan 12 2018, 2:18 PM
lld/COFF/Driver.cpp
51

But I don't think we need a separate flag. It seems you can show timing information only when a timer has non-zero time duration (which means the timer has activated and deactivated).

lld/COFF/PDB.cpp
994

T1 and T2?

lld/Common/Timer.cpp
81–88
std::string S = (std::string(' ', Depth * 2) + Name).str();

indents the string, no?

zturner added inline comments.Jan 12 2018, 2:26 PM
lld/COFF/Driver.cpp
51

That's an option, but would there be issues if we use a "Parent" that is defined in another translation unit. The child will call parent->Children.add(*this). Can we guarantee that the Parent has been constructed at that point? I always forget when it's safe for one global variable to use another. If we have to introduce ManagedStatic or something then I think it just makes the code more complicated.

lld/Common/Timer.cpp
81–88

Yes, but then the %30s is no longer correct, because some of the 30 has been consumed by the Depth*2 characters on the left. The point is that after indenting, the field width is no longer a constant. That said, I think this code is pretty simple already no?

ruiu added inline comments.Jan 12 2018, 2:38 PM
lld/COFF/Driver.cpp
51

Good point, but you could add a timer to its parent not when the constructor is executed but when it's start()'ed.

lld/Common/Timer.cpp
81–88

Aah, sorry I misunderstood the issue. Can't you use '% -30s'? It left-justifies the string.

zturner added inline comments.Jan 12 2018, 2:45 PM
lld/COFF/Driver.cpp
51

We might start and stop it many times though (we already do this in PDB.cpp with TypeMergingTimer, for example). So if we do it that way we'd need to make sure it hasn't already been added, at which point we have to use a map of some kind instead of a simple vector.

lld/Common/Timer.cpp
81–88

If we append the spaces first to construct a temporary string then that should work, but that results in some extra copies. I don't think it would result in shorter or fewer lines of code either.

ruiu added inline comments.Jan 12 2018, 2:49 PM
lld/COFF/Driver.cpp
51

I don't think we need a map. A timer can add itself to a parent timer first time start() is called -- which is when the timer duration is zero.

lld/Common/Timer.cpp
81–88

Creating extra copies is fine -- it's very small amount of data, and

std::string S = (std::string(' ', Depth * 2) + Name + ":").str();
Stream << format("% -30s (%6.2f%%) %5d ms", S, 100 * millis() / TotalDuration, (int)millis());

is apparently shorter than 8 lines of code.

zturner added inline comments.Jan 12 2018, 2:57 PM
lld/COFF/Driver.cpp
51

Currently a timer does not store a pointer to its parent. In order to do this it would have to store that. Should I make that change?

ruiu added inline comments.Jan 12 2018, 3:13 PM
lld/COFF/Driver.cpp
51

Good point, and I'm almost convinced, but I'd think we still want it. The current API is easy to misuse. I think you forgot to call addChildTimer for LTOTimer.

zturner updated this revision to Diff 129721.Jan 12 2018, 3:48 PM

Fix remaining issues.

ruiu added a comment.Jan 12 2018, 4:03 PM

Overall looking pretty good.

lld/COFF/Driver.cpp
51

Ah, this is a very nice way to force some initialization order. Thank you for doing this.

lld/include/lld/Common/Timer.h
36–42

Please move private members after public members.

38

I'd use std::vector when we do not have to optimize for speed.

42

This constructor is used only by root(), so I'd make this private to prevent misuse of this constructor.

45

It looks like ScopedTimer accesses only public functions of this class.

ruiu added a comment.Jan 12 2018, 5:08 PM

I tried to run this patch and found that printing out the timer tree as a tree like this

Total Link Time:              (100.00%)  6470 ms
  Input File Reading:         (  0.00%)     0 ms
  ICF:                        (  0.00%)     0 ms
  GC:                         (  0.00%)     0 ms
  LTO:                        (  0.00%)     0 ms
  Code Layout:                (  0.00%)     0 ms
  Commit Output File:         (  0.00%)     0 ms
  PDB Emission (Cumulative):  (  0.00%)     0 ms
    Add Objects:              (  0.00%)     0 ms
      Type Merging:           (  0.00%)     0 ms
      Symbol Merging:         (  0.00%)     0 ms
  ...

is more natural than the current format

  Input File Reading:         (  0.00%)     0 ms
  ICF:                        (  0.00%)     0 ms
  GC:                         (  0.00%)     0 ms
  LTO:                        (  0.00%)     0 ms
  Code Layout:                (  0.00%)     0 ms
  Commit Output File:         (  0.00%)     0 ms
  PDB Emission (Cumulative):  (  0.00%)     0 ms
    Add Objects:              (  0.00%)     0 ms
      Type Merging:           (  0.00%)     0 ms
      Symbol Merging:         (  0.00%)     0 ms
  ...
-------------------------------------------------
Total Link Time:              (100.00%)  6470 ms

which handles only the root timer in a special way because the former represents a tree as-is. Do you mind if I ask you to change the format?

ruiu added a comment.Jan 12 2018, 5:37 PM

I didn't notice until I actually tried to use this, but showing only the total in the reverse order actually confused me about how to interpret the "Total" after the separator "-------------". Turned out that the the last line after the hyphen line is a continuation of the previous lines, but that wasn't very intuitive to me.

Also, this order looks a bit odd to me

PDB Emission (Cumulative): ( 96.03%) 15969 ms

because parenthesized number describes not the following but the preceding number. I feel

PDB Emission (Cumulative): 15969 ms ( 96.03%)

is more natural.

The other point I noticed is that we probably need only one digit after the decimal point for the percentage number. We truncate duration to milliseconds, and compared to that, showing two digits after the decimal point is disproportionally precise for the percentage.

zturner updated this revision to Diff 129996.Jan 16 2018, 10:46 AM

Reversed the order of the time and the percentage, and formatted the percentage with only 1 decimal.

ruiu added a comment.Jan 16 2018, 1:18 PM

LGTM with these changes.

Please update the commit message as it no longer depends on /VERBOSE.

lld/include/lld/Common/Timer.h
36

Move private members after public members.

38

Please use std::vector.

42

Make this private.

45

Please remove friend. I believe this file compiles without it.

ruiu accepted this revision.Jan 16 2018, 1:19 PM
This revision is now accepted and ready to land.Jan 16 2018, 1:19 PM
This revision was automatically updated to reflect the committed changes.
ruiu added a comment.Jan 17 2018, 1:07 PM

It seems it is still showing the total timer at the end. Can you fix it?

takuto.ikuta added inline comments.
Common/Timer.cpp
1 ↗(On Diff #130227)

Do you forget adding LICENSE statement in this file?