This is an archive of the discontinued LLVM Phabricator instance.

[time-report] Add preprocessor timer
AbandonedPublic

Authored by modocache on Aug 8 2017, 3:25 PM.

Details

Reviewers
vsk
Summary

gcc -ftime-report provides a breakdown of how much time GCC spends
doing preprocessing, parsing, template instantiation, and more:

g++ -ftime-report foo.cpp
Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1414 kB ( 8%) ggc
 phase parsing           :   0.10 (100%) usr   0.08 (100%) sys   0.18 (95%) wall   15955 kB (88%) ggc
 phase opt and generate  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall     489 kB ( 3%) ggc
 |name lookup            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 (21%) wall    1054 kB ( 6%) ggc
 preprocessing           :   0.02 (20%) usr   0.01 (13%) sys   0.03 (16%) wall     896 kB ( 5%) ggc
 parser (global)         :   0.00 ( 0%) usr   0.03 (38%) sys   0.04 (21%) wall    6603 kB (37%) ggc
 parser struct body      :   0.04 (40%) usr   0.00 ( 0%) sys   0.04 (21%) wall    2791 kB (15%) ggc
 parser enumerator list  :   0.01 (10%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      44 kB ( 0%) ggc
 parser function body    :   0.00 ( 0%) usr   0.02 (25%) sys   0.02 (11%) wall    1047 kB ( 6%) ggc
 parser inl. meth. body  :   0.01 (10%) usr   0.01 (13%) sys   0.03 (16%) wall    1093 kB ( 6%) ggc
 template instantiation  :   0.02 (20%) usr   0.01 (12%) sys   0.02 (11%) wall    3280 kB (18%) ggc
 LRA non-specific        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 5%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.10             0.08             0.19              18028 kB

clang -ftime-report gives an incredibly detailed breakdown of how long
each LLVM pass takes, but when it comes to front-end times, only
provides the generic "Clang front-end timer", or "Code Generation Time".
Here's an example of its output:
https://gist.github.com/modocache/d74833818107ed50d11387a5a4e3fb72

As a result, when attempting to diagnose slow compile times with Clang,
users are forced to use external profiling tools in order to determine
whether the bottleneck is in parsing, template instantiation, or LLVM.

This diff adds the first of several timers that aim to match
gcc -ftime-report: a timer that outputs the amount of time spent in the Clang
preprocessor.

Event Timeline

modocache created this revision.Aug 8 2017, 3:25 PM

Thanks for working on this. Collecting better timing information in the frontend sgtm. It's cheap to do, and we can use the information to guide our efforts re: attacking the compile-time problem. Feel free to add me to future timing-related reviews. Regarding this specific patch:

Could you add a short test (perhaps in test/Misc?) that checks that -ftime-report prints out an entry for 'Preprocessing'?

It'd be nice to dump this timer from Preprocessor::PrintStats(), too.

lib/Lex/Preprocessor.cpp
661

I wonder whether this is too fine-grained. I think setting up a timer in Preprocessor::Lex() might capture more information. Would you mind experimenting with that?

661

Nitpick: it may be useful to add PPOpts::getTimer(), in case we find more sites where we need to either get back the PP timer or nullptr.

modocache planned changes to this revision.Aug 10 2017, 7:44 AM
modocache added a reviewer: vsk.

Thanks for the feedback, @vsk, I really appreciate it! I have some other work done for this on my local checkout, but I was going a little bonkers working on it without knowing whether people would want it merged or not. I'll update this with your feedback and upload the rest for review as well.

It'd be nice to dump this timer from Preprocessor::PrintStats(), too.

Good idea, will do!

lib/Lex/Preprocessor.cpp
661

I think setting up a timer in Preprocessor::Lex() might capture more information.

Yes, can do! As it happens, the timers in this current diff cover all of the event counters that are incremented and then printed in Preprocessor::PrintStats(), except two: NumTokenPaste and NumFastTokenPaste. Beginning the timer in the Lexer would allow me to measure the time it takes to do token pasting as well, so I think this is a great idea. Thanks!

it may be useful to add PPOpts::getTimer()

Agreed, will do!

FWIW, I strongly support the idea of adding more detailed timers into the frontend. Thanks for working on it!
I probably won't be very helpful in reviewing this code, but I'd appreciate if you CC me in the future patches.

Thanks,
Michael

modocache updated this revision to Diff 111195.Aug 15 2017, 9:54 AM
modocache retitled this revision from [RFC][time-report] Add preprocessor timer to [time-report] Add preprocessor timer.
modocache edited the summary of this revision. (Show Details)
modocache removed subscribers: vsk, mzolotukhin.

Add PreprocessorOptions::getTimer, and move the timer to the top of Preprocessor::Lex().

Oops, sorry, didn't mean to remove the subscribers. arc diff --verbatim strikes again.

modocache marked 3 inline comments as done.Aug 15 2017, 10:29 AM

Friendly ping! I think this is ready to be reviewed. It adds an additional row, Preprocessing, to the Miscellaneous Ungrouped Timers section of the clang -ftime-report output:

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0360 ( 17.8%)   0.0060 ( 35.3%)   0.0420 ( 19.2%)   0.3440 ( 50.6%)  Code Generation Time
   0.0390 ( 19.3%)   0.0050 ( 29.4%)   0.0440 ( 20.1%)   0.1734 ( 25.5%)  LLVM IR Generation Time
   0.1270 ( 62.9%)   0.0060 ( 35.3%)   0.1330 ( 60.7%)   0.1622 ( 23.9%)  Preprocessing
   0.2020 (100.0%)   0.0170 (100.0%)   0.2190 (100.0%)   0.6796 (100.0%)  Total

Grouping these timers is something I'd like to explore in future commits.

vsk accepted this revision.Aug 16 2017, 11:59 AM

LGTM.

This revision is now accepted and ready to land.Aug 16 2017, 11:59 AM

This looks really usefull, thanks for working on this!

lib/Lex/Preprocessor.cpp
746

Doesn't this just start a timer and immediately end the timer? Shouldn't we do: llvm::TimeRegion LexTime(PPOpts->getTimer()) so that the dtor gets called when this function returns and we track the time spent in this function?

Also: this is a pretty hot function, and it looks like TimeRegion does some non-trivial work if time is being tracked. Have you tried testing this on a big c++ file with and without this patch and seeing what the difference in compile time looks like?

modocache planned changes to this revision.Aug 16 2017, 2:02 PM
modocache added inline comments.
lib/Lex/Preprocessor.cpp
746

Ah, yes you're right! Sorry about that. Actually keeping the timer alive for the duration of the method also reveals that the method is called recursively, which causes an assert, because timers can't be started twice.

Another spot in Clang works around this with a "reference counted" timer: https://github.com/llvm-mirror/clang/blob/6ac9c51ede0a50cca13dd4ac03562c036f7a3f48/lib/CodeGen/CodeGenAction.cpp#L130-L134. I have a more generic version of this "reference counting timer" that I've been using for some of the other timers I've been adding; maybe I'll use it here as well.

eduardo-elizondo added inline comments.
include/clang/Lex/PreprocessorOptions.h
165

Should this be named "Lexing Time" or "Lexing" instead of "Preprocessing"?

modocache added inline comments.Aug 19 2017, 1:15 PM
include/clang/Lex/PreprocessorOptions.h
165

Good idea! Now that the timer's being started up in the Preprocessor::Lex method, it probably should be named "Lexing". Alternatively, I could move this into, say, Lexer::Lex. I guess there's not much of a distinction in Clang between "lexing" and "preprocessing."

I had originally picked this name because that's what appears in gcc -ftime-report, but I guess we don't need to keep the names the same.

MatzeB added a subscriber: MatzeB.Aug 22 2017, 1:13 PM
MatzeB added inline comments.
lib/Lex/Preprocessor.cpp
746

FWIF: I share Eriks concerns about compiletime. Timers are enabled in optimized builds, and querying them is not free. So putting one into a function that is called a lot and is time critical seems like a bad idea (do benchmarking to prove or disprove this!).

vsk added inline comments.Aug 23 2017, 10:19 AM
lib/Lex/Preprocessor.cpp
746

The timer is not started or queried unless -ftime-report is enabled. In the common case the overhead amounts to one extra null-check. And when we're collecting timing information, some performance degradation (say, within 5%) should be acceptable. I agree that we should get a sense for what the overhead is, but am not convinced that this should be a blocking issue.

avt77 added a subscriber: avt77.Feb 14 2018, 5:06 AM
avt77 added inline comments.Feb 14 2018, 5:40 AM
include/clang/Lex/PreprocessorOptions.h
165

Preprocessing does not means lexing. It includes much more, for example, file inclusion, include pathes elaboration, macros expansion, etc. We could merge all these parts or we could output the info for evey single piece. In any case I think we need a group of timers related to frontend (or maybe several groups: for pp, parser, Sema, codegen...). And maybe we need special switches to disable/enable the groups (-ftime-report could be used as some super switch for all of them). As result we need some repository of the timers and most probably this repository should be kept in PreprocessorOpts.

If there are no objections I can take this job on me and come up with new more general version of the patch. If it's OK then I'd like to collect the proposes about the required details of the time profile: what exactly we'd like to see in output? We have gcc output as an initial example but maybe we need more (or less) details?

lib/Lex/Preprocessor.cpp
746

Here we have more than "one extra null-check": we're dealing with constructor/destructor as well (and it's for every Lex()). I agree that it should be acceptable for us but we have to profile it (maybe we should not profile Lex() but something above it).

modocache abandoned this revision.Feb 14 2018, 7:40 AM
modocache added inline comments.
include/clang/Lex/PreprocessorOptions.h
165

If there are no objections I can take this job on me and come up with new more general version of the patch. If it's OK then I'd like to collect the proposes about the required details of the time profile: what exactly we'd like to see in output? We have gcc output as an initial example but maybe we need more (or less) details?

Yes, absolutely! This sounds fantastic, thank you for doing this work.

I'll abandon this diff so it doesn't clog up anyone else's review queues. Thanks again!