This is an archive of the discontinued LLVM Phabricator instance.

[Time-report ](2): Recursive timers in Clang
Needs ReviewPublic

Authored by avt77 on May 22 2018, 7:05 AM.

Details

Summary

I introduce the new infrastructure supporting recursive timers in Clang. This patch is based on D45619 [Time-report] (1) and it is not the last one in this series. The main idea is a possibilty to have "horizontal" timers instead of "vertical" ones which we have today. The patch alows to find the most critical functions (from compilation time of view) and to to analyze them.

Diff Detail

Event Timeline

avt77 created this revision.May 22 2018, 7:05 AM
mgrang added inline comments.May 22 2018, 10:51 AM
include/clang/Frontend/Utils.h
446
466

Same here.

avt77 updated this revision to Diff 148226.May 23 2018, 8:34 AM
  • switched to llvm::sort - from std::sort (tnx to mgrang)
  • added a new counter (hope it will improve the output numbers)
  • slightly changed a couple of existing counters
  • seriously improved debug/monitor logging
avt77 updated this revision to Diff 148395.May 24 2018, 5:52 AM

The sources were re-based to fit in LLVM_DEBUG rename.
One more counter was added.
Debug logging was improved again.

avt77 added a comment.May 30 2018, 2:08 AM

Hi All,
What should I do to simplify the review?
I could remove all LLVM_DEBUG related stuff; I could remove all addtional counters and leave only necessary one or two of them. As result the patch will become shorter.

Should I do all these truncations to simplify the review?
Or maybe I should add comments and/or docs in Utils.h describing the new infrastructure?
Maybe you need examples of the produced output?

avt77 updated this revision to Diff 149734.Jun 4 2018, 5:35 AM

To simplify the review I removed all LLVM_DEBUG items - now the patch is really shorter than it was before.

mgorny resigned from this revision.Jun 15 2018, 2:36 PM

Can you give an example of what the output looks like?

Please don't define multiple different classes with the same name SortClassName.

It seems like you've scattered the "start" and "stop" calls all over the place; can you put the start and stop calls in the same place, somehow?

lib/CodeGen/CodeGenAction.cpp
845

The use of getWithDecl here is weird; can you just do GlobalDecl(FD)? If not, please add a comment explaining what you're doing.

Why does it matter whether the function has a body? IIRC you should be able to mangle the name of functions without a body.

RKSimon resigned from this revision.Jun 19 2018, 2:45 AM
avt77 added a comment.Jul 10 2018, 7:21 AM

Can you give an example of what the output looks like?

First of all excuse me for such delay with my answer: I was on my vacations.
During bootstrap we could see something like here:

[ 84%] Building CXX object tools/clang/lib/StaticAnalyzer/Core/CMakeFiles/clangStaticAnalyzerCore.dir/BlockCounter.cpp.o

------------ Clang Timers: CodeGen Functions ------------

0.0080 (2) _ZN5clang7tooling20getOccurrencesOfUSRsEN4llvm8ArrayRefINSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENS1_9StringRefEPNS_4DeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBAL
N_115RenameLocFinderEE12TraverseAttrEPNS_4AttrE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE29TraverseOMPTaskyieldDirectiveEPNS_21OMPTaskyieldDirectiveEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL
N_123USRLocFindingASTVisitorEEEE16TraverseBinCommaEPNS_14BinaryOperatorEPN4llvm15SmallVectorImplINS9_14PointerIntPairIPNS_4StmtELj1EbNS9_21PointerLikeTypeTraitsISD_EENS9_18PointerIntPairInfoISD_Lj1ESF_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE21TraverseTypeAliasDeclEPNS_13TypeAliasDeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL
N_123USRLocFindingASTVisitorEEEE22TraverseObjCObjectTypeEPNS_14ObjCObjectTypeE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE12TraverseDeclEPNS_4DeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBAL
N_115RenameLocFinderEE37TraverseVarTemplateSpecializationDeclEPNS_29VarTemplateSpecializationDeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE13TraverseBinGTEPNS_14BinaryOperatorEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBAL
N_115RenameLocFinderEE31TraverseCXXPseudoDestructorExprEPNS_23CXXPseudoDestructorExprEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0120 (2) _ZN5clang7tooling25createRenameAtomicChangesEN4llvm8ArrayRefINSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENS1_9StringRefEPNS_4DeclE (*)
0.0120 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL
N_123USRLocFindingASTVisitorEEEE15TraverseTypeLocENS_7TypeLocE (*)
0.0160 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE13PostVisitStmtEPNS_4StmtE (*)
0.0200 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL
N_123USRLocFindingASTVisitorEEEE12TraverseAttrEPNS_4AttrE (*)
0.0200 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE13PostVisitStmtEPNS_4StmtE (*)
0.0400 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBAL
N_115RenameLocFinderEE16dataTraverseNodeEPNS_4StmtEPN4llvm15SmallVectorImplINS7_14PointerIntPairIS6_Lj1EbNS7_21PointerLikeTypeTraitsIS6_EENS7_18PointerIntPairInfoIS6_Lj1ESB_EEEEEE (*)
0.0440 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL__N_123USRLocFindingASTVisitorEEEE16dataTraverseNodeEPNS_4StmtEPN4llvm15SmallVectorImplINS9_14PointerIntPairIS8_Lj1EbNS9_21PointerLikeTypeTraitsIS8_EENS9_18PointerIntPairInfoIS8_Lj1ESD_EEEEEE (*)
0.0720 (1) _ZN5clang19RecursiveASTVisitor12TraverseAttrEPNS_4AttrE (*)

If we collect the full build log, grep and sort it then we could see the longest compilation times like here:

0.3240 (2) _ZN4llvm18HexagonInstPrinter16printInstructionEPKNS_6MCInstERNS_11raw_ostreamE (*)
0.4520 (2) _ZNK4llvm19AMDGPUMCCodeEmitter27verifyInstructionPredicatesERKNS_6MCInstEm (*)
0.4680 (2) _ZN12_GLOBALN_119SystemZDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.5960 (2) _ZL20validateOperandClassRN4llvm18MCParsedAsmOperandEN12_GLOBAL
N_114MatchClassKindE (*)
0.6200 (2) _ZN12_GLOBALN_115ARMDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.7280 (2) _ZN12_GLOBAL
N_115PPCDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.7920 (2) _ZN12_GLOBALN_119AArch64DAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
1.3160 (3) _ZN4llvm19HexagonDAGToDAGISel10SelectCodeEPNS_6SDNodeE (*)
1.4760 (2) _ZN12_GLOBAL
N_118AMDGPUDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
5.2520 (2) _ZN12_GLOBAL__N_115X86DAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)

It seems like you've scattered the "start" and "stop" calls all over the place; can you put the start and stop calls in the same place, somehow?

I use now startFrontendTimer/stopFronendTimer: you can find it in several files and inside RAII objects.

avt77 updated this revision to Diff 154802.Jul 10 2018, 7:26 AM

I fixed all issues raised by efriedma: GlobalDecl(FD), function body, class names, etc. Many tnx for your help.

Adding startFrontendTimer/stopFrontendTimer helps a little, but it's still difficult to match a given startFrontendTimer to the corresponding stopFrontendTimer because they're in completely different functions in some cases. Do they really need to be scattered like that? If they do, please add comments so someone reading the code can match them up.

Does getFrontendFunctionTimeCtx need to be a template, given it's always instantiated with const FunctionDecl *?

include/clang/Frontend/Utils.h
342

Why 0.00001? Should it be configurable?

403

This is merging together the times for functions with the same name? Why is that necessary?

This function could use a few comments describing what it's doing.

480

Commented-out code?

avt77 added a comment.Jul 17 2018, 1:16 AM

Adding startFrontendTimer/stopFrontendTimer helps a little, but it's still difficult to match a given startFrontendTimer to the corresponding stopFrontendTimer because they're in completely different functions in some cases. Do they really need to be scattered like that? If they do, please add comments so someone reading the code can match them up.

I'll do it.

Does getFrontendFunctionTimeCtx need to be a template, given it's always instantiated with const FunctionDecl *?

Initially there were 2 types of instantiations: std::string and const FunctionDecl *. Trying to simplify the patch (and to make it shorter) I removed string instantiations (hoping to return them back later). And I thought it could be useful to have some other types of instantiations in the future. Do you think it's redundant? I used std::string in places where we don't have FunctionDecl yet. And this counter infrastructure could be used in general with any other data types.

include/clang/Frontend/Utils.h
342

Yes, of course. I'll do it.

403

We merge times for the funcs with the same name because we'd like to get the total time spending for every function. The idea of this report is to collect all time pieces for separate functions and to show the longest times. As result we'll be able to see functions with longest compilation times and that's exactly our goal.
And, yes I'll add comments soon.

avt77 updated this revision to Diff 156064.Jul 18 2018, 7:09 AM

I added required comments and did the required changes.

avt77 added a comment.Jul 26 2018, 3:00 AM

Ping!

efriedma, do you have any other comments/requirements?

efriedma added inline comments.Jul 27 2018, 5:28 PM
lib/Sema/SemaLambda.cpp
1443

This seems sort of late? You're starting the timer after the body has already been parsed.

lib/Sema/TreeTransform.h
10969

What happens if we never hit ActOnFinishFunctionBody()? TransformLambdaExpr has an early return if the body doesn't typecheck.

More generally, given that we have early returns all over the place in Sema, I would be more comfortable using the RAII helper, rather than explicitly calling start/stop, even if that means you have to insert FrontendTimeRAII variables in half a dozen different places in the parser. (Note I'm specifically talking about the parser here; the explicit stopFrontendTimer in ~CodeGenFunction seems fine.)

avt77 added inline comments.Jul 30 2018, 12:47 AM
lib/Sema/SemaLambda.cpp
1443

Yes and no but if we decided to extend usage of RAII objects - it deos not matter. I'll change it.

lib/Sema/TreeTransform.h
10969

It seems you're right: usage of RAII objects is safely. OK, I'll change it.

avt77 updated this revision to Diff 158221.Jul 31 2018, 5:24 AM

Accordingly to efriedma suggestion I removed start/stopFrontendTimer where it's possible and inserted FrontendTimeRAII in several new places. As result the patch becomes bigger and bigger. And as another result I got output like here (on compiler bootstrap):
....
0.5920 (165) _ZSt7declvalv (*)
0.5960 (155) _ZSt7declvalv (*)
0.5960 (162) _ZSt7declvalv (*)
0.6000 (167) _ZSt7declvalv (*)
0.6040 (155) _ZSt7declvalv (*)
0.6040 (160) _ZSt7declvalv (*)
0.6040 (169) _ZSt7declvalv (*)
....
(the above is grep output from build log file)

It means we compile the same function many times inside many files and it should be optimised in some way.
Another thought: I should add "human" names to simplify the mangling names resolution.

I think there's something weird happening with your timers if they're showing half a second spent parsing or instantiating std::declval: it doesn't have a definition, anywhere (it's just a placeholder for template metaprogramming tricks).

lib/AST/Expr.cpp
544 ↗(On Diff #158221)

You shouldn't need a separate timer here; __func__ should only show up inside a function definition anyway.

lib/Analysis/AnalysisDeclContext.cpp
99

Not sure what you're trying to time here.

lib/Parse/Parser.cpp
1131 ↗(On Diff #158221)

CheckForFunctionRedefinition and MarkAsLateParsedTemplate are both pretty cheap; not sure it's worth separately timing them.

lib/Sema/SemaDecl.cpp
2992

MergeFunctionDecl only has one caller: CheckFunctionDeclaration. Probably makes more sense to time that, instead?

3588

MergeCompatibleFunctionDecls is only called by MergeFunctionDecl.

avt77 updated this revision to Diff 158542.Aug 1 2018, 8:11 AM

efriedma, I removed redundant RAII objects but I still have the following:

1.0800 (271) _ZSt7declvalv (*)
1.0840 (273) _ZSt7declvalv (*)
1.0880 (269) _ZSt7declvalv (*)
1.1000 (276) _ZSt7declvalv (*)
1.1200 (282) _ZSt7declvalv (*)
1.1360 (279) _ZSt7declvalv (*)
1.1440 (286) _ZSt7declvalv (*)
1.1760 (292) _ZSt7declvalv (*)
1.1760 (295) _ZSt7declvalv (*)
1.1800 (294) _ZSt7declvalv (*)
1.1880 (298) _ZSt7declvalv (*)
1.5960 (397) _ZSt7declvalv (*)

Every line corresponds to one unit and the number in parenthesis means the number of times we're dealing with the given function during the compilation of one unit. You say it's impossible but one time invocation of this function costs 0.0040 and 397 times invocations cost 1.588 (very close to the above number).

Do you have any ideas how to deal with all this stuff? Should I add/remove RAIIs somewhere?

"0.0040" is four milliseconds? You're probably crediting time incorrectly, somehow. Can you tell which FrontendTimeRAII the time is coming from?

avt77 added a comment.Aug 3 2018, 7:33 AM

"0.0040" is four milliseconds? You're probably crediting time incorrectly, somehow. Can you tell which FrontendTimeRAII the time is coming from?

Not sure I understand you. What do you like to know exactly? Please, clarify.

I mean, which of the callers of startFrontendTimer() is calling it with a pointer to std::declval()?

avt77 updated this revision to Diff 160997.Aug 16 2018, 4:27 AM

The ability to produce debug output for 'ftiming' was added. As result now it's possible to check places where timers start/stop and for what functions it's being done (see changes in Utils.h).

avt77 added a comment.Aug 16 2018, 4:50 AM

I mean, which of the callers of startFrontendTimer() is calling it with a pointer to std::declval()?

The attachment keeps debug output for

....../bin/clang -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name TransGCCalls.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -v -coverage-notes-file /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate/CMakeFiles/clangARCMigrate.dir/TransGCCalls.cpp.gcno -resource-dir /home/atischenko/workspaces/time-report-recursive-4/RelWithDebug/lib/clang/8.0.0 -D GTEST_HAS_RTTI=0 -D _DEBUG -D _GNU_SOURCE -D STDC_CONSTANT_MACROS -D STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -I /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate -I /home/atischenko/workspaces/bootstrap/llvm/tools/clang/lib/ARCMigrate -I /home/atischenko/workspaces/bootstrap/llvm/tools/clang/include -I /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/include -I /usr/include/libxml2 -I /home/atischenko/workspaces/bootstrap/Release-patch/include -I /home/atischenko/workspaces/bootstrap/llvm/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/atischenko/workspaces/time-report-recursive-4/RelWithDebug/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Woverloaded-virtual -Wno-nested-anon-types -pedantic -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate -ferror-limit 19 -fmessage-length 0 -fvisibility-inlines-hidden -ftime-report -fno-rtti -fobjc-runtime=gcc -fno-common -fdiagnostics-show-option -o TransGCCalls.cpp.o -x c++ /home/atischenko/workspaces/bootstrap/llvm/tools/clang/lib/ARCMigrate/TransGCCalls.cpp -faddrsig -mllvm -debug-only=ftiming

The full log is too big that's why I grepped "declval" only:

  • :start: means the timer was started
  • :stop-zero: means the timer was stopped and nothing was added to the calculated time period
  • :stop-add:<double> means the timer was stoped and the time slice was added to the time period.

There are DeclFunc* names and Tags to look for corresponding RAII objects inside sources. For example:

ActOnFunctionDeclarator:start:declval:

means

Tag:start:Name:

BTW, I see a lot of broken DeclFunc names: do you have any idea why it's possible? Example:

ActOnFunctionDeclarator:start:....._exception:

Here ...... means non-ASCII symbols.

:start: means the timer was started

In some cases, the ChildTime is already non-zero at the "start" point; what does that mean?

ActOnFunctionDeclarator:start:....._exception:

That's weird... DeclarationName::print should generally return something ASCII; do you know what kind of declaration it is?

avt77 added a comment.Aug 17 2018, 4:15 AM

:start: means the timer was started

In some cases, the ChildTime is already non-zero at the "start" point; what does that mean?

It means there was recursive call and this child time should be removed from the time of the current func (parent) otherwise it could be calculated twice.

ActOnFunctionDeclarator:start:....._exception:

That's weird... DeclarationName::print should generally return something ASCII; do you know what kind of declaration it is?

There are a lot of such names (see the attach as example.)


BTW, I use getNameAsString() (see getFirstName). I tried getIdentifier () ->getName() but it was broken rather often.

Trass3r added a subscriber: Trass3r.Dec 5 2018, 9:42 PM

Is this still in progress?

Is this still in progress?

AFAIK, this is abandoned by author now. I'm working on other implementation which is similar but with another angle of view. The implementation projected by author measures time of _compiler_ code, whereas _user_ source code compilation time looks more useful.
There are also another two differentials related to this one which may require revertion: https://reviews.llvm.org/D45619 and https://reviews.llvm.org/D43578.

russell.gallop resigned from this revision.Aug 19 2019, 6:01 AM