I introduce the new infrastructure supporting recursive timers in Clang. This patch is based on D45619 [Time-report] (1) and it is not the last one in this series. The main idea is a possibilty to have "horizontal" timers instead of "vertical" ones which we have today. The patch alows to find the most critical functions (from compilation time of view) and to to analyze them.
Diff Detail
Event Timeline
include/clang/Frontend/Utils.h | ||
---|---|---|
428 | Please use llvm::sort instead of std::sort. See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements. | |
448 | Same here. |
- switched to llvm::sort - from std::sort (tnx to mgrang)
- added a new counter (hope it will improve the output numbers)
- slightly changed a couple of existing counters
- seriously improved debug/monitor logging
The sources were re-based to fit in LLVM_DEBUG rename.
One more counter was added.
Debug logging was improved again.
Hi All,
What should I do to simplify the review?
I could remove all LLVM_DEBUG related stuff; I could remove all addtional counters and leave only necessary one or two of them. As result the patch will become shorter.
Should I do all these truncations to simplify the review?
Or maybe I should add comments and/or docs in Utils.h describing the new infrastructure?
Maybe you need examples of the produced output?
To simplify the review I removed all LLVM_DEBUG items - now the patch is really shorter than it was before.
Can you give an example of what the output looks like?
Please don't define multiple different classes with the same name SortClassName.
It seems like you've scattered the "start" and "stop" calls all over the place; can you put the start and stop calls in the same place, somehow?
lib/CodeGen/CodeGenAction.cpp | ||
---|---|---|
860 | The use of getWithDecl here is weird; can you just do GlobalDecl(FD)? If not, please add a comment explaining what you're doing. Why does it matter whether the function has a body? IIRC you should be able to mangle the name of functions without a body. |
First of all excuse me for such delay with my answer: I was on my vacations.
During bootstrap we could see something like here:
[ 84%] Building CXX object tools/clang/lib/StaticAnalyzer/Core/CMakeFiles/clangStaticAnalyzerCore.dir/BlockCounter.cpp.o
------------ Clang Timers: CodeGen Functions ------------
0.0080 (2) _ZN5clang7tooling20getOccurrencesOfUSRsEN4llvm8ArrayRefINSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENS1_9StringRefEPNS_4DeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE12TraverseAttrEPNS_4AttrE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE29TraverseOMPTaskyieldDirectiveEPNS_21OMPTaskyieldDirectiveEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE16TraverseBinCommaEPNS_14BinaryOperatorEPN4llvm15SmallVectorImplINS9_14PointerIntPairIPNS_4StmtELj1EbNS9_21PointerLikeTypeTraitsISD_EENS9_18PointerIntPairInfoISD_Lj1ESF_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE21TraverseTypeAliasDeclEPNS_13TypeAliasDeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE22TraverseObjCObjectTypeEPNS_14ObjCObjectTypeE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE12TraverseDeclEPNS_4DeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE37TraverseVarTemplateSpecializationDeclEPNS_29VarTemplateSpecializationDeclE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE13TraverseBinGTEPNS_14BinaryOperatorEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0080 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE31TraverseCXXPseudoDestructorExprEPNS_23CXXPseudoDestructorExprEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE (*)
0.0120 (2) _ZN5clang7tooling25createRenameAtomicChangesEN4llvm8ArrayRefINSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEENS1_9StringRefEPNS_4DeclE (*)
0.0120 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE15TraverseTypeLocENS_7TypeLocE (*)
0.0160 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE13PostVisitStmtEPNS_4StmtE (*)
0.0200 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE12TraverseAttrEPNS_4AttrE (*)
0.0200 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBALN_123USRLocFindingASTVisitorEEEE13PostVisitStmtEPNS_4StmtE (*)
0.0400 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling12_GLOBALN_115RenameLocFinderEE16dataTraverseNodeEPNS_4StmtEPN4llvm15SmallVectorImplINS7_14PointerIntPairIS6_Lj1EbNS7_21PointerLikeTypeTraitsIS6_EENS7_18PointerIntPairInfoIS6_Lj1ESB_EEEEEE (*)
0.0440 (2) _ZN5clang19RecursiveASTVisitorINS_7tooling22RecursiveSymbolVisitorINS1_12_GLOBAL__N_123USRLocFindingASTVisitorEEEE16dataTraverseNodeEPNS_4StmtEPN4llvm15SmallVectorImplINS9_14PointerIntPairIS8_Lj1EbNS9_21PointerLikeTypeTraitsIS8_EENS9_18PointerIntPairInfoIS8_Lj1ESD_EEEEEE (*)
0.0720 (1) _ZN5clang19RecursiveASTVisitor12TraverseAttrEPNS_4AttrE (*)
If we collect the full build log, grep and sort it then we could see the longest compilation times like here:
0.3240 (2) _ZN4llvm18HexagonInstPrinter16printInstructionEPKNS_6MCInstERNS_11raw_ostreamE (*)
0.4520 (2) _ZNK4llvm19AMDGPUMCCodeEmitter27verifyInstructionPredicatesERKNS_6MCInstEm (*)
0.4680 (2) _ZN12_GLOBALN_119SystemZDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.5960 (2) _ZL20validateOperandClassRN4llvm18MCParsedAsmOperandEN12_GLOBALN_114MatchClassKindE (*)
0.6200 (2) _ZN12_GLOBALN_115ARMDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.7280 (2) _ZN12_GLOBALN_115PPCDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
0.7920 (2) _ZN12_GLOBALN_119AArch64DAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
1.3160 (3) _ZN4llvm19HexagonDAGToDAGISel10SelectCodeEPNS_6SDNodeE (*)
1.4760 (2) _ZN12_GLOBALN_118AMDGPUDAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
5.2520 (2) _ZN12_GLOBAL__N_115X86DAGToDAGISel10SelectCodeEPN4llvm6SDNodeE (*)
It seems like you've scattered the "start" and "stop" calls all over the place; can you put the start and stop calls in the same place, somehow?
I use now startFrontendTimer/stopFronendTimer: you can find it in several files and inside RAII objects.
I fixed all issues raised by efriedma: GlobalDecl(FD), function body, class names, etc. Many tnx for your help.
Adding startFrontendTimer/stopFrontendTimer helps a little, but it's still difficult to match a given startFrontendTimer to the corresponding stopFrontendTimer because they're in completely different functions in some cases. Do they really need to be scattered like that? If they do, please add comments so someone reading the code can match them up.
Does getFrontendFunctionTimeCtx need to be a template, given it's always instantiated with const FunctionDecl *?
include/clang/Frontend/Utils.h | ||
---|---|---|
338 | Why 0.00001? Should it be configurable? | |
385 | This is merging together the times for functions with the same name? Why is that necessary? This function could use a few comments describing what it's doing. | |
462 | Commented-out code? |
I'll do it.
Does getFrontendFunctionTimeCtx need to be a template, given it's always instantiated with const FunctionDecl *?
Initially there were 2 types of instantiations: std::string and const FunctionDecl *. Trying to simplify the patch (and to make it shorter) I removed string instantiations (hoping to return them back later). And I thought it could be useful to have some other types of instantiations in the future. Do you think it's redundant? I used std::string in places where we don't have FunctionDecl yet. And this counter infrastructure could be used in general with any other data types.
include/clang/Frontend/Utils.h | ||
---|---|---|
338 | Yes, of course. I'll do it. | |
385 | We merge times for the funcs with the same name because we'd like to get the total time spending for every function. The idea of this report is to collect all time pieces for separate functions and to show the longest times. As result we'll be able to see functions with longest compilation times and that's exactly our goal. |
lib/Sema/SemaLambda.cpp | ||
---|---|---|
1440 | This seems sort of late? You're starting the timer after the body has already been parsed. | |
lib/Sema/TreeTransform.h | ||
10954 | What happens if we never hit ActOnFinishFunctionBody()? TransformLambdaExpr has an early return if the body doesn't typecheck. More generally, given that we have early returns all over the place in Sema, I would be more comfortable using the RAII helper, rather than explicitly calling start/stop, even if that means you have to insert FrontendTimeRAII variables in half a dozen different places in the parser. (Note I'm specifically talking about the parser here; the explicit stopFrontendTimer in ~CodeGenFunction seems fine.) |
Accordingly to efriedma suggestion I removed start/stopFrontendTimer where it's possible and inserted FrontendTimeRAII in several new places. As result the patch becomes bigger and bigger. And as another result I got output like here (on compiler bootstrap):
....
0.5920 (165) _ZSt7declvalv (*)
0.5960 (155) _ZSt7declvalv (*)
0.5960 (162) _ZSt7declvalv (*)
0.6000 (167) _ZSt7declvalv (*)
0.6040 (155) _ZSt7declvalv (*)
0.6040 (160) _ZSt7declvalv (*)
0.6040 (169) _ZSt7declvalv (*)
....
(the above is grep output from build log file)
It means we compile the same function many times inside many files and it should be optimised in some way.
Another thought: I should add "human" names to simplify the mangling names resolution.
I think there's something weird happening with your timers if they're showing half a second spent parsing or instantiating std::declval: it doesn't have a definition, anywhere (it's just a placeholder for template metaprogramming tricks).
lib/AST/Expr.cpp | ||
---|---|---|
544 ↗ | (On Diff #158221) | You shouldn't need a separate timer here; __func__ should only show up inside a function definition anyway. |
lib/Analysis/AnalysisDeclContext.cpp | ||
99 ↗ | (On Diff #158221) | Not sure what you're trying to time here. |
lib/Parse/Parser.cpp | ||
1131 ↗ | (On Diff #158221) | CheckForFunctionRedefinition and MarkAsLateParsedTemplate are both pretty cheap; not sure it's worth separately timing them. |
lib/Sema/SemaDecl.cpp | ||
2983 | MergeFunctionDecl only has one caller: CheckFunctionDeclaration. Probably makes more sense to time that, instead? | |
3579 | MergeCompatibleFunctionDecls is only called by MergeFunctionDecl. |
efriedma, I removed redundant RAII objects but I still have the following:
1.0800 (271) _ZSt7declvalv (*)
1.0840 (273) _ZSt7declvalv (*)
1.0880 (269) _ZSt7declvalv (*)
1.1000 (276) _ZSt7declvalv (*)
1.1200 (282) _ZSt7declvalv (*)
1.1360 (279) _ZSt7declvalv (*)
1.1440 (286) _ZSt7declvalv (*)
1.1760 (292) _ZSt7declvalv (*)
1.1760 (295) _ZSt7declvalv (*)
1.1800 (294) _ZSt7declvalv (*)
1.1880 (298) _ZSt7declvalv (*)
1.5960 (397) _ZSt7declvalv (*)
Every line corresponds to one unit and the number in parenthesis means the number of times we're dealing with the given function during the compilation of one unit. You say it's impossible but one time invocation of this function costs 0.0040 and 397 times invocations cost 1.588 (very close to the above number).
Do you have any ideas how to deal with all this stuff? Should I add/remove RAIIs somewhere?
"0.0040" is four milliseconds? You're probably crediting time incorrectly, somehow. Can you tell which FrontendTimeRAII the time is coming from?
I mean, which of the callers of startFrontendTimer() is calling it with a pointer to std::declval()?
The ability to produce debug output for 'ftiming' was added. As result now it's possible to check places where timers start/stop and for what functions it's being done (see changes in Utils.h).
The attachment keeps debug output for
....../bin/clang -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name TransGCCalls.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -v -v -coverage-notes-file /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate/CMakeFiles/clangARCMigrate.dir/TransGCCalls.cpp.gcno -resource-dir /home/atischenko/workspaces/time-report-recursive-4/RelWithDebug/lib/clang/8.0.0 -D GTEST_HAS_RTTI=0 -D _DEBUG -D _GNU_SOURCE -D STDC_CONSTANT_MACROS -D STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -I /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate -I /home/atischenko/workspaces/bootstrap/llvm/tools/clang/lib/ARCMigrate -I /home/atischenko/workspaces/bootstrap/llvm/tools/clang/include -I /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/include -I /usr/include/libxml2 -I /home/atischenko/workspaces/bootstrap/Release-patch/include -I /home/atischenko/workspaces/bootstrap/llvm/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/atischenko/workspaces/time-report-recursive-4/RelWithDebug/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Woverloaded-virtual -Wno-nested-anon-types -pedantic -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/atischenko/workspaces/bootstrap/Release-patch/tools/clang/lib/ARCMigrate -ferror-limit 19 -fmessage-length 0 -fvisibility-inlines-hidden -ftime-report -fno-rtti -fobjc-runtime=gcc -fno-common -fdiagnostics-show-option -o TransGCCalls.cpp.o -x c++ /home/atischenko/workspaces/bootstrap/llvm/tools/clang/lib/ARCMigrate/TransGCCalls.cpp -faddrsig -mllvm -debug-only=ftiming
The full log is too big that's why I grepped "declval" only:
- :start: means the timer was started
- :stop-zero: means the timer was stopped and nothing was added to the calculated time period
- :stop-add:<double> means the timer was stoped and the time slice was added to the time period.
There are DeclFunc* names and Tags to look for corresponding RAII objects inside sources. For example:
ActOnFunctionDeclarator:start:declval:
means
Tag:start:Name:
BTW, I see a lot of broken DeclFunc names: do you have any idea why it's possible? Example:
ActOnFunctionDeclarator:start:....._exception:
Here ...... means non-ASCII symbols.
:start: means the timer was started
In some cases, the ChildTime is already non-zero at the "start" point; what does that mean?
ActOnFunctionDeclarator:start:....._exception:
That's weird... DeclarationName::print should generally return something ASCII; do you know what kind of declaration it is?
It means there was recursive call and this child time should be removed from the time of the current func (parent) otherwise it could be calculated twice.
ActOnFunctionDeclarator:start:....._exception:
That's weird... DeclarationName::print should generally return something ASCII; do you know what kind of declaration it is?
There are a lot of such names (see the attach as example.)
BTW, I use getNameAsString() (see getFirstName). I tried getIdentifier () ->getName() but it was broken rather often.
AFAIK, this is abandoned by author now. I'm working on other implementation which is similar but with another angle of view. The implementation projected by author measures time of _compiler_ code, whereas _user_ source code compilation time looks more useful.
There are also another two differentials related to this one which may require revertion: https://reviews.llvm.org/D45619 and https://reviews.llvm.org/D43578.
Why 0.00001? Should it be configurable?