In particular, this CL speeds up the official Chrome linking with LTO by 1.8x.
See more details in https://crbug.com/542426
Differential D13743
Lazily sort ScopesWithImportedEntities to speed up LTO linking. krasin on Oct 14 2015, 2:02 PM. Authored by
Details In particular, this CL speeds up the official Chrome linking with LTO by 1.8x. See more details in https://crbug.com/542426
Diff Detail Event TimelineComment Actions Very cool! I'm not familiar with this code, so I shouldn't review this. Looks like dblaikie touched this not too long ago, he can probably approve this :-)
Comment Actions Please, take another look.
Comment Actions When is not all the debug info needed? & how did it end up in the IR if it wasn't needed? Comment Actions See lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:342: // Skip imported directives in gmlt-like data. if (!includeMinimalInlineScopes()) { // There is no need to emit empty lexical block DIE. for (const auto &E : DD->findImportedEntitiesForScope(DS)) Children.push_back( constructImportedEntityDIE(cast<DIImportedEntity>(E.second))); } When the official Chrome is compiled, includeMinimalInlineScopes() returns true, and DwarfDebug::findImportedEntitiesForScope is never called. bool DwarfCompileUnit::includeMinimalInlineScopes() const { return getCUNode()->getEmissionKind() == DIBuilder::LineTablesOnly || (DD->useSplitDwarf() && !Skeleton); } At some point, the official Chrome was compiled with LineTablesOnly, then it was realized that stack traces in crash reports are not complete, and I conclude that Chrome uses split Dwarf. See https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/8tKb9-Wr8gk Comment Actions Update: no, Chrome does not use the split dwarf for the official builds. I was wrong: At this time, I have no good explanation, why does not findImportedEntitiesForScope get called. I don't think it's blocking the review, since the CL does not change the logic, it only makes the sorting operation lazy. I will continue to read the code to get an answer for this question, but please take a look at the CL. It's tested: the Chrome binary is linked faster and bytewise the same as without this change. Comment Actions I have got the first results from my debug run: bool DwarfCompileUnit::includeMinimalInlineScopes() const { fprintf(stderr, "DwarfCompileUnit::includeMinimalInlineScopes, emission kind: %d\n", (int)getCUNode()->getEmissionKind()); fprintf(stderr, "useSplitDwarf: %d\n", (int)DD->useSplitDwarf()); fprintf(stderr, "Skeleton: %p\n", (void*)Skeleton); _exit(1); ... DwarfCompileUnit::includeMinimalInlineScopes, emission kind: 1 useSplitDwarf: 0 Skeleton: (nil) where emission kind: 1 means FullDebug: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/llvm/include/llvm/IR/DIBuilder.h&q=DIBuilder&sq=package:chromium&type=cs&l=71 Comment Actions Now, I have another hypothesis, which also explains the speed up: instead of sorting after each insertion into the vector, I sort them once. Comment Actions Now, I have a better explanation what happens here.
Please, take another look. Comment Actions It'd be nice if we could sort immediately after populating - but the issue What about if we sunk the data structure down into the DwarfUnit (actually Comment Actions This is correct.
I tried that, see http://reviews.llvm.org/D13918. I did't optimize it for speed, it only loses a minute comparing to the current CL anyway. Have I understood your suggestion right? Comment Actions I abandon this revision in favor of http://reviews.llvm.org/D13918 |
You might consider llvm::DenseMap<const MDNode *, llvm::SmallVector<const MDNode *, 4>> as a better data structure for this. That way, you won't need to maintain an ordering.