This is an archive of the discontinued LLVM Phabricator instance.

Very cool! I'm not familiar with this code, so I shouldn't review this. Looks like dblaikie touched this not too long ago, he can probably approve this :-)

pcc added inline comments.Oct 15 2015, 5:40 AM

lib/CodeGen/AsmPrinter/DwarfDebug.h
313–314	You might consider `llvm::DenseMap<const MDNode , llvm::SmallVector<const MDNode , 4>>` as a better data structure for this. That way, you won't need to maintain an ordering.

Please, take another look.

lib/CodeGen/AsmPrinter/DwarfDebug.h
313–314	If tried that. The build time is 5 minutes larger than the code I currently have. Your proposal might be better for the case, when all debug info is actually needed, but in this case (given the importance of having this part as fast as possible), the right approach would be to store these entries in a vector, and then create a DenseMap, if needed. That would also be faster overall, since the number of buckets for the DenseMap will be known. I propose to go with the current patch, as it provides the largest speed gains.

When is not all the debug info needed? & how did it end up in the IR if it wasn't needed?

In D13743#269169, @dblaikie wrote:

When is not all the debug info needed? & how did it end up in the IR if it wasn't needed?

See lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:342:
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp&q=lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:342&sq=package:chromium&type=cs&l=342

// Skip imported directives in gmlt-like data.                                                                                                                                                                
if (!includeMinimalInlineScopes()) {
  // There is no need to emit empty lexical block DIE.
  for (const auto &E : DD->findImportedEntitiesForScope(DS))
     Children.push_back(
         constructImportedEntityDIE(cast<DIImportedEntity>(E.second)));
}

When the official Chrome is compiled, includeMinimalInlineScopes() returns true, and DwarfDebug::findImportedEntitiesForScope is never called.
It's defined in lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:823:
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp&q=includeMinimalInlineScopes%20file:%5Esrc/third_party/llvm/&sq=package:chromium&type=cs&l=823

bool DwarfCompileUnit::includeMinimalInlineScopes() const {
  return getCUNode()->getEmissionKind() == DIBuilder::LineTablesOnly ||
         (DD->useSplitDwarf() && !Skeleton);
}

At some point, the official Chrome was compiled with LineTablesOnly, then it was realized that stack traces in crash reports are not complete, and I conclude that Chrome uses split Dwarf. See https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/8tKb9-Wr8gk

Update: no, Chrome does not use the split dwarf for the official builds. I was wrong:

https://code.google.com/p/chromium/codesearch#chromium/src/build/common.gypi&q=gsplit-dwarf&sq=package:chromium&type=cs&l=3804

At this time, I have no good explanation, why does not findImportedEntitiesForScope get called. I don't think it's blocking the review, since the CL does not change the logic, it only makes the sorting operation lazy.

I will continue to read the code to get an answer for this question, but please take a look at the CL. It's tested: the Chrome binary is linked faster and bytewise the same as without this change.

I have got the first results from my debug run:

bool DwarfCompileUnit::includeMinimalInlineScopes() const {
  fprintf(stderr, "DwarfCompileUnit::includeMinimalInlineScopes, emission kind: %d\n", (int)getCUNode()->getEmissionKind());
  fprintf(stderr, "useSplitDwarf: %d\n", (int)DD->useSplitDwarf());
  fprintf(stderr, "Skeleton: %p\n", (void*)Skeleton);
  _exit(1);
...

DwarfCompileUnit::includeMinimalInlineScopes, emission kind: 1
useSplitDwarf: 0
Skeleton: (nil)

where emission kind: 1 means FullDebug: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/llvm/include/llvm/IR/DIBuilder.h&q=DIBuilder&sq=package:chromium&type=cs&l=71

Now, I have another hypothesis, which also explains the speed up: instead of sorting after each insertion into the vector, I sort them once.
If that's true, the CL could be slightly simplified. I have started a new Chrome linking process which should provide us with this info in about 15 minutes.

sync

Now, I have a better explanation what happens here.

As in trunk, stable_sort is called many times, requiring to do a lot of unnecessary work after each small update of the vector. The memory gets shuffled, the cache gets polluted, the time is wasted.

This CL only sorts, when it's really needed. While sorting puts the contents of the vector into the cache (thus evicting other useful info), these contents are immediately used by the caller of findImportedEntitiesForScope, so we actually warm up the cache!

I tried to place the sort invocation right after the loop in DwarfDebug::constructAndAddImportedEntityDIE, but then I realized that it's not correct, as findImportedEntitiesForScope could be called from within the loop. Also, that's actually slower than the current CL by 2 minutes.

Please, take another look.

krasin updated this object.Oct 16 2015, 4:00 PM

David, friendly ping. :)

It'd be nice if we could sort immediately after populating - but the issue
you've identified is that we use it within the body of the for(CUs) loop
after we've added them for one CU?

What about if we sunk the data structure down into the DwarfUnit (actually
we should be able to sink it into DwarfCompileUnit - these specific scopes
should never appear in DwarfTypeUnits)? Then we would reduce the size of
the data structure to search through (since each query should know which
unit it's for - effectively bucketizing it) & be able to sort immediately
after populating, since it'll be "populate the map for this CU, sort it,
then never touch it again"?

In D13743#270549, @dblaikie wrote:

It'd be nice if we could sort immediately after populating - but the issue
you've identified is that we use it within the body of the for(CUs) loop
after we've added them for one CU?

This is correct.

What about if we sunk the data structure down into the DwarfUnit (actually
we should be able to sink it into DwarfCompileUnit - these specific scopes
should never appear in DwarfTypeUnits)? Then we would reduce the size of
the data structure to search through (since each query should know which
unit it's for - effectively bucketizing it) & be able to sort immediately
after populating, since it'll be "populate the map for this CU, sort it,
then never touch it again"?

I tried that, see http://reviews.llvm.org/D13918. I did't optimize it for speed, it only loses a minute comparing to the current CL anyway.
The problem is that the resultant binary is different. My understanding is that it could not find some of the imported entries for the given scope, because they could end up in a different DwarfCompileUnit.

Have I understood your suggestion right?

I abandon this revision in favor of http://reviews.llvm.org/D13918
I have moved the collection into DwarfCompilationUnit, as it's the only user for the data.
I also use a DenseMap, as proposed by Peter.

Revision Contents

Path

Size

lib/

CodeGen/

AsmPrinter/

DwarfDebug.h

8 lines

DwarfDebug.cpp

24 lines

Diff 37650

lib/CodeGen/AsmPrinter/DwarfDebug.h

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	class DwarfDebug : public AsmPrinterHandler {
DwarfFile InfoHolder;		DwarfFile InfoHolder;

/// Holders for the various debug information flags that we might need to		/// Holders for the various debug information flags that we might need to
/// have exposed. See accessor functions below for description.		/// have exposed. See accessor functions below for description.

/// Holder for imported entities.		/// Holder for imported entities.
typedef SmallVector<std::pair<const MDNode , const MDNode >, 32>		typedef SmallVector<std::pair<const MDNode , const MDNode >, 32>
ImportedEntityMap;		ImportedEntityMap;
ImportedEntityMap ScopesWithImportedEntities;		ImportedEntityMap ScopesWithImportedEntities;
		bool IsScopesWithImportedEntitiesSorted;
		pccUnsubmitted Not Done Reply Inline Actions You might consider `llvm::DenseMap<const MDNode , llvm::SmallVector<const MDNode , 4>>` as a better data structure for this. That way, you won't need to maintain an ordering. pcc: You might consider `llvm::DenseMap<const MDNode , llvm::SmallVector<const MDNode , 4>>` as a…
		krasinAuthorUnsubmitted Not Done Reply Inline Actions If tried that. The build time is 5 minutes larger than the code I currently have. Your proposal might be better for the case, when all debug info is actually needed, but in this case (given the importance of having this part as fast as possible), the right approach would be to store these entries in a vector, and then create a DenseMap, if needed. That would also be faster overall, since the number of buckets for the DenseMap will be known. I propose to go with the current patch, as it provides the largest speed gains. krasin: If tried that. The build time is 5 minutes larger than the code I currently have. Your proposal…

/// Map from MDNodes for user-defined types to the type units that		/// Map from MDNodes for user-defined types to the type units that
/// describe them.		/// describe them.
DenseMap<const MDNode , const DwarfTypeUnit > DwarfTypeUnits;		DenseMap<const MDNode , const DwarfTypeUnit > DwarfTypeUnits;

SmallVector<		SmallVector<
std::pair<std::unique_ptr<DwarfTypeUnit>, const DICompositeType *>, 1>		std::pair<std::unique_ptr<DwarfTypeUnit>, const DICompositeType *>, 1>
TypeUnitsUnderConstruction;		TypeUnitsUnderConstruction;
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	public:

void addAccelNamespace(StringRef Name, const DIE &Die);		void addAccelNamespace(StringRef Name, const DIE &Die);

void addAccelType(StringRef Name, const DIE &Die, char Flags);		void addAccelType(StringRef Name, const DIE &Die, char Flags);

const MachineFunction *getCurrentFunction() const { return CurFn; }		const MachineFunction *getCurrentFunction() const { return CurFn; }

iterator_range<ImportedEntityMap::const_iterator>		iterator_range<ImportedEntityMap::const_iterator>
findImportedEntitiesForScope(const MDNode *Scope) const {		findImportedEntitiesForScope(const MDNode *Scope);
return make_range(std::equal_range(
ScopesWithImportedEntities.begin(), ScopesWithImportedEntities.end(),
std::pair<const MDNode , const MDNode >(Scope, nullptr),
less_first()));
}

/// A helper function to check whether the DIE for a given Scope is		/// A helper function to check whether the DIE for a given Scope is
/// going to be null.		/// going to be null.
bool isLexicalScopeDIENull(LexicalScope *Scope);		bool isLexicalScopeDIENull(LexicalScope *Scope);

/// Return Label preceding the instruction.		/// Return Label preceding the instruction.
MCSymbol getLabelBeforeInsn(const MachineInstr MI);		MCSymbol getLabelBeforeInsn(const MachineInstr MI);

Show All 12 Lines

lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Show First 20 Lines • Show All 497 Lines • ▼ Show 20 Lines	void DwarfDebug::beginModule() {

SingleCU = CU_Nodes->getNumOperands() == 1;		SingleCU = CU_Nodes->getNumOperands() == 1;

for (MDNode *N : CU_Nodes->operands()) {		for (MDNode *N : CU_Nodes->operands()) {
auto *CUNode = cast<DICompileUnit>(N);		auto *CUNode = cast<DICompileUnit>(N);
DwarfCompileUnit &CU = constructDwarfCompileUnit(CUNode);		DwarfCompileUnit &CU = constructDwarfCompileUnit(CUNode);
for (auto *IE : CUNode->getImportedEntities())		for (auto *IE : CUNode->getImportedEntities())
ScopesWithImportedEntities.push_back(std::make_pair(IE->getScope(), IE));		ScopesWithImportedEntities.push_back(std::make_pair(IE->getScope(), IE));
// Stable sort to preserve the order of appearance of imported entities.		IsScopesWithImportedEntitiesSorted = false;
// This is to avoid out-of-order processing of interdependent declarations
// within the same scope, e.g. { namespace A = base; namespace B = A; }
std::stable_sort(ScopesWithImportedEntities.begin(),
ScopesWithImportedEntities.end(), less_first());
for (auto *GV : CUNode->getGlobalVariables())		for (auto *GV : CUNode->getGlobalVariables())
CU.getOrCreateGlobalVariableDIE(GV);		CU.getOrCreateGlobalVariableDIE(GV);
for (auto *SP : CUNode->getSubprograms())		for (auto *SP : CUNode->getSubprograms())
SPMap.insert(std::make_pair(SP, &CU));		SPMap.insert(std::make_pair(SP, &CU));
for (auto *Ty : CUNode->getEnumTypes()) {		for (auto *Ty : CUNode->getEnumTypes()) {
// The enum types array by design contains pointers to		// The enum types array by design contains pointers to
// MDNodes rather than DIRefs. Unique them here.		// MDNodes rather than DIRefs. Unique them here.
CU.getOrCreateTypeDIE(cast<DIType>(resolve(Ty->getRef())));		CU.getOrCreateTypeDIE(cast<DIType>(resolve(Ty->getRef())));
Show All 30 Lines	for (const auto &Var : ConcreteVariables) {
if (AbsVar && AbsVar->getDIE()) {		if (AbsVar && AbsVar->getDIE()) {
Unit->addDIEEntry(*VariableDie, dwarf::DW_AT_abstract_origin,		Unit->addDIEEntry(*VariableDie, dwarf::DW_AT_abstract_origin,
*AbsVar->getDIE());		*AbsVar->getDIE());
} else		} else
Unit->applyVariableAttributes(Var, VariableDie);		Unit->applyVariableAttributes(Var, VariableDie);
}		}
}		}

		iterator_range<DwarfDebug::ImportedEntityMap::const_iterator>
		DwarfDebug::findImportedEntitiesForScope(const MDNode *Scope) {
		// Stable sort to preserve the order of appearance of imported entities.
		// This is to avoid out-of-order processing of interdependent declarations
		// within the same scope, e.g. { namespace A = base; namespace B = A; }
		if (!IsScopesWithImportedEntitiesSorted) {
		std::stable_sort(ScopesWithImportedEntities.begin(),
		ScopesWithImportedEntities.end(), less_first());
		IsScopesWithImportedEntitiesSorted = true;
		}

		return make_range(std::equal_range(
		ScopesWithImportedEntities.begin(), ScopesWithImportedEntities.end(),
		std::pair<const MDNode , const MDNode >(Scope, nullptr),
		less_first()));
		}

void DwarfDebug::finishSubprogramDefinitions() {		void DwarfDebug::finishSubprogramDefinitions() {
for (const auto &P : SPMap)		for (const auto &P : SPMap)
forBothCUs(*P.second, [&](DwarfCompileUnit &CU) {		forBothCUs(*P.second, [&](DwarfCompileUnit &CU) {
CU.finishSubprogramDefinition(cast<DISubprogram>(P.first));		CU.finishSubprogramDefinition(cast<DISubprogram>(P.first));
});		});
}		}


▲ Show 20 Lines • Show All 1,485 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Lazily sort ScopesWithImportedEntities to speed up LTO linking.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 37650

lib/CodeGen/AsmPrinter/DwarfDebug.h

lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Lazily sort ScopesWithImportedEntities to speed up LTO linking.
AbandonedPublic