This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/AsmPrinter/
-
CodeGen/
-
AsmPrinter/
1/2
DwarfDebug.h
-
DwarfDebug.cpp
-
test/DebugInfo/MIR/X86/
-
DebugInfo/
-
MIR/
-
X86/
-
singlelocation-cutoffs.mir

Differential D83236

[DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour
ClosedPublic

Authored by jmorse on Jul 6 2020, 9:17 AM.

Download Raw Diff

Details

Reviewers

aprantl
probinson
dblaikie
Orlando
vsk

Commits

rGb9d977b0ca60: [DWARF] Add cuttoff guarding quadratic validThroughout behaviour

Summary

Occasionally we see absolutely massive basic blocks, typically in global constructors that are vulnerable to heavy inlining. When these blocks are dense with DBG_VALUE instructions, we can hit near quadratic complexity in DwarfDebug's validThroughout function. The problem is caused by:

validThroughout having to step through all instructions in the block to examine their lexical scope, and
a high proportion of instructions in that block being DBG_VALUEs for a unique variable fragment,

Leading to us stepping through every instruction in the block, for (nearly) each instruction in the block. In the particular sample I'm looking at, there's a block with 120K instructions and maybe two-thirds of them are DBG_VALUEs. Not running validThroughout for this block cuts time in DWARF emission in half (which is many tens of seconds).

By adding this guard, we force variables in this block to use a location list rather than a single-location expression, as shown in the added test . This shouldn't change the meaning of the output DWARF at all: instead we use a less efficient DWARF encoding to avoid a poor-performance code path. In the long term this could be fixed by Orlando's D82129 providing enough instruction ordering information to make validThroughouts checks less complex, but we're not there yet.

The testing technique is shamelessly ripped off from D80662. I've used a set of very-large block pointers rather than calling size() each time, because size() isn't constant-time with ilists.

The default setting of blocks that are over 30,000 instructions long being considered too large isn't determined scientifically; rather, it solves the problem in front of me, and doesn't trigger on a stage2 clang build. Suggestions on a good mechanism to pick this number most welcome.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jmorse created this revision.Jul 6 2020, 9:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 9:17 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B63032: Diff 275714!Jul 6 2020, 9:57 AM

Thanks, lgtm.

This revision is now accepted and ready to land.Jul 6 2020, 12:07 PM

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

aprantl added inline comments.Jul 6 2020, 4:43 PM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
598	Not very important, but: Assuming that `VeryLargeBlocks` will only be populated in the pathological case, micro-optimizing with a SmallPtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet?

In D83236#2133902, @dblaikie wrote:

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too.

(Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently)

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
598	Sounds fair, I'll fold that change in,

In D83236#2135647, @jmorse wrote:

In D83236#2133902, @dblaikie wrote:

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too.

(Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently)

Fair enough. Perhaps a clear FIXME or the like (suggesting that this should be obsoleted by @Orlando 's work (& ensuring they're aware that this is something that should be considered in the replacement design)?

Closed by commit rGb9d977b0ca60: [DWARF] Add cuttoff guarding quadratic validThroughout behaviour (authored by jmorse). · Explain WhyJul 8 2020, 2:31 AM

This revision was automatically updated to reflect the committed changes.

Orlando mentioned this in D86153: [DwarfDebug] Improve validThroughout performance (4/4).Aug 19 2020, 1:01 AM

Orlando added a reverting change: rGb6cca0ec05cf: Revert "[DWARF] Add cuttoff guarding quadratic validThroughout behaviour".Aug 27 2020, 4:14 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AsmPrinter/

DwarfDebug.h

6 lines

DwarfDebug.cpp

34 lines

test/

DebugInfo/

MIR/

X86/

singlelocation-cutoffs.mir

65 lines

Diff 276350

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h

Show First 20 Lines • Show All 586 Lines • ▼ Show 20 Lines	class DwarfDebug : public DebugHandlerBase {
/// Populate LexicalScope entries with variables' info.		/// Populate LexicalScope entries with variables' info.
void collectEntityInfo(DwarfCompileUnit &TheCU, const DISubprogram *SP,		void collectEntityInfo(DwarfCompileUnit &TheCU, const DISubprogram *SP,
DenseSet<InlinedEntity> &ProcessedVars);		DenseSet<InlinedEntity> &ProcessedVars);

/// Build the location list for all DBG_VALUEs in the		/// Build the location list for all DBG_VALUEs in the
/// function that describe the same variable. If the resulting		/// function that describe the same variable. If the resulting
/// list has only one entry that is valid for entire variable's		/// list has only one entry that is valid for entire variable's
/// scope return true.		/// scope return true.
bool buildLocationList(SmallVectorImpl<DebugLocEntry> &DebugLoc,		bool buildLocationList(
const DbgValueHistoryMap::Entries &Entries);		SmallVectorImpl<DebugLocEntry> &DebugLoc,
		const DbgValueHistoryMap::Entries &Entries,
		DenseSet<const MachineBasicBlock *> &VeryLargeBlocks);
		aprantlUnsubmitted Not Done Reply Inline Actions Not very important, but: Assuming that `VeryLargeBlocks` will only be populated in the pathological case, micro-optimizing with a SmallPtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet? aprantl: Not very important, but: Assuming that `VeryLargeBlocks` will only be populated in the…
		jmorseAuthorUnsubmitted Done Reply Inline Actions Sounds fair, I'll fold that change in, jmorse: Sounds fair, I'll fold that change in,

/// Collect variable information from the side table maintained by MF.		/// Collect variable information from the side table maintained by MF.
void collectVariableInfoFromMFTable(DwarfCompileUnit &TheCU,		void collectVariableInfoFromMFTable(DwarfCompileUnit &TheCU,
DenseSet<InlinedEntity> &P);		DenseSet<InlinedEntity> &P);

/// Emit the reference to the section.		/// Emit the reference to the section.
void emitSectionReference(const DwarfCompileUnit &CU);		void emitSectionReference(const DwarfCompileUnit &CU);

▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	DwarfLinkageNames("dwarf-linkage-names", cl::Hidden,
cl::desc("Which DWARF linkage-name attributes to emit."),		cl::desc("Which DWARF linkage-name attributes to emit."),
cl::values(clEnumValN(DefaultLinkageNames, "Default",		cl::values(clEnumValN(DefaultLinkageNames, "Default",
"Default for platform"),		"Default for platform"),
clEnumValN(AllLinkageNames, "All", "All"),		clEnumValN(AllLinkageNames, "All", "All"),
clEnumValN(AbstractLinkageNames, "Abstract",		clEnumValN(AbstractLinkageNames, "Abstract",
"Abstract subprograms")),		"Abstract subprograms")),
cl::init(DefaultLinkageNames));		cl::init(DefaultLinkageNames));

		static cl::opt<unsigned> LocationAnalysisSizeLimit(
		"singlevarlocation-input-bb-limit",
		cl::desc("Maximum block size to analyze for single-location variables"),
		cl::init(30000), cl::Hidden);

static const char *const DWARFGroupName = "dwarf";		static const char *const DWARFGroupName = "dwarf";
static const char *const DWARFGroupDescription = "DWARF Emission";		static const char *const DWARFGroupDescription = "DWARF Emission";
static const char *const DbgTimerName = "writer";		static const char *const DbgTimerName = "writer";
static const char *const DbgTimerDescription = "DWARF Debug Writer";		static const char *const DbgTimerDescription = "DWARF Debug Writer";
static constexpr unsigned ULEB128PadSize = 4;		static constexpr unsigned ULEB128PadSize = 4;

void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {		void DebugLocDwarfExpression::emitOp(uint8_t Op, const char *Comment) {
getActiveStreamer().EmitInt8(		getActiveStreamer().EmitInt8(
▲ Show 20 Lines • Show All 1,454 Lines • ▼ Show 20 Lines
// 4 [DbgValue, ~0, DBG_VALUE @g, [...] (fragment 0, 96)]		// 4 [DbgValue, ~0, DBG_VALUE @g, [...] (fragment 0, 96)]
//		//
// Output [start, end) [Value...]:		// Output [start, end) [Value...]:
//		//
// [0-1) [(reg0, fragment 0, 32)]		// [0-1) [(reg0, fragment 0, 32)]
// [1-3) [(reg0, fragment 0, 32), (reg1, fragment 32, 32)]		// [1-3) [(reg0, fragment 0, 32), (reg1, fragment 32, 32)]
// [3-4) [(reg1, fragment 32, 32), (123, fragment 64, 32)]		// [3-4) [(reg1, fragment 32, 32), (123, fragment 64, 32)]
// [4-) [(@g, fragment 0, 96)]		// [4-) [(@g, fragment 0, 96)]
bool DwarfDebug::buildLocationList(SmallVectorImpl<DebugLocEntry> &DebugLoc,		bool DwarfDebug::buildLocationList(
const DbgValueHistoryMap::Entries &Entries) {		SmallVectorImpl<DebugLocEntry> &DebugLoc,
		const DbgValueHistoryMap::Entries &Entries,
		DenseSet<const MachineBasicBlock *> &VeryLargeBlocks) {
using OpenRange =		using OpenRange =
std::pair<DbgValueHistoryMap::EntryIndex, DbgValueLoc>;		std::pair<DbgValueHistoryMap::EntryIndex, DbgValueLoc>;
SmallVector<OpenRange, 4> OpenRanges;		SmallVector<OpenRange, 4> OpenRanges;
bool isSafeForSingleLocation = true;		bool isSafeForSingleLocation = true;
const MachineInstr *StartDebugMI = nullptr;		const MachineInstr *StartDebugMI = nullptr;
const MachineInstr *EndMI = nullptr;		const MachineInstr *EndMI = nullptr;

for (auto EB = Entries.begin(), EI = EB, EE = Entries.end(); EI != EE; ++EI) {		for (auto EB = Entries.begin(), EI = EB, EE = Entries.end(); EI != EE; ++EI) {
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	LLVM_DEBUG({
dbgs() << "-----\n";		dbgs() << "-----\n";
});		});

auto PrevEntry = std::next(CurEntry);		auto PrevEntry = std::next(CurEntry);
if (PrevEntry != DebugLoc.rend() && PrevEntry->MergeRanges(*CurEntry))		if (PrevEntry != DebugLoc.rend() && PrevEntry->MergeRanges(*CurEntry))
DebugLoc.pop_back();		DebugLoc.pop_back();
}		}

return DebugLoc.size() == 1 && isSafeForSingleLocation &&		// If there's a single entry, safe for a single location, and not part of
validThroughout(LScopes, StartDebugMI, EndMI);		// an over-sized basic block, then ask validThroughout whether this
		// location can be represented as a single variable location.
		if (DebugLoc.size() != 1 \|\| !isSafeForSingleLocation)
		return false;
		if (VeryLargeBlocks.count(StartDebugMI->getParent()))
		return false;
		return validThroughout(LScopes, StartDebugMI, EndMI);
}		}

DbgEntity *DwarfDebug::createConcreteEntity(DwarfCompileUnit &TheCU,		DbgEntity *DwarfDebug::createConcreteEntity(DwarfCompileUnit &TheCU,
LexicalScope &Scope,		LexicalScope &Scope,
const DINode *Node,		const DINode *Node,
const DILocation *Location,		const DILocation *Location,
const MCSymbol *Sym) {		const MCSymbol *Sym) {
ensureAbstractEntityIsCreatedIfScoped(TheCU, Node, Scope.getScopeNode());		ensureAbstractEntityIsCreatedIfScoped(TheCU, Node, Scope.getScopeNode());
Show All 15 Lines

// Find variables for each lexical scope.		// Find variables for each lexical scope.
void DwarfDebug::collectEntityInfo(DwarfCompileUnit &TheCU,		void DwarfDebug::collectEntityInfo(DwarfCompileUnit &TheCU,
const DISubprogram *SP,		const DISubprogram *SP,
DenseSet<InlinedEntity> &Processed) {		DenseSet<InlinedEntity> &Processed) {
// Grab the variable info that was squirreled away in the MMI side-table.		// Grab the variable info that was squirreled away in the MMI side-table.
collectVariableInfoFromMFTable(TheCU, Processed);		collectVariableInfoFromMFTable(TheCU, Processed);

		// Identify blocks that are unreasonably sized, so that we can later
		// skip lexical scope analysis over them.
		DenseSet<const MachineBasicBlock *> VeryLargeBlocks;
		for (const auto &MBB : *CurFn)
		if (MBB.size() > LocationAnalysisSizeLimit)
		VeryLargeBlocks.insert(&MBB);

for (const auto &I : DbgValues) {		for (const auto &I : DbgValues) {
InlinedEntity IV = I.first;		InlinedEntity IV = I.first;
if (Processed.count(IV))		if (Processed.count(IV))
continue;		continue;

// Instruction ranges, specifying where IV is accessible.		// Instruction ranges, specifying where IV is accessible.
const auto &HistoryMapEntries = I.second;		const auto &HistoryMapEntries = I.second;
if (HistoryMapEntries.empty())		if (HistoryMapEntries.empty())
Show All 20 Lines	for (const auto &I : DbgValues) {
// If the history map contains a single debug value, there may be an		// If the history map contains a single debug value, there may be an
// additional entry which clobbers the debug value.		// additional entry which clobbers the debug value.
size_t HistSize = HistoryMapEntries.size();		size_t HistSize = HistoryMapEntries.size();
bool SingleValueWithClobber =		bool SingleValueWithClobber =
HistSize == 2 && HistoryMapEntries[1].isClobber();		HistSize == 2 && HistoryMapEntries[1].isClobber();
if (HistSize == 1 \|\| SingleValueWithClobber) {		if (HistSize == 1 \|\| SingleValueWithClobber) {
const auto *End =		const auto *End =
SingleValueWithClobber ? HistoryMapEntries[1].getInstr() : nullptr;		SingleValueWithClobber ? HistoryMapEntries[1].getInstr() : nullptr;
if (validThroughout(LScopes, MInsn, End)) {		if (VeryLargeBlocks.count(MInsn->getParent()) == 0 &&
		validThroughout(LScopes, MInsn, End)) {
RegVar->initializeDbgValue(MInsn);		RegVar->initializeDbgValue(MInsn);
continue;		continue;
}		}
}		}

// Do not emit location lists if .debug_loc secton is disabled.		// Do not emit location lists if .debug_loc secton is disabled.
if (!useLocSection())		if (!useLocSection())
continue;		continue;

// Handle multiple DBG_VALUE instructions describing one variable.		// Handle multiple DBG_VALUE instructions describing one variable.
DebugLocStream::ListBuilder List(DebugLocs, TheCU, Asm, RegVar, *MInsn);		DebugLocStream::ListBuilder List(DebugLocs, TheCU, Asm, RegVar, *MInsn);

// Build the location list for this variable.		// Build the location list for this variable.
SmallVector<DebugLocEntry, 8> Entries;		SmallVector<DebugLocEntry, 8> Entries;
bool isValidSingleLocation = buildLocationList(Entries, HistoryMapEntries);		bool isValidSingleLocation =
		buildLocationList(Entries, HistoryMapEntries, VeryLargeBlocks);

// Check whether buildLocationList managed to merge all locations to one		// Check whether buildLocationList managed to merge all locations to one
// that is valid throughout the variable's scope. If so, produce single		// that is valid throughout the variable's scope. If so, produce single
// value location.		// value location.
if (isValidSingleLocation) {		if (isValidSingleLocation) {
RegVar->initializeDbgValue(Entries[0].getValues()[0]);		RegVar->initializeDbgValue(Entries[0].getValues()[0]);
continue;		continue;
}		}
▲ Show 20 Lines • Show All 1,534 Lines • Show Last 20 Lines

llvm/test/DebugInfo/MIR/X86/singlelocation-cutoffs.mir

This file was added.

				# Test cutoffs for single-location variable analysis.
				# Disable validThroughout if the input size exceeds the specified limit

				# RUN: llc %s -o - -start-after=livedebugvalues -mtriple=x86_64-unknown-unknown \
				# RUN: --singlevarlocation-input-bb-limit=0 -filetype=obj\
				# RUN: \| llvm-dwarfdump -v -\
				# RUN: \| FileCheck %s -check-prefix=LIMITED

				# RUN: llc %s -o - -start-after=livedebugvalues -mtriple=x86_64-unknown-unknown \
				# RUN: --singlevarlocation-input-bb-limit=20 -filetype=obj \| llvm-dwarfdump -v -\
				# RUN: \| FileCheck %s -check-prefix=UNLIMITED

				# LIMITED: DW_AT_location [DW_FORM_sec_offset]

				# UNLIMITED: DW_AT_location [DW_FORM_exprloc]

				--- \|
				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				declare i32 @use(i32)

				define i32 @foo(i32 %x) !dbg !6 {
				entry:
				ret i32 1, !dbg !15
				}

				declare void @llvm.dbg.value(metadata, metadata, metadata)

				!llvm.dbg.cu = !{!0}
				!llvm.debugify = !{!3, !4}
				!llvm.module.flags = !{!5}

				!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "/tmp/t.ll", directory: "/")
				!2 = !{}
				!3 = !{i32 4}
				!4 = !{i32 2}
				!5 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope: null, file: !1, line: 1, type: !7, scopeLine: 1, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !8)
				!7 = !DISubroutineType(types: !2)
				!8 = !{!9, !11}
				!9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
				!10 = !DIBasicType(name: "ty32", size: 32, encoding: DW_ATE_unsigned)
				!11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
				!12 = !DILocation(line: 1, column: 1, scope: !6)
				!13 = !DILocation(line: 2, column: 1, scope: !6)
				!14 = !DILocation(line: 3, column: 1, scope: !6)
				!15 = !DILocation(line: 4, column: 1, scope: !6)

				...
				---
				name: foo
				liveins:
				- { reg: '$edi', virtual-reg: '' }
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				body: \|
				bb.0.entry:
				liveins: $edi
				DBG_VALUE renamable $edi, $noreg, !11, !DIExpression(), debug-location !14
				RETQ debug-location !14

				...