This is an archive of the discontinued LLVM Phabricator instance.

Avoid underestimating the number of DIEs for a given debug info size.
Needs ReviewPublic

Authored by simon.giesecke on May 14 2021, 5:20 AM.

Download Raw Diff

Details

Reviewers

clayborg
dblaikie

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	290 ms	x64 windows > LLVM.tools/dsymutil/X86::mismatch.m
	220 ms	x64 windows > LLVM.tools/dsymutil/X86::modules-empty.m

Event Timeline

simon.giesecke created this revision.May 14 2021, 5:20 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 14 2021, 5:20 AM

simon.giesecke requested review of this revision.May 14 2021, 5:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2021, 5:20 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I came across this when analyzing a perf profile of running llvm-gsymutil. I noticed that quite a lot of time was spent in resizing the DieArray, and then checking the actual number of entries relative to getDebugInfoSize() and found it to be rather in the range of 6-12 bytes (in most cases 6-8 actually). This was a RelWithDebInfo build using clang 12.

Surely, this is purely anecdotal. I don't know where the original estimate of 14-20 bytes came from either.

Harbormaster completed remote builds in B104475: Diff 345407.May 14 2021, 6:29 AM

Could you upload this with context?
& /maybe/ if you're willing, try building with different settings (optimized, unoptimized, etc) & see if that makes any difference?

You could add a statistic to llvm-dwarfdump --statistics if you like ,to make it easy to gather the mean-bytes-per-DIE stat (maybe it already has the data to compute it? Not sure if it has a DIE count stat)

I had run statistics on a few hundred DWARF files way back when and came up with the original 14-20 byte number, but this was a long long time ago (at least over 10 years). With newer DWARF versions, and with optimizations this can easily change. So it would be great to know what the minimum value should be set to by default. I agree that adding a statistic would be nice so we can track this. I will test some recent DWARF files out and see what my numbers show and report back in this patch.

After thinking about this again, I wonder:

How bad it would be to overestimate the number of entries; there might obviously be cases where this would lead to memory exhaustion. Not sure if we he have some standard means of recognizing a memory pressure situation in the LLVM codebase?
As @dblaikie asked If this may vary significantly depending on the platform, compiler, optimization level and other properties, and we should better probe this somehow for a given binary.

Revision Contents

Path

Size

llvm/

lib/

DebugInfo/

DWARF/

DWARFUnit.cpp

6 lines

Diff 345407

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp

Context not available.
	if (!AppendNonCUDies)	if (!AppendNonCUDies)
	break;	break;
	// The average bytes per DIE entry has been seen to be	// The average bytes per DIE entry has been seen to be
	// around 14-20 so let's pre-reserve the needed memory for	// around 6-12 so let's pre-reserve the needed memory for
	// our DIE entries accordingly.	// our DIE entries accordingly.
	Dies.reserve(Dies.size() + getDebugInfoSize() / 14);	Dies.reserve(Dies.size() + getDebugInfoSize() / 6);
	IsCUDie = false;	IsCUDie = false;
	} else {	} else {
	Dies.push_back(DIE);	Dies.push_back(DIE);
Context not available.
	}	}
	}	}

		Dies.shrink_to_fit();

	// Give a little bit of info if we encounter corrupt DWARF (our offset	// Give a little bit of info if we encounter corrupt DWARF (our offset
	// should always terminate at or before the start of the next compilation	// should always terminate at or before the start of the next compilation
	// unit header).	// unit header).
Context not available.