This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/DebugInfo/DWARF/
-
llvm/
-
DebugInfo/
-
DWARF/
-
DWARFUnit.h
-
lib/DebugInfo/DWARF/
-
DebugInfo/
-
DWARF/
1
DWARFUnit.cpp

Differential D40987

Rewrite the cached map used for locating the most precise DIE among inlined subroutines for a given address.
ClosedPublic

Authored by chandlerc on Dec 7 2017, 3:24 PM.

Download Raw Diff

Details

Reviewers

dblaikie
probinson
aprantl
JDevlieghere

Commits

rG54a5ad3681d8: Rewrite the cached map used for locating the most precise DIE among inlined…
rL321345: Rewrite the cached map used for locating the most precise DIE among

Summary

This is essentially the hot path of llvm-symbolizer when extracting
inlined frames during symbolization. Previously, we would read every
subprogram and every inlined subroutine, building a std::map across the
entire PC space to the best DIE, and then do only a handful of queries
as we symbolized a backtrace. A huge fraction of the time was spent
building the map itself.

This patch changes it two a two-level system. First, we just build a map
from PC-interval to DWARF subprograms. These are required to be disjoint
and so constructing this is pretty easy. Second, we build a map *just*
for the inlined subroutines within the subprogram containing the query
address. This allows us to look at far fewer DIEs and build a *much*
smaller set of cached maps in the llvm-symbolizer case where only a few
address get symbolized during the entire run.

It also builds both interval maps in a very different way. It constructs
a single flat vector of pairs that maps from offset -> index. The
indices point into collections of DIE objects, but can also be
"tombstones" (-1) to mark gaps. In the case of subprograms, this mostly
just simplifies the data structure a bit. For inlined subroutines,
because we carefully split them as we build the map, we end up in many
cases having no holes and not having to store both start and stop
offsets.

Finally, the PC ranges for the inlined subroutines are compressed into
32-bits by making them relative to the base PC of the outer subprogram.
This means that if you have a single function body with over 2gb of
executable code in it, we will stop mapping address past the first 2gb
of that function into inlined subroutines and just give you the
subprogram. This doesn't seem like a problem. ;]

All of this combines to make llvm-symbolizer *well* over 2x faster for
symbolizing backtraces out of LLVM's unittests. Death-test heavy unit
tests are running >2x faster. I'm still going to look at completely
disabling symbolization there, but figured while I had a good benchmark
we should make symbolization a bit better.

Sadly, the logic to build the flat interval map for the inlined
subroutines is fairly complex. I'm not super happy about this and
welcome any simplifying suggestions.

Also, the names of various components here seem a bit confusing and/or
redundant. I've tried a bunch of options and this is the least bad one
I've found but I'd love better naming patterns to use.

And last but not least, some aspects of the algorithm for this changed
several times while I was working on this. I may have some stale
comments or failing to comment things that really should be; don't
hesitate to let me know about these or to just ignore them and I'll do
a thorough once over tomorrow.

Huge thanks to Dave Blaikie who helped walk me through what the various
things I needed to do in DWARF to make this work.

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc created this revision.Dec 7 2017, 3:24 PM

Herald added subscribers: mgrang, JDevlieghere, hiraditya and 3 others. · View Herald TranscriptDec 7 2017, 3:24 PM

Harbormaster completed remote builds in B12889: Diff 126062.Dec 7 2017, 3:24 PM

aprantl added reviewers: probinson, aprantl, JDevlieghere.Dec 7 2017, 3:48 PM

Code mostly looks plausible. Haven't quite understood the "ParentIntervals*" variables/processing.

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
358–359 ↗	(On Diff #126062)	for(DWARFDie Child : Die.children())
371–382 ↗	(On Diff #126062)	This does slightly change the claim in the patch description about only 2GB functions being a problem. Since this at least in theory supports fragmented functions, even a small function with, say split into a hot and cold section, could end up with the hot part 2GB of address away from the cold part perhaps more readily/likely than a 2GB function existing. But still probably nothing to worry about.
384–388 ↗	(On Diff #126062)	subprograms that are children of other subprograms might be arbitrarily nested - not only direct children (looks like this code only handles direct children?) (though there's no /good/ reason they would be nested inside an inlined subroutine) eg: you could have code like: void f1() { if (int x = ...) { void f2() { ... } ... } } in which case, you could have DWARF like: DW_TAG_subprogram DW_AT_name "f1" DW_TAG_lexical_block DW_TAG_subprogram DW_AT_name "f2" which, yes, does mean you have to walk all the DIEs to find all the subprograms... not sure how much that defeats the goals here (I guess not having to build up so much in the way of data structures while doing so is still advantageous).
457 ↗	(On Diff #126062)	What's the "first" in this case - is it deterministically the most nested? Least nested? Unspecified? We'd need to make sure it's the most nested.
471 ↗	(On Diff #126062)	Nit: Usually I'm of the opinion if a lambda doesn't leak out of scope (via a std::function or similar), just use [&] rather than an explicit list. Same as a loop/conditional/etc doesn't need documentation about which variables are used in the scope. One more thing to have to touch if the code changes, especially since there's a -Wunused-lambda-capture that comes up a bit on the bots regularly. But up to you.
474–475 ↗	(On Diff #126062)	for (DWARFDie Child : Die)
482 ↗	(On Diff #126062)	Spurious semi
507–509 ↗	(On Diff #126062)	This is just for bad input, I take it? (where the address range of an inlined subroutine is outside that of the enclosing subprogram)
535–536 ↗	(On Diff #126062)	What's the behavior if they do overlap? While the DWARF spec & a reasonable person would say they can't, physically the data could contain such ranges. Will this crash? Have unreliable behavior? What's reasonable on invalid input here? (it's probably fine, just checking it's articulated)
561 ↗	(On Diff #126062)	Nit: Usually I see "i != e" in C++ code. Is < preferred?
618–620 ↗	(On Diff #126062)	Nit: {} on a single line scope.

aprantl added inline comments.Dec 7 2017, 3:59 PM

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
457 ↗	(On Diff #126062)	I think it would be nice to copy the high level overview of the two-level approach outlined in the phabricator review into a comment somewhere here.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
561 ↗	(On Diff #126062)	LLVM style wants this to be `I` and `E`.
690 ↗	(On Diff #126062)	That `--J` looks scary since we are using J later in the same expression. Perhaps factor this into a separate statement?

Update based on code review comments.

Harbormaster completed remote builds in B12901: Diff 126113.Dec 8 2017, 3:16 AM

Thanks for all the feedback!

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
457 ↗	(On Diff #126062)	Yeah, added comments to both of these routines in the source file (since they're implementation functions) and also added the two-level descriptive comments to the implementation of getSubroutineForAddress as that seems to be the most cohesive place to show the pattern of access.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
358–359 ↗	(On Diff #126062)	This is what I get for cargo culting. =]
371–382 ↗	(On Diff #126062)	Yeah, I've switched to at least use uint32_t so we have 4gb. I'll document it as well.
384–388 ↗	(On Diff #126062)	Ow, OK. This does make things slower, but we still get to avoid decoding every inlined subroutine. We just walk the graph w/o reading the address ranges. So my patch still helps. This change costs us about 10% total. =[
457 ↗	(On Diff #126062)	We aggressively split as we insert so that we should end up with essentially a single (most nested) offset which has a valid DIE index.
507–509 ↗	(On Diff #126062)	Yes, I just wanted to bound how bad that got below when I re-base the offsets.
535–536 ↗	(On Diff #126062)	It should produces some unpredictable result, but never crash. Regardless of whether this holds, we can still sort and unique them. It's just that the resulting thing may be.... surprising in the results it gives. But it still ends up sorted, so all our upper_bound queries should still work, etc etc. I don't think we assert on anything other than "it isn't empty" and "it is sorted" which should hold regardless.
561 ↗	(On Diff #126062)	If these were iterators, I would use `I` and `E`. We are increasingly commonly using `i` for an `int` for loop variable. I used `e` but am not super happy about it. More typically we have something like `Size` which makes `i < Size` much nicer (IMO). Here, the end point isn't the size and I didn't come up with a good name for it. But the `!=` I think largely comes from iterators and unsigned integers.... `<` seems much more clear for "normal" signed integers.
690 ↗	(On Diff #126062)	Yeah, never was super happy w/ this. I'm just exploding it and dealing w/ repeated code.

Thanks for working on this Chandler! It took me a while to fully grasp why you needed the parent interval in the second layer but it makes sense to me. I was considering a slightly different approach approach where you would do a depth first traversal and add the address ranges as you encounter them, but that would presumably just complicate getting the LowPC mapped to the most nested (most precise) inlined subroutine.

llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h
216 ↗	(On Diff #126113)	s/indexinto/index into/
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
378 ↗	(On Diff #126113)	I'm curious if there is any particular reason you prefer `::push_back({})` over `::emplace_back`?
521 ↗	(On Diff #126113)	Maybe extract `(uint64_t)std::numeric_limits<uint32_t>::max()` to make this a little more readable? Not relevant here (as it actually improves readability) but I wonder what the consensus is on C-style casts in LLVM? I don't think the style guide actually mentions it.

chandlerc marked an inline comment as done.Dec 8 2017, 11:53 AM

chandlerc added inline comments.

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
378 ↗	(On Diff #126113)	Unless the code will only compile with emplace_back, I strongly prefer push_back. I find the code substanitally easier to read and failures easier to debug. Forwarding is a horrifically complicated thing in C++ and it tends to not be worth the cost it imposes. We generally insist upon cheap-to-move objects making push_back's "extra" move not an important consideration.

JDevlieghere added inline comments.Dec 11 2017, 3:38 AM

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
378 ↗	(On Diff #126113)	Got it, thanks!

I'm pretty OK with this - haven't thought /deeeply/ about the algorithm from the code (more from the discussions we've had), mostly glossed over, looked at the code style, etc, seems plausible.

Wonder if it's worth doing some kind of stress test for this? (take an optimized clang, symbolize all the addresses, compare before/after this change - that would've likely caught the "there can be non-subroutine scopes that need to be recursed through" case I brought up in the first pass of review & maybe some others I've not spotted?)

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
496 ↗	(On Diff #126113)	I know the naming here is subtle/annoying, but maybe "inlined subroutines" rather than "nested subroutines" would make it more clear what we're searching for?
521–527 ↗	(On Diff #126113)	Worth pulling out a named constant for std::numeric_limits<uint32_t>::max() ? it's a long expression used 4 times in these long/wrapped lines.
571–573 ↗	(On Diff #126113)	top level const on locals is a bit uncommon (don't have to change it, just mentioning it in case there's some special motivation, or in case the choice is worth revisiting)

This revision is now accepted and ready to land.Dec 14 2017, 11:58 AM

Thanks all! Going to land this after a touch more testing so that my test runs start being faster. =D

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
521–527 ↗	(On Diff #126113)	Not sure that any name I come up with is better than `std::numeric_limits<uint32_t>::max()`, but sure. Combined with the suggestion above.
571–573 ↗	(On Diff #126113)	I added these to catch bugs where I was mutating something that wasn't a reference and I meant to be mutating something that was a reference. ::shrug:: I generally try to be pragmatic about this rather than dogmatic and make things const when useful to avoid mistakes.

Closed by commit rL321345: Rewrite the cached map used for locating the most precise DIE among (authored by chandlerc). · Explain WhyDec 21 2017, 10:42 PM

This revision was automatically updated to reflect the committed changes.

chandlerc marked an inline comment as done.

Sorry, missed this somehow... I didn't really look at the second layer, but I have a suggestion for the initial tree walk.

llvm/trunk/lib/DebugInfo/DWARF/DWARFUnit.cpp
380	Would the overall algorithm be faster if it adds the child to the worklist only if the child itself is a subprogram, or has children? Currently this loop adds uninteresting leaf DIEs to the worklist, only to discover they are uninteresting on a later iteration. I'd think keeping the size of the worklist down could be beneficial. There are DIEs that can have children but do not represent scopes (array_type and enum_type come to mind) or otherwise cannot have subprogram children, and it would be possible to come up with a list of those. But checking for a long list of tag types might get too expensive. With my above suggestion, you still (for example) add enum_type to the worklist, but not the individual enumerator DIEs, and that gets you the bulk of the performance benefit.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

DebugInfo/

DWARF/

DWARFUnit.h

44 lines

lib/

DebugInfo/

DWARF/

DWARFUnit.cpp

386 lines

Diff 127984

llvm/trunk/include/llvm/DebugInfo/DWARF/DWARFUnit.h

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	class DWARFUnit {
uint32_t Length;		uint32_t Length;
mutable const DWARFAbbreviationDeclarationSet *Abbrevs;		mutable const DWARFAbbreviationDeclarationSet *Abbrevs;
uint64_t AbbrOffset;		uint64_t AbbrOffset;
uint8_t UnitType;		uint8_t UnitType;
llvm::Optional<BaseAddress> BaseAddr;		llvm::Optional<BaseAddress> BaseAddr;
/// The compile unit debug information entry items.		/// The compile unit debug information entry items.
std::vector<DWARFDebugInfoEntry> DieArray;		std::vector<DWARFDebugInfoEntry> DieArray;

/// Map from range's start address to end address and corresponding DIE.		/// The vector of inlined subroutine DIEs that we can map directly to from
/// IntervalMap does not support range removal, as a result, we use the		/// their subprogram below.
/// std::map::upper_bound for address range lookup.		std::vector<DWARFDie> InlinedSubroutineDIEs;
std::map<uint64_t, std::pair<uint64_t, DWARFDie>> AddrDieMap;
		/// A type representing a subprogram DIE and a map (built using a sorted
		/// vector) into that subprogram's inlined subroutine DIEs.
		struct SubprogramDIEAddrInfo {
		DWARFDie SubprogramDIE;

		uint64_t SubprogramBasePC;

		/// A vector sorted to allow mapping from a relative PC to the inlined
		/// subroutine DIE with the most specific address range covering that PC.
		///
		/// The PCs are relative to the `SubprogramBasePC`.
		///
		/// The vector is sorted in ascending order of the first int which
		/// represents the relative PC for an interval in the map. The second int
		/// represents the index into the `InlinedSubroutineDIEs` vector of the DIE
		/// that interval maps to. An index of '-1` indicates an empty mapping. The
		/// interval covered is from the `.first` relative PC to the next entry's
		/// `.first` relative PC.
		std::vector<std::pair<uint32_t, int32_t>> InlinedSubroutineDIEAddrMap;
		};

		/// Vector of the subprogram DIEs and their subroutine address maps.
		std::vector<SubprogramDIEAddrInfo> SubprogramDIEAddrInfos;

		/// A vector sorted to allow mapping from a PC to the subprogram DIE (and
		/// associated addr map) index. Subprograms with overlapping PC ranges aren't
		/// supported here. Nothing will crash, but the mapping may be inaccurate.
		/// This vector may also contain "empty" ranges marked by an address with
		/// a DIE index of '-1'.
		std::vector<std::pair<uint64_t, int64_t>> SubprogramDIEAddrMap;

using die_iterator_range =		using die_iterator_range =
iterator_range<std::vector<DWARFDebugInfoEntry>::iterator>;		iterator_range<std::vector<DWARFDebugInfoEntry>::iterator>;

std::shared_ptr<DWARFUnit> DWO;		std::shared_ptr<DWARFUnit> DWO;

const DWARFUnitIndex::Entry *IndexEntry;		const DWARFUnitIndex::Entry *IndexEntry;

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	const DWARFSection &getStringOffsetSection() const {
return StringOffsetSection;		return StringOffsetSection;
}		}

void setAddrOffsetSection(const DWARFSection *AOS, uint32_t Base) {		void setAddrOffsetSection(const DWARFSection *AOS, uint32_t Base) {
AddrOffsetSection = AOS;		AddrOffsetSection = AOS;
AddrOffsetSectionBase = Base;		AddrOffsetSectionBase = Base;
}		}

/// Recursively update address to Die map.
void updateAddressDieMap(DWARFDie Die);

void setRangesSection(const DWARFSection *RS, uint32_t Base) {		void setRangesSection(const DWARFSection *RS, uint32_t Base) {
RangeSection = RS;		RangeSection = RS;
RangeSectionBase = Base;		RangeSectionBase = Base;
}		}

bool getAddrOffsetSectionItem(uint32_t Index, uint64_t &Result) const;		bool getAddrOffsetSectionItem(uint32_t Index, uint64_t &Result) const;
bool getStringOffsetSectionItem(uint32_t Index, uint64_t &Result) const;		bool getStringOffsetSectionItem(uint32_t Index, uint64_t &Result) const;

▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	void extractDIEsToVector(bool AppendCUDie, bool AppendNonCUDIEs,
std::vector<DWARFDebugInfoEntry> &DIEs) const;		std::vector<DWARFDebugInfoEntry> &DIEs) const;

/// clearDIEs - Clear parsed DIEs to keep memory usage low.		/// clearDIEs - Clear parsed DIEs to keep memory usage low.
void clearDIEs(bool KeepCUDie);		void clearDIEs(bool KeepCUDie);

/// parseDWO - Parses .dwo file for current compile unit. Returns true if		/// parseDWO - Parses .dwo file for current compile unit. Returns true if
/// it was actually constructed.		/// it was actually constructed.
bool parseDWO();		bool parseDWO();

		void buildSubprogramDIEAddrMap();
		void buildInlinedSubroutineDIEAddrMap(SubprogramDIEAddrInfo &SPInfo);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARF_DWARFUNIT_H		#endif // LLVM_DEBUGINFO_DWARF_DWARFUNIT_H

llvm/trunk/lib/DebugInfo/DWARF/DWARFUnit.cpp

//===- DWARFUnit.cpp ------------------------------------------------------===//		//===- DWARFUnit.cpp ------------------------------------------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/DebugInfo/DWARF/DWARFUnit.h"		#include "llvm/DebugInfo/DWARF/DWARFUnit.h"
		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h"		#include "llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h"
#include "llvm/DebugInfo/DWARF/DWARFContext.h"		#include "llvm/DebugInfo/DWARF/DWARFContext.h"
#include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"		#include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
#include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h"		#include "llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h"
#include "llvm/DebugInfo/DWARF/DWARFDie.h"		#include "llvm/DebugInfo/DWARF/DWARFDie.h"
#include "llvm/DebugInfo/DWARF/DWARFFormValue.h"		#include "llvm/DebugInfo/DWARF/DWARFFormValue.h"
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	if (DWOCreated)
DWO.reset();		DWO.reset();

// Keep memory down by clearing DIEs if this generate function		// Keep memory down by clearing DIEs if this generate function
// caused them to be parsed.		// caused them to be parsed.
if (ClearDIEs)		if (ClearDIEs)
clearDIEs(true);		clearDIEs(true);
}		}

void DWARFUnit::updateAddressDieMap(DWARFDie Die) {		// Populates a map from PC addresses to subprogram DIEs.
if (Die.isSubroutineDIE()) {		//
		// This routine tries to look at the smallest amount of the debug info it can
		// to locate the DIEs. This is because many subprograms will never end up being
		// read or needed at all. We want to be as lazy as possible.
		void DWARFUnit::buildSubprogramDIEAddrMap() {
		assert(SubprogramDIEAddrMap.empty() && "Must only build this map once!");
		SmallVector<DWARFDie, 16> Worklist;
		Worklist.push_back(getUnitDIE());
		do {
		DWARFDie Die = Worklist.pop_back_val();

		// Queue up child DIEs to recurse through.
		// FIXME: This causes us to read a lot more debug info than we really need.
		// We should look at pruning out DIEs which cannot transitively hold
		// separate subprograms.
		for (DWARFDie Child : Die.children())
		Worklist.push_back(Child);
		probinsonUnsubmitted Not Done Reply Inline Actions Would the overall algorithm be faster if it adds the child to the worklist only if the child itself is a subprogram, or has children? Currently this loop adds uninteresting leaf DIEs to the worklist, only to discover they are uninteresting on a later iteration. I'd think keeping the size of the worklist down could be beneficial. There are DIEs that can have children but do not represent scopes (array_type and enum_type come to mind) or otherwise cannot have subprogram children, and it would be possible to come up with a list of those. But checking for a long list of tag types might get too expensive. With my above suggestion, you still (for example) add enum_type to the worklist, but not the individual enumerator DIEs, and that gets you the bulk of the performance benefit. probinson: Would the overall algorithm be faster if it adds the child to the worklist only if the child…

		// If handling a non-subprogram DIE, nothing else to do.
		if (!Die.isSubprogramDIE())
		continue;

		// For subprogram DIEs, store them, and insert relevant markers into the
		// address map. We don't care about overlap at all here as DWARF doesn't
		// meaningfully support that, so we simply will insert a range with no DIE
		// starting from the high PC. In the event there are overlaps, sorting
		// these may truncate things in surprising ways but still will allow
		// lookups to proceed.
		int DIEIndex = SubprogramDIEAddrInfos.size();
		SubprogramDIEAddrInfos.push_back({Die, (uint64_t)-1, {}});
for (const auto &R : Die.getAddressRanges()) {		for (const auto &R : Die.getAddressRanges()) {
// Ignore 0-sized ranges.		// Ignore 0-sized ranges.
if (R.LowPC == R.HighPC)		if (R.LowPC == R.HighPC)
continue;		continue;
auto B = AddrDieMap.upper_bound(R.LowPC);
if (B != AddrDieMap.begin() && R.LowPC < (--B)->second.first) {		SubprogramDIEAddrMap.push_back({R.LowPC, DIEIndex});
// The range is a sub-range of existing ranges, we need to split the		SubprogramDIEAddrMap.push_back({R.HighPC, -1});
// existing range.
if (R.HighPC < B->second.first)		if (R.LowPC < SubprogramDIEAddrInfos.back().SubprogramBasePC)
AddrDieMap[R.HighPC] = B->second;		SubprogramDIEAddrInfos.back().SubprogramBasePC = R.LowPC;
if (R.LowPC > B->first)		}
AddrDieMap[B->first].first = R.LowPC;		} while (!Worklist.empty());
}
AddrDieMap[R.LowPC] = std::make_pair(R.HighPC, Die);		if (SubprogramDIEAddrMap.empty()) {
}		// If we found no ranges, create a no-op map so that lookups remain simple
}		// but never find anything.
// Parent DIEs are added to the AddrDieMap prior to the Children DIEs to		SubprogramDIEAddrMap.push_back({0, -1});
// simplify the logic to update AddrDieMap. The child's range will always		return;
// be equal or smaller than the parent's range. With this assumption, when		}
// adding one range into the map, it will at most split a range into 3
// sub-ranges.		// Next, sort the ranges and remove both exact duplicates and runs with the
for (DWARFDie Child = Die.getFirstChild(); Child; Child = Child.getSibling())		// same DIE index. We order the ranges so that non-empty ranges are
updateAddressDieMap(Child);		// preferred. Because there may be ties, we also need to use stable sort.
		std::stable_sort(SubprogramDIEAddrMap.begin(), SubprogramDIEAddrMap.end(),
		[](const std::pair<uint64_t, int64_t> &LHS,
		const std::pair<uint64_t, int64_t> &RHS) {
		if (LHS.first < RHS.first)
		return true;
		if (LHS.first > RHS.first)
		return false;

		// For ranges that start at the same address, keep the one
		// with a DIE.
		if (LHS.second != -1 && RHS.second == -1)
		return true;

		return false;
		});
		SubprogramDIEAddrMap.erase(
		std::unique(SubprogramDIEAddrMap.begin(), SubprogramDIEAddrMap.end(),
		[](const std::pair<uint64_t, int64_t> &LHS,
		const std::pair<uint64_t, int64_t> &RHS) {
		// If the start addresses are exactly the same, we can
		// remove all but the first one as it is the only one that
		// will be found and used.
		//
		// If the DIE indices are the same, we can "merge" the
		// ranges by eliminating the second.
		return LHS.first == RHS.first \|\| LHS.second == RHS.second;
		}),
		SubprogramDIEAddrMap.end());

		assert(SubprogramDIEAddrMap.back().second == -1 &&
		"The last interval must not have a DIE as each DIE's address range is "
		"bounded.");
		}

		// Build the second level of mapping from PC to DIE, specifically one that maps
		// a PC within a particular DWARF subprogram into a precise, maximally nested
		// inlined subroutine DIE (if any exists). We build a separate map for each
		// subprogram because many subprograms will never get queried for an address
		// and this allows us to be significantly lazier in reading the DWARF itself.
		void DWARFUnit::buildInlinedSubroutineDIEAddrMap(
		SubprogramDIEAddrInfo &SPInfo) {
		auto &AddrMap = SPInfo.InlinedSubroutineDIEAddrMap;
		uint64_t BasePC = SPInfo.SubprogramBasePC;

		auto SubroutineAddrMapSorter = [](const std::pair<int, int> &LHS,
		const std::pair<int, int> &RHS) {
		if (LHS.first < RHS.first)
		return true;
		if (LHS.first > RHS.first)
		return false;

		// For ranges that start at the same address, keep the
		// non-empty one.
		if (LHS.second != -1 && RHS.second == -1)
		return true;

		return false;
		};
		auto SubroutineAddrMapUniquer = [](const std::pair<int, int> &LHS,
		const std::pair<int, int> &RHS) {
		// If the start addresses are exactly the same, we can
		// remove all but the first one as it is the only one that
		// will be found and used.
		//
		// If the DIE indices are the same, we can "merge" the
		// ranges by eliminating the second.
		return LHS.first == RHS.first \|\| LHS.second == RHS.second;
		};

		struct DieAndParentIntervalRange {
		DWARFDie Die;
		int ParentIntervalsBeginIdx, ParentIntervalsEndIdx;
		};

		SmallVector<DieAndParentIntervalRange, 16> Worklist;
		auto EnqueueChildDIEs = [&](const DWARFDie &Die, int ParentIntervalsBeginIdx,
		int ParentIntervalsEndIdx) {
		for (DWARFDie Child : Die.children())
		Worklist.push_back(
		{Child, ParentIntervalsBeginIdx, ParentIntervalsEndIdx});
		};
		EnqueueChildDIEs(SPInfo.SubprogramDIE, 0, 0);
		while (!Worklist.empty()) {
		DWARFDie Die = Worklist.back().Die;
		int ParentIntervalsBeginIdx = Worklist.back().ParentIntervalsBeginIdx;
		int ParentIntervalsEndIdx = Worklist.back().ParentIntervalsEndIdx;
		Worklist.pop_back();

		// If we encounter a nested subprogram, simply ignore it. We map to
		// (disjoint) subprograms before arriving here and we don't want to examine
		// any inlined subroutines of an unrelated subpragram.
		if (Die.getTag() == DW_TAG_subprogram)
		continue;

		// For non-subroutines, just recurse to keep searching for inlined
		// subroutines.
		if (Die.getTag() != DW_TAG_inlined_subroutine) {
		EnqueueChildDIEs(Die, ParentIntervalsBeginIdx, ParentIntervalsEndIdx);
		continue;
		}

		// Capture the inlined subroutine DIE that we will reference from the map.
		int DIEIndex = InlinedSubroutineDIEs.size();
		InlinedSubroutineDIEs.push_back(Die);

		int DieIntervalsBeginIdx = AddrMap.size();
		// First collect the PC ranges for this DIE into our subroutine interval
		// map.
		for (auto R : Die.getAddressRanges()) {
		// Clamp the PCs to be above the base.
		R.LowPC = std::max(R.LowPC, BasePC);
		R.HighPC = std::max(R.HighPC, BasePC);
		// Compute relative PCs from the subprogram base and drop down to an
		// unsigned 32-bit int to represent them within the data structure. This
		// lets us cover a 4gb single subprogram. Because subprograms may be
		// partitioned into distant parts of a binary (think hot/cold
		// partitioning) we want to preserve as much as we can here without
		// burning extra memory. Past that, we will simply truncate and lose the
		// ability to map those PCs to a DIE more precise than the subprogram.
		const uint32_t MaxRelativePC = std::numeric_limits<uint32_t>::max();
		uint32_t RelativeLowPC = (R.LowPC - BasePC) > (uint64_t)MaxRelativePC
		? MaxRelativePC
		: (uint32_t)(R.LowPC - BasePC);
		uint32_t RelativeHighPC = (R.HighPC - BasePC) > (uint64_t)MaxRelativePC
		? MaxRelativePC
		: (uint32_t)(R.HighPC - BasePC);
		// Ignore empty or bogus ranges.
		if (RelativeLowPC >= RelativeHighPC)
		continue;
		AddrMap.push_back({RelativeLowPC, DIEIndex});
		AddrMap.push_back({RelativeHighPC, -1});
		}

		// If there are no address ranges, there is nothing to do to map into them
		// and there cannot be any child subroutine DIEs with address ranges of
		// interest as those would all be required to nest within this DIE's
		// non-existent ranges, so we can immediately continue to the next DIE in
		// the worklist.
		if (DieIntervalsBeginIdx == (int)AddrMap.size())
		continue;

		// The PCs from this DIE should never overlap, so we can easily sort them
		// here.
		std::sort(AddrMap.begin() + DieIntervalsBeginIdx, AddrMap.end(),
		SubroutineAddrMapSorter);
		// Remove any dead ranges. These should only come from "empty" ranges that
		// were clobbered by some other range.
		AddrMap.erase(std::unique(AddrMap.begin() + DieIntervalsBeginIdx,
		AddrMap.end(), SubroutineAddrMapUniquer),
		AddrMap.end());

		// Compute the end index of this DIE's addr map intervals.
		int DieIntervalsEndIdx = AddrMap.size();

		assert(DieIntervalsBeginIdx != DieIntervalsEndIdx &&
		"Must not have an empty map for this layer!");
		assert(AddrMap.back().second == -1 && "Must end with an empty range!");
		assert(std::is_sorted(AddrMap.begin() + DieIntervalsBeginIdx, AddrMap.end(),
		less_first()) &&
		"Failed to sort this DIE's interals!");

		// If we have any parent intervals, walk the newly added ranges and find
		// the parent ranges they were inserted into. Both of these are sorted and
		// neither has any overlaps. We need to append new ranges to split up any
		// parent ranges these new ranges would overlap when we merge them.
		if (ParentIntervalsBeginIdx != ParentIntervalsEndIdx) {
		int ParentIntervalIdx = ParentIntervalsBeginIdx;
		for (int i = DieIntervalsBeginIdx, e = DieIntervalsEndIdx - 1; i < e;
		++i) {
		const uint32_t IntervalStart = AddrMap[i].first;
		const uint32_t IntervalEnd = AddrMap[i + 1].first;
		const int IntervalDieIdx = AddrMap[i].second;
		if (IntervalDieIdx == -1) {
		// For empty intervals, nothing is required. This is a bit surprising
		// however. If the prior interval overlaps a parent interval and this
		// would be necessary to mark the end, we will synthesize a new end
		// that switches back to the parent DIE below. And this interval will
		// get dropped in favor of one with a DIE attached. However, we'll
		// still include this and so worst-case, it will still end the prior
		// interval.
		continue;
		}

		// We are walking the new ranges in order, so search forward from the
		// last point for a parent range that might overlap.
		auto ParentIntervalsRange =
		make_range(AddrMap.begin() + ParentIntervalIdx,
		AddrMap.begin() + ParentIntervalsEndIdx);
		assert(std::is_sorted(ParentIntervalsRange.begin(),
		ParentIntervalsRange.end(), less_first()) &&
		"Unsorted parent intervals can't be searched!");
		auto PI = std::upper_bound(
		ParentIntervalsRange.begin(), ParentIntervalsRange.end(),
		IntervalStart,
		[](uint32_t LHS, const std::pair<uint32_t, int32_t> &RHS) {
		return LHS < RHS.first;
		});
		if (PI == ParentIntervalsRange.begin() \|\|
		PI == ParentIntervalsRange.end())
		continue;

		ParentIntervalIdx = PI - AddrMap.begin();
		int32_t &ParentIntervalDieIdx = std::prev(PI)->second;
		uint32_t &ParentIntervalStart = std::prev(PI)->first;
		const uint32_t ParentIntervalEnd = PI->first;

		// If the new range starts exactly at the position of the parent range,
		// we need to adjust the parent range. Note that these collisions can
		// only happen with the original parent range because we will merge any
		// adjacent ranges in the child.
		if (IntervalStart == ParentIntervalStart) {
		// If there will be a tail, just shift the start of the parent
		// forward. Note that this cannot change the parent ordering.
		if (IntervalEnd < ParentIntervalEnd) {
		ParentIntervalStart = IntervalEnd;
		continue;
		}
		// Otherwise, mark this as becoming empty so we'll remove it and
		// prefer the child range.
		ParentIntervalDieIdx = -1;
		continue;
		}

		// Finally, if the parent interval will need to remain as a prefix to
		// this one, insert a new interval to cover any tail.
		if (IntervalEnd < ParentIntervalEnd)
		AddrMap.push_back({IntervalEnd, ParentIntervalDieIdx});
		}
		}

		// Note that we don't need to re-sort even this DIE's address map intervals
		// after this. All of the newly added intervals actually fill in gaps in
		// this DIE's address map, and we know that children won't need to lookup
		// into those gaps.

		// Recurse through its children, giving them the interval map range of this
		// DIE to use as their parent intervals.
		EnqueueChildDIEs(Die, DieIntervalsBeginIdx, DieIntervalsEndIdx);
		}

		if (AddrMap.empty()) {
		AddrMap.push_back({0, -1});
		return;
		}

		// Now that we've added all of the intervals needed, we need to resort and
		// unique them. Most notably, this will remove all the empty ranges that had
		// a parent range covering, etc. We only expect a single non-empty interval
		// at any given start point, so we just use std::sort. This could potentially
		// produce non-deterministic maps for invalid DWARF.
		std::sort(AddrMap.begin(), AddrMap.end(), SubroutineAddrMapSorter);
		AddrMap.erase(
		std::unique(AddrMap.begin(), AddrMap.end(), SubroutineAddrMapUniquer),
		AddrMap.end());
}		}

DWARFDie DWARFUnit::getSubroutineForAddress(uint64_t Address) {		DWARFDie DWARFUnit::getSubroutineForAddress(uint64_t Address) {
extractDIEsIfNeeded(false);		extractDIEsIfNeeded(false);
if (AddrDieMap.empty())
updateAddressDieMap(getUnitDIE());		// We use a two-level mapping structure to locate subroutines for a given PC
auto R = AddrDieMap.upper_bound(Address);		// address.
if (R == AddrDieMap.begin())		//
return DWARFDie();		// First, we map the address to a subprogram. This can be done more cheaply
// upper_bound's previous item contains Address.		// because subprograms cannot nest within each other. It also allows us to
--R;		// avoid detailed examination of many subprograms, instead only focusing on
if (Address >= R->second.first)		// the ones which we end up actively querying.
return DWARFDie();		if (SubprogramDIEAddrMap.empty())
return R->second.second;		buildSubprogramDIEAddrMap();

		assert(!SubprogramDIEAddrMap.empty() &&
		"We must always end up with a non-empty map!");

		auto I = std::upper_bound(
		SubprogramDIEAddrMap.begin(), SubprogramDIEAddrMap.end(), Address,
		[](uint64_t LHS, const std::pair<uint64_t, int64_t> &RHS) {
		return LHS < RHS.first;
		});
		// If we find the beginning, then the address is before the first subprogram.
		if (I == SubprogramDIEAddrMap.begin())
		return DWARFDie();
		// Back up to the interval containing the address and see if it
		// has a DIE associated with it.
		--I;
		if (I->second == -1)
		return DWARFDie();

		auto &SPInfo = SubprogramDIEAddrInfos[I->second];

		// Now that we have the subprogram for this address, we do the second level
		// mapping by building a map within a subprogram's PC range to any specific
		// inlined subroutine.
		if (SPInfo.InlinedSubroutineDIEAddrMap.empty())
		buildInlinedSubroutineDIEAddrMap(SPInfo);

		// We lookup within the inlined subroutine using a subprogram-relative
		// address.
		assert(Address >= SPInfo.SubprogramBasePC &&
		"Address isn't above the start of the subprogram!");
		uint32_t RelativeAddr = ((Address - SPInfo.SubprogramBasePC) >
		(uint64_t)std::numeric_limits<uint32_t>::max())
		? std::numeric_limits<uint32_t>::max()
		: (uint32_t)(Address - SPInfo.SubprogramBasePC);

		auto J =
		std::upper_bound(SPInfo.InlinedSubroutineDIEAddrMap.begin(),
		SPInfo.InlinedSubroutineDIEAddrMap.end(), RelativeAddr,
		[](uint32_t LHS, const std::pair<uint32_t, int32_t> &RHS) {
		return LHS < RHS.first;
		});
		// If we find the beginning, the address is before any inlined subroutine so
		// return the subprogram DIE.
		if (J == SPInfo.InlinedSubroutineDIEAddrMap.begin())
		return SPInfo.SubprogramDIE;
		// Back up `J` and return the inlined subroutine if we have one or the
		// subprogram if we don't.
		--J;
		return J->second == -1 ? SPInfo.SubprogramDIE
		: InlinedSubroutineDIEs[J->second];
}		}

void		void
DWARFUnit::getInlinedChainForAddress(uint64_t Address,		DWARFUnit::getInlinedChainForAddress(uint64_t Address,
SmallVectorImpl<DWARFDie> &InlinedChain) {		SmallVectorImpl<DWARFDie> &InlinedChain) {
assert(InlinedChain.empty());		assert(InlinedChain.empty());
// Try to look for subprogram DIEs in the DWO file.		// Try to look for subprogram DIEs in the DWO file.
parseDWO();		parseDWO();
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines