This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/
-
lib/
-
CodeGen/
-
LiveDebugValues.cpp

Differential D74706

[WIP][DebugInfo][LiveDebugValues] Index variable location IDs by machine location
AbandonedPublic

Authored by jmorse on Feb 17 2020, 5:28 AM.

Download Raw Diff

Details

Reviewers: None

Summary

This patch isn't intended for review right now, instead for discussion in comparison with D74633.

This patch morphs the "VarMap" map of VarLoc -> ID numbers into a class that also indexes all variable ID numbers by their register / machine location. This speeds up various parts of LiveDebugValues that step over all variable locations in search of a particular machine location, at the expense of additional memory consumption.

Diff Detail

Event Timeline

jmorse created this revision.Feb 17 2020, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2020, 5:28 AM

Herald added subscribers: llvm-commits, arphaman, hiraditya, kristof.beyls. · View Herald Transcript

jmorse edited the summary of this revision. (Show Details)Feb 17 2020, 5:30 AM

jmorse mentioned this in D74633: [LiveDebugValues] Visit open var locs just once in transferRegisterDef, NFC.Feb 17 2020, 6:15 AM

Nice. I hacked up a similar version of this patch that defined VarLocMap as DenseMap<unsigned, UniqueVector<VarLoc>>. Similar to here, the idea was to maintain a UniqueVector<VarLoc> for each register. The trick here was to make "ID"s 64-bit, so that they could efficiently encode a (register, index into the UniqueVector for that register) pair. To do a lookup given an ID, we'd break apart the 64-bit ID to grab the UniqueVector for the register (non-register locations were all slapped into the '0'th UniqueVector), then index into the vector. This seems less general than your patch.

When I was putting the finishing touches on this, I took a closer look at my profiling run with D74633 applied and saw this:

2.49 min   27.8%	0 s	LiveDebugValues::process
2.41 min   27.0%	5.40 s	LiveDebugValues::transferRegisterDef
1.51 min   16.9%	1.51 min LiveDebugValues::VarLoc::isDescribedByReg() const
32.73 s    6.1%		8.70 s	 llvm::SparseBitVector<128u>::SparseBitVectorIterator::operator++()

It looks like it costs us 33 seconds to iterate over OpenLocs /once/. If we were to do this once per location, I fear performance would regress. Well, at least for this particular WebKit test case I'm looking at, which has truly massive basic blocks.

So, while I think there is potential here, I'm not sure it's architecturally a good fit for transferRegisterDef (although, I readily admit it could be a good fit elsewhere).

What I was thinking of as an alternative (haven't had time to hack this up, might get to it today if my bug queue permits) is to replace our use of SparseBitVector with an IntervalSet. For prototyping purposes I was thinking of implementing this as an llvm::IntervalMap, equipped with set ops like intersectWithComplement, and a fancy iterator that lets us write for (unsigned ID : theIntervalSet). My intuition/expectation is that the pointer chasing in SparseBitVector is causing all kinds of issues: if that's correct, and we fix that first, that makes space for investigating cool alternative layouts for VarLocMap.

wdyt?

Hi Vedant,

Sorry for the complete blank, the usual distractions and some illness (but not the dreaded lurgi!) to blame.

For the record, on the massive-blocks-full-of-debug-instrs inputs we've been seeing, this particular patch got LiveDebugValues to about eight seconds from >100, but your coalescing bitvector implementation got it down to 0.3 seconds, many thanks for that!

In the meantime I've been kicking around ideas for different ways of doing LiveDebugValues' job -- I get the feeling that we artificially increase the N-size of the problem by pairing up variables and registers too early, where we might instead be able to keep the analysis proportionate to the number of DBG_VALUEs in a function. I'm looking at various inputs, but would you be able to point out what parts of WebKit were producing the pathological behaviour? (Assuming it's part of an open source release). It'd greatly help settling on a solution that's acceptable to all.

(I think the number of VarLocs for a function would be even larger than today if these [0, 1] transfer-bailout returns didn't exist. I don't know why they do).

[0] https://github.com/llvm/llvm-project/blob/ddd2f4b96f9f3967a66e744a98b6ecec25c55de8/llvm/lib/CodeGen/LiveDebugValues.cpp#L1371
[1] https://github.com/llvm/llvm-project/blob/ddd2f4b96f9f3967a66e744a98b6ecec25c55de8/llvm/lib/CodeGen/LiveDebugValues.cpp#L1301

Herald added a subscriber: danielkiss. · View Herald TranscriptApr 6 2020, 9:00 AM

I'm glad you've recovered!

I was looking at a benchmark for glyph rendering in WebKit. Unfortunately, the file appears to reference headers from an Apple-internal SDK, and I can't find an analogous benchmark in open source. I've stashed the bitcode for now: happy to benchmark future patches against it if requested.

Re: the early-exits [0, 1] you linked to, I don't understand why these exist. It could be a compile-time hack, but I'm not sure about that, because the loop over OpenLocs remains O(n). Do you expect getting rid of the early exits to result in a large increase in VarLocs? I suppose I'd expect a small/modest increase.

I was looking at a benchmark for glyph rendering in WebKit. Unfortunately, the file appears to reference headers from an Apple-internal SDK, and I can't find an analogous benchmark in open source. I've stashed the bitcode for now: happy to benchmark future patches against it if requested.

Righty-ho, I think I've got a reasonable body of benchmarks otherwise,

Re: the early-exits [0, 1] you linked to, I don't understand why these exist. It could be a compile-time hack, but I'm not sure about that, because the loop over OpenLocs remains O(n). Do you expect getting rid of the early exits to result in a large increase in VarLocs? I suppose I'd expect a small/modest increase.

It's odd; the return turned up in [2] in transferSpillInst and was duplicated in [3], possibly it's just an oversight. My fear is that large numbers of inlined variables could be transferred but currently aren't. This is a hunch though, at some point I'll have numbers to analyse.

[2] https://reviews.llvm.org/D29500
[3] https://reviews.llvm.org/D44016

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

LiveDebugValues.cpp

132 lines

Diff 244945

llvm/lib/CodeGen/LiveDebugValues.cpp

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	static bool isRegOtherThanSPAndFP(const MachineOperand &Op,
return Reg && Reg != SP && Reg != FP;		return Reg && Reg != SP && Reg != FP;
}		}

namespace {		namespace {

using DefinedRegsSet = SmallSet<Register, 32>;		using DefinedRegsSet = SmallSet<Register, 32>;

class LiveDebugValues : public MachineFunctionPass {		class LiveDebugValues : public MachineFunctionPass {
		friend class VarSetAdapter;
private:		private:
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetFrameLowering *TFI;		const TargetFrameLowering *TFI;
BitVector CalleeSavedRegs;		BitVector CalleeSavedRegs;
LexicalScopes LS;		LexicalScopes LS;

enum struct TransferKind { TransferCopy, TransferSpill, TransferRestore };		enum struct TransferKind { TransferCopy, TransferSpill, TransferRestore };
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	enum VarLocKind {
ImmediateKind,		ImmediateKind,
EntryValueKind,		EntryValueKind,
EntryValueBackupKind,		EntryValueBackupKind,
EntryValueCopyBackupKind		EntryValueCopyBackupKind
} Kind = InvalidKind;		} Kind = InvalidKind;

/// The value location. Stored separately to avoid repeatedly		/// The value location. Stored separately to avoid repeatedly
/// extracting it from MI.		/// extracting it from MI.
union {		typedef union {
uint64_t RegNo;		uint64_t RegNo;
SpillLoc SpillLocation;		SpillLoc SpillLocation;
uint64_t Hash;		uint64_t Hash;
int64_t Immediate;		int64_t Immediate;
const ConstantFP *FPImm;		const ConstantFP *FPImm;
const ConstantInt *CImm;		const ConstantInt *CImm;
} Loc;		} LocData;
		LocData Loc;

		// Define a type that uniquely identifies a "machine location", i.e.,
		// where the variable value is stored.
		using MachineLocation = std::pair<VarLocKind, LocData>;

VarLoc(const MachineInstr &MI, LexicalScopes &LS)		VarLoc(const MachineInstr &MI, LexicalScopes &LS)
: Var(MI.getDebugVariable(), MI.getDebugExpression(),		: Var(MI.getDebugVariable(), MI.getDebugExpression(),
MI.getDebugLoc()->getInlinedAt()),		MI.getDebugLoc()->getInlinedAt()),
Expr(MI.getDebugExpression()), MI(MI), UVS(MI.getDebugLoc(), LS) {		Expr(MI.getDebugExpression()), MI(MI), UVS(MI.getDebugLoc(), LS) {
static_assert((sizeof(Loc) == sizeof(uint64_t)),		static_assert((sizeof(Loc) == sizeof(uint64_t)),
"hash does not cover all members of Loc");		"hash does not cover all members of Loc");
assert(MI.isDebugValue() && "not a DBG_VALUE");		assert(MI.isDebugValue() && "not a DBG_VALUE");
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	void dump(const TargetRegisterInfo *TRI, raw_ostream &Out = dbgs()) const {

if (isEntryBackupLoc())		if (isEntryBackupLoc())
dbgs() << " (backup loc)\n";		dbgs() << " (backup loc)\n";
else		else
dbgs() << "\n";		dbgs() << "\n";
}		}
#endif		#endif

		/// Extract a MachineLocation object from this VarLoc.
		MachineLocation getMachineLocation() const {
		return std::make_pair(Kind, Loc);
		}

bool operator==(const VarLoc &Other) const {		bool operator==(const VarLoc &Other) const {
return Kind == Other.Kind && Var == Other.Var &&		return Kind == Other.Kind && Var == Other.Var &&
Loc.Hash == Other.Loc.Hash && Expr == Other.Expr;		Loc.Hash == Other.Loc.Hash && Expr == Other.Expr;
}		}

/// This operator guarantees that VarLocs are sorted by Variable first.		/// This operator guarantees that VarLocs are sorted by Variable first.
bool operator<(const VarLoc &Other) const {		bool operator<(const VarLoc &Other) const {
return std::tie(Var, Kind, Loc.Hash, Expr) <		return std::tie(Var, Kind, Loc.Hash, Expr) <
std::tie(Other.Var, Other.Kind, Other.Loc.Hash, Other.Expr);		std::tie(Other.Var, Other.Kind, Other.Loc.Hash, Other.Expr);
}		}
};		};
		friend struct llvm::DenseMapInfo<VarLoc::MachineLocation>;

using VarLocMap = UniqueVector<VarLoc>;
using VarLocSet = SparseBitVector<>;		using VarLocSet = SparseBitVector<>;
using VarLocInMBB = SmallDenseMap<const MachineBasicBlock *, VarLocSet>;		using VarLocInMBB = SmallDenseMap<const MachineBasicBlock *, VarLocSet>;

		/// Uniqueify VarLocs by giving them a variable location ID number. In
		/// addition, index all locations by their machine location. Ocasionally
		/// LiveDebugValues iterates over every variable location to find those
		/// that match a specific machine location, which benefits from such an
		/// index.
		class VarLocMap {
		/// Maintain a unique ID number for each VarLoc.
		UniqueVector<VarLoc> IdxToLoc;
		/// A map from machine location to the set of VarLocIDs for that machine
		/// location.
		DenseMap<VarLoc::MachineLocation, VarLocSet> LocToIdxs;
		/// An empty set to provide when there are no locations in this map.
		static VarLocSet EmptySet;

		public:
		/// Lookup by ID number -> examine the UniqueVector.
		const VarLoc &operator[](unsigned ID) const { return IdxToLoc[ID]; }

		/// Insert a new location and produce an ID number. Insert into the
		/// machine-location index too.
		unsigned insert(const VarLoc &Loc) {
		unsigned ID = IdxToLoc.insert(Loc);
		LocToIdxs[Loc.getMachineLocation()].set(ID);
		return ID;
		}

		/// Lookup the set of locations based on the given machine location.
		const VarLocSet &getIDsForLocation(const VarLoc::MachineLocation &L);
		};

struct TransferDebugPair {		struct TransferDebugPair {
MachineInstr *TransferInst; /// Instruction where this transfer occurs.		MachineInstr *TransferInst; /// Instruction where this transfer occurs.
unsigned LocationID; /// Location number for the transfer dest.		unsigned LocationID; /// Location number for the transfer dest.
};		};
using TransferMap = SmallVector<TransferDebugPair, 4>;		using TransferMap = SmallVector<TransferDebugPair, 4>;

// Types for recording sets of variable fragments that overlap. For a given		// Types for recording sets of variable fragments that overlap. For a given
// local variable, we record all other fragments of that variable that could		// local variable, we record all other fragments of that variable that could
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	public:
void printVarLocInMBB(const MachineFunction &MF, const VarLocInMBB &V,		void printVarLocInMBB(const MachineFunction &MF, const VarLocInMBB &V,
const VarLocMap &VarLocIDs, const char *msg,		const VarLocMap &VarLocIDs, const char *msg,
raw_ostream &Out) const;		raw_ostream &Out) const;

/// Calculate the liveness information for the given machine function.		/// Calculate the liveness information for the given machine function.
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;
};		};

		/// An adapter class that takes a VarLocSet (sparse bitvector) and pretends
		/// it's a set, which can then be combined with std::inserter. This allows
		/// us to store std::algorithm results into VarLocSets without keeping
		/// intermediate results somewhere.
		class VarSetAdapter {
		using VarLocSet = LiveDebugValues::VarLocSet; VarLocSet &Set;

		public:
		typedef unsigned *iterator;
		typedef unsigned value_type;

		VarSetAdapter(VarLocSet &_Set) : Set(_Set) {}

		iterator insert(iterator it, unsigned ID) {
		Set.set(ID);
		return it;
		}
		};

} // end anonymous namespace		} // end anonymous namespace

		// Hash map boilerplate for VarLoc::MachineLocation.
		namespace llvm {

		template <> struct DenseMapInfo<LiveDebugValues::VarLoc::MachineLocation> {
		using VL = LiveDebugValues::VarLoc;
		using Loc = VL::MachineLocation;
		static inline Loc getEmptyKey() {
		VL::LocData LD;
		LD.Hash = 0;
		return std::make_pair(VL::InvalidKind, LD);
		}
		static inline Loc getTombstoneKey() {
		VL::LocData LD;
		LD.Hash = 1;
		return std::make_pair(VL::InvalidKind, LD);
		}
		static inline unsigned getHashValue(const Loc &L) {
		return hash_combine(hash_value(L.first), hash_value(L.second.Hash));
		}
		static inline bool isEqual(const Loc &A, const Loc &B) {
		return A.first == B.first && A.second.Hash == B.second.Hash;
		}
		};

		} // end namespace llvm

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation		// Implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

char LiveDebugValues::ID = 0;		char LiveDebugValues::ID = 0;

char &llvm::LiveDebugValuesID = LiveDebugValues::ID;		char &llvm::LiveDebugValuesID = LiveDebugValues::ID;

		LiveDebugValues::VarLocSet LiveDebugValues::VarLocMap::EmptySet;

INITIALIZE_PASS(LiveDebugValues, DEBUG_TYPE, "Live DEBUG_VALUE analysis",		INITIALIZE_PASS(LiveDebugValues, DEBUG_TYPE, "Live DEBUG_VALUE analysis",
false, false)		false, false)

/// Default construct and initialize the pass.		/// Default construct and initialize the pass.
LiveDebugValues::LiveDebugValues() : MachineFunctionPass(ID) {		LiveDebugValues::LiveDebugValues() : MachineFunctionPass(ID) {
initializeLiveDebugValuesPass(*PassRegistry::getPassRegistry());		initializeLiveDebugValuesPass(*PassRegistry::getPassRegistry());
}		}

/// Tell the pass manager which passes we depend on and what information we		/// Tell the pass manager which passes we depend on and what information we
/// preserve.		/// preserve.
void LiveDebugValues::getAnalysisUsage(AnalysisUsage &AU) const {		void LiveDebugValues::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

		// Return reference to the set of variable locations based on the given
		// machine location, or an empty set.
		auto LiveDebugValues::VarLocMap::getIDsForLocation(
		const VarLoc::MachineLocation &L) -> const VarLocSet & {
		auto IDSetIter = LocToIdxs.find(L);
		if (IDSetIter == LocToIdxs.end())
		return EmptySet;
		return IDSetIter->second;
		}

/// Erase a variable from the set of open ranges, and additionally erase any		/// Erase a variable from the set of open ranges, and additionally erase any
/// fragments that may overlap it. If the VarLoc is a buckup location, erase		/// fragments that may overlap it. If the VarLoc is a buckup location, erase
/// the variable from the EntryValuesBackupVars set, indicating we should stop		/// the variable from the EntryValuesBackupVars set, indicating we should stop
/// tracking its backup entry location. Otherwise, if the VarLoc is primary		/// tracking its backup entry location. Otherwise, if the VarLoc is primary
/// location, erase the variable from the Vars set.		/// location, erase the variable from the Vars set.
void LiveDebugValues::OpenRangesSet::erase(const VarLoc &VL) {		void LiveDebugValues::OpenRangesSet::erase(const VarLoc &VL) {
// Erasure helper.		// Erasure helper.
auto DoErase = [VL, this](DebugVariable VarToErase) {		auto DoErase = [VL, this](DebugVariable VarToErase) {
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	void LiveDebugValues::transferRegisterDef(
SparseBitVector<> KillSet;		SparseBitVector<> KillSet;
for (const MachineOperand &MO : MI.operands()) {		for (const MachineOperand &MO : MI.operands()) {
// Determine whether the operand is a register def. Assume that call		// Determine whether the operand is a register def. Assume that call
// instructions never clobber SP, because some backends (e.g., AArch64)		// instructions never clobber SP, because some backends (e.g., AArch64)
// never list SP in the regmask.		// never list SP in the regmask.
if (MO.isReg() && MO.isDef() && MO.getReg() &&		if (MO.isReg() && MO.isDef() && MO.getReg() &&
Register::isPhysicalRegister(MO.getReg()) &&		Register::isPhysicalRegister(MO.getReg()) &&
!(MI.isCall() && MO.getReg() == SP)) {		!(MI.isCall() && MO.getReg() == SP)) {
// Remove ranges of all aliased registers.		// Remove ranges of all aliased registers. Rather than iterating over
for (MCRegAliasIterator RAI(MO.getReg(), TRI, true); RAI.isValid(); ++RAI)		// every single variable location for every possible register (/and/
for (unsigned ID : OpenRanges.getVarLocs())		// subregister), query VarLocMaps machine-location index. Furthermore,
if (VarLocIDs[ID].isDescribedByReg() == *RAI)		// use std::set_intersection to produce locations both live and based on
KillSet.set(ID);		// the machine locations.
		for (MCRegAliasIterator RAI(MO.getReg(), TRI, true); RAI.isValid();
		++RAI) {
		const VarLocSet &OpenLocs = OpenRanges.getVarLocs();

		// Produce a register MachineLocation.
		VarLoc::LocData LD;
		LD.RegNo = *RAI;
		VarLoc::MachineLocation ML = std::make_pair(VarLoc::RegisterKind, LD);

		// Query for the set of all locations based on that MachineLocation.
		const VarLocSet &AllLocationLocs = VarLocIDs.getIDsForLocation(ML);

		VarSetAdapter AddToKill(KillSet);
		std::set_intersection(OpenLocs.begin(), OpenLocs.end(),
		AllLocationLocs.begin(), AllLocationLocs.end(),
		std::inserter(AddToKill, nullptr));
		}
} else if (MO.isRegMask()) {		} else if (MO.isRegMask()) {
// Remove ranges of all clobbered registers. Register masks don't usually		// Remove ranges of all clobbered registers. Register masks don't usually
// list SP as preserved. While the debug info may be off for an		// list SP as preserved. While the debug info may be off for an
// instruction or two around callee-cleanup calls, transferring the		// instruction or two around callee-cleanup calls, transferring the
// DEBUG_VALUE across the call is still a better user experience.		// DEBUG_VALUE across the call is still a better user experience.
for (unsigned ID : OpenRanges.getVarLocs()) {		for (unsigned ID : OpenRanges.getVarLocs()) {
unsigned Reg = VarLocIDs[ID].isDescribedByReg();		unsigned Reg = VarLocIDs[ID].isDescribedByReg();
if (Reg && Reg != SP && MO.clobbersPhysReg(Reg))		if (Reg && Reg != SP && MO.clobbersPhysReg(Reg))
▲ Show 20 Lines • Show All 695 Lines • Show Last 20 Lines