This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Speed up placeDbgValues, NFC
AbandonedPublic

Authored by vsk on Feb 14 2020, 1:16 PM.

Download Raw Diff

Details

Reviewers

jmorse
aprantl
rnk

Summary

For a file in WebKit, this brings the time spent in CodeGenPrepare from
10 minutes to 50 milliseconds. The reduction comes from using a cache of
visited instructions to speed up block-local dominance queries.

Testing: I built LNT using the Os-g cmake cache with & without this
patch, then diffed the object files to verify there was no binary diff
(and check-llvm of course).

rdar://59446577

Diff Detail

Event Timeline

vsk created this revision.Feb 14 2020, 1:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 1:16 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

lgtm

llvm/lib/CodeGen/CodeGenPrepare.cpp
7273	I guess by the same reasoning this is still useful even if we had D51664. Although, if no reordering is required because all defs are before uses, D51664 would still help, and I think that is the common case. I assume the webkit case is a long BB of: def A ; dbgval(A) ; def B ; dbgval(B)...

This revision is now accepted and ready to land.Feb 14 2020, 1:26 PM

vsk marked an inline comment as done.Feb 14 2020, 2:33 PM

vsk added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
7273	I believe D51664 makes this patch unnecessary. I grabbed the NumDbgValueMoved statistic from the WebKit test case: it was 0. This patch might still improve performance if CGP were to encounter lots of weird dbg.value use-before-def. But I think we want to fix those bugs earlier in the pipeline anyway. Should we just hold off on landing this, or put it in as a stopgap until D51664 is merged (and file a PR for it to be reverted afterwards)?

rnk added inline comments.Feb 14 2020, 3:14 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
7273	I guess in the long run it would be best to not have this complexity here. I think the numbering invalidation logic could use whatever MLIR does to avoid invalidating the numbering too often, and then this code can remain simple. So, I agree we should either land this as a stop gap with a plan to revert, or wait. I plan to leave the RFC alone until next Thursday, and hopefully land D51664 then. Given that timeline, you can either land or wait. I suspect this will not be the only code in LLVM that can be simplified after the instruction numbering lands.

LGTM landed whichever way; am I right in thinking that this approach will still be faster than the domtree query even if the block is very large? (On account of the pointer set lookup being ordered, instead of walking through each instruction in a block).

Longer term, placeDbgValues should be entirely eliminated anyway, IIRC it's only still around because CodeGenPrepare doesn't try to update variable locations when it optimises things.

In D74642#1878873, @jmorse wrote:

LGTM landed whichever way;

I'm hoping we can just land D51664 and avoid making this code more complex.

am I right in thinking that this approach will still be faster than the domtree query even if the block is very large? (On account of the pointer set lookup being ordered, instead of walking through each instruction in a block).

For very large basic blocks, I think this patch might make placeDbgValues a little faster, because the set lookup is theoretically O(1) whereas Instruction::comesBefore is O(n). The worst-case bound is arguably the wrong thing to look at though because a) we'd cache the instruction order, and b) (more speculative) we don't expect to reorder very many debug intrinsics within a block? Note that not all reordering invalidates the instruction order because Reid made removeFromParent() preserve the order.

I think for the case of reordering two existing instructions, the ordering will be invalidated. So, after D51664, this is still O(n^2) in the case where every other instruction is a dbg.value that needs sinking. However, that's a pretty pathological input, and I think we could address that use case in a general way by leaving holes in the numbering, rather than by keeping this patch around.

In D74642#1881690, @rnk wrote:

I think for the case of reordering two existing instructions, the ordering will be invalidated. So, after D51664, this is still O(n^2) in the case where every other instruction is a dbg.value that needs sinking. However, that's a pretty pathological input, and I think we could address that use case in a general way by leaving holes in the numbering, rather than by keeping this patch around.

Agreed (point "b" in my last comment was my hand-waving about this being a fairly hard-to-hit case). I don't expect it to turn up, but if it does I'll look at poking holes in the instruction ordering.

vsk abandoned this revision.Feb 18 2020, 3:52 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

25 lines

Diff 244747

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,258 Lines • ▼ Show 20 Lines

	// A llvm.dbg.value may be using a value before its definition, due to			// A llvm.dbg.value may be using a value before its definition, due to
	// optimizations in this pass and others. Scan for such dbg.values, and rescue			// optimizations in this pass and others. Scan for such dbg.values, and rescue
	// them by moving the dbg.value to immediately after the value definition.			// them by moving the dbg.value to immediately after the value definition.
	// FIXME: Ideally this should never be necessary, and this has the potential			// FIXME: Ideally this should never be necessary, and this has the potential
	// to re-order dbg.value intrinsics.			// to re-order dbg.value intrinsics.
	bool CodeGenPrepare::placeDbgValues(Function &F) {			bool CodeGenPrepare::placeDbgValues(Function &F) {
	bool MadeChange = false;			bool MadeChange = false;
	DominatorTree DT(F);			DominatorTree &DT = getDT(F);

	for (BasicBlock &BB : F) {			for (BasicBlock &BB : F) {
				// Maintain an ersatz OrderedBasicBlock here, as it dramatically speeds up
				// dominance queries. For a file in WebKit, this reduces time spent in CGP
				// from 10 minutes to 50 milliseconds. We can't just use OrderedBasicBlock
				// because moving around dbg.values causes invalidation.
				rnkUnsubmitted Not Done Reply Inline Actions I guess by the same reasoning this is still useful even if we had D51664. Although, if no reordering is required because all defs are before uses, D51664 would still help, and I think that is the common case. I assume the webkit case is a long BB of: def A ; dbgval(A) ; def B ; dbgval(B)... rnk: I guess by the same reasoning this is still useful even if we had D51664. Although, if no…
				vskAuthorUnsubmitted Done Reply Inline Actions I believe D51664 makes this patch unnecessary. I grabbed the NumDbgValueMoved statistic from the WebKit test case: it was 0. This patch might still improve performance if CGP were to encounter lots of weird dbg.value use-before-def. But I think we want to fix those bugs earlier in the pipeline anyway. Should we just hold off on landing this, or put it in as a stopgap until D51664 is merged (and file a PR for it to be reverted afterwards)? vsk: I believe D51664 makes this patch unnecessary. I grabbed the NumDbgValueMoved statistic from…
				rnkUnsubmitted Not Done Reply Inline Actions I guess in the long run it would be best to not have this complexity here. I think the numbering invalidation logic could use whatever MLIR does to avoid invalidating the numbering too often, and then this code can remain simple. So, I agree we should either land this as a stop gap with a plan to revert, or wait. I plan to leave the RFC alone until next Thursday, and hopefully land D51664 then. Given that timeline, you can either land or wait. I suspect this will not be the only code in LLVM that can be simplified after the instruction numbering lands. rnk: I guess in the long run it would be best to not have this complexity here. I think the…
				SmallPtrSet<const Instruction *, 32> VisitedInsts;
	for (BasicBlock::iterator BI = BB.begin(), BE = BB.end(); BI != BE;) {			for (BasicBlock::iterator BI = BB.begin(), BE = BB.end(); BI != BE;) {
	Instruction Insn = &BI++;			Instruction Insn = &BI++;
				VisitedInsts.insert(Insn);

	DbgValueInst *DVI = dyn_cast<DbgValueInst>(Insn);			DbgValueInst *DVI = dyn_cast<DbgValueInst>(Insn);
	if (!DVI)			if (!DVI)
	continue;			continue;

	Instruction *VI = dyn_cast_or_null<Instruction>(DVI->getValue());			Instruction *VI = dyn_cast_or_null<Instruction>(DVI->getValue());

	if (!VI \|\| VI->isTerminator())			if (!VI \|\| VI->isTerminator())
	continue;			continue;

	// If VI is a phi in a block with an EHPad terminator, we can't insert			// If VI is a phi in a block with an EHPad terminator, we can't insert
	// after it.			// after it.
	if (isa<PHINode>(VI) && VI->getParent()->getTerminator()->isEHPad())			BasicBlock *VIParent = VI->getParent();
				if (isa<PHINode>(VI) && VIParent->getTerminator()->isEHPad())
	continue;			continue;

	// If the defining instruction dominates the dbg.value, we do not need			// If the defining instruction dominates the dbg.value, we do not need
	// to move the dbg.value.			// to move the dbg.value. Query the visited instruction set first as it's
	if (DT.dominates(VI, DVI))			// much cheaper than the domtree query: if we've seen VI, we must have
				// have either seen it before DVI or moved DVI, so `VI dom DVI`.
				bool UseAfterDef =
				(VIParent == &BB) ? VisitedInsts.count(VI) : DT.dominates(VI, DVI);
				if (UseAfterDef)
	continue;			continue;

	LLVM_DEBUG(dbgs() << "Moving Debug Value before :\n"			LLVM_DEBUG(dbgs() << "Moving Debug Value before :\n"
	<< DVI << ' ' << VI);			<< DVI << ' ' << VI);
	DVI->removeFromParent();			DVI->removeFromParent();
				// We will either move DVI out of this basic block, or move it to a later
				// position within the block. In the latter case, we would visit VI before
				// visiting DVI a second time. There's no need to update the visited set.
	if (isa<PHINode>(VI))			if (isa<PHINode>(VI))
	DVI->insertBefore(&*VI->getParent()->getFirstInsertionPt());			DVI->insertBefore(&*VIParent->getFirstInsertionPt());
	else			else
	DVI->insertAfter(VI);			DVI->insertAfter(VI);
	MadeChange = true;			MadeChange = true;
	++NumDbgValueMoved;			++NumDbgValueMoved;
	}			}
	}			}
	return MadeChange;			return MadeChange;
	}			}
	▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines