Download Raw Diff

Details

Reviewers

fhahn
nikic
mkazantsev

Commits

rG220f6e5271f2: [SimplifyCFG] Ignore ephemeral values when counting insts for threading

Summary

Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Apr 28 2021, 3:27 PM

Herald added subscribers: wenlei, hiraditya. · View Herald TranscriptApr 28 2021, 3:27 PM

tejohnson requested review of this revision.Apr 28 2021, 3:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2021, 3:27 PM

tejohnson added inline comments.Apr 28 2021, 3:29 PM

llvm/test/Transforms/SimplifyCFG/unprofitable-pr.ll
3	The change to the max small block size is needed because the below test cases all include an llvm.assume sequence, which is now ignored.

Harbormaster completed remote builds in B101509: Diff 341331.Apr 28 2021, 4:57 PM

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

In D101494#2726074, @nikic wrote:

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

It's basically the same problem we have/had with LoopHeaders, yep.

In D101494#2726080, @lebedev.ri wrote:

In D101494#2726074, @nikic wrote:

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

It's basically the same problem we have/had with LoopHeaders, yep.

Ah, thanks for pointing that out. I see there is a loop scoped collectEphemeralValues, is that how it was solved for LoopHeaders? I should probably just add a BB scoped collectEphemeralValues and use it in BlockIsSimpleEnoughToThreadThrough. That would avoid this issue and also avoid the need to pass it around.

Sorry, I know nothing about ephemereal values, so I don't think I can give any useful feedback here.

Compute ephemeral values on per-BB basis when needed

Harbormaster completed remote builds in B102024: Diff 342045.Apr 30 2021, 4:56 PM

nikic added inline comments.May 3 2021, 2:31 PM

llvm/lib/Analysis/CodeMetrics.cpp
131 ↗	(On Diff #342045)	So there's two potential compile-time problems here: The first is that while this starts off with assume inside the block, it may scan uses outside the block as well. Additionally it scans over all assumes in order to find those in the block. I'm not sure whether this is really important in practice, but having seen pathological ephemeral value collection during inlining, I'm being a bit cautious here. I think for the particular case it is used for here, it might make the most sense to not collect ephemeral values upfront, instead compute them on the fly. We can do this by changing the direction of the instruction walk from end to start, and then doing something like: SmallPtrSet<const Value , 32> EphValues; auto IsEphemeral = [&](const Value V) { if (isa<AssumeInst>(V)) return true; return isSafeToSpeculativelyExecute(V) && all_of(V->users(), [&](const User *U) { return EphValues.count(U); }); }; for (Instruction &I : reverse()) { if (IsEphemeral(&I)) EphValues.insert(&I); // Otherwise normal code. } This also has the advantage that we don't need to compute any ephemeral values past the ten or so instructions we look at.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2551–2552	This comment is confusing, in that ephemeral values will not be deleted while threading (unlike phis). They will only be deleted during codegen.

tejohnson marked an inline comment as done.May 4 2021, 6:35 PM

tejohnson added inline comments.

llvm/lib/Analysis/CodeMetrics.cpp
131 ↗	(On Diff #342045)	I was concerned about this too at first. I collected some stats for a large application build and found that on average there were very few assumptions being checked. That being said, pathological cases could occur, and I agree that it is straightforward to reverse the loop and collect on demand, so I changed it to do that.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2530–2531	One issue with the reversed loop is that the Size is checked against the limit the iteration after it is incremented. When iterating in forward order, this means that the branch is not counted against the limit, since the loop exits before the subsequent check. But with the reversed order it gets counted and the test started failing since we no longer did the threading. I decided to consolidate the Size increment and check to make it more consistent, and simply bumped up the default limit and the one used in the test so that there is no change to the status quo in terms of non-ephemeral values.

Address comments

Harbormaster completed remote builds in B102660: Diff 342929.May 4 2021, 7:24 PM

LGTM, though personally I'd keep the current value of the limit and change the condition to Size++ > MaxSmallBlockSize instead (using post-increment instead of pre-increment). I think that should retain the behavior. I'm okay either way though.

This revision is now accepted and ready to land.May 8 2021, 1:23 PM

In D101494#2746229, @nikic wrote:

LGTM, though personally I'd keep the current value of the limit and change the condition to Size++ > MaxSmallBlockSize instead (using post-increment instead of pre-increment). I think that should retain the behavior. I'm okay either way though.

Yep, went ahead and switched to that approach.

Address comment

This revision was landed with ongoing or failed builds.May 9 2021, 7:08 PM

Closed by commit rG220f6e5271f2: [SimplifyCFG] Ignore ephemeral values when counting insts for threading (authored by tejohnson). · Explain Why

This revision was automatically updated to reflect the committed changes.

tejohnson added a commit: rG220f6e5271f2: [SimplifyCFG] Ignore ephemeral values when counting insts for threading.

Harbormaster completed remote builds in B103416: Diff 343952.May 9 2021, 8:05 PM

Diff 343953

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	cl::desc("Allow exactly one expensive instruction to be speculatively "
"executed"));		"executed"));

static cl::opt<unsigned> MaxSpeculationDepth(		static cl::opt<unsigned> MaxSpeculationDepth(
"max-speculation-depth", cl::Hidden, cl::init(10),		"max-speculation-depth", cl::Hidden, cl::init(10),
cl::desc("Limit maximum recursion depth when calculating costs of "		cl::desc("Limit maximum recursion depth when calculating costs of "
"speculatively executed instructions"));		"speculatively executed instructions"));

static cl::opt<int>		static cl::opt<int>
MaxSmallBlockSize("simplifycfg-max-small-block-size", cl::Hidden, cl::init(10),		MaxSmallBlockSize("simplifycfg-max-small-block-size", cl::Hidden,
		cl::init(10),
cl::desc("Max size of a block which is still considered "		cl::desc("Max size of a block which is still considered "
"small enough to thread through"));		"small enough to thread through"));

// Two is chosen to allow one negation and a logical combine.		// Two is chosen to allow one negation and a logical combine.
static cl::opt<unsigned>		static cl::opt<unsigned>
BranchFoldThreshold("simplifycfg-branch-fold-threshold", cl::Hidden,		BranchFoldThreshold("simplifycfg-branch-fold-threshold", cl::Hidden,
cl::init(2),		cl::init(2),
cl::desc("Maximum cost of combining conditions when "		cl::desc("Maximum cost of combining conditions when "
"folding branches"));		"folding branches"));

▲ Show 20 Lines • Show All 2,356 Lines • ▼ Show 20 Lines	bool SimplifyCFGOpt::SpeculativelyExecuteBB(BranchInst BI, BasicBlock ThenBB,

++NumSpeculations;		++NumSpeculations;
return true;		return true;
}		}

/// Return true if we can thread a branch across this block.		/// Return true if we can thread a branch across this block.
static bool BlockIsSimpleEnoughToThreadThrough(BasicBlock *BB) {		static bool BlockIsSimpleEnoughToThreadThrough(BasicBlock *BB) {
int Size = 0;		int Size = 0;

for (Instruction &I : BB->instructionsWithoutDebug()) {		SmallPtrSet<const Value *, 32> EphValues;
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions One issue with the reversed loop is that the Size is checked against the limit the iteration after it is incremented. When iterating in forward order, this means that the branch is not counted against the limit, since the loop exits before the subsequent check. But with the reversed order it gets counted and the test started failing since we no longer did the threading. I decided to consolidate the Size increment and check to make it more consistent, and simply bumped up the default limit and the one used in the test so that there is no change to the status quo in terms of non-ephemeral values. tejohnson: One issue with the reversed loop is that the Size is checked against the limit the iteration…
if (Size > MaxSmallBlockSize)		auto IsEphemeral = [&](const Value *V) {
return false; // Don't clone large BB's.		if (isa<AssumeInst>(V))
		return true;
		return isSafeToSpeculativelyExecute(V) &&
		all_of(V->users(),
		[&](const User *U) { return EphValues.count(U); });
		};

		// Walk the loop in reverse so that we can identify ephemeral values properly
		// (values only feeding assumes).
		for (Instruction &I : reverse(BB->instructionsWithoutDebug())) {
// Can't fold blocks that contain noduplicate or convergent calls.		// Can't fold blocks that contain noduplicate or convergent calls.
if (CallInst *CI = dyn_cast<CallInst>(&I))		if (CallInst *CI = dyn_cast<CallInst>(&I))
if (CI->cannotDuplicate() \|\| CI->isConvergent())		if (CI->cannotDuplicate() \|\| CI->isConvergent())
return false;		return false;

		// Ignore ephemeral values which are deleted during codegen.
		if (IsEphemeral(&I))
		EphValues.insert(&I);
// We will delete Phis while threading, so Phis should not be accounted in		// We will delete Phis while threading, so Phis should not be accounted in
// block's size		// block's size.
		nikicUnsubmitted Done Reply Inline Actions This comment is confusing, in that ephemeral values will not be deleted while threading (unlike phis). They will only be deleted during codegen. nikic: This comment is confusing, in that ephemeral values will not be deleted while threading (unlike…
if (!isa<PHINode>(I))		else if (!isa<PHINode>(I)) {
++Size;		if (Size++ > MaxSmallBlockSize)
		return false; // Don't clone large BB's.
		}

// We can only support instructions that do not define values that are		// We can only support instructions that do not define values that are
// live outside of the current basic block.		// live outside of the current basic block.
for (User *U : I.users()) {		for (User *U : I.users()) {
Instruction *UI = cast<Instruction>(U);		Instruction *UI = cast<Instruction>(U);
if (UI->getParent() != BB \|\| isa<PHINode>(UI))		if (UI->getParent() != BB \|\| isa<PHINode>(UI))
return false;		return false;
}		}
▲ Show 20 Lines • Show All 4,280 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/unprofitable-pr.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -simplifycfg-max-small-block-size=10 -S < %s \| FileCheck %s			; RUN: opt -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -simplifycfg-max-small-block-size=6 -S < %s \| FileCheck %s
	; RUN: opt -passes=simplify-cfg -simplifycfg-max-small-block-size=10 -S < %s \| FileCheck %s			; RUN: opt -passes=simplify-cfg -simplifycfg-max-small-block-size=6 -S < %s \| FileCheck %s
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions The change to the max small block size is needed because the below test cases all include an llvm.assume sequence, which is now ignored. tejohnson: The change to the max small block size is needed because the below test cases all include an…

	target datalayout = "e-p:64:64-p5:32:32-A5"			target datalayout = "e-p:64:64-p5:32:32-A5"

	declare void @llvm.assume(i1)			declare void @llvm.assume(i1)
				declare i1 @llvm.type.test(i8*, metadata) nounwind readnone

	define void @test_01(i1 %c, i64* align 1 %ptr) local_unnamed_addr #0 {			define void @test_01(i1 %c, i64* align 1 %ptr) local_unnamed_addr #0 {
	; CHECK-LABEL: @test_01(			; CHECK-LABEL: @test_01(
	; CHECK-NEXT: br i1 [[C:%.]], label [[TRUE2_CRITEDGE:%.]], label [[FALSE1:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[TRUE2_CRITEDGE:%.]], label [[FALSE1:%.*]]
	; CHECK: false1:			; CHECK: false1:
	; CHECK-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4			; CHECK-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4
	; CHECK-NEXT: [[PTRINT:%.]] = ptrtoint i64 [[PTR]] to i64			; CHECK-NEXT: [[PTRINT:%.]] = ptrtoint i64 [[PTR]] to i64
	; CHECK-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7			; CHECK-NEXT: [[MASKEDPTR:%.*]] = and i64 [[PTRINT]], 7
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	true2: ; preds = %true1			true2: ; preds = %true1
	store volatile i64 2, i64* %ptr, align 8			store volatile i64 2, i64* %ptr, align 8
	ret void			ret void

	false2: ; preds = %true1			false2: ; preds = %true1
	store volatile i64 3, i64* %ptr, align 8			store volatile i64 3, i64* %ptr, align 8
	ret void			ret void
	}			}

				; Try the max block size for PRE again but with the bitcast/type test/assume
				; sequence used for whole program devirt.
				define void @test_04(i1 %c, i64* align 1 %ptr, [3 x i8] %vtable) local_unnamed_addr #0 {
				; CHECK-LABEL: @test_04(
				; CHECK-NEXT: br i1 [[C:%.]], label [[TRUE2_CRITEDGE:%.]], label [[FALSE1:%.*]]
				; CHECK: false1:
				; CHECK-NEXT: store volatile i64 1, i64* [[PTR:%.*]], align 4
				; CHECK-NEXT: [[VTABLE:%.]] = bitcast [3 x i8]* %vtable to i8*
				; CHECK-NEXT: [[P:%.]] = call i1 @llvm.type.test(i8 [[VTABLE]], metadata !"foo")
				; CHECK-NEXT: tail call void @llvm.assume(i1 [[P]])
				; CHECK-NEXT: store volatile i64 0, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 3, i64* [[PTR]], align 8
				; CHECK-NEXT: ret void
				; CHECK: true2.critedge:
				; CHECK-NEXT: [[VTABLE:%.]] = bitcast [3 x i8]* %vtable to i8*
				; CHECK-NEXT: [[P:%.]] = call i1 @llvm.type.test(i8 [[VTABLE]], metadata !"foo")
				; CHECK-NEXT: tail call void @llvm.assume(i1 [[P]])
				; CHECK-NEXT: store volatile i64 0, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 -1, i64* [[PTR]], align 8
				; CHECK-NEXT: store volatile i64 2, i64* [[PTR]], align 8
				; CHECK-NEXT: ret void
				;
				br i1 %c, label %true1, label %false1

				true1: ; preds = %false1, %0
				%vtablei8 = bitcast [3 x i8] %vtable to i8*
				%p = call i1 @llvm.type.test(i8* %vtablei8, metadata !"foo")
				tail call void @llvm.assume(i1 %p)
				store volatile i64 0, i64* %ptr, align 8
				store volatile i64 -1, i64* %ptr, align 8
				store volatile i64 -1, i64* %ptr, align 8
				store volatile i64 -1, i64* %ptr, align 8
				store volatile i64 -1, i64* %ptr, align 8
				store volatile i64 -1, i64* %ptr, align 8
				br i1 %c, label %true2, label %false2

				false1: ; preds = %0
				store volatile i64 1, i64* %ptr, align 4
				br label %true1

				true2: ; preds = %true1
				store volatile i64 2, i64* %ptr, align 8
				ret void

				false2: ; preds = %true1
				store volatile i64 3, i64* %ptr, align 8
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Ignore ephemeral values when counting insts for threading
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 343953

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Transforms/SimplifyCFG/unprofitable-pr.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Ignore ephemeral values when counting insts for threadingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 343953

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Transforms/SimplifyCFG/unprofitable-pr.ll

[SimplifyCFG] Ignore ephemeral values when counting insts for threading
ClosedPublic