This is an archive of the discontinued LLVM Phabricator instance.

Set minimum cost of speculating an instruction.
AbandonedPublic

Authored by danielcdh on Feb 17 2016, 2:48 PM.

Download Raw Diff

Details

Reviewers

Summary

Sometime SimplifyCFG is too aggressive in converting branch to select (if-conversion). The cost of speculation is under-estimated and would cause significant overhead. This patch set the minimum cost for speculating any instruction in the context of if-conversion. This speeds up stl map::lower_bound by 2X.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 48244.Feb 17 2016, 2:48 PM

danielcdh retitled this revision from to Set minimum cost of speculating an instruction..

danielcdh updated this object.

danielcdh added a reviewer: davidxl.

danielcdh added a subscriber: llvm-commits.

davidxl added inline comments.Feb 17 2016, 2:59 PM

lib/Transforms/Utils/SimplifyCFG.cpp
325	Why can't ComputeSpeculationCost return the correct cost here?

Move the cost check to callee.

davidxl added inline comments.Feb 17 2016, 5:25 PM

lib/Transforms/Utils/SimplifyCFG.cpp
255	My real question is why TTI.getUserCost(I) returns the wrong cost here :)

danielcdh added inline comments.Feb 17 2016, 9:43 PM

lib/Transforms/Utils/SimplifyCFG.cpp
255	For this specific case, it's that GEP operator is mostly considered free (as soon as the addressing mode is legal). The reasoning is that if you dereference the output of GEP operator, the memory instruction can have the addressing encoded. But here GEP operator is not immediately used by a memory op. Instead, it's used later in other basic block, and the GEP operation actually translated to a real LEA instruction, thus it is not free. I don't think it's quite right to set most GEP operator as 0-cost. But just by looking at the GEP operator itself, it's hard to tell whether it will be encoded into the memory op, or need to have separate instruction for it. In this case, it would be better to be conservative. So what should I do? Change GEP operator's cost to TCC_Basic for all cases?

hfinkel added a subscriber: hfinkel.Feb 17 2016, 10:00 PM

hfinkel added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
255	So what should I do? Change GEP operator's cost to TCC_Basic for all cases? Please don't do that; you'll be making the common case worse. The right answer is to look at the addressing mode and make a more-intelligent decision. As it turns out, I recall having a patch that did something like this (see gep-add-cost-v2.patch in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140303/206866.html). We never really finished discussing it, but you might see if something like that patch helps here too.

I think the correct heuristic change is : GEP cost can be considered 0 only when its result has a single use which is also the address of a memory operation. A more elaborate heuristic is that if GEP has a single use which is another GEP it can be merged with (and folded), its cost can be considered zero. See related bug when GEP merging does not do what it does when there are multiple uses (and folding does not happen with merging): https://llvm.org/bugs/show_bug.cgi?id=23163

The lower_bound case is quite interesting. If the input value is close to first or last, which means branch prediction within the loop will be mostly correct, the non-if-converted version is 2X faster than if-converted version. If the the input value is completely randomly distributed, the if-converted version is 2X faster than non-if-converted version.

So overall, it's hard to make the decision without looking at the branch misprediction rate for the specific branch.

Branch probability info needs to be used here for guidance if possible:

highly biased branch should be marked as predictable and skipped
the cost of speculation needs to be weighted using probability info.

Regarding the iterative nature of the SampleFDO, I think the right
approach is for the second iteration build to force ifcvt if the first
iteration does it (and leading to missing branch prob data in the
second iteration).

David

danielcdh abandoned this revision.Feb 8 2017, 4:13 PM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

6 lines

test/

Transforms/

SimplifyCFG/

no_if_conversion_to_overhead.ll

66 lines

Diff 48263

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
/// Compute an abstract "cost" of speculating the given instruction,		/// Compute an abstract "cost" of speculating the given instruction,
/// which is assumed to be safe to speculate. TCC_Free means cheap,		/// which is assumed to be safe to speculate. TCC_Free means cheap,
/// TCC_Basic means less cheap, and TCC_Expensive means prohibitively		/// TCC_Basic means less cheap, and TCC_Expensive means prohibitively
/// expensive.		/// expensive.
static unsigned ComputeSpeculationCost(const User *I,		static unsigned ComputeSpeculationCost(const User *I,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
assert(isSafeToSpeculativelyExecute(I) &&		assert(isSafeToSpeculativelyExecute(I) &&
"Instruction is not safe to speculatively execute!");		"Instruction is not safe to speculatively execute!");
return TTI.getUserCost(I);		unsigned Cost = TTI.getUserCost(I);
		// Cost of the instruction should be no less than TCC_Basic.
		if (Cost < TargetTransformInfo::TCC_Basic)
		Cost = TargetTransformInfo::TCC_Basic;
		davidxlUnsubmitted Not Done Reply Inline Actions My real question is why TTI.getUserCost(I) returns the wrong cost here :) davidxl: My real question is why TTI.getUserCost(I) returns the wrong cost here :)
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions For this specific case, it's that GEP operator is mostly considered free (as soon as the addressing mode is legal). The reasoning is that if you dereference the output of GEP operator, the memory instruction can have the addressing encoded. But here GEP operator is not immediately used by a memory op. Instead, it's used later in other basic block, and the GEP operation actually translated to a real LEA instruction, thus it is not free. I don't think it's quite right to set most GEP operator as 0-cost. But just by looking at the GEP operator itself, it's hard to tell whether it will be encoded into the memory op, or need to have separate instruction for it. In this case, it would be better to be conservative. So what should I do? Change GEP operator's cost to TCC_Basic for all cases? danielcdh: For this specific case, it's that GEP operator is mostly considered free (as soon as the…
		hfinkelUnsubmitted Not Done Reply Inline Actions So what should I do? Change GEP operator's cost to TCC_Basic for all cases? Please don't do that; you'll be making the common case worse. The right answer is to look at the addressing mode and make a more-intelligent decision. As it turns out, I recall having a patch that did something like this (see gep-add-cost-v2.patch in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140303/206866.html). We never really finished discussing it, but you might see if something like that patch helps here too. hfinkel: > So what should I do? Change GEP operator's cost to TCC_Basic for all cases? Please don't do…
		return Cost;
}		}

/// If we have a merge point of an "if condition" as accepted above,		/// If we have a merge point of an "if condition" as accepted above,
/// return true if the specified value dominates the block. We		/// return true if the specified value dominates the block. We
/// don't handle the true generality of domination here, just a special case		/// don't handle the true generality of domination here, just a special case
/// which works well enough for us.		/// which works well enough for us.
///		///
/// If AggressiveInsts is non-null, and if V does not dominate BB, we check to		/// If AggressiveInsts is non-null, and if V does not dominate BB, we check to
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	static bool DominatesMergePoint(Value V, BasicBlock BB,
// Okay, it looks like the instruction IS in the "condition". Check to		// Okay, it looks like the instruction IS in the "condition". Check to
// see if it's a cheap instruction to unconditionally compute, and if it		// see if it's a cheap instruction to unconditionally compute, and if it
// only uses stuff defined outside of the condition. If so, hoist it out.		// only uses stuff defined outside of the condition. If so, hoist it out.
if (!isSafeToSpeculativelyExecute(I))		if (!isSafeToSpeculativelyExecute(I))
return false;		return false;

unsigned Cost = ComputeSpeculationCost(I, TTI);		unsigned Cost = ComputeSpeculationCost(I, TTI);

// Allow exactly one instruction to be speculated regardless of its cost		// Allow exactly one instruction to be speculated regardless of its cost
		davidxlUnsubmitted Not Done Reply Inline Actions Why can't ComputeSpeculationCost return the correct cost here? davidxl: Why can't ComputeSpeculationCost return the correct cost here?
// (as long as it is safe to do so).		// (as long as it is safe to do so).
// This is intended to flatten the CFG even if the instruction is a division		// This is intended to flatten the CFG even if the instruction is a division
// or other expensive operation. The speculation of an expensive instruction		// or other expensive operation. The speculation of an expensive instruction
// is expected to be undone in CodeGenPrepare if the speculation has not		// is expected to be undone in CodeGenPrepare if the speculation has not
// enabled further IR optimizations.		// enabled further IR optimizations.
if (Cost > CostRemaining &&		if (Cost > CostRemaining &&
(!SpeculateOneExpensiveInst \|\| !AggressiveInsts->empty() \|\| Depth > 0))		(!SpeculateOneExpensiveInst \|\| !AggressiveInsts->empty() \|\| Depth > 0))
return false;		return false;
▲ Show 20 Lines • Show All 4,969 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/no_if_conversion_to_overhead.ll

This file was added.

				; RUN: opt < %s -simplifycfg -S \| FileCheck %s

				; int lower_bound(int first, int last, int val) {
				; int len = last - first;
				; while (len > 0) {
				; int half = len >> 1;
				; int *middle = first + half;
				; if (*middle < val) {
				; first = middle;
				; first++;
				; len = len - half - 1;
				; }
				; else
				; len = half;
				; }
				; return *first;
				; }

				; If-conversion should not happen for this function because the overhead
				; it introduced would over-weight its benefit.

				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @_Z11lower_boundPiS_i(i32* %first, i32* %last, i32 %val) #0 {
				entry:
				%sub.ptr.lhs.cast = ptrtoint i32* %last to i64
				%sub.ptr.rhs.cast = ptrtoint i32* %first to i64
				%sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
				%sub.ptr.div12 = lshr exact i64 %sub.ptr.sub, 2
				%conv = trunc i64 %sub.ptr.div12 to i32
				br label %while.cond

				while.cond: ; preds = %if.end, %entry
				%len.0 = phi i32 [ %conv, %entry ], [ %len.1, %if.end ]
				%first.addr.0 = phi i32* [ %first, %entry ], [ %first.addr.1, %if.end ]
				%cmp = icmp sgt i32 %len.0, 0
				br i1 %cmp, label %while.body, label %while.end

				while.body: ; preds = %while.cond
				; CHECK-NOT: select
				%shr = ashr i32 %len.0, 1
				%idx.ext = sext i32 %shr to i64
				%add.ptr = getelementptr inbounds i32, i32* %first.addr.0, i64 %idx.ext
				%0 = load i32, i32* %add.ptr, align 4
				%cmp1 = icmp slt i32 %0, %val
				br i1 %cmp1, label %if.then, label %if.else

				if.then: ; preds = %while.body
				%incdec.ptr = getelementptr inbounds i32, i32* %add.ptr, i64 1
				%sub = sub nsw i32 %len.0, %shr
				%sub2 = add nsw i32 %sub, -1
				br label %if.end

				if.else: ; preds = %while.body
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%len.1 = phi i32 [ %sub2, %if.then ], [ %shr, %if.else ]
				%first.addr.1 = phi i32* [ %incdec.ptr, %if.then ], [ %first.addr.0, %if.else ]
				br label %while.cond

				while.end: ; preds = %while.cond
				%1 = load i32, i32* %first.addr.0, align 4
				ret i32 %1
				}