This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
6
SimplifyCFG.cpp
-
test/Transforms/SimplifyCFG/
-
Transforms/
-
SimplifyCFG/
1
sink-common-code.ll

Differential D36753

[SimplifyCFG] Do not perform tail sinking if there are extra moves introduced
Needs ReviewPublic

Authored by xur on Aug 15 2017, 9:54 AM.

Download Raw Diff

Details

Reviewers

davidxl
jmolloy

Summary

SinkThenElseCodeToEnd() tries to sink the common instructions to a new BB (sink.split).
But it does not take account of the potential moves in the original BBs.

As shown in the newly added test case (test19 in sink-common-code.ll), we have
extra moves in the final assembly (which also prevents the constant folding). This
optimization hurts the performance and not reduce the code size either.

This patch checks if the phi-operand are of constants. If that's the case,
an extra move is likely needed. If the total number of moves inserted is greater
than the number instructions saved by tail sinking, we should not perform.

It only applies to the following pattern:
if (a)

f1;

else if (b)

f2;

For pattern of

if(a)
  f1;
else
  f2;

If the condition is simple and the constants in f1 and f2 are in close range,
tail sinking could remove the branch. This patch won't touch this case

Diff Detail

Event Timeline

xur created this revision.Aug 15 2017, 9:54 AM

davidxl added a reviewer: jmolloy.Aug 15 2017, 10:24 AM

davidxl added inline comments.Aug 18 2017, 4:25 PM

lib/Transforms/Utils/SimplifyCFG.cpp
1683	Fix variable naming.
1725	Is the first check needed?
1734	reuced --> reduced.
1735	Should it be InstructionsToSink.size() -1?
1738	Explain a little more on the 'Select' part.
1741	For O3, should we disable the sinking whenever there is runtime overhead introduced (unless there is runtime savings compensating it)?
test/Transforms/SimplifyCFG/sink-common-code.ll
870	here the store can still be sunk -- while other instructions remain in their original location. This can slightly increase register pressure.

I don't feel like this is really the right approach. Maybe it is the only approach that will work, but I'd like to at least try to solve this differently.

Specifically, reasoning about 'mov's at the IR level really doesn't make sense. The IR is much more abstract than that, and in fact is SSA form! =/ I feel like we'll end up with a heuristic that is pretty brittle and also have to deal with canonicalization regressions due to slicing up basic blocks differently...

It is also probably wrong for some architectures. Think about a very RISC architecture or even on x86 if the constants are too large to fold into an immediate operand of an instruction. In those cases, commoning the actual logic and just setting up the constants to flow into them seems like a real win.

Given how much target awareness you end up needing to make this decision even for x86 (a relatively CISC-y architecture) I think I'd suggest a MI level pass that works something like the following...

Build up a table of x86 arithmetic instructions that we can fold immediates into, and the size of immediate that can be folded. These will be similar to the tables in X86InstrInfo.cpp. Eventually, we should encode this in the .td files and extract it, but I'm not suggesting crossing that bridge today.

Use this to try and hoist instructions into their predecessors if doing so allows folding an immediate operand, and thereby reducing # uops and potentially register pressure. If the predecessors aren't reachable from each other, you can even do this if even a single predecessor allows the fold without increasing the dynamically executed instruction count. But we don't have to try to be that fancy at first. On x86 at least, replacing mov <imm>, %reg; <op> %reg, %reg with <op> %imm, %reg seems like a solid win.

Does this make sense? Are there problems with this approach?

A specific place where this approach seems like it would help would be when the immediates require >32 bits and thus will always have a mov regardless of CFG (movabsq).

(if this does make sense, I'm happy to take a stab at implementing it)

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

36 lines

test/

Transforms/

SimplifyCFG/

sink-common-code.ll

65 lines

Diff 111192

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,674 Lines • ▼ Show 20 Lines	static bool SinkThenElseCodeToEnd(BranchInst *BI1) {
// \| [x(2)] \|		// \| [x(2)] \|
// \ / \|		// \ / \|
// [sink.split] \|		// [sink.split] \|
// \ /		// \ /
// [ end ]		// [ end ]
//		//
SmallVector<BasicBlock*,4> UnconditionalPreds;		SmallVector<BasicBlock*,4> UnconditionalPreds;
Instruction *Cond = nullptr;		Instruction *Cond = nullptr;
		unsigned num_of_preds = 0;
		davidxlUnsubmitted Not Done Reply Inline Actions Fix variable naming. davidxl: Fix variable naming.
for (auto *B : predecessors(BBEnd)) {		for (auto *B : predecessors(BBEnd)) {
auto *T = B->getTerminator();		auto *T = B->getTerminator();
		++num_of_preds;
if (isa<BranchInst>(T) && cast<BranchInst>(T)->isUnconditional())		if (isa<BranchInst>(T) && cast<BranchInst>(T)->isUnconditional())
UnconditionalPreds.push_back(B);		UnconditionalPreds.push_back(B);
else if ((isa<BranchInst>(T) \|\| isa<SwitchInst>(T)) && !Cond)		else if ((isa<BranchInst>(T) \|\| isa<SwitchInst>(T)) && !Cond)
Cond = T;		Cond = T;
else		else
return false;		return false;
}		}

if (UnconditionalPreds.size() < 2)		if (UnconditionalPreds.size() < 2)
return false;		return false;

		unsigned NumOfConditionsPreds = num_of_preds - UnconditionalPreds.size();
bool Changed = false;		bool Changed = false;
// We take a two-step approach to tail sinking. First we scan from the end of		// We take a two-step approach to tail sinking. First we scan from the end of
// each block upwards in lockstep. If the n'th instruction from the end of each		// each block upwards in lockstep. If the n'th instruction from the end of each
// block can be sunk, those instructions are added to ValuesToSink and we		// block can be sunk, those instructions are added to ValuesToSink and we
// carry on. If we can sink an instruction but need to PHI-merge some operands		// carry on. If we can sink an instruction but need to PHI-merge some operands
// (because they're not identical in each instruction) we add these to		// (because they're not identical in each instruction) we add these to
// PHIOperands.		// PHIOperands.
unsigned ScanIdx = 0;		unsigned ScanIdx = 0;
SmallPtrSet<Value*,4> InstructionsToSink;		SmallPtrSet<Value*,4> InstructionsToSink;
DenseMap<Instruction, SmallVector<Value,4>> PHIOperands;		DenseMap<Instruction, SmallVector<Value,4>> PHIOperands;
LockstepReverseIterator LRI(UnconditionalPreds);		LockstepReverseIterator LRI(UnconditionalPreds);
while (LRI.isValid() &&		while (LRI.isValid() &&
canSinkInstructions(*LRI, PHIOperands)) {		canSinkInstructions(*LRI, PHIOperands)) {
DEBUG(dbgs() << "SINK: instruction can be sunk: " << (LRI)[0] << "\n");		DEBUG(dbgs() << "SINK: instruction can be sunk: " << (LRI)[0] << "\n");
InstructionsToSink.insert((LRI).begin(), (LRI).end());		InstructionsToSink.insert((LRI).begin(), (LRI).end());
++ScanIdx;		++ScanIdx;
--LRI;		--LRI;
}		}

		auto CountNumberOfMovesForPHIOperands = [&](LockstepReverseIterator &LRI) {
		unsigned NumOfMovesForPHIdOperands = 0;
		while (LRI.isValid()) {
		for (auto I : LRI) {
		for (auto *V : PHIOperands[I]) {
		// If the phi operand is a constant, we are likely to have a move
		// instruction.
		if (InstructionsToSink.count(V) == 0 && isa<Constant>(V))
		davidxlUnsubmitted Not Done Reply Inline Actions Is the first check needed? davidxl: Is the first check needed?
		++NumOfMovesForPHIdOperands;
		}
		}
		--LRI;
		}
		return NumOfMovesForPHIdOperands;
		};

		// If the number of moves is greater than the number of reuced instructions
		davidxlUnsubmitted Not Done Reply Inline Actions reuced --> reduced. davidxl: reuced --> reduced.
		// (which is InstructionsToSink.size() / 2), we bail out.
		davidxlUnsubmitted Not Done Reply Inline Actions Should it be InstructionsToSink.size() -1? davidxl: Should it be InstructionsToSink.size() -1?
		// Do this only for BBend that has two unconditional predecessors and one
		// conditional predecessors. If there is only unconditional predecessors,
		// there is chance to use select and generate code with using moves.
		davidxlUnsubmitted Not Done Reply Inline Actions Explain a little more on the 'Select' part. davidxl: Explain a little more on the 'Select' part.
		LRI.reset();
		unsigned NumOfMoves = CountNumberOfMovesForPHIOperands(LRI);
		if (InstructionsToSink.size() > 0 && NumOfConditionsPreds == 1 &&
		davidxlUnsubmitted Not Done Reply Inline Actions For O3, should we disable the sinking whenever there is runtime overhead introduced (unless there is runtime savings compensating it)? davidxl: For O3, should we disable the sinking whenever there is runtime overhead introduced (unless…
		NumOfMoves >= InstructionsToSink.size() / 2) {
		DEBUG(dbgs() << "SINK: stopping, too many move instructions (" << NumOfMoves
		<< ") need to be inserted.\n");
		return false;
		}

auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {		auto ProfitableToSinkInstruction = [&](LockstepReverseIterator &LRI) {
unsigned NumPHIdValues = 0;		unsigned NumPHIdValues = 0;
for (auto I : LRI)		for (auto I : LRI)
for (auto *V : PHIOperands[I])		for (auto *V : PHIOperands[I])
if (InstructionsToSink.count(V) == 0)		if (InstructionsToSink.count(V) == 0)
++NumPHIdValues;		++NumPHIdValues;
DEBUG(dbgs() << "SINK: #phid values: " << NumPHIdValues << "\n");		DEBUG(dbgs() << "SINK: #phid values: " << NumPHIdValues << "\n");
unsigned NumPHIInsts = NumPHIdValues / UnconditionalPreds.size();		unsigned NumPHIInsts = NumPHIdValues / UnconditionalPreds.size();
if ((NumPHIdValues % UnconditionalPreds.size()) != 0)		if ((NumPHIdValues % UnconditionalPreds.size()) != 0)
NumPHIInsts++;		NumPHIInsts++;

return NumPHIInsts <= 1;		return NumPHIInsts <= 1;
};		};

if (ScanIdx > 0 && Cond) {		if (ScanIdx > 0 && Cond) {
// Check if we would actually sink anything first! This mutates the CFG and		// Check if we would actually sink anything first! This mutates the CFG and
// adds an extra block. The goal in doing this is to allow instructions that		// adds an extra block. The goal in doing this is to allow instructions that
// couldn't be sunk before to be sunk - obviously, speculatable instructions		// couldn't be sunk before to be sunk - obviously, speculatable instructions
▲ Show 20 Lines • Show All 4,270 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/sink-common-code.ll

Show First 20 Lines • Show All 836 Lines • ▼ Show 20 Lines	if.end:
ret i32 1		ret i32 1
}		}

; CHECK-LABEL: @test_insertvalue		; CHECK-LABEL: @test_insertvalue
; CHECK: select		; CHECK: select
; CHECK: insertvalue		; CHECK: insertvalue
; CHECK-NOT: insertvalue		; CHECK-NOT: insertvalue

		; Tail sinking should not happen due to extra moves for the PHI.
		define zeroext i1 @test19(i64 %s, i32* nocapture %idx) {
		entry:
		%cmp = icmp ult i64 %s, 1025
		br i1 %cmp, label %if.then, label %if.else

		if.then:
		%conv1 = trunc i64 %s to i32
		%add = add i32 %conv1, 7
		%shr = lshr i32 %add, 3
		store i32 %shr, i32* %idx, align 4
		br label %return

		if.else:
		%cmp2 = icmp ult i64 %s, 4097
		br i1 %cmp2, label %if.then2, label %return

		if.then2:
		%conv4 = trunc i64 %s to i32
		%add6 = add i32 %conv4, 15487
		%shr7 = lshr i32 %add6, 7
		store i32 %shr7, i32* %idx, align 4
		br label %return

		return:
		%retval.0 = phi i1 [ true, %if.then ], [ true, %if.then2 ], [ false, %if.else ]
		davidxlUnsubmitted Not Done Reply Inline Actions here the store can still be sunk -- while other instructions remain in their original location. This can slightly increase register pressure. davidxl: here the store can still be sunk -- while other instructions remain in their original location.
		ret i1 %retval.0
		}
		; CHECK-LABEL: @test19
		; CHECK-NOT: return.sink.split

		; Tail sinking should happen.
		define zeroext i1 @test20(i64 %s, i32* nocapture %idx, i32 %c1) {
		entry:
		%cmp = icmp ult i64 %s, 1025
		br i1 %cmp, label %if.then, label %if.else

		if.then:
		%c2 = add i32 %c1, 100
		%conv1 = trunc i64 %s to i32
		%add = add i32 %conv1, %c2
		%shr = lshr i32 %add, 3
		store i32 %shr, i32* %idx, align 4
		br label %return

		if.else:
		%cmp2 = icmp ult i64 %s, 4097
		br i1 %cmp2, label %if.then2, label %return

		if.then2:
		%c3 = mul i32 %c1, 3
		%conv4 = trunc i64 %s to i32
		%add6 = add i32 %conv4, %c3
		%shr7 = lshr i32 %add6, 7
		store i32 %shr7, i32* %idx, align 4
		br label %return

		return:
		%retval.0 = phi i1 [ true, %if.then ], [ true, %if.then2 ], [ false, %if.else ]
		ret i1 %retval.0
		}
		; CHECK-LABEL: @test20
		; CHECK: return.sink.split


; CHECK: ![[TBAA]] = !{![[TYPE:[0-9]]], ![[TYPE]], i64 0}		; CHECK: ![[TBAA]] = !{![[TYPE:[0-9]]], ![[TYPE]], i64 0}
; CHECK: ![[TYPE]] = !{!"float", ![[TEXT:[0-9]]]}		; CHECK: ![[TYPE]] = !{!"float", ![[TEXT:[0-9]]]}
; CHECK: ![[TEXT]] = !{!"an example type tree"}		; CHECK: ![[TEXT]] = !{!"an example type tree"}