This is an archive of the discontinued LLVM Phabricator instance.

FastISel fails to remove dead code
ClosedPublic

Authored by wolfgangp on Aug 5 2015, 10:27 AM.

Download Raw Diff

Details

Reviewers

hans
echristo

Commits

rGaccc3e03765f: FastISel needs to remove dead code when it bails out.
rL255520: FastISel needs to remove dead code when it bails out.

Summary

When FastISel fails to translate an instruction it hands off code generation to SelectionDAG. Before it does so, it may have generated local value instructions to feed phi nodes in successor blocks. These instructions will then be generated again by SelectionDAG, causing duplication and less efficient code, including extra spill instructions.
Consider the following example:

define zeroext i1 @_Z3fooee(x86_fp80 %x, x86_fp80 %y) {
entry:

%x.addr = alloca x86_fp80, align 16
%y.addr = alloca x86_fp80, align 16
store x86_fp80 %x, x86_fp80* %x.addr, align 16
store x86_fp80 %y, x86_fp80* %y.addr, align 16
%0 = load x86_fp80, x86_fp80* %x.addr, align 16
%1 = load x86_fp80, x86_fp80* %y.addr, align 16
%cmp = fcmp oeq x86_fp80 %0, %1
br i1 %cmp, label %lor.end, label %lor.rhs

lor.rhs: ; preds = %entry

%2 = load x86_fp80, x86_fp80* %x.addr, align 16
%call = call zeroext i1 @_Z3bare(x86_fp80 %2)
br label %lor.end

lor.end: ; preds = %lor.rhs, %entry

%3 = phi i1 [ true, %entry ], [ %call, %lor.rhs ]
ret i1 %3

}

FastISel fails to translate one of the instructions in the entry block and leaves the rest of code generation to SelectionDAG. However, it fails to remove the instructions it generated to supply the phi node in the lor.end block:

...
subq    $64, %rsp
fldt    32(%rbp)
fldt    16(%rbp)
movb    $1, %al         <=======
fstpt   -16(%rbp)
fld     %st(0)
fstpt   -32(%rbp)
fldt    -16(%rbp)
movb    $1, %cl         <=======
fucompi %st(1)
fstp    %st(0)
movb    %al, -33(%rbp)          # 1-byte Spill <======== not used later
movb    %cl, -34(%rbp)          # 1-byte Spill <======== used
jne     .LBB0_1
jp      .LBB0_1
jmp     .LBB0_2

.LBB0_1:

...

The patch proposes to remove all phi-node handling instructions as dead code when FastISel quits.

Diff Detail

Repository: rL LLVM

Event Timeline

wolfgangp updated this revision to Diff 31363.Aug 5 2015, 10:27 AM

wolfgangp retitled this revision from to FastISel fails to remove dead code.

wolfgangp updated this object.

wolfgangp added a subscriber: llvm-commits.

wolfgangp added reviewers: echristo, hans.Sep 17 2015, 5:44 PM

This seems reasonable, but I don't know this code well enough to feel comfortable signing off here.

I'm afraid you'll have to get Eric or someone else to take a look.

lib/CodeGen/SelectionDAG/FastISel.cpp
1330 ↗	(On Diff #31363)	How about SavedLastLocalValue, to make the name a noun?
test/CodeGen/X86/fast-isel-deadcode.ll
15 ↗	(On Diff #31363)	Let's pretend the code above was C code and just call this "foo". Same thing for "_Z3bare" below: I'd just call it "bar".
38 ↗	(On Diff #31363)	I would probably have a CHECK-LABEL, and move these lines closer to the IR that they're generated from. As it is now, the reader only sees "check that movb $1 isn't followed by another movb $1", and that really doesn't provide much of a clue to what's going on.

Sorry, it's on my list, but there are a few patches ahead of it. Know that it isn't forgotten though.

Incorporated feedback by Hans. Sorry for the delay.

Ping ...

One inline comment requesting more commenting :)

Otherwise LGTM.

Thanks!

-eric

lib/CodeGen/SelectionDAG/FastISel.cpp
1392–1396 ↗	(On Diff #36039)	This feels a little weird, can you comment it a bit?

This revision is now accepted and ready to land.Nov 19 2015, 6:13 PM

Sorry, Eric, I had to revise the patch because we found another instance of this issue, in that handlePHINodesInSuccessorBlocks() can return false AND leave dead code behind, so I ended up breaking out the removing code and calling it from two different places. The test case has 2 subcases covering both code paths.

I beefed up the commentary a bit.

Still LGTM.

Thanks!

-eric

Closed by commit rL255520: FastISel needs to remove dead code when it bails out. (authored by probinson). · Explain WhyDec 14 2015, 10:36 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

FastISel.h

3 lines

lib/

CodeGen/

SelectionDAG/

FastISel.cpp

34 lines

test/

CodeGen/

X86/

fast-isel-deadcode.ll

147 lines

Diff 42735

llvm/trunk/include/llvm/CodeGen/FastISel.h

Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	private:
/// instructions.		/// instructions.
unsigned materializeRegForValue(const Value *V, MVT VT);		unsigned materializeRegForValue(const Value *V, MVT VT);

/// \brief Clears LocalValueMap and moves the area for the new local variables		/// \brief Clears LocalValueMap and moves the area for the new local variables
/// to the beginning of the block. It helps to avoid spilling cached variables		/// to the beginning of the block. It helps to avoid spilling cached variables
/// across heavy instructions like calls.		/// across heavy instructions like calls.
void flushLocalValueMap();		void flushLocalValueMap();

		/// \brief Removes dead local value instructions after SavedLastLocalvalue.
		void removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue);

/// \brief Insertion point before trying to select the current instruction.		/// \brief Insertion point before trying to select the current instruction.
MachineBasicBlock::iterator SavedInsertPt;		MachineBasicBlock::iterator SavedInsertPt;

/// \brief Add a stackmap or patchpoint intrinsic call's live variable		/// \brief Add a stackmap or patchpoint intrinsic call's live variable
/// operands to a stackmap or patchpoint machine instruction.		/// operands to a stackmap or patchpoint machine instruction.
bool addStackMapLiveVars(SmallVectorImpl<MachineOperand> &Ops,		bool addStackMapLiveVars(SmallVectorImpl<MachineOperand> &Ops,
const CallInst *CI, unsigned StartIdx);		const CallInst *CI, unsigned StartIdx);
bool lowerCallOperands(const CallInst *CI, unsigned ArgIdx, unsigned NumArgs,		bool lowerCallOperands(const CallInst *CI, unsigned ArgIdx, unsigned NumArgs,
const Value *Callee, bool ForceRetVoidTy,		const Value *Callee, bool ForceRetVoidTy,
CallLoweringInfo &CLI);		CallLoweringInfo &CLI);
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/trunk/lib/CodeGen/SelectionDAG/FastISel.cpp

Show First 20 Lines • Show All 1,316 Lines • ▼ Show 20 Lines	bool FastISel::selectBitCast(const User *I) {

if (!ResultReg)		if (!ResultReg)
return false;		return false;

updateValueMap(I, ResultReg);		updateValueMap(I, ResultReg);
return true;		return true;
}		}

		// Remove local value instructions starting from the instruction after
		// SavedLastLocalValue to the current function insert point.
		void FastISel::removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue)
		{
		MachineInstr *CurLastLocalValue = getLastLocalValue();
		if (CurLastLocalValue != SavedLastLocalValue) {
		// Find the first local value instruction to be deleted.
		// This is the instruction after SavedLastLocalValue if it is non-NULL.
		// Otherwise it's the first instruction in the block.
		MachineBasicBlock::iterator FirstDeadInst(SavedLastLocalValue);
		if (SavedLastLocalValue)
		++FirstDeadInst;
		else
		FirstDeadInst = FuncInfo.MBB->getFirstNonPHI();
		setLastLocalValue(SavedLastLocalValue);
		removeDeadCode(FirstDeadInst, FuncInfo.InsertPt);
		}
		}

bool FastISel::selectInstruction(const Instruction *I) {		bool FastISel::selectInstruction(const Instruction *I) {
		MachineInstr *SavedLastLocalValue = getLastLocalValue();
// Just before the terminator instruction, insert instructions to		// Just before the terminator instruction, insert instructions to
// feed PHI nodes in successor blocks.		// feed PHI nodes in successor blocks.
if (isa<TerminatorInst>(I))		if (isa<TerminatorInst>(I))
if (!handlePHINodesInSuccessorBlocks(I->getParent()))		if (!handlePHINodesInSuccessorBlocks(I->getParent())) {
		// PHI node handling may have generated local value instructions,
		// even though it failed to handle all PHI nodes.
		// We remove these instructions because SelectionDAGISel will generate
		// them again.
		removeDeadLocalValueCode(SavedLastLocalValue);
return false;		return false;
		}

DbgLoc = I->getDebugLoc();		DbgLoc = I->getDebugLoc();

SavedInsertPt = FuncInfo.InsertPt;		SavedInsertPt = FuncInfo.InsertPt;

if (const auto *Call = dyn_cast<CallInst>(I)) {		if (const auto *Call = dyn_cast<CallInst>(I)) {
const Function *F = Call->getCalledFunction();		const Function *F = Call->getCalledFunction();
LibFunc::Func Func;		LibFunc::Func Func;
Show All 32 Lines	bool FastISel::selectInstruction(const Instruction *I) {
}		}
// Remove dead code.		// Remove dead code.
recomputeInsertPt();		recomputeInsertPt();
if (SavedInsertPt != FuncInfo.InsertPt)		if (SavedInsertPt != FuncInfo.InsertPt)
removeDeadCode(FuncInfo.InsertPt, SavedInsertPt);		removeDeadCode(FuncInfo.InsertPt, SavedInsertPt);

DbgLoc = DebugLoc();		DbgLoc = DebugLoc();
// Undo phi node updates, because they will be added again by SelectionDAG.		// Undo phi node updates, because they will be added again by SelectionDAG.
if (isa<TerminatorInst>(I))		if (isa<TerminatorInst>(I)) {
		// PHI node handling may have generated local value instructions.
		// We remove them because SelectionDAGISel will generate them again.
		removeDeadLocalValueCode(SavedLastLocalValue);
FuncInfo.PHINodesToUpdate.resize(FuncInfo.OrigNumPHINodesToUpdate);		FuncInfo.PHINodesToUpdate.resize(FuncInfo.OrigNumPHINodesToUpdate);
		}
return false;		return false;
}		}

/// Emit an unconditional branch to the given block, unless it is the immediate		/// Emit an unconditional branch to the given block, unless it is the immediate
/// (fall-through) successor, and update the CFG.		/// (fall-through) successor, and update the CFG.
void FastISel::fastEmitBranch(MachineBasicBlock *MSucc, DebugLoc DbgLoc) {		void FastISel::fastEmitBranch(MachineBasicBlock *MSucc, DebugLoc DbgLoc) {
if (FuncInfo.MBB->getBasicBlock()->size() > 1 &&		if (FuncInfo.MBB->getBasicBlock()->size() > 1 &&
FuncInfo.MBB->isLayoutSuccessor(MSucc)) {		FuncInfo.MBB->isLayoutSuccessor(MSucc)) {
▲ Show 20 Lines • Show All 802 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fast-isel-deadcode.ll

				; RUN: llc < %s \| FileCheck %s
				;
				; Generated with clang -O2 -S -emit-llvm
				;
				; /* Test 1 */
				; extern "C" bool bar (long double);
				; __attribute__((optnone))
				; extern "C" bool foo(long double x, long double y)
				; {
				; return (x == y) \|\| (bar(x));
				; }
				;
				; /* Test 2 */
				; struct FVector {
				; float x, y, z;
				; inline __attribute__((always_inline)) FVector(float f): x(f), y(f), z(f) {}
				; inline __attribute__((always_inline)) FVector func(float p) const
				; {
				; if( x == 1.f ) {
				; return *this;
				; } else if( x < p ) {
				; return FVector(0.f);
				; }
				; return FVector(x);
				; }
				; };
				;
				; __attribute__((optnone))
				; int main()
				; {
				; FVector v(1.0);
				; v = v.func(1.e-8);
				; return 0;
				; }
				;
				; ModuleID = 'test.cpp'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.FVector = type { float, float, float }

				define zeroext i1 @foo(x86_fp80 %x, x86_fp80 %y) noinline optnone {
				entry:
				%x.addr = alloca x86_fp80, align 16
				%y.addr = alloca x86_fp80, align 16
				store x86_fp80 %x, x86_fp80* %x.addr, align 16
				store x86_fp80 %y, x86_fp80* %y.addr, align 16
				%0 = load x86_fp80, x86_fp80* %x.addr, align 16
				%1 = load x86_fp80, x86_fp80* %y.addr, align 16
				%cmp = fcmp oeq x86_fp80 %0, %1

				; Test 1
				; Make sure that there is no dead code generated
				; from Fast-ISel Phi-node handling. We should only
				; see one movb of the constant 1, feeding the PHI
				; node in lor.end. This covers the code path with
				; handlePHINodesInSuccessorBlocks() returning true.
				;
				; CHECK-LABEL: foo:
				; CHECK: movb $1,
				; CHECK-NOT: movb $1,
				; CHECK-LABEL: .LBB0_1:

				br i1 %cmp, label %lor.end, label %lor.rhs

				lor.rhs: ; preds = %entry
				%2 = load x86_fp80, x86_fp80* %x.addr, align 16
				%call = call zeroext i1 @bar(x86_fp80 %2)
				br label %lor.end

				lor.end: ; preds = %lor.rhs, %entry
				%3 = phi i1 [ true, %entry ], [ %call, %lor.rhs ]
				ret i1 %3
				}

				declare zeroext i1 @bar(x86_fp80)

				define i32 @main() noinline optnone {
				entry:
				%retval = alloca i32, align 4
				%v = alloca %struct.FVector, align 4
				%ref.tmp = alloca %struct.FVector, align 4
				%tmp = alloca { <2 x float>, float }, align 8
				store i32 0, i32* %retval, align 4
				%0 = bitcast %struct.FVector* %v to i8*
				call void @llvm.lifetime.start(i64 12, i8* %0) nounwind
				%x.i = getelementptr inbounds %struct.FVector, %struct.FVector* %v, i64 0, i32 0
				store float 1.000000e+00, float* %x.i, align 4
				%y.i = getelementptr inbounds %struct.FVector, %struct.FVector* %v, i64 0, i32 1
				store float 1.000000e+00, float* %y.i, align 4
				%z.i = getelementptr inbounds %struct.FVector, %struct.FVector* %v, i64 0, i32 2
				store float 1.000000e+00, float* %z.i, align 4
				%x.i.1 = getelementptr inbounds %struct.FVector, %struct.FVector* %v, i64 0, i32 0
				%1 = load float, float* %x.i.1, align 4
				%cmp.i = fcmp oeq float %1, 1.000000e+00
				br i1 %cmp.i, label %if.then.i, label %if.else.i

				if.then.i: ; preds = %entry
				%retval.sroa.0.0..sroa_cast.i = bitcast %struct.FVector* %v to <2 x float>*
				%retval.sroa.0.0.copyload.i = load <2 x float>, <2 x float>* %retval.sroa.0.0..sroa_cast.i, align 4
				%retval.sroa.6.0..sroa_idx16.i = getelementptr inbounds %struct.FVector, %struct.FVector* %v, i64 0, i32 2
				%retval.sroa.6.0.copyload.i = load float, float* %retval.sroa.6.0..sroa_idx16.i, align 4
				br label %func.exit

				if.else.i: ; preds = %entry

				; Test 2
				; In order to feed the first PHI node in func.exit handlePHINodesInSuccessorBlocks()
				; generates a local value instruction, but it cannot handle the second PHI node and
				; returns false to let SelectionDAGISel handle both cases. Make sure the generated
				; local value instruction is removed.
				; CHECK-LABEL: main:
				; CHECK-LABEL: .LBB1_2:
				; CHECK: xorps [[REG:%xmm[0-7]]], [[REG]]
				; CHECK-NOT: xorps [[REG]], [[REG]]
				; CHECK-LABEL: .LBB1_3:

				%cmp3.i = fcmp olt float %1, 0x3E45798EE0000000
				br i1 %cmp3.i, label %func.exit, label %if.end.5.i

				if.end.5.i: ; preds = %if.else.i
				%retval.sroa.0.0.vec.insert13.i = insertelement <2 x float> undef, float %1, i32 0
				%retval.sroa.0.4.vec.insert15.i = insertelement <2 x float> %retval.sroa.0.0.vec.insert13.i, float %1, i32 1
				br label %func.exit

				func.exit: ; preds = %if.then.i, %if.else.i, %if.end.5.i
				%retval.sroa.6.0.i = phi float [ %retval.sroa.6.0.copyload.i, %if.then.i ], [ %1, %if.end.5.i ], [ 0.000000e+00, %if.else.i ]
				%retval.sroa.0.0.i = phi <2 x float> [ %retval.sroa.0.0.copyload.i, %if.then.i ], [ %retval.sroa.0.4.vec.insert15.i, %if.end.5.i ], [ zeroinitializer, %if.else.i ]
				%.fca.0.insert.i = insertvalue { <2 x float>, float } undef, <2 x float> %retval.sroa.0.0.i, 0
				%.fca.1.insert.i = insertvalue { <2 x float>, float } %.fca.0.insert.i, float %retval.sroa.6.0.i, 1
				store { <2 x float>, float } %.fca.1.insert.i, { <2 x float>, float }* %tmp, align 8
				%2 = bitcast { <2 x float>, float }* %tmp to i8*
				%3 = bitcast %struct.FVector* %ref.tmp to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 12, i32 4, i1 false)
				%4 = bitcast %struct.FVector* %v to i8*
				%5 = bitcast %struct.FVector* %ref.tmp to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* %4, i8* %5, i64 12, i32 4, i1 false)
				%6 = bitcast %struct.FVector* %v to i8*
				call void @llvm.lifetime.end(i64 12, i8* %6) nounwind
				ret i32 0
				}

				declare void @llvm.lifetime.start(i64, i8* nocapture) argmemonly nounwind

				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32, i1) argmemonly nounwind

				declare void @llvm.lifetime.end(i64, i8* nocapture) argmemonly nounwind

This is an archive of the discontinued LLVM Phabricator instance.

FastISel fails to remove dead codeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 42735

llvm/trunk/include/llvm/CodeGen/FastISel.h

llvm/trunk/lib/CodeGen/SelectionDAG/FastISel.cpp

llvm/trunk/test/CodeGen/X86/fast-isel-deadcode.ll

FastISel fails to remove dead code
ClosedPublic