This is an archive of the discontinued LLVM Phabricator instance.

JumpThreading: enhance JT to handle BB with no successor and address comparison
Needs ReviewPublic

Authored by davidxl on Oct 11 2019, 4:25 PM.

Download Raw Diff

Details

Reviewers

efriedma
wmi

Summary

Current JT only process (clone) BBs with multiple successors in JT with the aim to thread the predecessor with a successor BB. This misses opportunities to to handle return BB where the return value can be simplified with threading (cloning).

Example:

#include <array>
#include <algorithm>

constexpr std::array<int, 3> x = {1, 7, 17};

bool Contains(int i) {

return std::find(x.begin(), x.end(), i) != x.end();

}

Clang produces inefficient code:

_Z8Containsi: # @_Z8Containsi

.cfi_startproc

%bb.0: cmpl $1, %edi je .LBB0_1
%bb.2: cmpl $7, %edi jne .LBB0_3
%bb.4: movl $_ZL1x+4, %eax jmp .LBB0_5

.LBB0_1:

movl    $_ZL1x, %eax
jmp     .LBB0_5

.LBB0_3:

cmpl    $17, %edi
movl    $_ZL1x+8, %ecx
movl    $_ZL1x+12, %eax
cmoveq  %rcx, %rax

.LBB0_5:

movl    $_ZL1x+12, %ecx
cmpq    %rcx, %rax
setne   %al
retq

While GCC produces:

_Z8Containsi:
.LFB1534:

.cfi_startproc
movl    $1, %eax
cmpl    $1, %edi
je      .L1
cmpl    $7, %edi
je      .L1
cmpl    $17, %edi
sete    %al

.L1:

ret

This patch address the issue. After the fix, the generated code looks like:

_Z8Containsi: # @_Z8Containsi

.cfi_startproc
addl    $-1, %edi
cmpl    $16, %edi
ja      .LBB0_2
movl    $65601, %eax            # imm = 0x10041
movl    %edi, %ecx
shrl    %cl, %eax
andb    $1, %al
retq

.LBB0_2: # %_ZSt4findIPKiiET_S2_S2_RKT0_.exit.thread

xorl    %eax, %eax
retq

Diff Detail

Event Timeline

davidxl created this revision.Oct 11 2019, 4:25 PM

Herald added a subscriber: jfb. · View Herald TranscriptOct 11 2019, 4:25 PM

If the terminator is a "ret", or some arbitrary terminator that doesn't simplify, it's not really "threading"; it's just tail duplication. That's likely profitable in some cases, but using ThreadEdge to perform the transform seems confusing.

JumpThreading is basically basic block cloning followed by control flow simplification. This is just a special case where the second part is missing.

There is already another special case in JT -- if all the Pred's target successor is the same, there is no threading either -- basically there is only control flow simplification part without the basic cloning. These two cases are just at two different ends of the spectrum.

I change the testcase a little so the terminator won't be ret, but the generated code pattern is the same. Should it be handled as well?

------------------------------------
#include <array>
#include <algorithm>

constexpr std::array<int, 3> x = {1, 7, 17};
bool global, cond;

void Contains(int i) {
  global = std::find(x.begin(), x.end(), i) != x.end();
  if (cond)
    __builtin_printf("hello\n");
}
------------------------------------

Handling what Wei's case will be a nice thing to have, but it may require more significant change in JT. Currently the JT candidate BB selection is based on checking the conditional value used by branch or return value of ret instr (with this patch).

To handle this case, it requires checking use values of arbitrary instructions (value of store in the example). Another thing to consider is the cost model difference. In Wei's case, cloning really becomes tail dup with increased complexity of control flow (handling Ret instruction on the other hand does not have the issue).

Handling what Wei's case will be a nice thing to have, but it may require more significant change in JT.

I think we need to have a plan for what this is going to look like, so the new code here doesn't immediately become obsolete.

Another thing to consider is the cost model difference.

I think the necessary cost model is effectively the same for ret vs. other instructions. In particular, the "ret" might go away after after inlining.

lib/Transforms/Scalar/JumpThreading.cpp
2005 ↗	(On Diff #224704)	I'm surprised threadEdge still works. Probably want to update the documentation for that API if we really want to allow SuccBB to be null.

Ping.

wmi added inline comments.Oct 31 2019, 4:53 PM

lib/Transforms/Scalar/JumpThreading.cpp
1076–1081 ↗	(On Diff #224704)	Nit: if (!RV \|\| !(Condition = dyn_cast<CmpInst>(RV))) return false;
1726 ↗	(On Diff #224704)	For BB being handled in this patch (containing Ret instruction), OnlyDest is always nullptr so this block will not be executed for it. Is it for other purpose?
1769–1780 ↗	(On Diff #224704)	For BB with Ret instruction, we don't factor predecessors with the same PredVal here. If we factor predecessors with the same PredVal, we may have less clones?
1946–1948 ↗	(On Diff #224704)	It will be helpful to add some comment for the case that SuccBB is nullptr and maybe a TODO for the extension and refactoring needed.
test/Transforms/JumpThreading/addr.ll
15–16 ↗	(On Diff #224704)	Tests need to be processed with opt -instnamer.

davidxl marked 6 inline comments as done.Nov 12 2019, 8:16 PM

davidxl added inline comments.

lib/Transforms/Scalar/JumpThreading.cpp
1076–1081 ↗	(On Diff #224704)	ok
1726 ↗	(On Diff #224704)	This simplifies the ret instruction after the condition is removed.
1769–1780 ↗	(On Diff #224704)	Right. This can be a TODO.
1946–1948 ↗	(On Diff #224704)	Ok.
2005 ↗	(On Diff #224704)	ok. Will document as a follow up.
test/Transforms/JumpThreading/addr.ll
15–16 ↗	(On Diff #224704)	ok

rebased and addressed review feedbacks.

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2019, 11:31 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

JumpThreading.h

6 lines

lib/

Transforms/

Scalar/

JumpThreading.cpp

136 lines

test/

Transforms/

JumpThreading/

addr.ll

79 lines

return.ll

103 lines

Diff 229606

llvm/include/llvm/Transforms/Scalar/JumpThreading.h

Context not available.
	DenseMap<Instruction , Value > CloneInstructions(BasicBlock::iterator BI,	DenseMap<Instruction , Value > CloneInstructions(BasicBlock::iterator BI,
	BasicBlock::iterator BE,	BasicBlock::iterator BE,
	BasicBlock *NewBB,	BasicBlock *NewBB,
	BasicBlock *PredBB);	BasicBlock *PredBB,
		BasicBlock *SuccBB,
		Value *PredVal);
	bool ThreadEdge(BasicBlock BB, const SmallVectorImpl<BasicBlock > &PredBBs,	bool ThreadEdge(BasicBlock BB, const SmallVectorImpl<BasicBlock > &PredBBs,
	BasicBlock *SuccBB);	BasicBlock SuccBB, Value PredVal = nullptr);
	bool DuplicateCondBranchOnPHIIntoPred(	bool DuplicateCondBranchOnPHIIntoPred(
	BasicBlock BB, const SmallVectorImpl<BasicBlock > &PredBBs);	BasicBlock BB, const SmallVectorImpl<BasicBlock > &PredBBs);

Context not available.

llvm/lib/Transforms/Scalar/JumpThreading.cpp

Context not available.
	if (IB->getNumSuccessors() == 0) return false;	if (IB->getNumSuccessors() == 0) return false;
	Condition = IB->getAddress()->stripPointerCasts();	Condition = IB->getAddress()->stripPointerCasts();
	Preference = WantBlockAddress;	Preference = WantBlockAddress;
		} else if (ReturnInst *RetInst = dyn_cast<ReturnInst>(Terminator)) {
		auto *RV = RetInst->getReturnValue();
		if (!RV \|\| !(Condition = dyn_cast<CmpInst>(RV)))
		return false;
	} else {	} else {
	return false; // Must be an invoke or callbr.	return false; // Must be an invoke or callbr.
	}	}
Context not available.

	// If the terminator is branching on an undef, we can pick any of the	// If the terminator is branching on an undef, we can pick any of the
	// successors to branch to. Let GetBestDestForJumpOnUndef decide.	// successors to branch to. Let GetBestDestForJumpOnUndef decide.
	if (isa<UndefValue>(Condition)) {	if (isa<UndefValue>(Condition) && BB->getTerminator()->getNumSuccessors()) {
	unsigned BestSucc = GetBestDestForJumpOnUndef(BB);	unsigned BestSucc = GetBestDestForJumpOnUndef(BB);
	std::vector<DominatorTree::UpdateType> Updates;	std::vector<DominatorTree::UpdateType> Updates;

Context not available.
	Constant *Val = PredValue.first;	Constant *Val = PredValue.first;

	BasicBlock *DestBB;	BasicBlock *DestBB;
	if (isa<UndefValue>(Val))	auto *TI = BB->getTerminator();
	DestBB = nullptr;	if (isa<UndefValue>(Val) && TI->getNumSuccessors() != 0)
	else if (BranchInst *BI = dyn_cast<BranchInst>(BB->getTerminator())) {	DestBB = TI->getSuccessor(GetBestDestForJumpOnUndef(BB));
		else if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
	assert(isa<ConstantInt>(Val) && "Expecting a constant integer");	assert(isa<ConstantInt>(Val) && "Expecting a constant integer");
	DestBB = BI->getSuccessor(cast<ConstantInt>(Val)->isZero());	DestBB = BI->getSuccessor(cast<ConstantInt>(Val)->isZero());
	} else if (SwitchInst *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {	} else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
	assert(isa<ConstantInt>(Val) && "Expecting a constant integer");	assert(isa<ConstantInt>(Val) && "Expecting a constant integer");
	DestBB = SI->findCaseValue(cast<ConstantInt>(Val))->getCaseSuccessor();	DestBB = SI->findCaseValue(cast<ConstantInt>(Val))->getCaseSuccessor();
		} else if (ReturnInst *RI = dyn_cast<ReturnInst>(TI)) {
		(void) RI;
		assert(RI->getReturnValue() == Cond &&
		"Expecting comparison result returned");
		DestBB = nullptr;
	} else {	} else {
	assert(isa<IndirectBrInst>(BB->getTerminator())	assert(isa<IndirectBrInst>(TI) && "Unexpected terminator");
	&& "Unexpected terminator");
	assert(isa<BlockAddress>(Val) && "Expecting a constant blockaddress");	assert(isa<BlockAddress>(Val) && "Expecting a constant blockaddress");
	DestBB = cast<BlockAddress>(Val)->getBasicBlock();	DestBB = cast<BlockAddress>(Val)->getBasicBlock();
	}	}
Context not available.
	CondInst->getParent() == BB)	CondInst->getParent() == BB)
	ReplaceFoldableUses(CondInst, OnlyVal);	ReplaceFoldableUses(CondInst, OnlyVal);
	}	}
		SimplifyInstructionsInBlock(BB, TLI);
	return true;	return true;
	}	}
	}	}
Context not available.
	// Now that we know what the most popular destination is, factor all	// Now that we know what the most popular destination is, factor all
	// predecessors that will jump to it into a single predecessor.	// predecessors that will jump to it into a single predecessor.
	SmallVector<BasicBlock*, 16> PredsToFactor;	SmallVector<BasicBlock*, 16> PredsToFactor;
	for (const auto &PredToDest : PredToDestList)	Value *PredVal = nullptr;
	if (PredToDest.second == MostPopularDest) {	if (MostPopularDest) {
	BasicBlock *Pred = PredToDest.first;	for (const auto &PredToDest : PredToDestList)
		if (PredToDest.second == MostPopularDest) {
	// This predecessor may be a switch or something else that has multiple	BasicBlock *Pred = PredToDest.first;
	// edges to the block. Factor each of these edges by listing them
	// according to # occurrences in PredsToFactor.	// This predecessor may be a switch or something else that has multiple
	for (BasicBlock *Succ : successors(Pred))	// edges to the block. Factor each of these edges by listing them
	if (Succ == BB)	// according to # occurrences in PredsToFactor.
	PredsToFactor.push_back(Pred);	for (BasicBlock *Succ : successors(Pred))
		if (Succ == BB)
		PredsToFactor.push_back(Pred);
		}
		} else {
		LLVM_DEBUG(auto *TI = BB->getTerminator();
		assert(isa<ReturnInst>(TI)););
		auto *Pred = PredToDestList.begin()->first;
		PredsToFactor.push_back(Pred);
		// We need to find the pred val associated with the selected pred:
		for (const auto &PredValue : PredValues) {
		if (PredValue.second == Pred) {
		PredVal = PredValue.first;
		break;
		}
	}	}
		// FIXME: add support for factoring predecessors with
	// If the threadable edges are branching on an undefined value, we get to pick	// the same PredVal
	// the destination that these predecessors should get to.	assert(PredVal && "non null pred val expected!");
	if (!MostPopularDest)	}
	MostPopularDest = BB->getTerminator()->
	getSuccessor(GetBestDestForJumpOnUndef(BB));

	// Ok, try to thread it!	// Ok, try to thread it!
	return ThreadEdge(BB, PredsToFactor, MostPopularDest);	return ThreadEdge(BB, PredsToFactor, MostPopularDest, PredVal);
	}	}

	/// ProcessBranchOnPHI - We have an otherwise unthreadable conditional branch on	/// ProcessBranchOnPHI - We have an otherwise unthreadable conditional branch on
Context not available.
	DenseMap<Instruction , Value >	DenseMap<Instruction , Value >
	JumpThreadingPass::CloneInstructions(BasicBlock::iterator BI,	JumpThreadingPass::CloneInstructions(BasicBlock::iterator BI,
	BasicBlock::iterator BE, BasicBlock *NewBB,	BasicBlock::iterator BE, BasicBlock *NewBB,
	BasicBlock *PredBB) {	BasicBlock *PredBB,
		BasicBlock *SuccBB,
		Value *PredVal) {
	// We are going to have to map operands from the source basic block to the new	// We are going to have to map operands from the source basic block to the new
	// copy of the block 'NewBB'. If there are PHI nodes in the source basic	// copy of the block 'NewBB'. If there are PHI nodes in the source basic
	// block, evaluate them to account for entry from PredBB.	// block, evaluate them to account for entry from PredBB.
Context not available.
	// keeping track of the mapping and using it to remap operands in the cloned	// keeping track of the mapping and using it to remap operands in the cloned
	// instructions.	// instructions.
	for (; BI != BE; ++BI) {	for (; BI != BE; ++BI) {
		if (SuccBB && BI->isTerminator())
		break;
	Instruction *New = BI->clone();	Instruction *New = BI->clone();
		assert(!BI->isTerminator() \|\| isa<ReturnInst>(New));
	New->setName(BI->getName());	New->setName(BI->getName());
	NewBB->getInstList().push_back(New);	NewBB->getInstList().push_back(New);
	ValueMapping[&*BI] = New;	ValueMapping[&*BI] = New;
Context not available.
	if (I != ValueMapping.end())	if (I != ValueMapping.end())
	New->setOperand(i, I->second);	New->setOperand(i, I->second);
	}	}
		if (ReturnInst *RetInst = dyn_cast<ReturnInst>(New)) {
		assert(PredVal && !SuccBB);
		New->replaceUsesOfWith(RetInst->getReturnValue(), PredVal);
		}
	}	}

	return ValueMapping;	return ValueMapping;
Context not available.
	/// ThreadEdge - We have decided that it is safe and profitable to factor the	/// ThreadEdge - We have decided that it is safe and profitable to factor the
	/// blocks in PredBBs to one predecessor, then thread an edge from it to SuccBB	/// blocks in PredBBs to one predecessor, then thread an edge from it to SuccBB
	/// across BB. Transform the IR to reflect this change.	/// across BB. Transform the IR to reflect this change.
		/// When SuccBB is nullptr, there is no edge threading to be done. Instead, this
		/// method clones BB into the predecessor block to allow simplification of
		/// of instructions inside BB. For this case (SuccBB == nullptr) currently, only
		/// BB with return instruction is handled. FIXME: handle it for general case, and
		/// make this a utility.
	bool JumpThreadingPass::ThreadEdge(BasicBlock *BB,	bool JumpThreadingPass::ThreadEdge(BasicBlock *BB,
	const SmallVectorImpl<BasicBlock *> &PredBBs,	const SmallVectorImpl<BasicBlock *> &PredBBs,
	BasicBlock *SuccBB) {	BasicBlock SuccBB, Value PredVal) {
	// If threading to the same block as we come from, we would infinite loop.	// If threading to the same block as we come from, we would infinite loop.
	if (SuccBB == BB) {	if (SuccBB == BB) {
	LLVM_DEBUG(dbgs() << " Not threading across BB '" << BB->getName()	LLVM_DEBUG(dbgs() << " Not threading across BB '" << BB->getName()
Context not available.
	}	}

	// And finally, do it!	// And finally, do it!
	LLVM_DEBUG(dbgs() << " Threading edge from '" << PredBB->getName()	if (SuccBB)
	<< "' to '" << SuccBB->getName()	LLVM_DEBUG(dbgs() << " Threading edge from '" << PredBB->getName()
	<< "' with cost: " << JumpThreadCost	<< "' to '" << SuccBB->getName()
	<< ", across block:\n " << *BB << "\n");	<< "' with cost: " << JumpThreadCost << "\n");
		else
		LLVM_DEBUG(dbgs() << " Cloning BB '" << BB->getName() << "' into '"
		<< PredBB->getName() << "' with cost: " << JumpThreadCost
		<< ", across block:\n " << *BB << "\n");

	if (DTU->hasPendingDomTreeUpdates())	if (DTU->hasPendingDomTreeUpdates())
	LVI->disableDT();	LVI->disableDT();
Context not available.
	}	}

	// Copy all the instructions from BB to NewBB except the terminator.	// Copy all the instructions from BB to NewBB except the terminator.
		auto BE = BB->end();
		if (SuccBB)
		BE = std::prev(BE);
	DenseMap<Instruction , Value > ValueMapping =	DenseMap<Instruction , Value > ValueMapping =
	CloneInstructions(BB->begin(), std::prev(BB->end()), NewBB, PredBB);	CloneInstructions(BB->begin(), BE, NewBB, PredBB,
		SuccBB, PredVal);

	// We didn't copy the terminator from BB over to NewBB, because there is now	// We didn't copy the terminator from BB over to NewBB, because there is now
	// an unconditional jump to SuccBB. Insert the unconditional jump.	// an unconditional jump to SuccBB. Insert the unconditional jump.
	BranchInst *NewBI = BranchInst::Create(SuccBB, NewBB);	BranchInst *NewBI = SuccBB ? BranchInst::Create(SuccBB, NewBB) : nullptr;
	NewBI->setDebugLoc(BB->getTerminator()->getDebugLoc());	if (NewBI) {
		NewBI->setDebugLoc(BB->getTerminator()->getDebugLoc());
	// Check to see if SuccBB has PHI nodes. If so, we need to add entries to the	// Check to see if SuccBB has PHI nodes. If so, we need to add entries to
	// PHI nodes for NewBB now.	// the PHI nodes for NewBB now.
	AddPHINodeEntriesForMappedBlock(SuccBB, BB, NewBB, ValueMapping);	AddPHINodeEntriesForMappedBlock(SuccBB, BB, NewBB, ValueMapping);
		}

	// Update the terminator of PredBB to jump to NewBB instead of BB. This	// Update the terminator of PredBB to jump to NewBB instead of BB. This
	// eliminates predecessors from BB, which requires us to simplify any PHI	// eliminates predecessors from BB, which requires us to simplify any PHI
Context not available.
	SimplifyInstructionsInBlock(NewBB, TLI);	SimplifyInstructionsInBlock(NewBB, TLI);

	// Update the edge weight from BB to SuccBB, which should be less than before.	// Update the edge weight from BB to SuccBB, which should be less than before.
	UpdateBlockFreqAndEdgeWeight(PredBB, BB, NewBB, SuccBB);	if (NewBI)
		UpdateBlockFreqAndEdgeWeight(PredBB, BB, NewBB, SuccBB);

	// Threaded an edge!	// Threaded an edge!
	++NumThreads;	++NumThreads;
Context not available.
	if (LoopHeaders.count(BB))	if (LoopHeaders.count(BB))
	return false;	return false;

		auto IsConstOrAddr = [](Value *V) {
		if (isa<ConstantInt>(V))
		return true;
		V = V->stripInBoundsConstantOffsets();
		if (isa<ConstantArray>(V) \|\| isa<AllocaInst>(V) \|\| isa<GlobalVariable>(V))
		return true;
		return false;
		};

	for (BasicBlock::iterator BI = BB->begin();	for (BasicBlock::iterator BI = BB->begin();
	PHINode *PN = dyn_cast<PHINode>(BI); ++BI) {	PHINode *PN = dyn_cast<PHINode>(BI); ++BI) {
	// Look for a Phi having at least one constant incoming value.	// Look for a Phi having at least one constant incoming value.
	if (llvm::all_of(PN->incoming_values(),	if (llvm::none_of(PN->incoming_values(), IsConstOrAddr))
	[](Value *V) { return !isa<ConstantInt>(V); }))
	continue;	continue;

	auto isUnfoldCandidate = [BB](SelectInst SI, Value V) {	auto isUnfoldCandidate = [BB](SelectInst SI, Value V) {
Context not available.
	// Look for a ICmp in BB that compares PN with a constant and is the	// Look for a ICmp in BB that compares PN with a constant and is the
	// condition of a Select.	// condition of a Select.
	if (Cmp->getParent() == BB && Cmp->hasOneUse() &&	if (Cmp->getParent() == BB && Cmp->hasOneUse() &&
	isa<ConstantInt>(Cmp->getOperand(1 - U.getOperandNo())))	IsConstOrAddr(Cmp->getOperand(1 - U.getOperandNo())))
	if (SelectInst *SelectI = dyn_cast<SelectInst>(Cmp->user_back()))	if (SelectInst *SelectI = dyn_cast<SelectInst>(Cmp->user_back()))
	if (isUnfoldCandidate(SelectI, Cmp->use_begin()->get())) {	if (isUnfoldCandidate(SelectI, Cmp->use_begin()->get())) {
	SI = SelectI;	SI = SelectI;
Context not available.

llvm/test/Transforms/JumpThreading/addr.ll

This file was added.

				; RUN: opt -jump-threading -S < %s \| FileCheck %s
				%"struct.std::array" = type { [3 x i32] }

				@r = dso_local local_unnamed_addr global i32 0, align 4
				@_ZL1x = internal constant %"struct.std::array" { [3 x i32] [i32 1, i32 7, i32 17] }, align 4

				; Function Attrs: nounwind uwtable
				define dso_local void @foo1(i32 %arg) local_unnamed_addr #0 {
				bb:
				switch i32 %arg, label %bb1 [
				i32 1, label %bb4
				i32 7, label %bb3
				]

				bb1: ; preds = %bb
				%tmp = icmp eq i32 %arg, 17
				%tmp2 = select i1 %tmp, i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 2), i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				br label %bb4

				bb3: ; preds = %bb
				br label %bb4

				bb4: ; preds = %bb3, %bb1, %bb
				%tmp5 = phi i32* [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 0), %bb ], [ %tmp2, %bb1 ], [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 1), %bb3 ]
				; CHECK: [[VAL:%.]] = phi i32 [ 10,{{.}}], [ 20, {{.*}}]
				%tmp6 = icmp eq i32* %tmp5, getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				; CHECK-NOT: icmp
				; CHECK-NOT: select
				%tmp7 = select i1 %tmp6, i32 20, i32 10
				store i32 %tmp7, i32* @r, align 4
				; CHECK: store i32 [[VAL]], i32* @r
				ret void
				}

				; Function Attrs: nounwind uwtable
				define dso_local i32 @foo2(i32 %arg) local_unnamed_addr #0 {
				bb:
				%tmp = alloca [100 x i32], align 16
				%tmp1 = bitcast [100 x i32]* %tmp to i8*
				call void @llvm.lifetime.start.p0i8(i64 400, i8* nonnull %tmp1) #2
				switch i32 %arg, label %bb2 [
				i32 1, label %bb6
				i32 7, label %bb5
				]

				bb2: ; preds = %bb
				%tmp3 = icmp eq i32 %arg, 17
				%tmp4 = select i1 %tmp3, i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 2), i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				br label %bb6

				bb5: ; preds = %bb
				br label %bb6

				bb6: ; preds = %bb5, %bb2, %bb
				%tmp7 = phi i32* [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 0), %bb ], [ %tmp4, %bb2 ], [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 1), %bb5 ]
				; CHECK: [[VAL:%.]] = phi i32 [ 10,{{.}}], [ 20, {{.*}}]
				%tmp8 = icmp eq i32* %tmp7, getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				%tmp9 = select i1 %tmp8, i32 20, i32 10
				; CHECK-NOT: select
				store i32 %tmp9, i32* @r, align 4
				; CHECK: store i32 [[VAL]]
				%tmp10 = zext i32 %tmp9 to i64
				%tmp11 = getelementptr inbounds [100 x i32], [100 x i32]* %tmp, i64 0, i64 %tmp10
				store i32 10, i32* %tmp11, align 8
				%tmp12 = getelementptr inbounds [100 x i32], [100 x i32]* %tmp, i64 0, i64 10
				%tmp13 = load i32, i32* %tmp12, align 8
				call void @llvm.lifetime.end.p0i8(i64 400, i8* nonnull %tmp1) #2
				ret i32 %tmp13
				}

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1

				attributes #0 = { nounwind uwtable }
				attributes #1 = { argmemonly nounwind willreturn }
				attributes #2 = { nounwind }

llvm/test/Transforms/JumpThreading/return.ll

This file was added.

				; RUN: opt -jump-threading -S < %s \| FileCheck %s
				%"struct.std::array" = type { [3 x i32] }

				@_ZL1x = internal constant %"struct.std::array" { [3 x i32] [i32 1, i32 7, i32 17] }, align 4
				@g = dso_local global [10 x i32] zeroinitializer, align 16

				; Function Attrs: norecurse nounwind readonly uwtable
				define dso_local zeroext i1 @foo1(i32 %arg) local_unnamed_addr #0 {
				bb:
				switch i32 %arg, label %bb1 [
				i32 1, label %bb4
				i32 7, label %bb3
				]

				bb1: ; preds = %bb
				%tmp = icmp eq i32 %arg, 17
				%tmp2 = select i1 %tmp, i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 2), i32* getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				br label %bb4
				; CHECK: ret i1 true

				bb3: ; preds = %bb
				br label %bb4
				; CHECK: ret i1 true

				bb4: ; preds = %bb3, %bb1, %bb
				%tmp5 = phi i32* [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 0), %bb ], [ %tmp2, %bb1 ], [ getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 0, i32 0, i64 1), %bb3 ]
				%tmp6 = icmp ne i32* %tmp5, getelementptr inbounds (%"struct.std::array", %"struct.std::array"* @_ZL1x, i64 1, i32 0, i64 0)
				ret i1 %tmp6
				}

				; Function Attrs: norecurse nounwind readonly uwtable
				define dso_local zeroext i1 @foo2(i32 %arg, i32* readnone %arg1) local_unnamed_addr #0 {
				bb:
				%tmp = icmp sgt i32 %arg, 5
				br i1 %tmp, label %bb8, label %bb2
				; CHECK: ret i1 true

				bb2: ; preds = %bb
				%tmp3 = icmp sgt i32 %arg, 1
				br i1 %tmp3, label %bb8, label %bb4
				; CHECK: ret i1 false

				bb4: ; preds = %bb2
				%tmp5 = icmp eq i32 %arg, 1
				%tmp6 = getelementptr inbounds i32, i32* %arg1, i64 1
				%tmp7 = select i1 %tmp5, i32* %arg1, i32* %tmp6
				br label %bb8

				bb8: ; preds = %bb4, %bb2, %bb
				%tmp9 = phi i32* [ getelementptr inbounds ([10 x i32], [10 x i32]* @g, i64 0, i64 0), %bb ], [ getelementptr inbounds ([10 x i32], [10 x i32]* @g, i64 0, i64 1), %bb2 ], [ %tmp7, %bb4 ]
				%tmp10 = icmp eq i32* %tmp9, getelementptr inbounds ([10 x i32], [10 x i32]* @g, i64 0, i64 0)
				ret i1 %tmp10
				}

				define linkonce_odr hidden i1 @foo3() {
				entry:
				br label %land.rhs7

				land.rhs7: ; preds = %entry
				br i1 undef, label %land.rhs22, label %lor.lhs.false17

				lor.lhs.false17: ; preds = %land.rhs7
				unreachable

				land.rhs22: ; preds = %land.rhs7
				%tobool30 = icmp ne i32 undef, 0
				ret i1 %tobool30
				; CHECK: ret i1 undef
				}

				define linkonce_odr hidden i1 @foo4() {
				entry:
				%neg = and i32 undef, 2097152
				%tobool = icmp eq i32 %neg, 0
				br label %land.rhs7

				land.rhs7: ; preds = %entry
				br i1 %tobool, label %land.rhs22, label %lor.lhs.false17
				; CHECK: ret i1 false

				lor.lhs.false17: ; preds = %land.rhs7
				unreachable

				land.rhs22: ; preds = %land.rhs7
				%tobool30 = icmp ne i32 %neg, 0
				ret i1 %tobool30
				}

				define i1 @foo5() {
				bb:
				%l = load i8, i8* undef, align 8
				%i = icmp ne i8 %l, 0
				br i1 %i, label %t1, label %t2
				; CHECK: ret i1 false

				t1: ; preds = %bb
				unreachable

				t2: ; preds = %bb
				ret i1 %i
				}

				attributes #0 = { norecurse nounwind readonly uwtable }

This is an archive of the discontinued LLVM Phabricator instance.

JumpThreading: enhance JT to handle BB with no successor and address comparisonNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 229606

llvm/include/llvm/Transforms/Scalar/JumpThreading.h

llvm/lib/Transforms/Scalar/JumpThreading.cpp

llvm/test/Transforms/JumpThreading/addr.ll

llvm/test/Transforms/JumpThreading/return.ll

JumpThreading: enhance JT to handle BB with no successor and address comparison
Needs ReviewPublic