This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
CodeGenPrepare.cpp
-
test/
-
CodeGen/
-
ARM/
-
indirectbr.ll
-
MSP430/
-
indirectbr2.ll
-
PowerPC/
-
indirectbr.ll
-
Transforms/CodeGenPrepare/
-
CodeGenPrepare/
-
computedgoto.ll

Differential D29916

[CGP] Split some critical edges coming out of indirect branches
ClosedPublic

Authored by mkuper on Feb 13 2017, 3:25 PM.

Download Raw Diff

Details

Reviewers

echristo
efriedma

Commits

rG13bf8a2684a1: [CGP] Split some critical edges coming out of indirect branches
rG46b131e3f85b: [CGP] Split some critical edges coming out of indirect branches
rG12e79d50020b: [CGP] Split some critical edges coming out of indirect branches
rL296416: [CGP] Split some critical edges coming out of indirect branches
rL296149: [CGP] Split some critical edges coming out of indirect branches
rL296060: [CGP] Split some critical edges coming out of indirect branches

Summary

Split critical edges coming out of indirect branches, when it is easy to do: we match a "jump table" pattern where the jump table has local linkage, is used by exactly one indirectbr, and has no other uses.

Having a critical edge survive until MI means that when we go out of SSA, we have to define all the live-ins of a destination block within the origin block. Normally, MachineSink tries to split critical edges and sink these definitions into the destination block, but teaching it to split indirect edges on the MI level in a generic way looks problematic. So, instead, this tries to split these edges in IR.

This is motivated by the use of computed gotos in python 2.7 - PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. This causes us to emit about ~100 defs of registers containing constants, which we then fail to sink because the edge is critical (destination block has incoming edges from both the indirectbr and from a switch). So, at each goto, we "spill" about a hundred constants.
That end result is that a clang-compiled python interpreter is about ~2.5 (!) slower on a simple python reduction loop than a gcc-compiled interpreter.

Diff Detail

Repository: rL LLVM

Event Timeline

mkuper created this revision.Feb 13 2017, 3:25 PM

Herald added a subscriber: mehdi_amini. · View Herald TranscriptFeb 13 2017, 3:25 PM

Would it be possible to split the edges in some less fragile manner? I mean, in general you can "split" the edge from an indirectbr to a block with exactly one indirectbr predecessor by splitting the other predecessors.

In D29916#675732, @efriedma wrote:

Would it be possible to split the edges in some less fragile manner? I mean, in general you can "split" the edge from an indirectbr to a block with exactly one indirectbr predecessor by splitting the other predecessors.

I agree this is fragile in the sense that we need to hit a very specific pattern. But I don't have significantly better ideas.
I could try to generalize this, but I think we'd basically need to run both backward data-flow from the IBRs to make sure they're only fed by addresses from GVs (and at most one GV each), and forward data-flow from the GVs to make sure they aren't potentially written to. I'm not sure this is worth the trouble.

I don't really understand what you mean regarding splitting the other predecessors. I want to avoid having a phi on a block where one of the predecessor blocks on the phi is an indirectbr. I don't see how doing anything to the other incoming edges would help, you'll still end with this phi, just change the other predecessors. (The reason this matters only when the edge is critical is because if the indirectbr has only one successor, it doesn't matter where the defs end up living.)

Regardless, I'd like to also extend this to split cases where we have several indirectbr predecessors (of this form).

I don't really understand what you mean regarding splitting the other predecessors. I want to avoid having a phi on a block where one of the predecessor blocks on the phi is an indirectbr. I don't see how doing anything to the other incoming edges would help, you'll still end with this phi, just change the other predecessors.

Err, sorry, that was a terrible explanation; I started out thinking one thing, and writing something different.

Anyway, suppose you have a block B with one indirectbr predecessor and some other non-indirectbr predecessors. First, you split B before the first non-PHI instruction, making B2. Then you clone B, making B3. Then you change all the non-indirectbr predecessors of B point at B3. (Then you fix up the PHI nodes.) You now have exactly the same CFG as your code produces, without messing with the global variable.

Regardless, I'd like to also extend this to split cases where we have several indirectbr predecessors (of this form).

clang doesn't generate code like that, even if a function has multiple indirect goto statements. Have you seen this come up in practice?

In D29916#675801, @efriedma wrote:

I don't really understand what you mean regarding splitting the other predecessors. I want to avoid having a phi on a block where one of the predecessor blocks on the phi is an indirectbr. I don't see how doing anything to the other incoming edges would help, you'll still end with this phi, just change the other predecessors.

Err, sorry, that was a terrible explanation; I started out thinking one thing, and writing something different.

Anyway, suppose you have a block B with one indirectbr predecessor and some other non-indirectbr predecessors. First, you split B before the first non-PHI instruction, making B2. Then you clone B, making B3. Then you change all the non-indirectbr predecessors of B point at B3. (Then you fix up the PHI nodes.) You now have exactly the same CFG as your code produces, without messing with the global variable.

Oh, got it! That's really nice, thanks!

Regardless, I'd like to also extend this to split cases where we have several indirectbr predecessors (of this form).

clang doesn't generate code like that, even if a function has multiple indirect goto statements. Have you seen this come up in practice?

No, I haven't seen that, it was just the generalization step that seemed obvious to me. You're right, it's probably better to generalize in the direction you suggested above.
I'll try that and see if anything unexpected breaks.

Changed to work the way Eli suggested.

(I need to add a couple more tests, will do that if this really does look better.)

Yes, this approach looks good.

Can we add the Python interpreter loop in question to the testsuite so we don't regress this in the future?

Do we have any tests in the testsuite which use indirect goto which might be affected? If not, what else might be affected?

In D29916#677746, @efriedma wrote:

Yes, this approach looks good.

Can we add the Python interpreter loop in question to the testsuite so we don't regress this in the future?

I can try to craft a test that approximates the performance impact. I'm not sure it would make sense to add python itself to the test suite.

Do we have any tests in the testsuite which use indirect goto which might be affected? If not, what else might be affected?

I believe Eric saw this in the wild a while ago, I'm not entirely sure what the context is. But I think the main use-case of computed gotos is implementing interpreters. :-)

I haven't bothered with the test-suite, because I assumed this is rare enough not to appear. A cursory check shows we have one case that explicitly tests correctness of computed gotos (SingleSource/Regression/C/2004-03-15-IndirectGoto.c), but I doubt we have anything where this matters for performance. I'll check.

I've verified that 2004-03-15-IndirectGoto.c really is the only place in the test-suite where this fires.

(To be clear - that should have been "Eric *also* saw this in the wild", the python issue is very much not synthetic.)

Added two more tests.

In D29916#677819, @mkuper wrote:

I've verified that 2004-03-15-IndirectGoto.c really is the only place in the test-suite where this fires.

(To be clear - that should have been "Eric *also* saw this in the wild", the python issue is very much not synthetic.)

Yes. I saw this in the main interpreter loop in WebKit a number of years ago.

A gentle ping.

I can try to craft a test that approximates the performance impact. I'm not sure it would make sense to add python itself to the test suite.

The key is that we need some coverage for this fix; without a test, if indirectbr handling breaks somehow in the future, we won't know until someone complains about a huge performance regression a year later.

Patch looks fine, but I'd like to see perf numbers from the testsuite showing an improvement before we merge this.

In D29916#684036, @efriedma wrote:

I can try to craft a test that approximates the performance impact. I'm not sure it would make sense to add python itself to the test suite.

The key is that we need some coverage for this fix; without a test, if indirectbr handling breaks somehow in the future, we won't know until someone complains about a huge performance regression a year later.

I understand, I'm just not sure what you're suggesting. Would you like me to commit a synthetic test for this into the test-suite, to serve as a performance regression test?

Patch looks fine, but I'd like to see perf numbers from the testsuite showing an improvement before we merge this.

As I said above, this code-path only gets hit in exactly one source file in the testsuite, and that file doesn't belong to anything that can be used as a performance test: http://llvm.org/svn/llvm-project/test-suite/trunk/SingleSource/Regression/C/2004-03-15-IndirectGoto.c
So there are really no meaningful testsuite numbers to provide. I'm pretty sure computed gotos are really not common except when implementing interpreters, emulators, etc.

I can see whether it gets hit in SPEC, maybe perlbench...

Would you like me to commit a synthetic test for this into the test-suite, to serve as a performance regression test?

How big is Python, anyway? It might not be that ridiculous to put it into the testsuite.

Failing that, some artificial test which reflects the usage in Python would be okay, I guess.

In D29916#684203, @efriedma wrote:

Would you like me to commit a synthetic test for this into the test-suite, to serve as a performance regression test?

How big is Python, anyway? It might not be that ridiculous to put it into the testsuite.

It's not huge. On my machine, "make -j32" takes about 30 seconds wall time.
But it's still a fairly large yak (e.g. the build system is non-trivial, at lest to me - and it is not cmake-based. There's an external project that provides cmake build files, but I don't know how stable that is) . I would strongly prefer not to shave it at the moment.

A synthetic test would be based on something like this: https://gist.github.com/anonymous/652ff876293730601cccd1091f1ed446 modified to do something sane.

Tweaked patch a bit so that we always eliminate trivial phis.

Herald added a subscriber: nemanjai. · View Herald TranscriptFeb 22 2017, 6:07 PM

If there isn't any existing application you can easily import, a toy interpreter loop is fine.

mkuper mentioned this in D30313: [test-suite] Add regression test for indirect branch critical edge splitting.Feb 23 2017, 3:18 PM

Test-suite test is in. Let me know if you have any further comments on the patch itself.

LGTM.

This revision is now accepted and ready to land.Feb 23 2017, 4:13 PM

Closed by commit rL296060: [CGP] Split some critical edges coming out of indirect branches (authored by mkuper). · Explain WhyFeb 23 2017, 5:08 PM

This revision was automatically updated to reflect the committed changes.

pftbest mentioned this in D29069: [MSP430] Add SRet support to MSP430 target.Mar 1 2017, 2:13 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

CodeGenPrepare.cpp

155 lines

test/

CodeGen/

ARM/

indirectbr.ll

1 line

MSP430/

indirectbr2.ll

2 lines

PowerPC/

indirectbr.ll

36 lines

Transforms/

CodeGenPrepare/

computedgoto.ll

254 lines

Diff 89593

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

Show All 9 Lines
// This pass munges the code in the input function to better prepare it for		// This pass munges the code in the input function to better prepare it for
// SelectionDAG-based code generation. This works around limitations in it's		// SelectionDAG-based code generation. This works around limitations in it's
// basic-block-at-a-time approach. It should eventually be removed.		// basic-block-at-a-time approach. It should eventually be removed.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
Show All 18 Lines
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/BuildLibCalls.h"		#include "llvm/Transforms/Utils/BuildLibCalls.h"
#include "llvm/Transforms/Utils/BypassSlowDivision.h"		#include "llvm/Transforms/Utils/BypassSlowDivision.h"
		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/SimplifyLibCalls.h"		#include "llvm/Transforms/Utils/SimplifyLibCalls.h"
		#include "llvm/Transforms/Utils/ValueMapper.h"
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "codegenprepare"		#define DEBUG_TYPE "codegenprepare"

STATISTIC(NumBlocksElim, "Number of blocks eliminated");		STATISTIC(NumBlocksElim, "Number of blocks eliminated");
STATISTIC(NumPHIsElim, "Number of trivial PHIs eliminated");		STATISTIC(NumPHIsElim, "Number of trivial PHIs eliminated");
STATISTIC(NumGEPsElim, "Number of GEPs converted to casts");		STATISTIC(NumGEPsElim, "Number of GEPs converted to casts");
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	private:
bool dupRetToEnableTailCallOpts(BasicBlock *BB);		bool dupRetToEnableTailCallOpts(BasicBlock *BB);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,		bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,
Instruction *&Inst,		Instruction *&Inst,
const SmallVectorImpl<Instruction *> &Exts,		const SmallVectorImpl<Instruction *> &Exts,
unsigned CreatedInstCost);		unsigned CreatedInstCost);
bool splitBranchCondition(Function &F);		bool splitBranchCondition(Function &F);
bool simplifyOffsetableRelocate(Instruction &I);		bool simplifyOffsetableRelocate(Instruction &I);
		bool splitIndirectCriticalEdges(Function &F);
};		};
}		}

char CodeGenPrepare::ID = 0;		char CodeGenPrepare::ID = 0;
INITIALIZE_TM_PASS_BEGIN(CodeGenPrepare, "codegenprepare",		INITIALIZE_TM_PASS_BEGIN(CodeGenPrepare, "codegenprepare",
"Optimize for code generation", false, false)		"Optimize for code generation", false, false)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_TM_PASS_END(CodeGenPrepare, "codegenprepare",		INITIALIZE_TM_PASS_END(CodeGenPrepare, "codegenprepare",
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::runOnFunction(Function &F) {
// llvm.dbg.value is far away from the value then iSel may not be able		// llvm.dbg.value is far away from the value then iSel may not be able
// handle it properly. iSel will drop llvm.dbg.value if it can not		// handle it properly. iSel will drop llvm.dbg.value if it can not
// find a node corresponding to the value.		// find a node corresponding to the value.
EverMadeChange \|= placeDbgValues(F);		EverMadeChange \|= placeDbgValues(F);

if (!DisableBranchOpts)		if (!DisableBranchOpts)
EverMadeChange \|= splitBranchCondition(F);		EverMadeChange \|= splitBranchCondition(F);

		// Split some critical edges where one of the sources is an indirect branch,
		// to help generate sane code for PHIs involving such edges.
		EverMadeChange \|= splitIndirectCriticalEdges(F);

bool MadeChange = true;		bool MadeChange = true;
while (MadeChange) {		while (MadeChange) {
MadeChange = false;		MadeChange = false;
for (Function::iterator I = F.begin(); I != F.end(); ) {		for (Function::iterator I = F.begin(); I != F.end(); ) {
BasicBlock BB = &I++;		BasicBlock BB = &I++;
bool ModifiedDTOnIteration = false;		bool ModifiedDTOnIteration = false;
MadeChange \|= optimizeBlock(*BB, ModifiedDTOnIteration);		MadeChange \|= optimizeBlock(*BB, ModifiedDTOnIteration);

▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (DestBB == BB)
return nullptr;		return nullptr;

if (!canMergeBlocks(BB, DestBB))		if (!canMergeBlocks(BB, DestBB))
DestBB = nullptr;		DestBB = nullptr;

return DestBB;		return DestBB;
}		}

		// Return the unique indirectbr predecessor of a block. This may return null
		// even if such a predecessor exists, if it's not useful for splitting.
		// If a predecessor is found, OtherPreds will contain all other (non-indirectbr)
		// predecessors of BB.
		static BasicBlock *
		findIBRPredecessor(BasicBlock BB, SmallVectorImpl<BasicBlock > &OtherPreds) {
		// If the block doesn't have any PHIs, we don't care about it, since there's
		// no point in splitting it.
		PHINode *PN = dyn_cast<PHINode>(BB->begin());
		if (!PN)
		return nullptr;

		// Verify we have exactly one IBR predecessor.
		// Conservatively bail out if one of the other predecessors is not a "regular"
		// terminator (that is, not a switch or a br).
		BasicBlock *IBB = nullptr;
		for (unsigned Pred = 0, E = PN->getNumIncomingValues(); Pred != E; ++Pred) {
		BasicBlock *PredBB = PN->getIncomingBlock(Pred);
		TerminatorInst *PredTerm = PredBB->getTerminator();
		switch (PredTerm->getOpcode()) {
		case Instruction::IndirectBr:
		if (IBB)
		return nullptr;
		IBB = PredBB;
		break;
		case Instruction::Br:
		case Instruction::Switch:
		OtherPreds.push_back(PredBB);
		continue;
		default:
		return nullptr;
		}
		}

		return IBB;
		}

		// Split critical edges where the source of the edge is an indirectbr
		// instruction. This isn't always possible, but we can handle some easy cases.
		// This is useful because MI is unable to split such critical edges,
		// which means it will not be able to sink instructions along those edges.
		// This is especially painful for indirect branches with many successors, where
		// we end up having to prepare all outgoing values in the origin block.
		//
		// Our normal algorithm for splitting critical edges requires us to update
		// the outgoing edges of the edge origin block, but for an indirectbr this
		// is hard, since it would require finding and updating the block addresses
		// the indirect branch uses. But if a block only has a single indirectbr
		// predecessor, with the others being regular branches, we can do it in a
		// different way.
		// Say we have A -> D, B -> D, I -> D where only I -> D is an indirectbr.
		// We can split D into D0 and D1, where D0 contains only the PHIs from D,
		// and D1 is the D block body. We can then duplicate D0 as D0A and D0B, and
		// create the following structure:
		// A -> D0A, B -> D0A, I -> D0B, D0A -> D1, D0B -> D1
		bool CodeGenPrepare::splitIndirectCriticalEdges(Function &F) {
		// Check whether the function has any indirectbrs, and collect which blocks
		// they may jump to. Since most functions don't have indirect branches,
		// this lowers the common case's overhead to O(Blocks) instead of O(Edges).
		SmallSetVector<BasicBlock *, 16> Targets;
		for (auto &BB : F) {
		auto *IBI = dyn_cast<IndirectBrInst>(BB.getTerminator());
		if (!IBI)
		continue;

		for (unsigned Succ = 0, E = IBI->getNumSuccessors(); Succ != E; ++Succ)
		Targets.insert(IBI->getSuccessor(Succ));
		}

		if (Targets.empty())
		return false;

		bool Changed = false;
		for (BasicBlock *Target : Targets) {
		SmallVector<BasicBlock *, 16> OtherPreds;
		BasicBlock *IBRPred = findIBRPredecessor(Target, OtherPreds);
		if (!IBRPred)
		continue;

		// Don't even think about ehpads/landingpads.
		Instruction *FirstNonPHI = Target->getFirstNonPHI();
		if (FirstNonPHI->isEHPad() \|\| Target->isLandingPad())
		continue;

		BasicBlock *BodyBlock = Target->splitBasicBlock(FirstNonPHI, ".split");
		// It's possible Target was its own successor through an indirectbr.
		// In this case, the indirectbr now comes from BodyBlock.
		if (IBRPred == Target)
		IBRPred = BodyBlock;

		// At this point Target only has PHIs, and BodyBlock has the rest of the
		// block's body. Create a copy of Target that will be used by the "direct"
		// preds.
		ValueToValueMapTy VMap;
		BasicBlock *DirectSucc = CloneBasicBlock(Target, VMap, ".clone", &F);

		for (BasicBlock *Pred : OtherPreds)
		Pred->getTerminator()->replaceUsesOfWith(Target, DirectSucc);

		// Ok, now fix up the PHIs. We know the two blocks only have PHIs, and that
		// they are clones, so the number of PHIs are the same.
		// (a) Remove the edge coming from IBRPred from the "Direct" PHI
		// (b) Leave that as the only edge in the "Indirect" PHI.
		// (c) Merge the two in the body block.
		BasicBlock::iterator Indirect = Target->begin(),
		End = Target->getFirstNonPHI()->getIterator();
		BasicBlock::iterator Direct = DirectSucc->begin();
		BasicBlock::iterator MergeInsert = BodyBlock->getFirstInsertionPt();

		assert(&*End == Target->getTerminator() &&
		"Block was expected to only contain PHIs");

		while (Indirect != End) {
		PHINode *DirPHI = cast<PHINode>(Direct);
		PHINode *IndPHI = cast<PHINode>(Indirect);

		// Now, clean up - the direct block shouldn't get the indirect value,
		// and vice versa.
		DirPHI->removeIncomingValue(IBRPred);
		Direct++;

		// Advance the pointer here, to avoid invalidation issues when the old
		// PHI is erased.
		Indirect++;

		PHINode *NewIndPHI = PHINode::Create(IndPHI->getType(), 1, "ind", IndPHI);
		NewIndPHI->addIncoming(IndPHI->getIncomingValueForBlock(IBRPred),
		IBRPred);

		// Create a PHI in the body block, to merge the direct and indirect
		// predecessors.
		PHINode *MergePHI =
		PHINode::Create(IndPHI->getType(), 2, "merge", &*MergeInsert);
		MergePHI->addIncoming(NewIndPHI, Target);
		MergePHI->addIncoming(DirPHI, DirectSucc);

		IndPHI->replaceAllUsesWith(MergePHI);
		IndPHI->eraseFromParent();
		}

		Changed = true;
		}

		return Changed;
		}

/// Eliminate blocks that contain only PHI nodes, debug info directives, and an		/// Eliminate blocks that contain only PHI nodes, debug info directives, and an
/// unconditional branch. Passes before isel (e.g. LSR/loopsimplify) often split		/// unconditional branch. Passes before isel (e.g. LSR/loopsimplify) often split
/// edges in ways that are non-optimal for isel. Start by eliminating these		/// edges in ways that are non-optimal for isel. Start by eliminating these
/// blocks so we can split them the way we want them.		/// blocks so we can split them the way we want them.
bool CodeGenPrepare::eliminateMostlyEmptyBlocks(Function &F) {		bool CodeGenPrepare::eliminateMostlyEmptyBlocks(Function &F) {
SmallPtrSet<BasicBlock *, 16> Preheaders;		SmallPtrSet<BasicBlock *, 16> Preheaders;
SmallVector<Loop *, 16> LoopList(LI->begin(), LI->end());		SmallVector<Loop *, 16> LoopList(LI->begin(), LI->end());
while (!LoopList.empty()) {		while (!LoopList.empty()) {
▲ Show 20 Lines • Show All 5,529 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/indirectbr.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	L4: ; preds = %L5, %bb2
%res.0 = phi i32 [ 385, %L5 ], [ 35, %bb2 ] ; <i32> [#uses=1]		%res.0 = phi i32 [ 385, %L5 ], [ 35, %bb2 ] ; <i32> [#uses=1]
br label %L3		br label %L3

L3: ; preds = %L4, %bb2		L3: ; preds = %L4, %bb2
%res.1 = phi i32 [ %res.0, %L4 ], [ 5, %bb2 ] ; <i32> [#uses=1]		%res.1 = phi i32 [ %res.0, %L4 ], [ 5, %bb2 ] ; <i32> [#uses=1]
br label %L2		br label %L2

L2: ; preds = %L3, %bb2		L2: ; preds = %L3, %bb2
		; THUMB-LABEL: %L1.clone
; THUMB: muls		; THUMB: muls
%res.2 = phi i32 [ %res.1, %L3 ], [ 1, %bb2 ] ; <i32> [#uses=1]		%res.2 = phi i32 [ %res.1, %L3 ], [ 1, %bb2 ] ; <i32> [#uses=1]
%phitmp = mul i32 %res.2, 6 ; <i32> [#uses=1]		%phitmp = mul i32 %res.2, 6 ; <i32> [#uses=1]
br label %L1		br label %L1

L1: ; preds = %L2, %bb2		L1: ; preds = %L2, %bb2
%res.3 = phi i32 [ %phitmp, %L2 ], [ 2, %bb2 ] ; <i32> [#uses=1]		%res.3 = phi i32 [ %phitmp, %L2 ], [ 2, %bb2 ] ; <i32> [#uses=1]
; ARM-LABEL: %L1		; ARM-LABEL: %L1
Show All 18 Lines

llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll

	; RUN: llc -march=msp430 < %s \| FileCheck %s			; RUN: llc -march=msp430 < %s \| FileCheck %s
	@C.0.2070 = private constant [5 x i8] [i8 blockaddress(@foo, %L1), i8* blockaddress(@foo, %L2), i8* blockaddress(@foo, %L3), i8* blockaddress(@foo, %L4), i8* blockaddress(@foo, %L5)] ; <[5 x i8]> [#uses=1]			@C.0.2070 = private constant [5 x i8] [i8 blockaddress(@foo, %L1), i8* blockaddress(@foo, %L2), i8* blockaddress(@foo, %L3), i8* blockaddress(@foo, %L4), i8* blockaddress(@foo, %L5)] ; <[5 x i8]> [#uses=1]

	define internal i16 @foo(i16 %i) nounwind {			define internal i16 @foo(i16 %i) nounwind {
	entry:			entry:
	%tmp1 = getelementptr inbounds [5 x i8], [5 x i8]* @C.0.2070, i16 0, i16 %i ; <i8**> [#uses=1]			%tmp1 = getelementptr inbounds [5 x i8], [5 x i8]* @C.0.2070, i16 0, i16 %i ; <i8**> [#uses=1]
	%gotovar.4.0 = load i8, i8* %tmp1, align 4 ; <i8*> [#uses=1]			%gotovar.4.0 = load i8, i8* %tmp1, align 4 ; <i8*> [#uses=1]
	; CHECK: br .LC.0.2070(r12)			; CHECK: br .LC.0.2070(r15)
	indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]			indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]

	L5: ; preds = %bb2			L5: ; preds = %bb2
	br label %L4			br label %L4

	L4: ; preds = %L5, %bb2			L4: ; preds = %L5, %bb2
	%res.0 = phi i16 [ 385, %L5 ], [ 35, %entry ] ; <i16> [#uses=1]			%res.0 = phi i16 [ 385, %L5 ], [ 35, %entry ] ; <i16> [#uses=1]
	br label %L3			br label %L3
	Show All 13 Lines

llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll

	Show All 11 Lines
	entry:			entry:
	%0 = load i8, i8* @nextaddr, align 4 ; <i8*> [#uses=2]			%0 = load i8, i8* @nextaddr, align 4 ; <i8*> [#uses=2]
	%1 = icmp eq i8* %0, null ; <i1> [#uses=1]			%1 = icmp eq i8* %0, null ; <i1> [#uses=1]
	br i1 %1, label %bb3, label %bb2			br i1 %1, label %bb3, label %bb2

	bb2: ; preds = %entry, %bb3			bb2: ; preds = %entry, %bb3
	%gotovar.4.0 = phi i8* [ %gotovar.4.0.pre, %bb3 ], [ %0, %entry ] ; <i8*> [#uses=1]			%gotovar.4.0 = phi i8* [ %gotovar.4.0.pre, %bb3 ], [ %0, %entry ] ; <i8*> [#uses=1]
	; PIC: mtctr			; PIC: mtctr
	; PIC-NEXT: li
	; PIC-NEXT: li
	; PIC-NEXT: li
	; PIC-NEXT: li
	; PIC-NEXT: bctr			; PIC-NEXT: bctr
				; PIC: li
				; PIC: b LBB
				; PIC: li
				; PIC: b LBB
				; PIC: li
				; PIC: b LBB
				; PIC: li
				; PIC: b LBB
	; STATIC: mtctr			; STATIC: mtctr
	; STATIC-NEXT: li
	; STATIC-NEXT: li
	; STATIC-NEXT: li
	; STATIC-NEXT: li
	; STATIC-NEXT: bctr			; STATIC-NEXT: bctr
				; STATIC: li
				; STATIC: b LBB
				; STATIC: li
				; STATIC: b LBB
				; STATIC: li
				; STATIC: b LBB
				; STATIC: li
				; STATIC: b LBB
	; PPC64: mtctr			; PPC64: mtctr
	; PPC64-NEXT: li
	; PPC64-NEXT: li
	; PPC64-NEXT: li
	; PPC64-NEXT: li
	; PPC64-NEXT: bctr			; PPC64-NEXT: bctr
				; PPC64: li
				; PPC64: b LBB
				; PPC64: li
				; PPC64: b LBB
				; PPC64: li
				; PPC64: b LBB
				; PPC64: li
				; PPC64: b LBB
	indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]			indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]

	bb3: ; preds = %entry			bb3: ; preds = %entry
	%2 = getelementptr inbounds [5 x i8], [5 x i8]* @C.0.2070, i32 0, i32 %i ; <i8**> [#uses=1]			%2 = getelementptr inbounds [5 x i8], [5 x i8]* @C.0.2070, i32 0, i32 %i ; <i8**> [#uses=1]
	%gotovar.4.0.pre = load i8, i8* %2, align 4 ; <i8*> [#uses=1]			%gotovar.4.0.pre = load i8, i8* %2, align 4 ; <i8*> [#uses=1]
	br label %bb2			br label %bb2

	L5: ; preds = %bb2			L5: ; preds = %bb2
	Show All 27 Lines

llvm/trunk/test/Transforms/CodeGenPrepare/computedgoto.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -codegenprepare -S < %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				declare void @use(i32) local_unnamed_addr
				declare void @useptr([2 x i8]) local_unnamed_addr

				; CHECK: @simple.targets = constant [2 x i8] [i8 blockaddress(@simple, %bb0), i8* blockaddress(@simple, %bb1)], align 16
				@simple.targets = constant [2 x i8] [i8 blockaddress(@simple, %bb0), i8* blockaddress(@simple, %bb1)], align 16

				; CHECK: @multi.targets = constant [2 x i8] [i8 blockaddress(@multi, %bb0), i8* blockaddress(@multi, %bb1)], align 16
				@multi.targets = constant [2 x i8] [i8 blockaddress(@multi, %bb0), i8* blockaddress(@multi, %bb1)], align 16

				; CHECK: @loop.targets = constant [2 x i8] [i8 blockaddress(@loop, %bb0), i8* blockaddress(@loop, %bb1)], align 16
				@loop.targets = constant [2 x i8] [i8 blockaddress(@loop, %bb0), i8* blockaddress(@loop, %bb1)], align 16

				; CHECK: @nophi.targets = constant [2 x i8] [i8 blockaddress(@nophi, %bb0), i8* blockaddress(@nophi, %bb1)], align 16
				@nophi.targets = constant [2 x i8] [i8 blockaddress(@nophi, %bb0), i8* blockaddress(@nophi, %bb1)], align 16

				; Check that we break the critical edge when an jump table has only one use.
				define void @simple(i32* nocapture readonly %p) {
				; CHECK-LABEL: @simple(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[INITVAL:%.]] = load i32, i32 [[P]], align 4
				; CHECK-NEXT: [[INITOP:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
				; CHECK-NEXT: switch i32 [[INITOP]], label [[EXIT:%.*]] [
				; CHECK-NEXT: i32 0, label [[BB0_CLONE:%.*]]
				; CHECK-NEXT: i32 1, label [[BB1_CLONE:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb0:
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: .split:
				; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ [[PTR:%.]], [[BB0:%.]] ], [ [[INCDEC_PTR]], [[BB0_CLONE]] ]
				; CHECK-NEXT: [[MERGE2:%.*]] = phi i32 [ 0, [[BB0]] ], [ [[INITVAL]], [[BB0_CLONE]] ]
				; CHECK-NEXT: tail call void @use(i32 [[MERGE2]])
				; CHECK-NEXT: br label [[INDIRECTGOTO:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[DOTSPLIT3:%.*]]
				; CHECK: .split3:
				; CHECK-NEXT: [[MERGE5:%.]] = phi i32 [ [[PTR]], [[BB1:%.*]] ], [ [[INCDEC_PTR]], [[BB1_CLONE]] ]
				; CHECK-NEXT: [[MERGE7:%.*]] = phi i32 [ 1, [[BB1]] ], [ [[INITVAL]], [[BB1_CLONE]] ]
				; CHECK-NEXT: tail call void @use(i32 [[MERGE7]])
				; CHECK-NEXT: br label [[INDIRECTGOTO]]
				; CHECK: indirectgoto:
				; CHECK-NEXT: [[P_ADDR_SINK:%.]] = phi i32 [ [[MERGE5]], [[DOTSPLIT3]] ], [ [[MERGE]], [[DOTSPLIT]] ]
				; CHECK-NEXT: [[PTR]] = getelementptr inbounds i32, i32* [[P_ADDR_SINK]], i64 1
				; CHECK-NEXT: [[NEWP:%.]] = load i32, i32 [[P_ADDR_SINK]], align 4
				; CHECK-NEXT: [[IDX:%.*]] = sext i32 [[NEWP]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x i8], [2 x i8] @simple.targets, i64 0, i64 [[IDX]]
				; CHECK-NEXT: [[NEWOP:%.]] = load i8, i8** [[ARRAYIDX]], align 8
				; CHECK-NEXT: indirectbr i8* [[NEWOP]], [label [[BB0]], label %bb1]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				; CHECK: bb0.clone:
				; CHECK-NEXT: br label [[DOTSPLIT]]
				; CHECK: bb1.clone:
				; CHECK-NEXT: br label [[DOTSPLIT3]]
				;
				entry:
				%incdec.ptr = getelementptr inbounds i32, i32* %p, i64 1
				%initval = load i32, i32* %p, align 4
				%initop = load i32, i32* %incdec.ptr, align 4
				switch i32 %initop, label %exit [
				i32 0, label %bb0
				i32 1, label %bb1
				]

				bb0:
				%p.addr.0 = phi i32* [ %incdec.ptr, %entry ], [ %ptr, %indirectgoto ]
				%opcode.0 = phi i32 [ %initval, %entry ], [ 0, %indirectgoto ]
				tail call void @use(i32 %opcode.0)
				br label %indirectgoto

				bb1:
				%p.addr.1 = phi i32* [ %incdec.ptr, %entry ], [ %ptr, %indirectgoto ]
				%opcode.1 = phi i32 [ %initval, %entry ], [ 1, %indirectgoto ]
				tail call void @use(i32 %opcode.1)
				br label %indirectgoto

				indirectgoto:
				%p.addr.sink = phi i32* [ %p.addr.1, %bb1 ], [ %p.addr.0, %bb0 ]
				%ptr = getelementptr inbounds i32, i32* %p.addr.sink, i64 1
				%newp = load i32, i32* %p.addr.sink, align 4
				%idx = sext i32 %newp to i64
				%arrayidx = getelementptr inbounds [2 x i8], [2 x i8]* @simple.targets, i64 0, i64 %idx
				%newop = load i8, i8* %arrayidx, align 8
				indirectbr i8* %newop, [label %bb0, label %bb1]

				exit:
				ret void
				}

				; Don't try to break critical edges when several indirectbr point to a single block
				define void @multi(i32* nocapture readonly %p) {
				; CHECK-LABEL: @multi(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[INITVAL:%.]] = load i32, i32 [[P]], align 4
				; CHECK-NEXT: [[INITOP:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
				; CHECK-NEXT: switch i32 [[INITOP]], label [[EXIT:%.*]] [
				; CHECK-NEXT: i32 0, label [[BB0:%.*]]
				; CHECK-NEXT: i32 1, label [[BB1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb0:
				; CHECK-NEXT: [[P_ADDR_0:%.]] = phi i32 [ [[INCDEC_PTR]], [[ENTRY:%.]] ], [ [[NEXT0:%.]], [[BB0]] ], [ [[NEXT1:%.*]], [[BB1]] ]
				; CHECK-NEXT: [[OPCODE_0:%.*]] = phi i32 [ [[INITVAL]], [[ENTRY]] ], [ 0, [[BB0]] ], [ 1, [[BB1]] ]
				; CHECK-NEXT: tail call void @use(i32 [[OPCODE_0]])
				; CHECK-NEXT: [[NEXT0]] = getelementptr inbounds i32, i32* [[P_ADDR_0]], i64 1
				; CHECK-NEXT: [[NEWP0:%.]] = load i32, i32 [[P_ADDR_0]], align 4
				; CHECK-NEXT: [[IDX0:%.*]] = sext i32 [[NEWP0]] to i64
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds [2 x i8], [2 x i8] @multi.targets, i64 0, i64 [[IDX0]]
				; CHECK-NEXT: [[NEWOP0:%.]] = load i8, i8** [[ARRAYIDX0]], align 8
				; CHECK-NEXT: indirectbr i8* [[NEWOP0]], [label [[BB0]], label %bb1]
				; CHECK: bb1:
				; CHECK-NEXT: [[P_ADDR_1:%.]] = phi i32 [ [[INCDEC_PTR]], [[ENTRY]] ], [ [[NEXT0]], [[BB0]] ], [ [[NEXT1]], [[BB1]] ]
				; CHECK-NEXT: [[OPCODE_1:%.*]] = phi i32 [ [[INITVAL]], [[ENTRY]] ], [ 0, [[BB0]] ], [ 1, [[BB1]] ]
				; CHECK-NEXT: tail call void @use(i32 [[OPCODE_1]])
				; CHECK-NEXT: [[NEXT1]] = getelementptr inbounds i32, i32* [[P_ADDR_1]], i64 1
				; CHECK-NEXT: [[NEWP1:%.]] = load i32, i32 [[P_ADDR_1]], align 4
				; CHECK-NEXT: [[IDX1:%.*]] = sext i32 [[NEWP1]] to i64
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds [2 x i8], [2 x i8] @multi.targets, i64 0, i64 [[IDX1]]
				; CHECK-NEXT: [[NEWOP1:%.]] = load i8, i8** [[ARRAYIDX1]], align 8
				; CHECK-NEXT: indirectbr i8* [[NEWOP1]], [label [[BB0]], label %bb1]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%incdec.ptr = getelementptr inbounds i32, i32* %p, i64 1
				%initval = load i32, i32* %p, align 4
				%initop = load i32, i32* %incdec.ptr, align 4
				switch i32 %initop, label %exit [
				i32 0, label %bb0
				i32 1, label %bb1
				]

				bb0:
				%p.addr.0 = phi i32* [ %incdec.ptr, %entry ], [ %next0, %bb0 ], [ %next1, %bb1 ]
				%opcode.0 = phi i32 [ %initval, %entry ], [ 0, %bb0 ], [ 1, %bb1 ]
				tail call void @use(i32 %opcode.0)
				%next0 = getelementptr inbounds i32, i32* %p.addr.0, i64 1
				%newp0 = load i32, i32* %p.addr.0, align 4
				%idx0 = sext i32 %newp0 to i64
				%arrayidx0 = getelementptr inbounds [2 x i8], [2 x i8]* @multi.targets, i64 0, i64 %idx0
				%newop0 = load i8, i8* %arrayidx0, align 8
				indirectbr i8* %newop0, [label %bb0, label %bb1]

				bb1:
				%p.addr.1 = phi i32* [ %incdec.ptr, %entry ], [ %next0, %bb0 ], [ %next1, %bb1 ]
				%opcode.1 = phi i32 [ %initval, %entry ], [ 0, %bb0 ], [ 1, %bb1 ]
				tail call void @use(i32 %opcode.1)
				%next1 = getelementptr inbounds i32, i32* %p.addr.1, i64 1
				%newp1 = load i32, i32* %p.addr.1, align 4
				%idx1 = sext i32 %newp1 to i64
				%arrayidx1 = getelementptr inbounds [2 x i8], [2 x i8]* @multi.targets, i64 0, i64 %idx1
				%newop1 = load i8, i8* %arrayidx1, align 8
				indirectbr i8* %newop1, [label %bb0, label %bb1]

				exit:
				ret void
				}

				; Make sure we do the right thing for cases where the indirectbr branches to
				; the block it terminates.
				define void @loop(i64* nocapture readonly %p) {
				; CHECK-LABEL: @loop(
				; CHECK-NEXT: bb0.clone:
				; CHECK-NEXT: br label [[DOTSPLIT:%.*]]
				; CHECK: bb0:
				; CHECK-NEXT: br label [[DOTSPLIT]]
				; CHECK: .split:
				; CHECK-NEXT: [[MERGE:%.]] = phi i64 [ [[I_NEXT:%.]], [[BB0:%.]] ], [ 0, [[BB0_CLONE:%.]] ]
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, i64 [[P:%.*]], i64 [[MERGE]]
				; CHECK-NEXT: store i64 [[MERGE]], i64* [[TMP0]], align 4
				; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[MERGE]], 1
				; CHECK-NEXT: [[IDX:%.*]] = srem i64 [[MERGE]], 2
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x i8], [2 x i8] @loop.targets, i64 0, i64 [[IDX]]
				; CHECK-NEXT: [[TARGET:%.]] = load i8, i8** [[ARRAYIDX]], align 8
				; CHECK-NEXT: indirectbr i8* [[TARGET]], [label [[BB0]], label %bb1]
				; CHECK: bb1:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %bb0

				bb0:
				%i = phi i64 [ %i.next, %bb0 ], [ 0, %entry ]
				%tmp0 = getelementptr inbounds i64, i64* %p, i64 %i
				store i64 %i, i64* %tmp0, align 4
				%i.next = add nuw nsw i64 %i, 1
				%idx = srem i64 %i, 2
				%arrayidx = getelementptr inbounds [2 x i8], [2 x i8]* @loop.targets, i64 0, i64 %idx
				%target = load i8, i8* %arrayidx, align 8
				indirectbr i8* %target, [label %bb0, label %bb1]

				bb1:
				ret void
				}

				; Don't do anything for cases that contain no phis.
				define void @nophi(i32* %p) {
				; CHECK-LABEL: @nophi(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[INITOP:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
				; CHECK-NEXT: switch i32 [[INITOP]], label [[EXIT:%.*]] [
				; CHECK-NEXT: i32 0, label [[BB0:%.*]]
				; CHECK-NEXT: i32 1, label [[BB1:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb0:
				; CHECK-NEXT: tail call void @use(i32 0)
				; CHECK-NEXT: br label [[INDIRECTGOTO:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: tail call void @use(i32 1)
				; CHECK-NEXT: br label [[INDIRECTGOTO]]
				; CHECK: indirectgoto:
				; CHECK-NEXT: [[SUNKADDR:%.]] = ptrtoint i32 [[P]] to i64
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = add i64 [[SUNKADDR]], 4
				; CHECK-NEXT: [[SUNKADDR2:%.]] = inttoptr i64 [[SUNKADDR1]] to i32
				; CHECK-NEXT: [[NEWP:%.]] = load i32, i32 [[SUNKADDR2]], align 4
				; CHECK-NEXT: [[IDX:%.*]] = sext i32 [[NEWP]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x i8], [2 x i8] @nophi.targets, i64 0, i64 [[IDX]]
				; CHECK-NEXT: [[NEWOP:%.]] = load i8, i8** [[ARRAYIDX]], align 8
				; CHECK-NEXT: indirectbr i8* [[NEWOP]], [label [[BB0]], label %bb1]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%incdec.ptr = getelementptr inbounds i32, i32* %p, i64 1
				%initop = load i32, i32* %incdec.ptr, align 4
				switch i32 %initop, label %exit [
				i32 0, label %bb0
				i32 1, label %bb1
				]

				bb0:
				tail call void @use(i32 0)
				br label %indirectgoto

				bb1:
				tail call void @use(i32 1)
				br label %indirectgoto

				indirectgoto:
				%newp = load i32, i32* %incdec.ptr, align 4
				%idx = sext i32 %newp to i64
				%arrayidx = getelementptr inbounds [2 x i8], [2 x i8]* @nophi.targets, i64 0, i64 %idx
				%newop = load i8, i8* %arrayidx, align 8
				indirectbr i8* %newop, [label %bb0, label %bb1]

				exit:
				ret void
				}