This is an archive of the discontinued LLVM Phabricator instance.

[Loads] Skip non load/store instructions when finding available load
Needs ReviewPublic

Authored by tejohnson on Apr 29 2021, 10:32 AM.

Download Raw Diff

Details

Reviewers

nikic
lebedev.ri

Summary

Detect and skip non load and store instructions, up to a limit,
when finding available loads for a pointer. The scan limit is only 6
instructions which is fairly small, so for example type test and assume
intrinsics inserted for whole program devirtualization would prevent
analysis, sometimes causing profile matching issues. On the other hand,
debug and pseudo instructions were skipped indefinitely.

We now skip all of the unrelated instruction types when scanning for an
available load or intervening memory write, up to a default limit of 32
instructions. This supercedes the unlimited skipping of debug and pseudo
instructions that previously existed.

This exposed an issue where a caller from jump threading could end up
infinitely looping due to an unreachable loop with a single predecessor
(the backedge). It was not exposed before because the branch instruction
was counted towards the scan limit of 6 instructions, so iteration would
terminate. Fix this and added a test case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Apr 29 2021, 10:32 AM

Herald added subscribers: dexonsmith, wenlei, hiraditya, Prazek. · View Herald TranscriptApr 29 2021, 10:32 AM

tejohnson requested review of this revision.Apr 29 2021, 10:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 10:32 AM

Harbormaster completed remote builds in B101680: Diff 341572.Apr 29 2021, 11:43 AM

Not a big fan of this -- this is effectively whitelisting a specific pattern, and doesn't generalize.

For this code, there are two limits of interest: The total number of instructions we look at, and the number of alias-analysis queries we perform. Currently we limit both through a small number of total instructions. However, the only expensive part, and what needs to be aggressively limited, is the number of alias analysis queries.

I think what we can do here is to have a relatively liberal upper limit on the number of scanned instructions (say 32), while having a tight limit on the number of AA queries or AA query candidates (say 4). This should make the code more resilient against additional bitcast/assume instructions in between, while still limiting compile-time impact.

Do you think something along those lines would address your motivation here?

In D101553#2726674, @nikic wrote:

Not a big fan of this -- this is effectively whitelisting a specific pattern, and doesn't generalize.

For this code, there are two limits of interest: The total number of instructions we look at, and the number of alias-analysis queries we perform. Currently we limit both through a small number of total instructions. However, the only expensive part, and what needs to be aggressively limited, is the number of alias analysis queries.

There's sort of 3 categories of instructions from what I can tell, in increasing order of expense:

Insts that are not loads or stores and don't modify memory - these can be skipped immediately
Loads and stores which are compared to the given pointer by comparing the pointer values but not with AA (in getAvailableLoadStore)
(Subset of category 2) Instructions that are compared using AA analysis (insts that may write memory)

I think what we can do here is to have a relatively liberal upper limit on the number of scanned instructions (say 32), while having a tight limit on the number of AA queries or AA query candidates (say 4). This should make the code more resilient against additional bitcast/assume instructions in between, while still limiting compile-time impact.

Do you think something along those lines would address your motivation here?

It would help, but we could still end up with corner cases where with -fwhole-program-vtables enabled for the optimization build we might end up with IR profile matching issues due to different simplification optimizations prior. Although with a larger window the chances of that would presumably be reduced.

One thing I am wondering is whether for the first category above (non load/store/may write memory), if there needs to be any limit since they can be skipped immediately (e.g. instead of just dbg or pseudo insts as it does now). I am not sure how expensive it would be to allow quickly skipping through all of them in the block as we scan for memory accesses. I did some measurements for a large binary build with ~20k input files. With that change, essentially to convert:

if (isa<DbgInfoIntrinsic>(Inst))
  continue;

if (!isa<LoadInst>(&Inst) && !isa<StoreInst>(&Inst) && !Inst.mayWriteToMemory())
  continue;

across all the input files there are about 40% more iterations of the loop in findAvailablePtrLoadStore and about 47% more iterations in FindAvailableLoadedValue. I left the existing default scan limit of 6 for the loads/stores/maywrite insts, so these were largely just skipping through without any analysis. I didn't compare the compile times of all of the individual files in detail, but the slowest compiling files did not increase in compile time nor did the overall build.

If that isn't acceptable, perhaps a larger limit such as 100? Note this could either subsume the existing unlimited skipping of dbg and pseudo instructions, or allow those to be skipped unconditionally as now. Right now with the current skipping of all dbg and pseudo instructions, the max iteration of this loop is ~1K, so if we include those instructions in the limit it will have a potentially negative effect on optimization over the status quo.

Prazek added inline comments.May 2 2021, 6:22 AM

llvm/lib/Analysis/Loads.cpp
649–655	nit: This comment (which is very useful) is a big duplication in code. What do you think of wrapping this chunk of code to a function, so that either logic, or the comment will not get out of sync? Something like: if (isUnrelatedToLoadOrStore(Inst)) // Probably could be named better. continue;

tejohnson added inline comments.May 4 2021, 11:47 AM

llvm/lib/Analysis/Loads.cpp
649–655	I ended up simplifying this code which meant it no longer required a big comment about these particular instruction types.

tejohnson retitled this revision from [Loads] Ignore type test assume sequences inserted for devirtualization to [Loads] Skip non load/store instructions when finding available load.May 4 2021, 11:48 AM

tejohnson edited the summary of this revision. (Show Details)

Update patch as discussed, to skip a larger number of non ld/st instructions

ping

In D101553#2726674, @nikic wrote:

Not a big fan of this -- this is effectively whitelisting a specific pattern, and doesn't generalize.

For this code, there are two limits of interest: The total number of instructions we look at, and the number of alias-analysis queries we perform. Currently we limit both through a small number of total instructions. However, the only expensive part, and what needs to be aggressively limited, is the number of alias analysis queries.

I think what we can do here is to have a relatively liberal upper limit on the number of scanned instructions (say 32), while having a tight limit on the number of AA queries or AA query candidates (say 4). This should make the code more resilient against additional bitcast/assume instructions in between, while still limiting compile-time impact.

Do you think something along those lines would address your motivation here?

@nikic ping, I believe I implemented your suggested alternative, ptal

nikic mentioned this in D129003: [WPD] Filter out intrinsics inserted from whole-program-vtables.Jul 1 2022, 2:18 PM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:30 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicInst.h

11 lines

lib/

Analysis/

Loads.cpp

47 lines

Transforms/

Scalar/

JumpThreading.cpp

5 lines

test/

Transforms/

InstCombine/

load.ll

37 lines

JumpThreading/

thread-loads.ll

129 lines

Diff 342812

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 1,276 Lines • ▼ Show 20 Lines	public:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::assume;		return I->getIntrinsicID() == Intrinsic::assume;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This represents the llvm.type.test intrinsic.
		class TypeTestInst : public IntrinsicInst {
		public:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::type_test;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_INTRINSICINST_H		#endif // LLVM_IR_INTRINSICINST_H

llvm/lib/Analysis/Loads.cpp

Show First 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	bool llvm::isSafeToLoadUnconditionally(Value V, Type Ty, Align Alignment,
const DataLayout &DL,		const DataLayout &DL,
Instruction *ScanFrom,		Instruction *ScanFrom,
const DominatorTree *DT,		const DominatorTree *DT,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
APInt Size(DL.getIndexTypeSizeInBits(V->getType()), DL.getTypeStoreSize(Ty));		APInt Size(DL.getIndexTypeSizeInBits(V->getType()), DL.getTypeStoreSize(Ty));
return isSafeToLoadUnconditionally(V, Alignment, Size, DL, ScanFrom, DT, TLI);		return isSafeToLoadUnconditionally(V, Alignment, Size, DL, ScanFrom, DT, TLI);
}		}

/// DefMaxInstsToScan - the default number of maximum instructions		/// DefMaxInstsToScan - the default number of maximum ld/st instructions
/// to scan in the block, used by FindAvailableLoadedValue().		/// to scan in the block, used by FindAvailableLoadedValue().
/// FindAvailableLoadedValue() was introduced in r60148, to improve jump		/// FindAvailableLoadedValue() was introduced in r60148, to improve jump
/// threading in part by eliminating partially redundant loads.		/// threading in part by eliminating partially redundant loads.
/// At that point, the value of MaxInstsToScan was already set to '6'		/// At that point, the value of MaxInstsToScan was already set to '6'
/// without documented explanation.		/// without documented explanation.
cl::opt<unsigned>		cl::opt<unsigned> llvm::DefMaxInstsToScan(
llvm::DefMaxInstsToScan("available-load-scan-limit", cl::init(6), cl::Hidden,		"available-load-scan-limit", cl::init(6), cl::Hidden,
cl::desc("Use this to specify the default maximum number of instructions "		cl::desc(
		"Use this to specify the default maximum number of ld/st instructions "
"to scan backward from a given instruction, when searching for "		"to scan backward from a given instruction, when searching for "
"available loaded value"));		"available loaded value"));

		// The default maximum number of non ld/st instructions to skip when finding
		// available loads.
		static cl::opt<unsigned>
		DefMaxInstsToSkip("available-load-skip-limit", cl::init(32), cl::Hidden,
		cl::desc("Use this to specify the default maximum number "
		"of non-ld/st instructions "
		"to skip when scanning backward from a given "
		"instruction, when searching for "
		"available loaded value"));

Value llvm::FindAvailableLoadedValue(LoadInst Load,		Value llvm::FindAvailableLoadedValue(LoadInst Load,
BasicBlock *ScanBB,		BasicBlock *ScanBB,
BasicBlock::iterator &ScanFrom,		BasicBlock::iterator &ScanFrom,
unsigned MaxInstsToScan,		unsigned MaxInstsToScan,
AAResults AA, bool IsLoad,		AAResults AA, bool IsLoad,
unsigned *NumScanedInst) {		unsigned *NumScanedInst) {
// Don't CSE load that is volatile or anything stronger than unordered.		// Don't CSE load that is volatile or anything stronger than unordered.
if (!Load->isUnordered())		if (!Load->isUnordered())
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
}		}

Value *llvm::findAvailablePtrLoadStore(		Value *llvm::findAvailablePtrLoadStore(
const MemoryLocation &Loc, Type *AccessTy, bool AtLeastAtomic,		const MemoryLocation &Loc, Type *AccessTy, bool AtLeastAtomic,
BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan,		BasicBlock *ScanBB, BasicBlock::iterator &ScanFrom, unsigned MaxInstsToScan,
AAResults AA, bool IsLoadCSE, unsigned *NumScanedInst) {		AAResults AA, bool IsLoadCSE, unsigned *NumScanedInst) {
if (MaxInstsToScan == 0)		if (MaxInstsToScan == 0)
MaxInstsToScan = ~0U;		MaxInstsToScan = ~0U;
		// If the max insts to scan is larger, presumably the caller wanted to give a
		// larger budget (potentially unlimited), so bump up the insts to skip
		// accordingly.
		int MaxNonLdSt =
		DefMaxInstsToSkip > MaxInstsToScan ? DefMaxInstsToSkip : MaxInstsToScan;

const DataLayout &DL = ScanBB->getModule()->getDataLayout();		const DataLayout &DL = ScanBB->getModule()->getDataLayout();
const Value *StrippedPtr = Loc.Ptr->stripPointerCasts();		const Value *StrippedPtr = Loc.Ptr->stripPointerCasts();

while (ScanFrom != ScanBB->begin()) {		while (ScanFrom != ScanBB->begin()) {
// We must ignore debug info directives when counting (otherwise they		// We must ignore debug info directives when counting (otherwise they
// would affect codegen).		// would affect codegen).
Instruction Inst = &--ScanFrom;		Instruction Inst = &--ScanFrom;
if (Inst->isDebugOrPseudoInst())		if (!isa<LoadInst>(Inst) && !isa<StoreInst>(Inst) &&
		!Inst->mayWriteToMemory()) {
		if (MaxNonLdSt-- == 0) {
		return nullptr;
		}
continue;		continue;
		}

// Restore ScanFrom to expected value in case next test succeeds		// Restore ScanFrom to expected value in case next test succeeds
ScanFrom++;		ScanFrom++;

if (NumScanedInst)		if (NumScanedInst)
++(*NumScanedInst);		++(*NumScanedInst);

// Don't scan huge blocks.		// Don't scan huge blocks.
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	Value llvm::FindAvailableLoadedValue(LoadInst Load, AAResults &AA,

if (!Load->isUnordered())		if (!Load->isUnordered())
return nullptr;		return nullptr;

// Try to find an available value first, and delay expensive alias analysis		// Try to find an available value first, and delay expensive alias analysis
// queries until later.		// queries until later.
Value *Available = nullptr;;		Value *Available = nullptr;;
SmallVector<Instruction *> MustNotAliasInsts;		SmallVector<Instruction *> MustNotAliasInsts;
		// If the max insts to scan is larger, presumably the caller wanted to give a
		// larger budget (potentially unlimited), so bump up the insts to skip
		// accordingly.
		int MaxNonLdSt =
		DefMaxInstsToSkip > MaxInstsToScan ? DefMaxInstsToSkip : MaxInstsToScan;
for (Instruction &Inst : make_range(++Load->getReverseIterator(),		for (Instruction &Inst : make_range(++Load->getReverseIterator(),
ScanBB->rend())) {		ScanBB->rend())) {
if (Inst.isDebugOrPseudoInst())		if (!isa<LoadInst>(&Inst) && !isa<StoreInst>(&Inst) &&
		!Inst.mayWriteToMemory()) {
		if (MaxNonLdSt-- == 0) {
		return nullptr;
		}
continue;		continue;
		}
		PrazekUnsubmitted Not Done Reply Inline Actions nit: This comment (which is very useful) is a big duplication in code. What do you think of wrapping this chunk of code to a function, so that either logic, or the comment will not get out of sync? Something like: if (isUnrelatedToLoadOrStore(Inst)) // Probably could be named better. continue; Prazek: nit: This comment (which is very useful) is a big duplication in code. What do you think of…
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions I ended up simplifying this code which meant it no longer required a big comment about these particular instruction types. tejohnson: I ended up simplifying this code which meant it no longer required a big comment about these…

if (MaxInstsToScan-- == 0)		if (MaxInstsToScan-- == 0)
return nullptr;		return nullptr;

Available = getAvailableLoadStore(&Inst, StrippedPtr, AccessTy,		Available = getAvailableLoadStore(&Inst, StrippedPtr, AccessTy,
AtLeastAtomic, DL, IsLoadCSE);		AtLeastAtomic, DL, IsLoadCSE);
if (Available)		if (Available)
break;		break;
Show All 36 Lines

llvm/lib/Transforms/Scalar/JumpThreading.cpp

Show First 20 Lines • Show All 1,398 Lines • ▼ Show 20 Lines	MemoryLocation Loc(LoadedPtr->DoPHITranslation(LoadBB, PredBB),
AATags);		AATags);
PredAvailable = findAvailablePtrLoadStore(Loc, AccessTy, LoadI->isAtomic(),		PredAvailable = findAvailablePtrLoadStore(Loc, AccessTy, LoadI->isAtomic(),
PredBB, BBIt, DefMaxInstsToScan,		PredBB, BBIt, DefMaxInstsToScan,
AA, &IsLoadCSE, &NumScanedInst);		AA, &IsLoadCSE, &NumScanedInst);

// If PredBB has a single predecessor, continue scanning through the		// If PredBB has a single predecessor, continue scanning through the
// single predecessor.		// single predecessor.
BasicBlock *SinglePredBB = PredBB;		BasicBlock *SinglePredBB = PredBB;
		// Make sure we don't walk backwards into an unreachable loop.
		SmallPtrSet<BasicBlock *, 16> Visited;
while (!PredAvailable && SinglePredBB && BBIt == SinglePredBB->begin() &&		while (!PredAvailable && SinglePredBB && BBIt == SinglePredBB->begin() &&
NumScanedInst < DefMaxInstsToScan) {		NumScanedInst < DefMaxInstsToScan &&
		Visited.insert(SinglePredBB).second) {
SinglePredBB = SinglePredBB->getSinglePredecessor();		SinglePredBB = SinglePredBB->getSinglePredecessor();
if (SinglePredBB) {		if (SinglePredBB) {
BBIt = SinglePredBB->end();		BBIt = SinglePredBB->end();
PredAvailable = findAvailablePtrLoadStore(		PredAvailable = findAvailablePtrLoadStore(
Loc, AccessTy, LoadI->isAtomic(), SinglePredBB, BBIt,		Loc, AccessTy, LoadI->isAtomic(), SinglePredBB, BBIt,
(DefMaxInstsToScan - NumScanedInst), AA, &IsLoadCSE,		(DefMaxInstsToScan - NumScanedInst), AA, &IsLoadCSE,
&NumScanedInst);		&NumScanedInst);
}		}
▲ Show 20 Lines • Show All 1,659 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; Set available-load-scan-limit to the current default of 6 so that the test
	; RUN: opt -passes=instcombine -S < %s \| FileCheck %s			; works the same if the limit is adjusted in the future.
				; RUN: opt -instcombine -available-load-scan-limit=6 -S < %s \| FileCheck %s
				; RUN: opt -passes=instcombine -available-load-scan-limit=6 -S < %s \| FileCheck %s

	target datalayout = "e-m:e-p:64:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-p:64:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	@X = constant i32 42 ; <i32*> [#uses=2]			@X = constant i32 42 ; <i32*> [#uses=2]
	@X2 = constant i32 47 ; <i32*> [#uses=1]			@X2 = constant i32 47 ; <i32*> [#uses=1]
	@Y = constant [2 x { i32, float }] [ { i32, float } { i32 12, float 1.000000e+00 }, { i32, float } { i32 37, float 0x3FF3B2FEC0000000 } ] ; <[2 x { i32, float }]*> [#uses=2]			@Y = constant [2 x { i32, float }] [ { i32, float } { i32 12, float 1.000000e+00 }, { i32, float } { i32 37, float 0x3FF3B2FEC0000000 } ] ; <[2 x { i32, float }]*> [#uses=2]
	@Z = constant [2 x { i32, float }] zeroinitializer ; <[2 x { i32, float }]*> [#uses=1]			@Z = constant [2 x { i32, float }] zeroinitializer ; <[2 x { i32, float }]*> [#uses=1]

	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store i32 1, i32* [[P:%.*]], align 4			; CHECK-NEXT: store i32 1, i32* [[P:%.*]], align 4
	; CHECK-NEXT: ret i32 1			; CHECK-NEXT: ret i32 1
	;			;
	store i32 1, i32* %P			store i32 1, i32* %P
	%X = load i32, i32* %P ; <i32> [#uses=1]			%X = load i32, i32* %P ; <i32> [#uses=1]
	ret i32 %X			ret i32 %X
	}			}

				declare void @llvm.assume(i1)
				declare i1 @llvm.type.test(i8*, metadata) nounwind readnone

				; Ensure that load store forwarding not prevented by the bitcast / type test /
				; assume sequences inserted for whole program devirtualization.
				define i32 @test8_type_test_assume(i32* %P, [3 x i8] %vtable) {
				; CHECK-LABEL: @test8_type_test_assume(
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]], align 4
				; CHECK-NEXT: bitcast [3 x i8] %vtable to i8*
				; CHECK-NEXT: call i1 @llvm.type.test
				; CHECK-NEXT: call void @llvm.assume
				; CHECK-NEXT: bitcast [3 x i8] %vtable to i8*
				; CHECK-NEXT: call i1 @llvm.type.test
				; CHECK-NEXT: call void @llvm.assume
				; CHECK-NEXT: ret i32 1
				;
				store i32 1, i32* %P

				; Insert 2 bitcast / type test / assume sequences so that we would be above
				; the scan limit of 6 if they were not ignored.
				%vtablei8 = bitcast [3 x i8] %vtable to i8*
				%p = call i1 @llvm.type.test(i8* %vtablei8, metadata !"foo")
				tail call void @llvm.assume(i1 %p)
				%vtablei82 = bitcast [3 x i8] %vtable to i8*
				%p2 = call i1 @llvm.type.test(i8* %vtablei82, metadata !"foo")
				tail call void @llvm.assume(i1 %p2)

				%X = load i32, i32* %P ; <i32> [#uses=1]
				ret i32 %X
				}

	define i32 @test9(i32* %P) {			define i32 @test9(i32* %P) {
	; CHECK-LABEL: @test9(			; CHECK-LABEL: @test9(
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%X = load i32, i32* %P ; <i32> [#uses=1]			%X = load i32, i32* %P ; <i32> [#uses=1]
	%Y = load i32, i32* %P ; <i32> [#uses=1]			%Y = load i32, i32* %P ; <i32> [#uses=1]
	%Z = sub i32 %X, %Y ; <i32> [#uses=1]			%Z = sub i32 %X, %Y ; <i32> [#uses=1]
	ret i32 %Z			ret i32 %Z
	▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/test/Transforms/JumpThreading/thread-loads.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -jump-threading -S \| FileCheck %s			; Set available-load-scan-limit to the current default of 6 so that the test
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=jump-threading -S \| FileCheck %s			; works the same if the limit is adjusted in the future.
				; RUN: opt < %s -jump-threading -available-load-scan-limit=6 -S \| FileCheck %s
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=jump-threading -available-load-scan-limit=6 -S \| FileCheck %s
				; Try again with a low skip limit.
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=jump-threading -available-load-scan-limit=6 -available-load-skip-limit=6 -S \| FileCheck %s --check-prefix=SKIPLIM6

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
	target triple = "i386-apple-darwin7"			target triple = "i386-apple-darwin7"

	; Test that we can thread through the block with the partially redundant load (%2).			; Test that we can thread through the block with the partially redundant load (%2).
	; rdar://6402033			; rdar://6402033
	define i32 @test1(i32* %P) nounwind {			define i32 @test1(i32* %P) nounwind {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	▲ Show 20 Lines • Show All 665 Lines • ▼ Show 20 Lines

	left_x:			left_x:
	ret i32 20			ret i32 20

	right_x:			right_x:
	ret i32 10			ret i32 10
	}			}

				declare void @llvm.assume(i1)
				declare i1 @llvm.type.test(i8*, metadata) nounwind readnone

				; Test that we can thread through the block with the partially redundant load (%2),
				; ensuring that the threading is not prevented by the bitcast / type test /
				; assume sequences inserted for whole program devirtualization.
				define i32 @test1_type_test_assume(i32* %P, [3 x i8] %vtable) nounwind {
				; CHECK-LABEL: @test1_type_test_assume(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = tail call i32 (...) @f1() #[[ATTR0:[0-9]+]]
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[TMP0]], 0
				; CHECK-NEXT: br i1 [[TMP1]], label [[BB1:%.]], label [[BB1_THREAD:%.]]
				; CHECK: bb1.thread:
				; CHECK-NEXT: store i32 42, i32* [[P:%.*]], align 4
				; CHECK-NEXT: bitcast [3 x i8] %vtable to i8*
				; CHECK-NEXT: call i1 @llvm.type.test
				; CHECK-NEXT: call void @llvm.assume
				; CHECK-NEXT: bitcast [3 x i8] %vtable to i8*
				; CHECK-NEXT: call i1 @llvm.type.test
				; CHECK-NEXT: call void @llvm.assume
				; CHECK-NEXT: bitcast [3 x i8] %vtable to i8*
				; CHECK-NEXT: call i1 @llvm.type.test
				; CHECK-NEXT: call void @llvm.assume
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: [[DOTPR:%.]] = load i32, i32 [[P]], align 4
				; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 [[DOTPR]], 36
				; CHECK-NEXT: br i1 [[TMP2]], label [[BB3]], label [[BB2:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 (...) @f2() #[[ATTR0]]
				; CHECK-NEXT: ret i32 0
				; CHECK: bb3:
				; CHECK-NEXT: [[RES_02:%.*]] = phi i32 [ 1, [[BB1_THREAD]] ], [ 0, [[BB1]] ]
				; CHECK-NEXT: ret i32 [[RES_02]]
				;
				; We should not thread if we've set the scan limit low.
				; SKIPLIM6-LABEL: @test1_type_test_assume
				; SKIPLIM6-NOT: thread
				; SKIPLIM6: ret
				entry:
				%0 = tail call i32 (...) @f1() nounwind ; <i32> [#uses=1]
				%1 = icmp eq i32 %0, 0 ; <i1> [#uses=1]
				br i1 %1, label %bb1, label %bb

				bb: ; preds = %entry
				store i32 42, i32* %P, align 4

				; Insert 3 bitcast / type test / assume sequences so that we would be above
				; a skip limit of 6.
				%vtablei8 = bitcast [3 x i8] %vtable to i8*
				%p = call i1 @llvm.type.test(i8* %vtablei8, metadata !"foo")
				tail call void @llvm.assume(i1 %p)
				%vtablei82 = bitcast [3 x i8] %vtable to i8*
				%p2 = call i1 @llvm.type.test(i8* %vtablei82, metadata !"foo")
				tail call void @llvm.assume(i1 %p2)
				%vtablei83 = bitcast [3 x i8] %vtable to i8*
				%p3 = call i1 @llvm.type.test(i8* %vtablei83, metadata !"foo")
				tail call void @llvm.assume(i1 %p3)

				br label %bb1

				bb1: ; preds = %entry, %bb
				%res.0 = phi i32 [ 1, %bb ], [ 0, %entry ] ; <i32> [#uses=2]
				%2 = load i32, i32* %P, align 4 ; <i32> [#uses=1]
				%3 = icmp sgt i32 %2, 36 ; <i1> [#uses=1]
				br i1 %3, label %bb3, label %bb2

				bb2: ; preds = %bb1
				%4 = tail call i32 (...) @f2() nounwind ; <i32> [#uses=0]
				ret i32 %res.0

				bb3: ; preds = %bb1
				ret i32 %res.0
				}

				; Make sure we don't infinite loop in the presence of an unreachable loop
				; when searching up through single preds for available pointer loads.
				define i32 @fn_SinglePredUnreachable(i1 %c2,i64* %P) {
				; CHECK-LABEL: @fn_SinglePredUnreachable(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[L1:%.]] = load i64, i64 [[P:%.*]], align 4
				; CHECK-NEXT: [[C:%.*]] = icmp eq i64 [[L1]], 0
				; CHECK-NEXT: br label [[COND2:%.*]]
				; CHECK: cond1:
				; CHECK-NEXT: br i1 false, label [[COND1:%.]], label %[[CONDPRESPLIT:.]]
				; CHECK: [[CONDPRESPLIT]]:
				; CHECK-NEXT: [[L2PR:%.]] = load i64, i64 [[P]], align 4
				; CHECK-NEXT: br label [[COND2:%.*]]
				; CHECK: cond2:
				; CHECK-NEXT: [[L2:%.*]] = phi i64 [ [[L2PR]], %[[CONDPRESPLIT]] ]
				; CHECK-NEXT: call void @fn2(i64 [[L2]])
				; CHECK-NEXT: [[C3:%.*]] = icmp eq i64 [[L2]], 0
				; CHECK-NEXT: br i1 %c3, label [[COND3:%.]], label [[END:%.]]
				; CHECK: cond3:
				; CHECK-NEXT: call void @fn3(i64 [[L2]])
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: ret i32 0
				;

				entry:
				%l1 = load i64, i64* %P
				%c = icmp eq i64 %l1, 0
				br label %cond2

				cond1:
				br i1 false, label %cond1, label %cond2

				cond2:
				%l2 = load i64, i64* %P
				call void @fn2(i64 %l2)
				%c3 = icmp eq i64 %l2, 0
				br i1 %c3, label %cond3, label %end

				cond3:
				call void @fn3(i64 %l2)
				br label %end

				end:
				ret i32 0
				}

	; CHECK: [[RNG4]] = !{i32 0, i32 1}			; CHECK: [[RNG4]] = !{i32 0, i32 1}

	!0 = !{!3, !3, i64 0}			!0 = !{!3, !3, i64 0}
	!1 = !{!"omnipotent char", !2}			!1 = !{!"omnipotent char", !2}
	!2 = !{!"Simple C/C++ TBAA"}			!2 = !{!"Simple C/C++ TBAA"}
	!3 = !{!"int", !1}			!3 = !{!"int", !1}
	!4 = !{ i32 0, i32 1 }			!4 = !{ i32 0, i32 1 }
	!5 = !{ i32 8, i32 10 }			!5 = !{ i32 8, i32 10 }
	!6 = !{!6}			!6 = !{!6}
	!7 = !{!7, !6}			!7 = !{!7, !6}
	!8 = !{!8, !6}			!8 = !{!8, !6}
	!9 = !{!7}			!9 = !{!7}
	!10 = !{!8}			!10 = !{!8}