This is an archive of the discontinued LLVM Phabricator instance.

In the future, can you please analyze why this is happening?
This is another example of a test case that is unrelated to the pattern you are trying to optimize. That means if this broke optimizing this tomorrow, we'd have to run spec again and analyze it.
We also really need to understand what optimizations need to be happening, not just randomly implement stuff in various passes because it increases SPEC

In D30352#688862, @dberlin wrote:

In the future, can you please analyze why this is happening?
This is another example of a test case that is unrelated to the pattern you are trying to optimize. That means if this broke optimizing this tomorrow, we'd have to run spec again and analyze it.
We also really need to understand what optimizations need to be happening, not just randomly implement stuff in various passes because it increases SPEC

Hi Daniel,

I didn't see this as a "new optimisation that needs doing", just as an "omitted analysis that got finally added", and obviously, if you have AA, then alias analysis will be done and the case where that matters will be better.

IIUC, the test is just to make sure it doesn't get removed or changed in the future.

The comment about SPEC improvement, from my POV was irrelevant, given that this is an obvious win.

Unless I'm missing something obvious here, of course... :)

--renato

In D30352#688924, @rengolin wrote:

In D30352#688862, @dberlin wrote:

In the future, can you please analyze why this is happening?
This is another example of a test case that is unrelated to the pattern you are trying to optimize. That means if this broke optimizing this tomorrow, we'd have to run spec again and analyze it.
We also really need to understand what optimizations need to be happening, not just randomly implement stuff in various passes because it increases SPEC

Hi Daniel,

I didn't see this as a "new optimisation that needs doing", just as an "omitted analysis that got finally added", and obviously, if you have AA, then alias analysis will be done and the case where that matters will be better.

IIUC, the test is just to make sure it doesn't get removed or changed in the future.

The comment about SPEC improvement, from my POV was irrelevant, given that this is an obvious win.

Unless I'm missing something obvious here, of course... :)

--renato

Hey Renato,
This is not the first patch in this line of extending the PRE in jump threading so I get that you may be missing context. :)
So far we've extended and then made this optimization more expensive without a single real test we care about, and every single testcase added is already caught by other passes. Rather than simply make every pass do everything, because it seems to improve spec score, I would really like to see us actually analyze and understand why it is improving spec and whether extending and making this optimization is really the right plan. I agree that otherwise this particular change is innocuous, but I asked for analysis and real cases for the last patches and haven't gotten them yet.

Obviously, if analysis shows this pass is the right thing to improve, we should do it. But I would like to see that happen before we keep going.

My original intension for this change was to handle a missing opportunity that should be handled in here so that it can obviously open up more threading, which I believe match with the purpose of SimplifyPartiallyRedundantLoad() in jump thread. Regarding why such pattern is exposed in this pass, I can add another test case by reducing from the real case like D29571 for my previous patch. Would it be acceptable with you?

In D30352#688933, @dberlin wrote:

So far we've extended and then made this optimization more expensive without a single real test we care about, and every single testcase added is already caught by other passes. Rather than simply make every pass do everything, because it seems to improve spec score, I would really like to see us actually analyze and understand why it is improving spec and whether extending and making this optimization is really the right plan. I agree that otherwise this particular change is innocuous, but I asked for analysis and real cases for the last patches and haven't gotten them yet.

Right, it seems I have jumped the gun on what I thought it was just an obvious patch. Apologies.

Obviously, if analysis shows this pass is the right thing to improve, we should do it. But I would like to see that happen before we keep going.

I totally agree. We're already slower than GCC for usually less performance. I agree having a better analysis of at least execution time worth doing.

Though, I wonder how much this one patch would fare (and be wrongly picked upon) across all previous patches. Not that this should stop any further investigation before commit, but that we maybe should look at the bigger picture (if this has been happening consistently) and plot a graph with relative performance vs. compile time for a particular set of patches...

Makes sense?

--renato

Added another test reduced from spec2000/crafty. This test shows the case caught by this change in O3. I didn't see why other passes before jump thread didn't handle this. Please let me know if you believe this need to be handled before jump thread in other pass.

As I mentioned before, the main purpose of SimplifyPartiallyRedundantLoad() is to encourage jump threading opportunities, and it is run interlaced with other jump threading tasks. Based on that, improving SimplifyPartiallyRedundantLoad() would be profitable even though the other pass can do the same (or even better) job in other places. Hopefully, the test added shows the case caught by this change properly as well as expose opportunities for other passes.

Herald added a subscriber: aemerson. · View Herald TranscriptMar 1 2017, 2:28 PM

junbuml added a reviewer: trentxintong.Mar 2 2017, 7:30 AM

junbuml mentioned this in D30543: [JumpThreading] Perform phi-translation in SimplifyPartiallyRedundantLoad..Mar 2 2017, 12:34 PM

Minor update in comment. Remove mentioning about crafty.

Just kindly ping. Please let me know about any further investigation or comments.

Thanks, with this testcase, i think this change is fine.
This is something we are unlikely to want to do elsewhere ATM.

• dberlin accepted this revision.Mar 7 2017, 11:50 AM

Closed by commit rL297284: [JumpThread] Use AA in SimplifyPartiallyRedundantLoad() (authored by junbuml). · Explain WhyMar 8 2017, 7:34 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Scalar/

JumpThreading.h

5 lines

lib/

Transforms/

Scalar/

JumpThreading.cpp

31 lines

test/

Transforms/

JumpThreading/

thread-loads.ll

111 lines

Diff 91013

llvm/trunk/include/llvm/Transforms/Scalar/JumpThreading.h

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H		#ifndef LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H
#define LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H		#define LLVM_TRANSFORMS_SCALAR_JUMPTHREADING_H

#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BlockFrequencyInfoImpl.h"		#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/LazyValueInfo.h"		#include "llvm/Analysis/LazyValueInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
Show All 27 Lines
/// if (X < 3) {		/// if (X < 3) {
///		///
/// In this case, the unconditional branch at the end of the first if can be		/// In this case, the unconditional branch at the end of the first if can be
/// revectored to the false side of the second if.		/// revectored to the false side of the second if.
///		///
class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> {		class JumpThreadingPass : public PassInfoMixin<JumpThreadingPass> {
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
LazyValueInfo *LVI;		LazyValueInfo *LVI;
		AliasAnalysis *AA;
std::unique_ptr<BlockFrequencyInfo> BFI;		std::unique_ptr<BlockFrequencyInfo> BFI;
std::unique_ptr<BranchProbabilityInfo> BPI;		std::unique_ptr<BranchProbabilityInfo> BPI;
bool HasProfileData = false;		bool HasProfileData = false;
bool HasGuards = false;		bool HasGuards = false;
#ifdef NDEBUG		#ifdef NDEBUG
SmallPtrSet<const BasicBlock *, 16> LoopHeaders;		SmallPtrSet<const BasicBlock *, 16> LoopHeaders;
#else		#else
SmallSet<AssertingVH<const BasicBlock>, 16> LoopHeaders;		SmallSet<AssertingVH<const BasicBlock>, 16> LoopHeaders;
Show All 14 Lines	struct RecursionSetRemover {
~RecursionSetRemover() { TheSet.erase(ThePair); }		~RecursionSetRemover() { TheSet.erase(ThePair); }
};		};

public:		public:
JumpThreadingPass(int T = -1);		JumpThreadingPass(int T = -1);

// Glue for old PM.		// Glue for old PM.
bool runImpl(Function &F, TargetLibraryInfo TLI_, LazyValueInfo LVI_,		bool runImpl(Function &F, TargetLibraryInfo TLI_, LazyValueInfo LVI_,
bool HasProfileData_, std::unique_ptr<BlockFrequencyInfo> BFI_,		AliasAnalysis *AA_, bool HasProfileData_,
		std::unique_ptr<BlockFrequencyInfo> BFI_,
std::unique_ptr<BranchProbabilityInfo> BPI_);		std::unique_ptr<BranchProbabilityInfo> BPI_);

PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

void releaseMemory() {		void releaseMemory() {
BFI.reset();		BFI.reset();
BPI.reset();		BPI.reset();
}		}
Show All 40 Lines

llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/JumpThreading.h"		#include "llvm/Transforms/Scalar/JumpThreading.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/BlockFrequencyInfoImpl.h"		#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
static char ID; // Pass identification		static char ID; // Pass identification
JumpThreading(int T = -1) : FunctionPass(ID), Impl(T) {		JumpThreading(int T = -1) : FunctionPass(ID), Impl(T) {
initializeJumpThreadingPass(*PassRegistry::getPassRegistry());		initializeJumpThreadingPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<LazyValueInfoWrapperPass>();		AU.addRequired<LazyValueInfoWrapperPass>();
AU.addPreserved<LazyValueInfoWrapperPass>();		AU.addPreserved<LazyValueInfoWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
}		}

void releaseMemory() override { Impl.releaseMemory(); }		void releaseMemory() override { Impl.releaseMemory(); }
};		};
}		}

char JumpThreading::ID = 0;		char JumpThreading::ID = 0;
INITIALIZE_PASS_BEGIN(JumpThreading, "jump-threading",		INITIALIZE_PASS_BEGIN(JumpThreading, "jump-threading",
"Jump Threading", false, false)		"Jump Threading", false, false)
INITIALIZE_PASS_DEPENDENCY(LazyValueInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LazyValueInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_END(JumpThreading, "jump-threading",		INITIALIZE_PASS_END(JumpThreading, "jump-threading",
"Jump Threading", false, false)		"Jump Threading", false, false)

// Public interface to the Jump Threading pass		// Public interface to the Jump Threading pass
FunctionPass *llvm::createJumpThreadingPass(int Threshold) { return new JumpThreading(Threshold); }		FunctionPass *llvm::createJumpThreadingPass(int Threshold) { return new JumpThreading(Threshold); }

JumpThreadingPass::JumpThreadingPass(int T) {		JumpThreadingPass::JumpThreadingPass(int T) {
BBDupThreshold = (T == -1) ? BBDuplicateThreshold : unsigned(T);		BBDupThreshold = (T == -1) ? BBDuplicateThreshold : unsigned(T);
}		}

/// runOnFunction - Top level algorithm.		/// runOnFunction - Top level algorithm.
///		///
bool JumpThreading::runOnFunction(Function &F) {		bool JumpThreading::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
auto TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
auto LVI = &getAnalysis<LazyValueInfoWrapperPass>().getLVI();		auto LVI = &getAnalysis<LazyValueInfoWrapperPass>().getLVI();
		auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
std::unique_ptr<BlockFrequencyInfo> BFI;		std::unique_ptr<BlockFrequencyInfo> BFI;
std::unique_ptr<BranchProbabilityInfo> BPI;		std::unique_ptr<BranchProbabilityInfo> BPI;
bool HasProfileData = F.getEntryCount().hasValue();		bool HasProfileData = F.getEntryCount().hasValue();
if (HasProfileData) {		if (HasProfileData) {
LoopInfo LI{DominatorTree(F)};		LoopInfo LI{DominatorTree(F)};
BPI.reset(new BranchProbabilityInfo(F, LI));		BPI.reset(new BranchProbabilityInfo(F, LI));
BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));		BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));
}		}
return Impl.runImpl(F, TLI, LVI, HasProfileData, std::move(BFI),
		return Impl.runImpl(F, TLI, LVI, AA, HasProfileData, std::move(BFI),
std::move(BPI));		std::move(BPI));
}		}

PreservedAnalyses JumpThreadingPass::run(Function &F,		PreservedAnalyses JumpThreadingPass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {

auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &LVI = AM.getResult<LazyValueAnalysis>(F);		auto &LVI = AM.getResult<LazyValueAnalysis>(F);
		auto &AA = AM.getResult<AAManager>(F);

std::unique_ptr<BlockFrequencyInfo> BFI;		std::unique_ptr<BlockFrequencyInfo> BFI;
std::unique_ptr<BranchProbabilityInfo> BPI;		std::unique_ptr<BranchProbabilityInfo> BPI;
bool HasProfileData = F.getEntryCount().hasValue();		bool HasProfileData = F.getEntryCount().hasValue();
if (HasProfileData) {		if (HasProfileData) {
LoopInfo LI{DominatorTree(F)};		LoopInfo LI{DominatorTree(F)};
BPI.reset(new BranchProbabilityInfo(F, LI));		BPI.reset(new BranchProbabilityInfo(F, LI));
BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));		BFI.reset(new BlockFrequencyInfo(F, *BPI, LI));
}		}
bool Changed =
runImpl(F, &TLI, &LVI, HasProfileData, std::move(BFI), std::move(BPI));		bool Changed = runImpl(F, &TLI, &LVI, &AA, HasProfileData, std::move(BFI),
		std::move(BPI));

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
}		}

bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_,		bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_,
LazyValueInfo *LVI_, bool HasProfileData_,		LazyValueInfo LVI_, AliasAnalysis AA_,
		bool HasProfileData_,
std::unique_ptr<BlockFrequencyInfo> BFI_,		std::unique_ptr<BlockFrequencyInfo> BFI_,
std::unique_ptr<BranchProbabilityInfo> BPI_) {		std::unique_ptr<BranchProbabilityInfo> BPI_) {

DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n");		DEBUG(dbgs() << "Jump threading on function '" << F.getName() << "'\n");
TLI = TLI_;		TLI = TLI_;
LVI = LVI_;		LVI = LVI_;
		AA = AA_;
BFI.reset();		BFI.reset();
BPI.reset();		BPI.reset();
// When profile data is available, we need to update edge weights after		// When profile data is available, we need to update edge weights after
// successful jump threading, which requires both BPI and BFI being available.		// successful jump threading, which requires both BPI and BFI being available.
HasProfileData = HasProfileData_;		HasProfileData = HasProfileData_;
auto *GuardDecl = F.getParent()->getFunction(		auto *GuardDecl = F.getParent()->getFunction(
Intrinsic::getName(Intrinsic::experimental_guard));		Intrinsic::getName(Intrinsic::experimental_guard));
HasGuards = GuardDecl && !GuardDecl->use_empty();		HasGuards = GuardDecl && !GuardDecl->use_empty();
▲ Show 20 Lines • Show All 771 Lines • ▼ Show 20 Lines	bool JumpThreadingPass::SimplifyPartiallyRedundantLoad(LoadInst *LI) {
if (Instruction *PtrOp = dyn_cast<Instruction>(LoadedPtr))		if (Instruction *PtrOp = dyn_cast<Instruction>(LoadedPtr))
if (PtrOp->getParent() == LoadBB)		if (PtrOp->getParent() == LoadBB)
return false;		return false;

// Scan a few instructions up from the load, to see if it is obviously live at		// Scan a few instructions up from the load, to see if it is obviously live at
// the entry to its block.		// the entry to its block.
BasicBlock::iterator BBIt(LI);		BasicBlock::iterator BBIt(LI);
bool IsLoadCSE;		bool IsLoadCSE;
if (Value *AvailableVal =		if (Value *AvailableVal = FindAvailableLoadedValue(
FindAvailableLoadedValue(LI, LoadBB, BBIt, DefMaxInstsToScan, nullptr, &IsLoadCSE)) {		LI, LoadBB, BBIt, DefMaxInstsToScan, AA, &IsLoadCSE)) {
// If the value of the load is locally available within the block, just use		// If the value of the load is locally available within the block, just use
// it. This frequently occurs for reg2mem'd allocas.		// it. This frequently occurs for reg2mem'd allocas.

if (IsLoadCSE) {		if (IsLoadCSE) {
LoadInst *NLI = cast<LoadInst>(AvailableVal);		LoadInst *NLI = cast<LoadInst>(AvailableVal);
combineMetadataForCSE(NLI, LI);		combineMetadataForCSE(NLI, LI);
};		};

Show All 30 Lines	bool JumpThreadingPass::SimplifyPartiallyRedundantLoad(LoadInst *LI) {
for (BasicBlock *PredBB : predecessors(LoadBB)) {		for (BasicBlock *PredBB : predecessors(LoadBB)) {
// If we already scanned this predecessor, skip it.		// If we already scanned this predecessor, skip it.
if (!PredsScanned.insert(PredBB).second)		if (!PredsScanned.insert(PredBB).second)
continue;		continue;

// Scan the predecessor to see if the value is available in the pred.		// Scan the predecessor to see if the value is available in the pred.
BBIt = PredBB->end();		BBIt = PredBB->end();
unsigned NumScanedInst = 0;		unsigned NumScanedInst = 0;
Value *PredAvailable =		Value *PredAvailable = FindAvailableLoadedValue(
FindAvailableLoadedValue(LI, PredBB, BBIt, DefMaxInstsToScan, nullptr,		LI, PredBB, BBIt, DefMaxInstsToScan, AA, &IsLoadCSE, &NumScanedInst);
&IsLoadCSE, &NumScanedInst);

// If PredBB has a single predecessor, continue scanning through the single		// If PredBB has a single predecessor, continue scanning through the single
// predecessor.		// predecessor.
BasicBlock *SinglePredBB = PredBB;		BasicBlock *SinglePredBB = PredBB;
while (!PredAvailable && SinglePredBB && BBIt == SinglePredBB->begin() &&		while (!PredAvailable && SinglePredBB && BBIt == SinglePredBB->begin() &&
NumScanedInst < DefMaxInstsToScan) {		NumScanedInst < DefMaxInstsToScan) {
SinglePredBB = SinglePredBB->getSinglePredecessor();		SinglePredBB = SinglePredBB->getSinglePredecessor();
if (SinglePredBB) {		if (SinglePredBB) {
BBIt = SinglePredBB->end();		BBIt = SinglePredBB->end();
PredAvailable = FindAvailableLoadedValue(		PredAvailable = FindAvailableLoadedValue(
LI, SinglePredBB, BBIt, (DefMaxInstsToScan - NumScanedInst),		LI, SinglePredBB, BBIt, (DefMaxInstsToScan - NumScanedInst), AA,
nullptr, &IsLoadCSE, &NumScanedInst);		&IsLoadCSE, &NumScanedInst);
}		}
}		}

if (!PredAvailable) {		if (!PredAvailable) {
OneUnavailablePred = PredBB;		OneUnavailablePred = PredBB;
continue;		continue;
}		}

▲ Show 20 Lines • Show All 1,142 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/JumpThreading/thread-loads.ll

	; RUN: opt < %s -jump-threading -S \| FileCheck %s			; RUN: opt < %s -jump-threading -S \| FileCheck %s
	; RUN: opt < %s -passes=jump-threading -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=jump-threading -S \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
	target triple = "i386-apple-darwin7"			target triple = "i386-apple-darwin7"

	; Test that we can thread through the block with the partially redundant load (%2).			; Test that we can thread through the block with the partially redundant load (%2).
	; rdar://6402033			; rdar://6402033
	define i32 @test1(i32* %P) nounwind {			define i32 @test1(i32* %P) nounwind {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	ret1:			ret1:
	ret void			ret void

	ret2:			ret2:
	%xxx = tail call i32 (...) @f1() nounwind			%xxx = tail call i32 (...) @f1() nounwind
	ret void			ret void
	}			}

				define i32 @fn_noalias(i1 %c2,i64* noalias %P, i64* noalias %P2) {
				; CHECK-LABEL: @fn_noalias
				; CHECK-LABEL: cond1:
				; CHECK: %[[LD1:.]] = load i64, i64 %P
				; CHECK: br i1 %c, label %[[THREAD:.*]], label %end
				; CHECK-LABEL: cond2:
				; CHECK: %[[LD2:.]] = load i64, i64 %P
				; CHECK-LABEL: cond3:
				; CHECK: %[[PHI:.*]] = phi i64 [ %[[LD1]], %[[THREAD]] ], [ %[[LD2]], %cond2 ]
				; CHECK: call void @fn3(i64 %[[PHI]])
				entry:
				br i1 %c2, label %cond2, label %cond1

				cond1:
				%l1 = load i64, i64* %P
				store i64 42, i64* %P2
				%c = icmp eq i64 %l1, 0
				br i1 %c, label %cond2, label %end

				cond2:
				%l2 = load i64, i64* %P
				call void @fn2(i64 %l2)
				%c3 = icmp eq i64 %l2, 0
				br i1 %c3, label %cond3, label %end

				cond3:
				call void @fn3(i64 %l2)
				br label %end

				end:
				ret i32 0
				}

				; This tests if we can thread from %sw.bb.i to %do.body.preheader.i67 through
				; %sw.bb21.i. To make this happen, %l2 should be detected as a partically
				; redundant load with %l3 across the store to %phase in %sw.bb21.i.

				%struct.NEXT_MOVE = type { i32, i32, i32* }
				@hash_move = unnamed_addr global [65 x i32] zeroinitializer, align 4
				@current_move = internal global [65 x i32] zeroinitializer, align 4
				@last = internal unnamed_addr global [65 x i32*] zeroinitializer, align 8
				@next_status = internal unnamed_addr global [65 x %struct.NEXT_MOVE] zeroinitializer, align 8
				define fastcc i32 @Search(i64 %idxprom.i, i64 %idxprom.i89, i32 %c) {
				; CHECK-LABEL: @Search
				; CHECK-LABEL: sw.bb.i:
				; CHECK: %[[LD1:.]] = load i32, i32 %arrayidx185, align 4
				; CHECK: %[[C1:.*]] = icmp eq i32 %[[LD1]], 0
				; CHECK: br i1 %[[C1]], label %sw.bb21.i.thread, label %if.then.i64
				; CHECK-LABEL: sw.bb21.i.thread:
				; CHECK: br label %[[THREAD_TO:.*]]
				; CHECK-LABEL: sw.bb21.i:
				; CHECK: %[[LD2:.]] = load i32, i32 %arrayidx185, align 4
				; CHECK: %[[C2:.*]] = icmp eq i32 %[[LD2]], 0
				; CHECK:br i1 %[[C2]], label %[[THREAD_TO]], label %cleanup
				entry:
				%arrayidx185 = getelementptr inbounds [65 x i32], [65 x i32]* @hash_move, i64 0, i64 %idxprom.i
				%arrayidx307 = getelementptr inbounds [65 x i32], [65 x i32]* @current_move, i64 0, i64 %idxprom.i
				%arrayidx89 = getelementptr inbounds [65 x i32], [65 x i32]* @last, i64 0, i64 %idxprom.i
				%phase = getelementptr inbounds [65 x %struct.NEXT_MOVE], [65 x %struct.NEXT_MOVE]* @next_status, i64 0, i64 %idxprom.i, i32 0
				br label %cond.true282

				cond.true282:
				switch i32 %c, label %sw.default.i [
				i32 1, label %sw.bb.i
				i32 0, label %sw.bb21.i
				]

				sw.default.i:
				br label %cleanup

				sw.bb.i:
				%call.i62 = call fastcc i32* @GenerateCheckEvasions()
				store i32* %call.i62, i32** %arrayidx89, align 8
				%l2 = load i32, i32* %arrayidx185, align 4
				%tobool.i63 = icmp eq i32 %l2, 0
				br i1 %tobool.i63, label %sw.bb21.i, label %if.then.i64

				if.then.i64: ; preds = %sw.bb.i
				store i32 7, i32* %phase, align 8
				store i32 %l2, i32* %arrayidx307, align 4
				%call16.i = call fastcc i32 @ValidMove(i32 %l2)
				%tobool17.i = icmp eq i32 %call16.i, 0
				br i1 %tobool17.i, label %if.else.i65, label %cleanup

				if.else.i65:
				call void @f65()
				br label %sw.bb21.i

				sw.bb21.i:
				store i32 10, i32* %phase, align 8
				%l3= load i32, i32* %arrayidx185, align 4
				%tobool27.i = icmp eq i32 %l3, 0
				br i1 %tobool27.i, label %do.body.preheader.i67, label %cleanup

				do.body.preheader.i67:
				call void @f67()
				ret i32 67

				cleanup:
				call void @Cleanup()
				ret i32 0
				}

				declare fastcc i32* @GenerateCheckEvasions()
				declare fastcc i32 @ValidMove(i32 %move)
				declare void @f67()
				declare void @Cleanup()
				declare void @f65()

	define i32 @fn_SinglePred(i1 %c2,i64* %P) {			define i32 @fn_SinglePred(i1 %c2,i64* %P) {
	; CHECK-LABEL: @fn_SinglePred			; CHECK-LABEL: @fn_SinglePred
	; CHECK-LABEL: entry:			; CHECK-LABEL: entry:
	; CHECK: %[[L1:.]] = load i64, i64 %P			; CHECK: %[[L1:.]] = load i64, i64 %P
	; CHECK: br i1 %c, label %cond3, label %cond1			; CHECK: br i1 %c, label %cond3, label %cond1
	; CHECK-LABEL: cond2:			; CHECK-LABEL: cond2:
	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK: %[[PHI:.*]] = phi i64 [ %[[L1]], %cond1 ]			; CHECK: %[[PHI:.*]] = phi i64 [ %[[L1]], %cond1 ]
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[JumpThread] Use AA in SimplifyPartiallyRedundantLoad()ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91013

llvm/trunk/include/llvm/Transforms/Scalar/JumpThreading.h

llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp

llvm/trunk/test/Transforms/JumpThreading/thread-loads.ll

[JumpThread] Use AA in SimplifyPartiallyRedundantLoad()
ClosedPublic