This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineInternal.h
-
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/2
sink_across_call.ll

Differential D47816

[InstCombine] Stop sinking instructions across function call.
Needs ReviewPublic

Authored by renlin on Jun 6 2018, 3:18 AM.

Download Raw Diff

Details

Reviewers

hfinkel

Summary

instcombine pass will try to sink an instruction to the place where the value
is used when the CFG is very simple. However, this doesn't take the
function call into account.

While the sinking will reduce the live range of the result, it will increase the live range of related operands.
It is not clear whether overall, it is beneficial or not.
instcombine might not be the best place to make that kind of decision, , especially across function calls.
Here, instruction sinking is prohibit across function calls.

TargetTransformInfor::isLoweredToCall is used to decided whether the function will be become a really call or simplified into a different form.

Diff Detail

Event Timeline

renlin created this revision.Jun 6 2018, 3:18 AM

Herald added subscribers: llvm-commits, eraman. · View Herald TranscriptJun 6 2018, 3:18 AM

I don't see how inlining is relevant here. Sinking across function call boundaries is generally also bad (because increasing register pressure across calls tends to increase spilling). I feel like the test you want here is to call TTI::isLoweredToCall.

I agree that, generally it is not a good idea to sink across function call. I am a little bit consecutive on this change.

Thank you for your suggestion. I checked the document about TTI::isLoweredToCall.
IIUIC, it seems to me this backend information is still not strong enough.

If the backend returns false for a function, it will get expanded by some passes somewhere in the optimization pipeline. This might complicated the CFG, similar as the inlined function cases.
If the backend returns true for a function, it is an real function call. This will extend the live-range of related variables.

Should we just stop sinking instructions across any type of function call?
Normally, InstCombine will be called multiple time, if a function got expanded, we still have opportunities to sink instructions when it is beneficial.

In D47816#1123630, @renlin wrote:

I agree that, generally it is not a good idea to sink across function call. I am a little bit consecutive on this change.

Thank you for your suggestion. I checked the document about TTI::isLoweredToCall.
IIUIC, it seems to me this backend information is still not strong enough.

If the backend returns false for a function, it will get expanded by some passes somewhere in the optimization pipeline. This might complicated the CFG, similar as the inlined function cases.

In theory, yes. In practice, it shouldn't (unless the backend generates a custom expansion including branches, but in such cases, the costs are generally low). The purpose of the call, however, is to inform the mid-end optimizer whether or not it should view the call as actually having call-like overheads. This applies to information-only intrinsics, calls that get lowered to single instructions (e.g., sin/cos on some targets).

If the backend returns true for a function, it is an real function call. This will extend the live-range of related variables.

Should we just stop sinking instructions across any type of function call?

No. You would at least need to exclude a whole list of intrinsics. isLoweredToCall will do this for you, and also handle a number of useful cases where the backend expands the call into something cheap. Please use isLoweredToCall, and if we find cases where this turns out to be problematic, we'll figure out how to address them.

Normally, InstCombine will be called multiple time, if a function got expanded, we still have opportunities to sink instructions when it is beneficial.

Update to use TargetTransformInfor::isLoweredToCall
Simplify the logic, remove the back-backward search from the user.

renlin updated this revision to Diff 150178.Jun 6 2018, 12:22 PM

Generally , the sinking will increase the live range of variables

This isn't a good description. It increases the live range of the operands, and reduces the live range of the result. Which one matters more isn't obvious. And in some cases, the cost of the operation itself might be more important than the cost of spilling.

Granted, instcombine might not be the best place to make that kind of decision; it's very hard to estimate register pressure before isel.

Ka-Ka added a subscriber: Ka-Ka.Jun 6 2018, 2:47 PM

renlin retitled this revision from [InstCombine] Don't sink instructions across inlined function call. to [InstCombine] Stop sinking instructions across inlined function call..Jun 7 2018, 4:20 AM

renlin edited the summary of this revision. (Show Details)

renlin retitled this revision from [InstCombine] Stop sinking instructions across inlined function call. to [InstCombine] Stop sinking instructions across function call..

remove a space

Ping ~

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

test/Transforms/InstCombine/sink_across_call.ll
2	instcombine tests use `utils/update_test_checks.py`

Use utils./update_test_checks.py for test case checking.

renlin marked an inline comment as done.Jun 19 2018, 4:37 AM

renlin added inline comments.

test/Transforms/InstCombine/sink_across_call.ll
2	thanks, updated

In D47816#1136346, @lebedev.ri wrote:

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

In one of the test case I have, the sinking of a load instruction (together with the operands used) across a inlined function increases the number of instructions generated.
The inline pass run after instCombine pass. If I reorder the passes to have inline pass before instCombine, this will make the CFG complex. And instCombine will give up the sinking.

In D47816#1136426, @renlin wrote:

In D47816#1136346, @lebedev.ri wrote:

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

In one of the test case I have, the sinking of a load instruction (together with the operands used) across a inlined function increases the number of instructions generated.

Instructions as in IR instructions? Or the instructions in the final assembly?

The inline pass run after instCombine pass. If I reorder the passes to have inline pass before instCombine, this will make the CFG complex. And instCombine will give up the sinking.

In D47816#1136432, @lebedev.ri wrote:

In D47816#1136426, @renlin wrote:

In D47816#1136346, @lebedev.ri wrote:

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

In one of the test case I have, the sinking of a load instruction (together with the operands used) across a inlined function increases the number of instructions generated.

Instructions as in IR instructions? Or the instructions in the final assembly?

Final machine assembly.

In D47816#1136433, @renlin wrote:

In D47816#1136432, @lebedev.ri wrote:

In D47816#1136426, @renlin wrote:

In D47816#1136346, @lebedev.ri wrote:

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

In one of the test case I have, the sinking of a load instruction (together with the operands used) across a inlined function increases the number of instructions generated.

Instructions as in IR instructions? Or the instructions in the final assembly?

Final machine assembly.

Perhaps you can add an example test?
There seems to be some precedent for that:

llvm/test/CodeGen/X86$ grep -r instcombine | grep RUN
2009-03-23-i80-fp80.ll:; RUN: opt < %s -instcombine -S | grep 302245289961712575840256
2009-03-23-i80-fp80.ll:; RUN: opt < %s -instcombine -S | grep K40018000000000000000
vec_udiv_to_shift.ll:; RUN: opt < %s -instcombine -S | FileCheck %s
vec_ins_extract.ll:; RUN: opt < %s -sroa -instcombine | \
no-plt-libcalls.ll:; RUN: opt < %s -instcombine -S | FileCheck %s

But i must say, right now this sounds the problem is elsewhere,
and this change only papers over it, by pessimizing all other cases.

In D47816#1136439, @lebedev.ri wrote:
In D47816#1136433, @renlin wrote:

In D47816#1136432, @lebedev.ri wrote:

In D47816#1136426, @renlin wrote:

In D47816#1136346, @lebedev.ri wrote:

I'm seeing what, but not why.
What is the motivation behind this change?
What problem is it trying to solve?

In one of the test case I have, the sinking of a load instruction (together with the operands used) across a inlined function increases the number of instructions generated.

Instructions as in IR instructions? Or the instructions in the final assembly?

Final machine assembly.

Perhaps you can add an example test?
There seems to be some precedent for that:
llvm/test/CodeGen/X86$ grep -r instcombine | grep RUN
2009-03-23-i80-fp80.ll:; RUN: opt < %s -instcombine -S | grep 302245289961712575840256
2009-03-23-i80-fp80.ll:; RUN: opt < %s -instcombine -S | grep K40018000000000000000
vec_udiv_to_shift.ll:; RUN: opt < %s -instcombine -S | FileCheck %s
vec_ins_extract.ll:; RUN: opt < %s -sroa -instcombine | \
no-plt-libcalls.ll:; RUN: opt < %s -instcombine -S | FileCheck %s
But i must say, right now this sounds the problem is elsewhere,
and this change only papers over it, by pessimizing all other cases.

Given the following test case (similar as the one in my very initial patch)

define i64 @inline_func(i64* %in, i32 %radius) readonly noinline {
;;define i64 @inline_func(i64* %in, i32 %radius) readonly alwaysinline {
entry:
  %cmp12 = icmp sgt i32 %radius, 0
  br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:
  %wide.trip.count = zext i32 %radius to i64
  br label %for.body

for.cond.cleanup:
  %max_val.0.lcssa = phi i64 [ 0, %entry ], [ %max_val.0., %for.body ]
  ret i64 %max_val.0.lcssa

for.body:
  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  %max_val.013 = phi i64 [ 0, %for.body.preheader ], [ %max_val.0., %for.body ]
  %arrayidx = getelementptr inbounds i64, i64* %in, i64 %indvars.iv
  %0 = load i64, i64* %arrayidx, align 4
  %cmp1 = icmp eq i64 %max_val.013, %0
  %max_val.0. = select i1 %cmp1, i64 %max_val.013, i64 %0
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

define void @test1(i64* nocapture %out, i64* %in, i32 %w, i32 %n)  {
entry:
  %idxprom = sext i32 %w to i64
  %arrayidx = getelementptr inbounds i64, i64* %in, i64 %idxprom
  %0 = load i64, i64* %arrayidx, align 4
  %call = tail call i64 @inline_func(i64* %in, i32 %n)
  %cmp = icmp eq i64 %call, -1
  br i1 %cmp, label %if.then, label %if.end

if.then: 
  %cmp1 = icmp eq i64 %0, %call
  %conv = sext i1 %cmp1 to i64
  store i64 %conv, i64* %out, align 4
  br label %if.end

if.end: 
  ret void
}

Compiled with Clang with -O2 flag for aarch64, without the change here, there is more core register save/restores in the stack frame and one more register move.
It is needed to hold the base pointer and offset in callee save registers to make sure the value is preserved across function call. And register move instructions are need to move them from argument registers to callee save registers.

With the change, only the register holding the load value will be saved in the prologue.
Even mark the function as alwaysinline, one more register move instruction is generated to hold the pointer of in.
Because, the code in inline_func will use and change the pointer.

I didn't put it as code-generation test because I think it might be too fragile to check the exact code-gen.

I agree, optimization passes are related.
I might missed something, but I didn't see any other passes that are obviously doing things wrong or not trying hard to optimize code for this particular case.

Ping~

labrinea added a subscriber: labrinea.Jul 20 2018, 9:05 AM

renlin added a reviewer: hfinkel.Aug 8 2018, 2:28 AM

renlin removed a subscriber: hfinkel.

@hfinkel , I have updated the patch as you suggested, and in the latest comment, I showed an artificial test case which could be improved by the change.
Do you think you would be able to review this? thanks!

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineInternal.h

5 lines

InstructionCombining.cpp

34 lines

test/

Transforms/

InstCombine/

sink_across_call.ll

42 lines

Diff 151895

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	private:
const DataLayout &DL;		const DataLayout &DL;
const SimplifyQuery SQ;		const SimplifyQuery SQ;
OptimizationRemarkEmitter &ORE;		OptimizationRemarkEmitter &ORE;

// Optional analyses. When non-null, these can both be used to do better		// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.		// combining and will be updated to reflect any changes.
LoopInfo *LI;		LoopInfo *LI;

		TargetTransformInfo &TTI;
bool MadeIRChange = false;		bool MadeIRChange = false;

public:		public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,		InstCombiner(InstCombineWorklist &Worklist, BuilderTy &Builder,
bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,		bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,		AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,
OptimizationRemarkEmitter &ORE, const DataLayout &DL,		OptimizationRemarkEmitter &ORE, const DataLayout &DL,
LoopInfo *LI)		LoopInfo *LI, TargetTransformInfo &TTI)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),		: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),		ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),
DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), LI(LI) {}		DL(DL), SQ(DL, &TLI, &DT, &AC), ORE(ORE), LI(LI), TTI(TTI) {}

/// Run the combiner over the entire worklist until it is empty.		/// Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache &getAssumptionCache() const { return AC; }		AssumptionCache &getAssumptionCache() const { return AC; }

▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	#include "llvm/Analysis/EHPersonalities.h"			#include "llvm/Analysis/EHPersonalities.h"
	#include "llvm/Analysis/GlobalsModRef.h"			#include "llvm/Analysis/GlobalsModRef.h"
	#include "llvm/Analysis/InstructionSimplify.h"			#include "llvm/Analysis/InstructionSimplify.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/MemoryBuiltins.h"			#include "llvm/Analysis/MemoryBuiltins.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"			#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/TargetFolder.h"			#include "llvm/Analysis/TargetFolder.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Analysis/Utils/Local.h"			#include "llvm/Analysis/Utils/Local.h"
	#include "llvm/Analysis/ValueTracking.h"			#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/CFG.h"			#include "llvm/IR/CFG.h"
	#include "llvm/IR/Constant.h"			#include "llvm/IR/Constant.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
	#include "llvm/IR/DIBuilder.h"			#include "llvm/IR/DIBuilder.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	▲ Show 20 Lines • Show All 1,316 Lines • ▼ Show 20 Lines
	}			}
	}			}

	// See if we can trivially sink this instruction to a successor basic block.			// See if we can trivially sink this instruction to a successor basic block.
	if (I->hasOneUse()) {			if (I->hasOneUse()) {
	BasicBlock *BB = I->getParent();			BasicBlock *BB = I->getParent();
	Instruction UserInst = cast<Instruction>(I->user_begin());			Instruction UserInst = cast<Instruction>(I->user_begin());
	BasicBlock *UserParent;			BasicBlock *UserParent;
				bool SinkCandidate = true;

				// See if there is a function call in the BB after the value
				// definition. In this case, don't sink the instruction.
				for (BasicBlock::iterator Scan = I->getIterator(), E = BB->end();
				Scan != E; ++Scan) {
				if (CallInst *CI = dyn_cast<CallInst>(Scan)) {
				Function *F = CI->getCalledFunction ();
				if (F && TTI.isLoweredToCall(F)) {
				SinkCandidate = false;
				break;
				}
				}
				}

	// Get the block the use occurs in.			// Get the block the use occurs in.
	if (PHINode *PN = dyn_cast<PHINode>(UserInst))			if (PHINode *PN = dyn_cast<PHINode>(UserInst))
	UserParent = PN->getIncomingBlock(*I->use_begin());			UserParent = PN->getIncomingBlock(*I->use_begin());
	else			else
	UserParent = UserInst->getParent();			UserParent = UserInst->getParent();

	if (UserParent != BB) {			if (UserParent != BB && SinkCandidate) {
	bool UserIsSuccessor = false;			bool UserIsSuccessor = false;
	// See if the user is one of our successors.			// See if the user is one of our successors.
	for (succ_iterator SI = succ_begin(BB), E = succ_end(BB); SI != E; ++SI)			for (succ_iterator SI = succ_begin(BB), E = succ_end(BB); SI != E; ++SI)
	if (*SI == UserParent) {			if (*SI == UserParent) {
	UserIsSuccessor = true;			UserIsSuccessor = true;
	break;			break;
	}			}

	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	}			}

	return MadeIRChange;			return MadeIRChange;
	}			}

	static bool combineInstructionsOverFunction(			static bool combineInstructionsOverFunction(
	Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,			Function &F, InstCombineWorklist &Worklist, AliasAnalysis *AA,
	AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,			AssumptionCache &AC, TargetLibraryInfo &TLI, DominatorTree &DT,
	OptimizationRemarkEmitter &ORE, bool ExpensiveCombines = true,			OptimizationRemarkEmitter &ORE, TargetTransformInfo &TTI,
	LoopInfo *LI = nullptr) {			bool ExpensiveCombines = true, LoopInfo *LI = nullptr) {
	auto &DL = F.getParent()->getDataLayout();			auto &DL = F.getParent()->getDataLayout();
	ExpensiveCombines \|= EnableExpensiveCombines;			ExpensiveCombines \|= EnableExpensiveCombines;

	/// Builder - This is an IRBuilder that automatically inserts new			/// Builder - This is an IRBuilder that automatically inserts new
	/// instructions into the worklist when they are created.			/// instructions into the worklist when they are created.
	IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(			IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
	F.getContext(), TargetFolder(DL),			F.getContext(), TargetFolder(DL),
	IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {			IRBuilderCallbackInserter([&Worklist, &AC](Instruction *I) {
	Show All 13 Lines
	while (true) {			while (true) {
	++Iteration;			++Iteration;
	DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "			DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
	<< F.getName() << "\n");			<< F.getName() << "\n");

	MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);			MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);

	InstCombiner IC(Worklist, Builder, F.optForMinSize(), ExpensiveCombines, AA,			InstCombiner IC(Worklist, Builder, F.optForMinSize(), ExpensiveCombines, AA,
	AC, TLI, DT, ORE, DL, LI);			AC, TLI, DT, ORE, DL, LI, TTI);
	IC.MaxArraySizeForCombine = MaxArraySize;			IC.MaxArraySizeForCombine = MaxArraySize;

	if (!IC.run())			if (!IC.run())
	break;			break;
	}			}

	return MadeIRChange \|\| Iteration > 1;			return MadeIRChange \|\| Iteration > 1;
	}			}

	PreservedAnalyses InstCombinePass::run(Function &F,			PreservedAnalyses InstCombinePass::run(Function &F,
	FunctionAnalysisManager &AM) {			FunctionAnalysisManager &AM) {
	auto &AC = AM.getResult<AssumptionAnalysis>(F);			auto &AC = AM.getResult<AssumptionAnalysis>(F);
	auto &DT = AM.getResult<DominatorTreeAnalysis>(F);			auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
	auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);			auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
	auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);			auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);

	auto *LI = AM.getCachedResult<LoopAnalysis>(F);			auto *LI = AM.getCachedResult<LoopAnalysis>(F);

	auto *AA = &AM.getResult<AAManager>(F);			auto *AA = &AM.getResult<AAManager>(F);
	if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE,			auto &TTI = AM.getResult<TargetIRAnalysis>(F);

				if (!combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, TTI,
	ExpensiveCombines, LI))			ExpensiveCombines, LI))
	// No changes, all analyses are preserved.			// No changes, all analyses are preserved.
	return PreservedAnalyses::all();			return PreservedAnalyses::all();

	// Mark all the analyses that instcombine updates as preserved.			// Mark all the analyses that instcombine updates as preserved.
	PreservedAnalyses PA;			PreservedAnalyses PA;
	PA.preserveSet<CFGAnalyses>();			PA.preserveSet<CFGAnalyses>();
	PA.preserve<AAManager>();			PA.preserve<AAManager>();
	PA.preserve<BasicAA>();			PA.preserve<BasicAA>();
	PA.preserve<GlobalsAA>();			PA.preserve<GlobalsAA>();
	return PA;			return PA;
	}			}

	void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {			void InstructionCombiningPass::getAnalysisUsage(AnalysisUsage &AU) const {
	AU.setPreservesCFG();			AU.setPreservesCFG();
	AU.addRequired<AAResultsWrapperPass>();			AU.addRequired<AAResultsWrapperPass>();
	AU.addRequired<AssumptionCacheTracker>();			AU.addRequired<AssumptionCacheTracker>();
	AU.addRequired<TargetLibraryInfoWrapperPass>();			AU.addRequired<TargetLibraryInfoWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
	AU.addRequired<DominatorTreeWrapperPass>();			AU.addRequired<DominatorTreeWrapperPass>();
	AU.addRequired<OptimizationRemarkEmitterWrapperPass>();			AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
	AU.addPreserved<DominatorTreeWrapperPass>();			AU.addPreserved<DominatorTreeWrapperPass>();
	AU.addPreserved<AAResultsWrapperPass>();			AU.addPreserved<AAResultsWrapperPass>();
	AU.addPreserved<BasicAAWrapperPass>();			AU.addPreserved<BasicAAWrapperPass>();
	AU.addPreserved<GlobalsAAWrapperPass>();			AU.addPreserved<GlobalsAAWrapperPass>();
	}			}

	bool InstructionCombiningPass::runOnFunction(Function &F) {			bool InstructionCombiningPass::runOnFunction(Function &F) {
	if (skipFunction(F))			if (skipFunction(F))
	return false;			return false;

	// Required analyses.			// Required analyses.
	auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();			auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
	auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);			auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
	auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();			auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
	auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();			auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();			auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
				auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

	// Optional analyses.			// Optional analyses.
	auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();			auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
	auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;			auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;

	return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE,			return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT, ORE, TTI,
	ExpensiveCombines, LI);			ExpensiveCombines, LI);
	}			}

	char InstructionCombiningPass::ID = 0;			char InstructionCombiningPass::ID = 0;

	INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",			INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",
	"Combine redundant instructions", false, false)			"Combine redundant instructions", false, false)
	INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)			INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
	INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)			INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
	INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)			INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
	INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)			INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
	INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)			INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
	INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)			INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
	INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",			INITIALIZE_PASS_END(InstructionCombiningPass, "instcombine",
	"Combine redundant instructions", false, false)			"Combine redundant instructions", false, false)

	// Initialization Routines			// Initialization Routines
	Show All 15 Lines

test/Transforms/InstCombine/sink_across_call.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s
				lebedev.riUnsubmitted Done Reply Inline Actions instcombine tests use `utils/update_test_checks.py` lebedev.ri: instcombine tests use `utils/update_test_checks.py`
				renlinAuthorUnsubmitted Not Done Reply Inline Actions thanks, updated renlin: thanks, updated

				;; This tests that instructions are not sunk into user block when
				;; there is a function call interfering.

				declare float @other(float*, i32) readonly

				define void @test(float* nocapture %out, float* %in, i32 %w, i32 %n) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[W:%.]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[IN:%.*]], i64 [[IDXPROM]]
				; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[CALL:%.]] = tail call float @other(float [[IN]], i32 [[N:%.*]])
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[CALL]], -1.000000e+01
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[CMP1:%.*]] = fcmp oge float [[TMP0]], [[CALL]]
				; CHECK-NEXT: [[CONV:%.*]] = uitofp i1 [[CMP1]] to float
				; CHECK-NEXT: store float [[CONV]], float* [[OUT:%.*]], align 4
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;
				entry:
				%idxprom = sext i32 %w to i64
				%arrayidx = getelementptr inbounds float, float* %in, i64 %idxprom
				%0 = load float, float* %arrayidx, align 4
				%call = tail call float @other(float* %in, i32 %n)
				%cmp = fcmp oeq float %call, -1.000000e+01
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				%cmp1 = fcmp oge float %0, %call
				%conv = uitofp i1 %cmp1 to float
				store float %conv, float* %out, align 4
				br label %if.end

				if.end:
				ret void
				}