This is an archive of the discontinued LLVM Phabricator instance.

Differential D120231

[SelectOpti][3/5] Base Heuristics
ClosedPublic

Authored by apostolakis on Feb 20 2022, 10:44 PM.

Download Raw Diff

Details

Reviewers

davidxl

Commits

rG8b42bc5662ca: [SelectOpti][3/5] Base Heuristics

Summary

This patch adds the base heuristics for determining whether branches are more profitable than conditional moves.
Base heuristics apply to all code apart from inner-most loops.

Depends on D122259

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

apostolakis created this revision.Feb 20 2022, 10:44 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 20 2022, 10:44 PM

apostolakis edited the summary of this revision. (Show Details)Feb 20 2022, 10:45 PM

apostolakis added a child revision: D120232: [SelectOpti][4/5] Loop Heuristics.Feb 20 2022, 10:49 PM

Harbormaster completed remote builds in B150636: Diff 410221.Feb 21 2022, 12:03 AM

Minor tweaks

Harbormaster completed remote builds in B150756: Diff 410385.Feb 21 2022, 3:08 PM

apostolakis published this revision for review.Feb 23 2022, 10:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 10:14 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Skip selects with vector condition

Harbormaster completed remote builds in B151554: Diff 411535.Feb 25 2022, 4:04 PM

apostolakis mentioned this in D120230: [SelectOpti][1/5] Setup new select-optimize pass.Mar 9 2022, 4:53 PM

tschuett added a subscriber: tschuett.Mar 13 2022, 5:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 5:21 AM

bsmith added a subscriber: bsmith.Mar 21 2022, 7:59 AM

Rebase and move tests

Herald added a subscriber: pengfei. · View Herald TranscriptMar 22 2022, 2:02 PM

Harbormaster completed remote builds in B155699: Diff 417398.Mar 22 2022, 2:03 PM

apostolakis retitled this revision from [SelectOpti][2/4] Base Heuristics to [SelectOpti][3/5] Base Heuristics.Mar 22 2022, 2:04 PM

apostolakis edited the summary of this revision. (Show Details)

apostolakis added a parent revision: D122259: [SelectOpti][2/5] Select-to-branch base transformation.

modimo added a subscriber: modimo.Mar 23 2022, 2:44 PM

Use opt tests instead of llc ones

Harbormaster completed remote builds in B156001: Diff 417838.Mar 23 2022, 11:50 PM

Rebase

Herald added a subscriber: ormris. · View Herald TranscriptMar 26 2022, 3:22 PM

Harbormaster completed remote builds in B156432: Diff 418423.Mar 26 2022, 3:23 PM

davidxl added a subscriber: davidxl.Mar 31 2022, 4:11 PM

davidxl added inline comments.

llvm/lib/CodeGen/SelectOptimize.cpp
109	add some comments to the methods.
385	is less profitable?
446	This function deserves more comments in the code.
464	non-loop regions?
477	should the relative hotness of II be checked instead of looking at the loop context? Also It might worth check if the instructions in slice can actually be wrapped in the cold branch -- checking hasOneUse is for this, but is it more restricted?

davidxl added a reviewer: davidxl.Apr 1 2022, 1:21 PM

apostolakis marked an inline comment as done.Apr 1 2022, 2:14 PM

apostolakis added inline comments.

llvm/lib/CodeGen/SelectOptimize.cpp
109	Done.
385	Yes. Fixed.
446	Added more comments.
477	That's a good idea. It is better to check the relative hotness rather than looking at loop contexts. Changed the code. Skipping instructions less hot than the source instruction (the value operand of the select). Checking for hasOneUse is a bit conservative (e.g., an instruction could have two uses that are both meant for the computation of the cold value operand of the target select) but it is much simpler than accurately identifying all the instructions used solely for the computation of the select’s true/false value operands. I originally implemented the latter but the experimental results did not show improved performance and thus the extra implementation complexity was not justified.

Address review comments

Harbormaster completed remote builds in B157498: Diff 419858.Apr 1 2022, 2:15 PM

davidxl added inline comments.Apr 5 2022, 2:31 PM

llvm/lib/CodeGen/SelectOptimize.cpp
394	If the SI is in a cold basic block (as determined by profile summary), it may not be worth converting.
419	When profile data is available, but the select instruction does not have a meta data attached, we may want to emit a missed optimization warning.
435	assert ColdI != NULL?
439	what if all defs in the chain are cheap, but they add up to be expensive?
450	Perhaps make the name of the method more general to match its intention: get instructions that can be wrapped into the cold branch when converted to control flow. Also make it clear that current heuristic chooses only oneUse defs.
469	Is this possible?

apostolakis added inline comments.Apr 9 2022, 8:16 PM

llvm/lib/CodeGen/SelectOptimize.cpp
394	That's true. Selects in cold functions will not be considered for conversion since this is tested earlier by calling llvm::shouldOptimizeForSize. But cold basic blocks within a non-cold function are not checked. Added a check for coldness at the basic block level to prevent any conversion.
419	Sure, added.
435	The value operands of the select instructions are not necessarily instructions, could be for example, a function argument.
439	Yes, many cheap ones could amount to be as costly as one expensive one. Tweaked the heuristic to account for that.
450	Good point. Changed the function name and the comments.
469	This branch can go both ways: Example where freq(II) < freq(I): II = ... for () { I = II + ... x = select c, I, ... } Example where freq(II) > freq(I): for () { II = ... } I = II + ... x = select c, I, ...

Address review.

Harbormaster completed remote builds in B158886: Diff 421763.Apr 9 2022, 8:19 PM

davidxl added inline comments.Apr 11 2022, 11:07 PM

llvm/lib/CodeGen/SelectOptimize.cpp
453	Should the cost also consider the frequency difference between the SI and the cold operand? basically the colder the operand, the more expensive it is to use CMOV.
455	Suggest adding an option which is the multiplier of the TCC_expensive

apostolakis added inline comments.Apr 18 2022, 9:51 PM

llvm/lib/CodeGen/SelectOptimize.cpp
453	Yes that would seem a good idea. Tested a couple of different ways of adjusting the cost and evaluated perf impact on search workload. Added the one that showed some positive (although only slight) effect.
455	Good idea to make it customizable. Added.

Address comments.

Harbormaster completed remote builds in B160170: Diff 423533.Apr 18 2022, 9:53 PM

davidxl added inline comments.Apr 18 2022, 10:15 PM

llvm/lib/CodeGen/SelectOptimize.cpp
470	Is there a concern on the overflow of the multiplication? Probably not if the meta data weight is 32 bit.

apostolakis added inline comments.Apr 18 2022, 10:27 PM

llvm/lib/CodeGen/SelectOptimize.cpp
470	The weight metadata indeed contains 32-bit values and the SliceCost is expected to be small (these cold slices contain a handful of instructions with typical costs of 1 to 4 for each instruction). So, no real concern of overflowing here.

lgtm

This revision is now accepted and ready to land.Apr 18 2022, 10:30 PM

bsmith added inline comments.Apr 20 2022, 8:01 AM

llvm/lib/CodeGen/SelectOptimize.cpp
179–181	Is this the correct thing to check for here? Even if a select is cheap, the true/false values feeding it may not be, and given the later change you have that can sink these values when converted to a branch, this check may cause us to miss some cases where this pass is still useful. (Although perhaps changing this should be deferred to the patch that does the sinking).

davidxl added inline comments.Apr 20 2022, 9:10 AM

llvm/lib/CodeGen/SelectOptimize.cpp
179–181	This is a good point. Perhaps the target specific behavior should be modeled using a cost value (SelectCost) instead of using the binary knob.

apostolakis added inline comments.Apr 20 2022, 2:19 PM

llvm/lib/CodeGen/SelectOptimize.cpp
179–181	You are right that there are still cases where it is worth considering converting to a branch even for architectures with cheap selects. I left this check since it was used in CodeGenPrepare when optimizing selects and I wanted to avoid unexpected optimizations for non-x86 architectures. However, since this pass will be, at least initially, opt-in and this check is not always useful, I will remove it. It will allow for easier non-x86 testing of the pass. Actually, when I tested some small examples on AArch64, I had to comment out this check since this pass got skipped otherwise. The cost of the select is already taken into account in the loop-level heuristics. For the base heuristics, the heuristic looking for the expensive cold operand applies regardless of the cost of the select. Yet, for the heuristic that converts highly predictable selects to branches, this check still applies. If a predictable select is not expensive, then the high predictability of the select does not suffice on its own for conversion to a branch. So, I added this check for this heuristic. By the way, @bsmith let me know if you do any perfomance testing of this pass on ARM architectures. I will be happy to incorporate feedback and refine this pass to profitably target both x86 and ARM archs.

Limit the isPredictableSelectExpensive check to only the highly-predictable heuristic.

Harbormaster completed remote builds in B160529: Diff 424026.Apr 20 2022, 2:22 PM

apostolakis removed a parent revision: D120230: [SelectOpti][1/5] Setup new select-optimize pass.May 14 2022, 2:05 PM

This revision was landed with ongoing or failed builds.May 23 2022, 7:02 PM

Closed by commit rG8b42bc5662ca: [SelectOpti][3/5] Base Heuristics (authored by apostolakis). · Explain Why

This revision was automatically updated to reflect the committed changes.

apostolakis added a commit: rG8b42bc5662ca: [SelectOpti][3/5] Base Heuristics.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectOptimize.cpp

295 lines

test/

CodeGen/

X86/

select-optimize.ll

228 lines

Diff 431555

llvm/lib/CodeGen/SelectOptimize.cpp

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
		#include "llvm/Analysis/ProfileSummaryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
		#include "llvm/Transforms/Utils/SizeOpts.h"
		#include <algorithm>
		#include <memory>
		#include <queue>
		#include <stack>
		#include <string>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "select-optimize"		#define DEBUG_TYPE "select-optimize"

		STATISTIC(NumSelectOptAnalyzed,
		"Number of select groups considered for conversion to branch");
		STATISTIC(NumSelectConvertedExpColdOperand,
		"Number of select groups converted due to expensive cold operand");
		STATISTIC(NumSelectConvertedHighPred,
		"Number of select groups converted due to high-predictability");
		STATISTIC(NumSelectUnPred,
		"Number of select groups not converted due to unpredictability");
		STATISTIC(NumSelectColdBB,
		"Number of select groups not converted due to cold basic block");
STATISTIC(NumSelectsConverted, "Number of selects converted");		STATISTIC(NumSelectsConverted, "Number of selects converted");

		static cl::opt<unsigned> ColdOperandThreshold(
		"cold-operand-threshold",
		cl::desc("Maximum frequency of path for an operand to be considered cold."),
		cl::init(20), cl::Hidden);

		static cl::opt<unsigned> ColdOperandMaxCostMultiplier(
		"cold-operand-max-cost-multiplier",
		cl::desc("Maximum cost multiplier of TCC_expensive for the dependence "
		"slice of a cold operand to be considered inexpensive."),
		cl::init(1), cl::Hidden);

namespace {		namespace {

class SelectOptimize : public FunctionPass {		class SelectOptimize : public FunctionPass {
const TargetMachine *TM = nullptr;		const TargetMachine *TM = nullptr;
const TargetSubtargetInfo *TSI;		const TargetSubtargetInfo *TSI;
const TargetLowering *TLI = nullptr;		const TargetLowering *TLI = nullptr;
		const TargetTransformInfo *TTI = nullptr;
const LoopInfo *LI;		const LoopInfo *LI;
		DominatorTree *DT;
std::unique_ptr<BlockFrequencyInfo> BFI;		std::unique_ptr<BlockFrequencyInfo> BFI;
std::unique_ptr<BranchProbabilityInfo> BPI;		std::unique_ptr<BranchProbabilityInfo> BPI;
		ProfileSummaryInfo *PSI;
		OptimizationRemarkEmitter *ORE;

public:		public:
static char ID;		static char ID;

SelectOptimize() : FunctionPass(ID) {		SelectOptimize() : FunctionPass(ID) {
initializeSelectOptimizePass(*PassRegistry::getPassRegistry());		initializeSelectOptimizePass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<ProfileSummaryInfoWrapperPass>();
AU.addRequired<TargetPassConfig>();		AU.addRequired<TargetPassConfig>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
		AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
}		}

private:		private:
// Select groups consist of consecutive select instructions with the same		// Select groups consist of consecutive select instructions with the same
// condition.		// condition.
using SelectGroup = SmallVector<SelectInst *, 2>;		using SelectGroup = SmallVector<SelectInst *, 2>;
using SelectGroups = SmallVector<SelectGroup, 2>;		using SelectGroups = SmallVector<SelectGroup, 2>;

		// Converts select instructions of a function to conditional jumps when deemed
		// profitable. Returns true if at least one select was converted.
bool optimizeSelects(Function &F);		bool optimizeSelects(Function &F);

		davidxlUnsubmitted Not Done Reply Inline Actions add some comments to the methods. davidxl: add some comments to the methods.
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Done. apostolakis: Done.
		// Heuristics for determining which select instructions can be profitably
		// conveted to branches. Separate heuristics for selects in inner-most loops
		// and the rest of code regions (base heuristics for non-inner-most loop
		// regions).
		void optimizeSelectsBase(Function &F, SelectGroups &ProfSIGroups);
		void optimizeSelectsInnerLoops(Function &F, SelectGroups &ProfSIGroups);

		// Converts to branches the select groups that were deemed
		// profitable-to-convert.
void convertProfitableSIGroups(SelectGroups &ProfSIGroups);		void convertProfitableSIGroups(SelectGroups &ProfSIGroups);

		// Splits selects of a given basic block into select groups.
void collectSelectGroups(BasicBlock &BB, SelectGroups &SIGroups);		void collectSelectGroups(BasicBlock &BB, SelectGroups &SIGroups);

		// Determines for which select groups it is profitable converting to branches
		// (base heuristics).
		void findProfitableSIGroupsBase(SelectGroups &SIGroups,
		SelectGroups &ProfSIGroups);
		// Determines if a select group should be converted to a branch (base
		// heuristics).
		bool isConvertToBranchProfitableBase(const SmallVector<SelectInst *, 2> &ASI);

		// Returns true if there are expensive instructions in the cold value
		// operand's (if any) dependence slice of any of the selects of the given
		// group.
		bool hasExpensiveColdOperand(const SmallVector<SelectInst *, 2> &ASI);

		// For a given source instruction, collect its backwards dependence slice
		// consisting of instructions exclusively computed for producing the operands
		// of the source instruction.
		void getExclBackwardsSlice(Instruction *I,
		SmallVector<Instruction *, 2> &Slice);

		// Returns true if the condition of the select is highly predictable.
		bool isSelectHighlyPredictable(const SelectInst *SI);

		// Returns true if the target architecture supports lowering a given select.
bool isSelectKindSupported(SelectInst *SI);		bool isSelectKindSupported(SelectInst *SI);
};		};
} // namespace		} // namespace

char SelectOptimize::ID = 0;		char SelectOptimize::ID = 0;

INITIALIZE_PASS_BEGIN(SelectOptimize, DEBUG_TYPE, "Optimize selects", false,		INITIALIZE_PASS_BEGIN(SelectOptimize, DEBUG_TYPE, "Optimize selects", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)		INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
INITIALIZE_PASS_END(SelectOptimize, DEBUG_TYPE, "Optimize selects", false,		INITIALIZE_PASS_END(SelectOptimize, DEBUG_TYPE, "Optimize selects", false,
false)		false)

FunctionPass *llvm::createSelectOptimizePass() { return new SelectOptimize(); }		FunctionPass *llvm::createSelectOptimizePass() { return new SelectOptimize(); }

bool SelectOptimize::runOnFunction(Function &F) {		bool SelectOptimize::runOnFunction(Function &F) {
TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();		TM = &getAnalysis<TargetPassConfig>().getTM<TargetMachine>();
TSI = TM->getSubtargetImpl(F);		TSI = TM->getSubtargetImpl(F);
TLI = TSI->getTargetLowering();		TLI = TSI->getTargetLowering();

		// If none of the select types is supported then skip this pass.
		// This is an optimization pass. Legality issues will be handled by
		// instruction selection.
		if (!TLI->isSelectSupported(TargetLowering::ScalarValSelect) &&
		!TLI->isSelectSupported(TargetLowering::ScalarCondVectorVal) &&
		!TLI->isSelectSupported(TargetLowering::VectorMaskSelect))
		return false;

		TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
		bsmithUnsubmitted Not Done Reply Inline Actions Is this the correct thing to check for here? Even if a select is cheap, the true/false values feeding it may not be, and given the later change you have that can sink these values when converted to a branch, this check may cause us to miss some cases where this pass is still useful. (Although perhaps changing this should be deferred to the patch that does the sinking). bsmith: Is this the correct thing to check for here? Even if a select is cheap, the true/false values…
		davidxlUnsubmitted Not Done Reply Inline Actions This is a good point. Perhaps the target specific behavior should be modeled using a cost value (SelectCost) instead of using the binary knob. davidxl: This is a good point. Perhaps the target specific behavior should be modeled using a cost…
		apostolakisAuthorUnsubmitted Done Reply Inline Actions You are right that there are still cases where it is worth considering converting to a branch even for architectures with cheap selects. I left this check since it was used in CodeGenPrepare when optimizing selects and I wanted to avoid unexpected optimizations for non-x86 architectures. However, since this pass will be, at least initially, opt-in and this check is not always useful, I will remove it. It will allow for easier non-x86 testing of the pass. Actually, when I tested some small examples on AArch64, I had to comment out this check since this pass got skipped otherwise. The cost of the select is already taken into account in the loop-level heuristics. For the base heuristics, the heuristic looking for the expensive cold operand applies regardless of the cost of the select. Yet, for the heuristic that converts highly predictable selects to branches, this check still applies. If a predictable select is not expensive, then the high predictability of the select does not suffice on its own for conversion to a branch. So, I added this check for this heuristic. By the way, @bsmith let me know if you do any perfomance testing of this pass on ARM architectures. I will be happy to incorporate feedback and refine this pass to profitably target both x86 and ARM archs. apostolakis: You are right that there are still cases where it is worth considering converting to a branch…
BPI.reset(new BranchProbabilityInfo(F, *LI));		BPI.reset(new BranchProbabilityInfo(F, *LI));
BFI.reset(new BlockFrequencyInfo(F, BPI, LI));		BFI.reset(new BlockFrequencyInfo(F, BPI, LI));
		PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
		ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();

		// When optimizing for size, selects are preferable over branches.
		if (F.hasOptSize() \|\| llvm::shouldOptimizeForSize(&F, PSI, BFI.get()))
		return false;

return optimizeSelects(F);		return optimizeSelects(F);
}		}

bool SelectOptimize::optimizeSelects(Function &F) {		bool SelectOptimize::optimizeSelects(Function &F) {
// Collect all the select groups.
SelectGroups SIGroups;
for (BasicBlock &BB : F) {
collectSelectGroups(BB, SIGroups);
}

// Determine for which select groups it is profitable converting to branches.		// Determine for which select groups it is profitable converting to branches.
SelectGroups ProfSIGroups;		SelectGroups ProfSIGroups;
// For now assume that all select groups can be profitably converted to		// Base heuristics apply only to non-loops and outer loops.
// branches.		optimizeSelectsBase(F, ProfSIGroups);
for (SelectGroup &ASI : SIGroups) {		// Separate heuristics for inner-most loops.
ProfSIGroups.push_back(ASI);		optimizeSelectsInnerLoops(F, ProfSIGroups);
}

// Convert to branches the select groups that were deemed		// Convert to branches the select groups that were deemed
// profitable-to-convert.		// profitable-to-convert.
convertProfitableSIGroups(ProfSIGroups);		convertProfitableSIGroups(ProfSIGroups);

// Code modified if at least one select group was converted.		// Code modified if at least one select group was converted.
return !ProfSIGroups.empty();		return !ProfSIGroups.empty();
}		}

		void SelectOptimize::optimizeSelectsBase(Function &F,
		SelectGroups &ProfSIGroups) {
		// Collect all the select groups.
		SelectGroups SIGroups;
		for (BasicBlock &BB : F) {
		// Base heuristics apply only to non-loops and outer loops.
		Loop *L = LI->getLoopFor(&BB);
		if (L && L->isInnermost())
		continue;
		collectSelectGroups(BB, SIGroups);
		}

		// Determine for which select groups it is profitable converting to branches.
		findProfitableSIGroupsBase(SIGroups, ProfSIGroups);
		}

		void SelectOptimize::optimizeSelectsInnerLoops(Function &F,
		SelectGroups &ProfSIGroups) {}

/// If \p isTrue is true, return the true value of \p SI, otherwise return		/// If \p isTrue is true, return the true value of \p SI, otherwise return
/// false value of \p SI. If the true/false value of \p SI is defined by any		/// false value of \p SI. If the true/false value of \p SI is defined by any
/// select instructions in \p Selects, look through the defining select		/// select instructions in \p Selects, look through the defining select
/// instruction until the true/false value is not defined in \p Selects.		/// instruction until the true/false value is not defined in \p Selects.
static Value *		static Value *
getTrueOrFalseValue(SelectInst *SI, bool isTrue,		getTrueOrFalseValue(SelectInst *SI, bool isTrue,
const SmallPtrSet<const Instruction *, 2> &Selects) {		const SmallPtrSet<const Instruction *, 2> &Selects) {
Value *V = nullptr;		Value *V = nullptr;
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
if (!isSelectKindSupported(SI))		if (!isSelectKindSupported(SI))
continue;		continue;

SIGroups.push_back(SIGroup);		SIGroups.push_back(SIGroup);
}		}
}		}
}		}

		void SelectOptimize::findProfitableSIGroupsBase(SelectGroups &SIGroups,
		SelectGroups &ProfSIGroups) {
		for (SelectGroup &ASI : SIGroups) {
		++NumSelectOptAnalyzed;
		if (isConvertToBranchProfitableBase(ASI))
		ProfSIGroups.push_back(ASI);
		}
		}

		bool SelectOptimize::isConvertToBranchProfitableBase(
		const SmallVector<SelectInst *, 2> &ASI) {
		SelectInst *SI = ASI.front();
		OptimizationRemark OR(DEBUG_TYPE, "SelectOpti", SI);
		OptimizationRemarkMissed ORmiss(DEBUG_TYPE, "SelectOpti", SI);

		// Skip cold basic blocks. Better to optimize for size for cold blocks.
		davidxlUnsubmitted Done Reply Inline Actions is less profitable? davidxl: is less profitable?
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Yes. Fixed. apostolakis: Yes. Fixed.
		if (PSI->isColdBlock(SI->getParent(), BFI.get())) {
		++NumSelectColdBB;
		ORmiss << "Not converted to branch because of cold basic block. ";
		ORE->emit(ORmiss);
		return false;
		}

		// If unpredictable, branch form is less profitable.
		if (SI->getMetadata(LLVMContext::MD_unpredictable)) {
		davidxlUnsubmitted Not Done Reply Inline Actions If the SI is in a cold basic block (as determined by profile summary), it may not be worth converting. davidxl: If the SI is in a cold basic block (as determined by profile summary), it may not be worth…
		apostolakisAuthorUnsubmitted Done Reply Inline Actions That's true. Selects in cold functions will not be considered for conversion since this is tested earlier by calling llvm::shouldOptimizeForSize. But cold basic blocks within a non-cold function are not checked. Added a check for coldness at the basic block level to prevent any conversion. apostolakis: That's true. Selects in cold functions will not be considered for conversion since this is…
		++NumSelectUnPred;
		ORmiss << "Not converted to branch because of unpredictable branch. ";
		ORE->emit(ORmiss);
		return false;
		}

		// If highly predictable, branch form is more profitable, unless a
		// predictable select is inexpensive in the target architecture.
		if (isSelectHighlyPredictable(SI) && TLI->isPredictableSelectExpensive()) {
		++NumSelectConvertedHighPred;
		OR << "Converted to branch because of highly predictable branch. ";
		ORE->emit(OR);
		return true;
		}

		// Look for expensive instructions in the cold operand's (if any) dependence
		// slice of any of the selects in the group.
		if (hasExpensiveColdOperand(ASI)) {
		++NumSelectConvertedExpColdOperand;
		OR << "Converted to branch because of expensive cold operand.";
		ORE->emit(OR);
		return true;
		}

		ORmiss << "Not profitable to convert to branch (base heuristic).";
		davidxlUnsubmitted Not Done Reply Inline Actions When profile data is available, but the select instruction does not have a meta data attached, we may want to emit a missed optimization warning. davidxl: When profile data is available, but the select instruction does not have a meta data attached…
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Sure, added. apostolakis: Sure, added.
		ORE->emit(ORmiss);
		return false;
		}

		static InstructionCost divideNearest(InstructionCost Numerator,
		uint64_t Denominator) {
		return (Numerator + (Denominator / 2)) / Denominator;
		}

		bool SelectOptimize::hasExpensiveColdOperand(
		const SmallVector<SelectInst *, 2> &ASI) {
		bool ColdOperand = false;
		uint64_t TrueWeight, FalseWeight, TotalWeight;
		if (ASI.front()->extractProfMetadata(TrueWeight, FalseWeight)) {
		uint64_t MinWeight = std::min(TrueWeight, FalseWeight);
		TotalWeight = TrueWeight + FalseWeight;
		davidxlUnsubmitted Not Done Reply Inline Actions assert ColdI != NULL? davidxl: assert ColdI != NULL?
		apostolakisAuthorUnsubmitted Done Reply Inline Actions The value operands of the select instructions are not necessarily instructions, could be for example, a function argument. apostolakis: The value operands of the select instructions are not necessarily instructions, could be for…
		// Is there a path with frequency <ColdOperandThreshold% (default:20%) ?
		ColdOperand = TotalWeight * ColdOperandThreshold > 100 * MinWeight;
		} else if (PSI->hasProfileSummary()) {
		OptimizationRemarkMissed ORmiss(DEBUG_TYPE, "SelectOpti", ASI.front());
		davidxlUnsubmitted Not Done Reply Inline Actions what if all defs in the chain are cheap, but they add up to be expensive? davidxl: what if all defs in the chain are cheap, but they add up to be expensive?
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Yes, many cheap ones could amount to be as costly as one expensive one. Tweaked the heuristic to account for that. apostolakis: Yes, many cheap ones could amount to be as costly as one expensive one. Tweaked the heuristic…
		ORmiss << "Profile data available but missing branch-weights metadata for "
		"select instruction. ";
		ORE->emit(ORmiss);
		}
		if (!ColdOperand)
		return false;
		// Check if the cold path's dependence slice is expensive for any of the
		davidxlUnsubmitted Not Done Reply Inline Actions This function deserves more comments in the code. davidxl: This function deserves more comments in the code.
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Added more comments. apostolakis: Added more comments.
		// selects of the group.
		for (SelectInst *SI : ASI) {
		Instruction *ColdI = nullptr;
		uint64_t HotWeight;
		davidxlUnsubmitted Not Done Reply Inline Actions Perhaps make the name of the method more general to match its intention: get instructions that can be wrapped into the cold branch when converted to control flow. Also make it clear that current heuristic chooses only oneUse defs. davidxl: Perhaps make the name of the method more general to match its intention: get instructions that…
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Good point. Changed the function name and the comments. apostolakis: Good point. Changed the function name and the comments.
		if (TrueWeight < FalseWeight) {
		ColdI = dyn_cast<Instruction>(SI->getTrueValue());
		HotWeight = FalseWeight;
		davidxlUnsubmitted Not Done Reply Inline Actions Should the cost also consider the frequency difference between the SI and the cold operand? basically the colder the operand, the more expensive it is to use CMOV. davidxl: Should the cost also consider the frequency difference between the SI and the cold operand?
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Yes that would seem a good idea. Tested a couple of different ways of adjusting the cost and evaluated perf impact on search workload. Added the one that showed some positive (although only slight) effect. apostolakis: Yes that would seem a good idea. Tested a couple of different ways of adjusting the cost and…
		} else {
		ColdI = dyn_cast<Instruction>(SI->getFalseValue());
		davidxlUnsubmitted Not Done Reply Inline Actions Suggest adding an option which is the multiplier of the TCC_expensive davidxl: Suggest adding an option which is the multiplier of the TCC_expensive
		apostolakisAuthorUnsubmitted Done Reply Inline Actions Good idea to make it customizable. Added. apostolakis: Good idea to make it customizable. Added.
		HotWeight = TrueWeight;
		}
		if (ColdI) {
		SmallVector<Instruction *, 2> ColdSlice;
		getExclBackwardsSlice(ColdI, ColdSlice);
		InstructionCost SliceCost = 0;
		for (auto *ColdII : ColdSlice) {
		SliceCost +=
		TTI->getInstructionCost(ColdII, TargetTransformInfo::TCK_Latency);
		davidxlUnsubmitted Not Done Reply Inline Actions non-loop regions? davidxl: non-loop regions?
		}
		// The colder the cold value operand of the select is the more expensive
		// the cmov becomes for computing the cold value operand every time. Thus,
		// the colder the cold operand is the more its cost counts.
		// Get nearest integer cost adjusted for coldness.
		davidxlUnsubmitted Not Done Reply Inline Actions Is this possible? davidxl: Is this possible?
		apostolakisAuthorUnsubmitted Done Reply Inline Actions This branch can go both ways: Example where freq(II) < freq(I): II = ... for () { I = II + ... x = select c, I, ... } Example where freq(II) > freq(I): for () { II = ... } I = II + ... x = select c, I, ... apostolakis: This branch can go both ways: Example where freq(II) < freq(I): ``` II = ... for () { I =…
		InstructionCost AdjSliceCost =
		davidxlUnsubmitted Not Done Reply Inline Actions Is there a concern on the overflow of the multiplication? Probably not if the meta data weight is 32 bit. davidxl: Is there a concern on the overflow of the multiplication? Probably not if the meta data weight…
		apostolakisAuthorUnsubmitted Done Reply Inline Actions The weight metadata indeed contains 32-bit values and the SliceCost is expected to be small (these cold slices contain a handful of instructions with typical costs of 1 to 4 for each instruction). So, no real concern of overflowing here. apostolakis: The weight metadata indeed contains 32-bit values and the SliceCost is expected to be small…
		divideNearest(SliceCost * HotWeight, TotalWeight);
		if (AdjSliceCost >=
		ColdOperandMaxCostMultiplier * TargetTransformInfo::TCC_Expensive)
		return true;
		}
		}
		return false;
		davidxlUnsubmitted Not Done Reply Inline Actions should the relative hotness of II be checked instead of looking at the loop context? Also It might worth check if the instructions in slice can actually be wrapped in the cold branch -- checking hasOneUse is for this, but is it more restricted? davidxl: should the relative hotness of II be checked instead of looking at the loop context? Also It…
		apostolakisAuthorUnsubmitted Not Done Reply Inline Actions That's a good idea. It is better to check the relative hotness rather than looking at loop contexts. Changed the code. Skipping instructions less hot than the source instruction (the value operand of the select). Checking for hasOneUse is a bit conservative (e.g., an instruction could have two uses that are both meant for the computation of the cold value operand of the target select) but it is much simpler than accurately identifying all the instructions used solely for the computation of the select’s true/false value operands. I originally implemented the latter but the experimental results did not show improved performance and thus the extra implementation complexity was not justified. apostolakis: That's a good idea. It is better to check the relative hotness rather than looking at loop…
		}

		// For a given source instruction, collect its backwards dependence slice
		// consisting of instructions exclusively computed for the purpose of producing
		// the operands of the source instruction. As an approximation
		// (sufficiently-accurate in practice), we populate this set with the
		// instructions of the backwards dependence slice that only have one-use and
		// form an one-use chain that leads to the source instruction.
		void SelectOptimize::getExclBackwardsSlice(
		Instruction I, SmallVector<Instruction , 2> &Slice) {
		SmallPtrSet<Instruction *, 2> Visited;
		std::queue<Instruction *> Worklist;
		Worklist.push(I);
		while (!Worklist.empty()) {
		Instruction *II = Worklist.front();
		Worklist.pop();

		// Avoid cycles.
		if (Visited.count(II))
		continue;
		Visited.insert(II);

		if (!II->hasOneUse())
		continue;

		// Avoid considering instructions with less frequency than the source
		// instruction (i.e., avoid colder code regions of the dependence slice).
		if (BFI->getBlockFreq(II->getParent()) < BFI->getBlockFreq(I->getParent()))
		continue;

		// Eligible one-use instruction added to the dependence slice.
		Slice.push_back(II);

		// Explore all the operands of the current instruction to expand the slice.
		for (unsigned k = 0; k < II->getNumOperands(); ++k)
		if (auto *OpI = dyn_cast<Instruction>(II->getOperand(k)))
		Worklist.push(OpI);
		}
		}

		bool SelectOptimize::isSelectHighlyPredictable(const SelectInst *SI) {
		uint64_t TrueWeight, FalseWeight;
		if (SI->extractProfMetadata(TrueWeight, FalseWeight)) {
		uint64_t Max = std::max(TrueWeight, FalseWeight);
		uint64_t Sum = TrueWeight + FalseWeight;
		if (Sum != 0) {
		auto Probability = BranchProbability::getBranchProbability(Max, Sum);
		if (Probability > TTI->getPredictableBranchThreshold())
		return true;
		}
		}
		return false;
		}

bool SelectOptimize::isSelectKindSupported(SelectInst *SI) {		bool SelectOptimize::isSelectKindSupported(SelectInst *SI) {
bool VectorCond = !SI->getCondition()->getType()->isIntegerTy(1);		bool VectorCond = !SI->getCondition()->getType()->isIntegerTy(1);
if (VectorCond)		if (VectorCond)
return false;		return false;
TargetLowering::SelectSupportKind SelectKind;		TargetLowering::SelectSupportKind SelectKind;
if (SI->getType()->isVectorTy())		if (SI->getType()->isVectorTy())
SelectKind = TargetLowering::ScalarCondVectorVal;		SelectKind = TargetLowering::ScalarCondVectorVal;
else		else
SelectKind = TargetLowering::ScalarValSelect;		SelectKind = TargetLowering::ScalarValSelect;
return TLI->isSelectSupported(SelectKind);		return TLI->isSelectSupported(SelectKind);
}		}

llvm/test/CodeGen/X86/select-optimize.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-unknown-unknown -select-optimize -S < %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-unknown-unknown -select-optimize -S < %s \| FileCheck %s

	; Single select converted to branch			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	define i32 @single_select(i32 %a, i32 %b, i1 %cmp) {			;; Test base heuristic 1:
	; CHECK-LABEL: @single_select(			;; highly-biased selects assumed to be highly predictable, converted to branches
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; If a select is obviously predictable, turn it into a branch.
				define i32 @weighted_select1(i32 %a, i32 %b, i1 %cmp) {
				; CHECK-LABEL: @weighted_select1(
	; CHECK-NEXT: [[SEL_FROZEN:%.]] = freeze i1 [[CMP:%.]]			; CHECK-NEXT: [[SEL_FROZEN:%.]] = freeze i1 [[CMP:%.]]
	; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF2:![0-9]+]]			; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF16:![0-9]+]]
	; CHECK: select.false:			; CHECK: select.false:
	; CHECK-NEXT: br label [[SELECT_END]]			; CHECK-NEXT: br label [[SELECT_END]]
	; CHECK: select.end:			; CHECK: select.end:
	; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]			; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]
	; CHECK-NEXT: ret i32 [[SEL]]			; CHECK-NEXT: ret i32 [[SEL]]
	;			;
	%sel = select i1 %cmp, i32 %a, i32 %b, !prof !0			%sel = select i1 %cmp, i32 %a, i32 %b, !prof !15
	ret i32 %sel			ret i32 %sel
	}			}

	; Select group converted to branch			; If a select is obviously predictable (reversed profile weights),
	define i32 @select_group(i32 %a, i32 %b, i32 %c, i1 %cmp) {			; turn it into a branch.
	; CHECK-LABEL: @select_group(			define i32 @weighted_select2(i32 %a, i32 %b, i1 %cmp) {
				; CHECK-LABEL: @weighted_select2(
				; CHECK-NEXT: [[SEL_FROZEN:%.]] = freeze i1 [[CMP:%.]]
				; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF17:![0-9]+]]
				; CHECK: select.false:
				; CHECK-NEXT: br label [[SELECT_END]]
				; CHECK: select.end:
				; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%sel = select i1 %cmp, i32 %a, i32 %b, !prof !16
				ret i32 %sel
				}

				; Not obvioulsy predictable select.
				define i32 @weighted_select3(i32 %a, i32 %b, i1 %cmp) {
				; CHECK-LABEL: @weighted_select3(
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[A:%.]], i32 [[B:%.]], !prof [[PROF18:![0-9]+]]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%sel = select i1 %cmp, i32 %a, i32 %b, !prof !17
				ret i32 %sel
				}

				; Unpredictable select should not form a branch.
				define i32 @unpred_select(i32 %a, i32 %b, i1 %cmp) {
				; CHECK-LABEL: @unpred_select(
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[A:%.]], i32 [[B:%.]], !unpredictable !19
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%sel = select i1 %cmp, i32 %a, i32 %b, !unpredictable !20
				ret i32 %sel
				}

				; Predictable select in function with optsize attribute should not form branch.
				define i32 @weighted_select_optsize(i32 %a, i32 %b, i1 %cmp) optsize {
				; CHECK-LABEL: @weighted_select_optsize(
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[A:%.]], i32 [[B:%.]], !prof [[PROF16]]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%sel = select i1 %cmp, i32 %a, i32 %b, !prof !15
				ret i32 %sel
				}

				define i32 @weighted_select_pgso(i32 %a, i32 %b, i1 %cmp) !prof !14 {
				; CHECK-LABEL: @weighted_select_pgso(
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[A:%.]], i32 [[B:%.]], !prof [[PROF16]]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%sel = select i1 %cmp, i32 %a, i32 %b, !prof !15
				ret i32 %sel
				}

				; If two selects in a row are predictable, turn them into branches.
				define i32 @weighted_selects(i32 %a, i32 %b) !prof !19 {
				; CHECK-LABEL: @weighted_selects(
				; CHECK-NEXT: [[CMP:%.]] = icmp ne i32 [[A:%.]], 0
				; CHECK-NEXT: [[SEL_FROZEN:%.*]] = freeze i1 [[CMP]]
				; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF16]]
				; CHECK: select.false:
				; CHECK-NEXT: br label [[SELECT_END]]
				; CHECK: select.end:
				; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[A]], [[TMP0:%.]] ], [ [[B:%.*]], [[SELECT_FALSE]] ]
				; CHECK-NEXT: [[CMP1:%.*]] = icmp ne i32 [[SEL]], 0
				; CHECK-NEXT: [[SEL1_FROZEN:%.*]] = freeze i1 [[CMP1]]
				; CHECK-NEXT: br i1 [[SEL1_FROZEN]], label [[SELECT_END1:%.]], label [[SELECT_FALSE2:%.]], !prof [[PROF16]]
				; CHECK: select.false2:
				; CHECK-NEXT: br label [[SELECT_END1]]
				; CHECK: select.end1:
				; CHECK-NEXT: [[SEL1:%.*]] = phi i32 [ [[B]], [[SELECT_END]] ], [ [[A]], [[SELECT_FALSE2]] ]
				; CHECK-NEXT: ret i32 [[SEL1]]
				;
				%cmp = icmp ne i32 %a, 0
				%sel = select i1 %cmp, i32 %a, i32 %b, !prof !15
				%cmp1 = icmp ne i32 %sel, 0
				%sel1 = select i1 %cmp1, i32 %b, i32 %a, !prof !15
				ret i32 %sel1
				}

				; If select group predictable, turn it into a branch.
				define i32 @weighted_select_group(i32 %a, i32 %b, i32 %c, i1 %cmp) !prof !19 {
				; CHECK-LABEL: @weighted_select_group(
	; CHECK-NEXT: [[SEL1_FROZEN:%.]] = freeze i1 [[CMP:%.]]			; CHECK-NEXT: [[SEL1_FROZEN:%.]] = freeze i1 [[CMP:%.]]
	; CHECK-NEXT: br i1 [[SEL1_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF2]]			; CHECK-NEXT: br i1 [[SEL1_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF16]]
	; CHECK: select.false:			; CHECK: select.false:
	; CHECK-NEXT: br label [[SELECT_END]]			; CHECK-NEXT: br label [[SELECT_END]]
	; CHECK: select.end:			; CHECK: select.end:
	; CHECK-NEXT: [[SEL1:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]			; CHECK-NEXT: [[SEL1:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]
	; CHECK-NEXT: [[SEL2:%.]] = phi i32 [ [[C:%.]], [[TMP0]] ], [ [[A]], [[SELECT_FALSE]] ]			; CHECK-NEXT: [[SEL2:%.]] = phi i32 [ [[C:%.]], [[TMP0]] ], [ [[A]], [[SELECT_FALSE]] ]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[SEL1]], metadata [[META3:![0-9]+]], metadata !DIExpression()), !dbg [[DBG8:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[SEL1]], metadata [[META22:![0-9]+]], metadata !DIExpression()), !dbg [[DBG26:![0-9]+]]
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[SEL1]], [[SEL2]]			; CHECK-NEXT: [[ADD:%.*]] = add i32 [[SEL1]], [[SEL2]]
	; CHECK-NEXT: ret i32 [[ADD]]			; CHECK-NEXT: ret i32 [[ADD]]
	;			;
	%sel1 = select i1 %cmp, i32 %a, i32 %b, !prof !0			%sel1 = select i1 %cmp, i32 %a, i32 %b, !prof !15
	call void @llvm.dbg.value(metadata i32 %sel1, metadata !4, metadata !DIExpression()), !dbg !DILocation(scope: !3)			call void @llvm.dbg.value(metadata i32 %sel1, metadata !24, metadata !DIExpression()), !dbg !DILocation(scope: !23)
	%sel2 = select i1 %cmp, i32 %c, i32 %a, !prof !0			%sel2 = select i1 %cmp, i32 %c, i32 %a, !prof !15
	%add = add i32 %sel1, %sel2			%add = add i32 %sel1, %sel2
	ret i32 %add			ret i32 %add
	}			}

	; Select group with intra-group dependence converted to branch			; Predictable select group with intra-group dependence converted to branch
	define i32 @select_group_intra_group(i32 %a, i32 %b, i32 %c, i1 %cmp) {			define i32 @select_group_intra_group(i32 %a, i32 %b, i32 %c, i1 %cmp) {
	; CHECK-LABEL: @select_group_intra_group(			; CHECK-LABEL: @select_group_intra_group(
	; CHECK-NEXT: [[SEL1_FROZEN:%.]] = freeze i1 [[CMP:%.]]			; CHECK-NEXT: [[SEL1_FROZEN:%.]] = freeze i1 [[CMP:%.]]
	; CHECK-NEXT: br i1 [[SEL1_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF2]]			; CHECK-NEXT: br i1 [[SEL1_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF16]]
	; CHECK: select.false:			; CHECK: select.false:
	; CHECK-NEXT: br label [[SELECT_END]]			; CHECK-NEXT: br label [[SELECT_END]]
	; CHECK: select.end:			; CHECK: select.end:
	; CHECK-NEXT: [[SEL1:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]			; CHECK-NEXT: [[SEL1:%.]] = phi i32 [ [[A:%.]], [[TMP0:%.]] ], [ [[B:%.]], [[SELECT_FALSE]] ]
	; CHECK-NEXT: [[SEL2:%.]] = phi i32 [ [[C:%.]], [[TMP0]] ], [ [[B]], [[SELECT_FALSE]] ]			; CHECK-NEXT: [[SEL2:%.]] = phi i32 [ [[C:%.]], [[TMP0]] ], [ [[B]], [[SELECT_FALSE]] ]
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 [[SEL1]], [[SEL2]]			; CHECK-NEXT: [[SUB:%.*]] = sub i32 [[SEL1]], [[SEL2]]
	; CHECK-NEXT: ret i32 [[SUB]]			; CHECK-NEXT: ret i32 [[SUB]]
	;			;
	%sel1 = select i1 %cmp, i32 %a, i32 %b, !prof !0			%sel1 = select i1 %cmp, i32 %a, i32 %b,!prof !15
	%sel2 = select i1 %cmp, i32 %c, i32 %sel1, !prof !0			%sel2 = select i1 %cmp, i32 %c, i32 %sel1, !prof !15
	%sub = sub i32 %sel1, %sel2			%sub = sub i32 %sel1, %sel2
	ret i32 %sub			ret i32 %sub
	}			}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
				;; Test base heuristic 2:
				;; look for expensive instructions in the one-use slice of the cold path
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; Select with cold one-use load value operand should form branch and
				; sink load
				define i32 @expensive_val_operand1(i32* nocapture %a, i32 %y, i1 %cmp) {
				; CHECK-LABEL: @expensive_val_operand1(
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[A:%.*]], align 8
				; CHECK-NEXT: [[SEL_FROZEN:%.]] = freeze i1 [[CMP:%.]]
				; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF18]]
				; CHECK: select.false:
				; CHECK-NEXT: br label [[SELECT_END]]
				; CHECK: select.end:
				; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[LOAD]], [[TMP0:%.]] ], [ [[Y:%.*]], [[SELECT_FALSE]] ]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%load = load i32, i32* %a, align 8
				%sel = select i1 %cmp, i32 %load, i32 %y, !prof !17
				ret i32 %sel
				}

				; Expensive hot value operand and cheap cold value operand.
				define i32 @expensive_val_operand2(i32* nocapture %a, i32 %x, i1 %cmp) {
				; CHECK-LABEL: @expensive_val_operand2(
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[A:%.*]], align 8
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[X:%.*]], i32 [[LOAD]], !prof [[PROF18]]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%load = load i32, i32* %a, align 8
				%sel = select i1 %cmp, i32 %x, i32 %load, !prof !17
				ret i32 %sel
				}

				; Cold value operand with load in its one-use dependence slice shoud result
				; into a branch with sinked dependence slice.
				define i32 @expensive_val_operand3(i32* nocapture %a, i32 %b, i32 %y, i1 %cmp) {
				; CHECK-LABEL: @expensive_val_operand3(
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[A:%.*]], align 8
				; CHECK-NEXT: [[X:%.]] = add i32 [[LOAD]], [[B:%.]]
				; CHECK-NEXT: [[SEL_FROZEN:%.]] = freeze i1 [[CMP:%.]]
				; CHECK-NEXT: br i1 [[SEL_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]], !prof [[PROF18]]
				; CHECK: select.false:
				; CHECK-NEXT: br label [[SELECT_END]]
				; CHECK: select.end:
				; CHECK-NEXT: [[SEL:%.]] = phi i32 [ [[X]], [[TMP0:%.]] ], [ [[Y:%.*]], [[SELECT_FALSE]] ]
				; CHECK-NEXT: ret i32 [[SEL]]
				;
				%load = load i32, i32* %a, align 8
				%x = add i32 %load, %b
				%sel = select i1 %cmp, i32 %x, i32 %y, !prof !17
				ret i32 %sel
				}

				; Multiple uses of the load value operand.
				define i32 @expensive_val_operand4(i32 %a, i32* nocapture %b, i32 %x, i1 %cmp) {
				; CHECK-LABEL: @expensive_val_operand4(
				; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[B:%.*]], align 4
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32 [[X:%.*]], i32 [[LOAD]]
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[SEL]], [[LOAD]]
				; CHECK-NEXT: ret i32 [[ADD]]
				;
				%load = load i32, i32* %b, align 4
				%sel = select i1 %cmp, i32 %x, i32 %load
				%add = add i32 %sel, %load
				ret i32 %add
				}

	; Function Attrs: nounwind readnone speculatable willreturn			; Function Attrs: nounwind readnone speculatable willreturn
	declare void @llvm.dbg.value(metadata, metadata, metadata)			declare void @llvm.dbg.value(metadata, metadata, metadata)

	!llvm.module.flags = !{!6, !7}			!llvm.module.flags = !{!0, !26, !27}
				!0 = !{i32 1, !"ProfileSummary", !1}
	!0 = !{!"branch_weights", i32 1, i32 100}			!1 = !{!2, !3, !4, !5, !6, !7, !8, !9}
	!1 = !DIFile(filename: "test.c", directory: "/test")			!2 = !{!"ProfileFormat", !"InstrProf"}
	!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 15.0.0", isOptimized: true, emissionKind: FullDebug, globals: !5, splitDebugInlining: false, nameTableKind: None)			!3 = !{!"TotalCount", i64 10000}
	!3 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 1, unit: !2)			!4 = !{!"MaxCount", i64 10}
	!4 = !DILocalVariable(name: "x", scope: !3)			!5 = !{!"MaxInternalCount", i64 1}
	!5 = !{}			!6 = !{!"MaxFunctionCount", i64 1000}
	!6 = !{i32 2, !"Dwarf Version", i32 4}			!7 = !{!"NumCounts", i64 3}
	!7 = !{i32 1, !"Debug Info Version", i32 3}			!8 = !{!"NumFunctions", i64 3}
				!9 = !{!"DetailedSummary", !10}
				!10 = !{!11, !12, !13}
				!11 = !{i32 10000, i64 100, i32 1}
				!12 = !{i32 999000, i64 100, i32 1}
				!13 = !{i32 999999, i64 1, i32 2}
				!14 = !{!"function_entry_count", i64 0}
				!15 = !{!"branch_weights", i32 1, i32 100}
				!16 = !{!"branch_weights", i32 100, i32 1}
				!17 = !{!"branch_weights", i32 1, i32 99}
				!18 = !{!"branch_weights", i32 50, i32 50}
				!19 = !{!"function_entry_count", i64 100}
				!20 = !{}
				!21 = !DIFile(filename: "test.c", directory: "/test")
				!22 = distinct !DICompileUnit(language: DW_LANG_C99, file: !21, producer: "clang version 15.0.0", isOptimized: true, emissionKind: FullDebug, globals: !25, splitDebugInlining: false, nameTableKind: None)
				!23 = distinct !DISubprogram(name: "test", scope: !21, file: !21, line: 1, unit: !22)
				!24 = !DILocalVariable(name: "x", scope: !23)
				!25 = !{}
				!26 = !{i32 2, !"Dwarf Version", i32 4}
				!27 = !{i32 1, !"Debug Info Version", i32 3}