This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/Frontend/
-
test/
-
Frontend/
-
optimization-remark-analysis.c
-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
SimplifyCFGOptions.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/
-
Scalar/
-
SimplifyCFGPass.cpp
-
Utils/
-
SimplifyCFG.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-defaults.ll
-
new-pm-thinlto-postlink-pgo-defaults.ll
-
new-pm-thinlto-postlink-samplepgo-defaults.ll
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-remove-switches.ll
-
remove-switches.ll
-
SimplifyCFG/
-
nomerge.ll
-
remove-switches.ll

Differential D108138

[WIP] Remove switch statements before vectorization
AbandonedPublic

Authored by kmclaughlin on Aug 16 2021, 8:35 AM.

Download Raw Diff

Details

Reviewers

david-arm
fhahn
dmgreen
craig.topper
lebedev.ri

Summary

This patch changes the LowerSwitch pass so that when a flag is
passed (LoopUnswitch) the pass will only attempt to unswitch simple
switch statements (i.e. there are no ranges, each destination block is
unique) which are part of a loop. The purpose of this is to allow
vectorization of loops which is not possible at the moment due to
the presence of switch statements.

The LowerSwitch pass is now run just before the vectorizer, with
LoopUnswitch set to true. For simple switches, we create a series of
compares and branches which have a simpler structure which SimplifyCFG
can later replace with a switch again if the vectorizer made no changes.

The following tests have been added:

LowerSwitch/simple-switches.ll: Tests the changes to LowerSwitch to replace switch statments in loops.
LoopVectorize/AArch64/sve-remove-switches.ll: Tests that we can vectorize loops with switch statements with scalable vectors. Also tests that where vectorization is not possible, that the switch statement is created again.
LoopVectorize/remove-switches.ll: Ensures that we do not vectorize the loop if the target doesn't support masked loads & stores, where the cost would be too high.

Diff Detail

Unit TestsFailed

	Time	Test
	230 ms	x64 debian > Clang.CodeGenOpenCL::builtins-amdgcn.cl
	100 ms	x64 debian > Clang.CodeGenOpenCL::builtins-r600.cl
	160 ms	x64 windows > Clang.CodeGenOpenCL::builtins-amdgcn.cl
	120 ms	x64 windows > Clang.CodeGenOpenCL::builtins-r600.cl

Event Timeline

kmclaughlin created this revision.Aug 16 2021, 8:35 AM

Herald added subscribers: ctetreau, ormris, wenlei and 3 others. · View Herald TranscriptAug 16 2021, 8:35 AM

kmclaughlin requested review of this revision.Aug 16 2021, 8:35 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 16 2021, 8:35 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B119725: Diff 366637.Aug 16 2021, 9:13 AM

I'm not sure i'm sold on this, even though i'm aware that selects hurt vectorization.
How does this Simplify the CFG? I think it would be best to teach LV selects,
or at worst do this in LV itself.

In D108138#2947229, @lebedev.ri wrote:

I'm not sure i'm sold on this, even though i'm aware that selects hurt vectorization.
How does this Simplify the CFG? I think it would be best to teach LV selects,
or at worst do this in LV itself.

Hi @lebedev.ri, I'm under the impression that the vectoriser has a policy of never making scalar transformations so I doubt it would be acceptable to do this in the vectoriser pass. I think the only realistic alternative is to teach LV how to vectorise switch statements and create the vector compares and selects directly in the code, or scalarise them in the vector loop with creation of new blocks. @fhahn and @craig.topper do you have any thoughts on this or preference?

The alternative to teaching looprotate/LV about switches is to make swiches non-canonical in the first half of the pipeline, before LV.
That is, don't form them, and aggressively expand any and all existing switches.

I'm under the impression that the vectoriser has a policy of never making scalar transformations

I'm not sure what you mean. I've not looked into the details, but it could presumably be done as some sort of VPlan transformation, possibly in the constructions of vplans to treat switches like multiple icmp's/branches?

In D108138#2948975, @dmgreen wrote:

I'm under the impression that the vectoriser has a policy of never making scalar transformations

I'm not sure what you mean. I've not looked into the details, but it could presumably be done as some sort of VPlan transformation, possibly in the constructions of vplans to treat switches like multiple icmp's/branches?

Hi @dmgreen, I just meant that if LV makes a scalar transformation prior to legality/cost-model checks, then for some reason we don't vectorise, we then end up with a changed scalar body without any vectorisation.

In D108138#2948995, @david-arm wrote:

In D108138#2948975, @dmgreen wrote:

I'm under the impression that the vectoriser has a policy of never making scalar transformations

I'm not sure what you mean. I've not looked into the details, but it could presumably be done as some sort of VPlan transformation, possibly in the constructions of vplans to treat switches like multiple icmp's/branches?

Hi @dmgreen, I just meant that if LV makes a scalar transformation prior to legality/cost-model checks, then for some reason we don't vectorise, we then end up with a changed scalar body without any vectorisation.

Oh yeah, that makes sense. I was wondering if we could teach VPlan to treat them as ICmp/Br without having to actually transform the IR, just doing it as part of constructing the VPlan.

Also check @nikic’s https://reviews.llvm.org/D95296

Matt added a subscriber: Matt.Aug 17 2021, 9:23 AM

junparser added a subscriber: junparser.Aug 17 2021, 11:33 PM

Since we already have LowerSwitchPass to transform switchinst, can we add a cost modle and run it before vectorization?

Thanks all for the suggestions on this patch :)

I had a look at the LowerSwitch pass as suggested by @junparser, and I did find that running it before vectorisation transforms the switch and allows the same loops to be vectorised. However, I did find that if the loop is not vectorised then the switch is not created again later by SimplifyCFG (possibly because the pass is also arbitrarily splitting cases into ranges and creating multiple branches to the default block?). Tests such as Transforms/PhaseOrdering/X86/simplifycfg-late.ll then fail, which attempts to convert a switch statement into a lookup table.

For example, running the @switch_no_vectorize test (from remove-switches.ll) with -lowerswitch results in:

for.body:                                         ; preds = %L3, %entry
  %i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
  %sum.033 = phi float [ %conv20, %L3 ], [ 2.000000e+00, %entry ]
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
  %0 = load i32, i32* %arrayidx, align 4
  br label %NodeBlock

NodeBlock:                                        ; preds = %for.body
  %Pivot = icmp slt i32 %0, 3
  br i1 %Pivot, label %LeafBlock, label %LeafBlock1

LeafBlock1:                                       ; preds = %NodeBlock
  %SwitchLeaf2 = icmp eq i32 %0, 3
  br i1 %SwitchLeaf2, label %L3, label %NewDefault

LeafBlock:                                        ; preds = %NodeBlock
  %SwitchLeaf = icmp eq i32 %0, 2
  br i1 %SwitchLeaf, label %L2, label %NewDefault

NewDefault:                                       ; preds = %LeafBlock1, %LeafBlock
  br label %L1

I also found that any weights assigned to the switch statement are ignored when creating the new branches in LowerSwitch.

I'm not sure what the best approach to this is - I could try to change LowerSwitch to create branches which SimplifyCFG will be able to recognise and replace with a switch, or try to change SimplifyCFG to recognise this pattern of compares & branches. Alternatively, the changes in this patch could be used as the basis for a new pass which runs before the vectoriser. I wondered if anyone has any thoughts or preferences on which would be the best option here?

IMO anything other than enhancing LV is wrong.

In D108138#2967100, @lebedev.ri wrote:

IMO anything other than enhancing LV is wrong.

Hi @lebedev.ri I personally disagree here. Adding support to LV for this is significantly more work (and IMO unnecessary) because there are cases when LV has to handle a lot more than just the obvious flattened vectorisation case using vector comparisons and select instructions. We will also need to add support for vectorisation factors of 1 (with interleaving) and cases where VF>1,but we have to scalarise the switch statement. These latter two cases require basically doing exactly the same thing as @kmclaughlin's patch does here, i.e. unswitching the switch statement into compares/branches and new blocks. It seems far simpler to have a small pass that runs prior to the vectoriser (when enabled) that unswitches.

Not sure what others think here?

How is it conceptually different to break apart IR in LV itself, or do the same in a special pass just before that?
If we want to go this road, we need to completely make switches illegal/non-canonical before LV.

In D108138#2967133, @lebedev.ri wrote:

How is it conceptually different to break apart IR in LV itself, or do the same in a special pass just before that?
If we want to go this road, we need to completely make switches illegal/non-canonical before LV.

If I understand correctly you're suggesting that LV makes a scalar transformation prior to legalisation checks/cost model analysis? If that's the case then I don't think we can do that as this is beyond LV's remit and I don't see how that's any different to making a scalar transformation in a separate pass prior to LV.

In D108138#2967156, @david-arm wrote:

In D108138#2967133, @lebedev.ri wrote:

How is it conceptually different to break apart IR in LV itself, or do the same in a special pass just before that?
If we want to go this road, we need to completely make switches illegal/non-canonical before LV.

If I understand correctly you're suggesting that LV makes a scalar transformation prior to legalisation checks/cost model analysis? If that's the case then I don't think we can do that as this is beyond LV's remit and I don't see how that's any different to making a scalar transformation in a separate pass prior to LV.

Actually no, i'm saying that since LV is not allowed to do such scalar transformations,
doing the same scalar transfomation, but just outside of LV, doesn't change the fact
that we've just made a preparatory transformation in hope that it will allow LV,
without actually knowing that. If it doesn't, we now need to undo it.

I could try to change LowerSwitch to create branches which SimplifyCFG will be able to recognise and replace with a switch, or try to change SimplifyCFG to recognise this pattern of compares & branches.

option is better.

I had a look at the LowerSwitch pass as suggested by @junparser, and I did find that running it before vectorisation transforms the switch and allows the same loops to be vectorised. However, I did find that if the loop is not vectorised then the switch is not created again later by SimplifyCFG

Maybe always lower switch in loops before LV? And some very late (simplifycfg) pass to form switches from branches? icmps are more friendly for futher optimizations than switches anyway, or?

Removed changes to SimplifyCFG and instead run LowerSwitch before vectorisation.
Added SimpleSwitchConvert to LowerSwitch which is used if the pass is run before vectorisation - this only considers simple switches (where each destination block is unique) which are also part of a loop.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptSep 15 2021, 8:20 AM

Hi all, I've updated this to take a different approach - the new patch runs LowerSwitch just before the vectoriser, where it will only consider simple switches which are part of a loop. For these switches, the pass will create a series of branches and compares which SimplifyCFG is able replace with a switch again later if the vectoriser did not make any changes.

I'm happy to split this patch up to make it easier to review, but I thought I would first post the changes I have so far to gather some thoughts on whether this is a better direction than before? Thanks!

Harbormaster completed remote builds in B124015: Diff 372706.Sep 15 2021, 9:02 AM

Hi. I'm personally still not very okay with the approach as it currently is.

Do you need to run LoopRotate after lowering switches? Anything else?
But then you don't actually know that after spending all this compile time,
the vectorization will actually happen, and you won't just now need to undo all this,
correct? This seems conceptually wrong to me.

Will LV never have to learn to deal with switches properly?
I would assume it will, in which case what is the urgency of this temporary approach?

If you really don't want to fix this properly, i'm looking forward to an RFC on llvm-dev.

I just wanted to give an update on this patch, which I'm abandoning for the time being:

@lebedev.ri raised some good questions about the approach taken and whether the additional compile time spent would be worth the additional opportunities for vectorisation. After posting the last update, I collected some benchmark results using Spec2017 to get a better understanding of the impact of these changes and found that several benchmarks showed performance regressions for fixed-width.

The biggest outliers (in terms of percentage runtime change) were:
520.omnetpp_r: -3.00%
500.perlbench_r: -2.00%
502.gcc_r: -1.52%

I also collected the results after adding in a threshold number of cases to be unswitched (set to 4), as was included in the first draft of this patch. This also showed some regressions in the benchmarks run and no significant improvements. Both sets of results showed increased compile times for many benchmarks.

The same benchmarks as above, with the threshold of 4 set:
520.omnetpp_r: -3.46%
500.perlbench_r: -1.20%
502.gcc_r: -1.22%

Results were collected on a Neoverse-N1 machine. Given that these results indicate this isn't the best approach to take, I'm abandoning the patch for now. When this is picked up in future, it will likely be better to follow either the suggestion to prevent canonicalisation of branches & compares into switch statements (under a given number of cases) in the first place, or to teach the loop vectoriser to recognise switches.

:(
I'm sorry for derailing this.
I still think proper switch handling for loops would be nice.

lebedev.ri mentioned this in D116309: [WIP][LoopVectorize] Convert switch blocks into branch sequence.Dec 27 2021, 7:41 AM

Revision Contents

Path

Size

clang/

test/

Frontend/

optimization-remark-analysis.c

4 lines

llvm/

include/

llvm/

Transforms/

Utils/

SimplifyCFGOptions.h

6 lines

lib/

Passes/

PassBuilder.cpp

15 lines

Transforms/

Scalar/

SimplifyCFGPass.cpp

7 lines

Utils/

SimplifyCFG.cpp

126 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-lto-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

Transforms/

LoopVectorize/

AArch64/

sve-remove-switches.ll

277 lines

remove-switches.ll

352 lines

SimplifyCFG/

nomerge.ll

2 lines

remove-switches.ll

142 lines

Diff 366637

clang/test/Frontend/optimization-remark-analysis.c

	// RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -Rpass-analysis -S %s -o - 2>&1 \| FileCheck %s --check-prefix=RPASS			// RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -mllvm -remove-switch-blocks=false -emit-llvm -Rpass-analysis -S %s -o - 2>&1 \| FileCheck %s --check-prefix=RPASS
	// RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -emit-llvm -S %s -o - 2>&1 \| FileCheck %s			// RUN: %clang -O1 -fvectorize -target x86_64-unknown-unknown -mllvm -remove-switch-blocks=false -emit-llvm -S %s -o - 2>&1 \| FileCheck %s

	// RPASS: {{.*}}:7:8: remark: loop not vectorized: loop contains a switch statement			// RPASS: {{.*}}:7:8: remark: loop not vectorized: loop contains a switch statement
	// CHECK-NOT: {{.*}}:7:8: remark: loop not vectorized: loop contains a switch statement			// CHECK-NOT: {{.*}}:7:8: remark: loop not vectorized: loop contains a switch statement

	double foo(int N, int *Array) {			double foo(int N, int *Array) {
	double v = 0.0;			double v = 0.0;

	#pragma clang loop vectorize(enable)			#pragma clang loop vectorize(enable)
	Show All 11 Lines

llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h

Show All 23 Lines	struct SimplifyCFGOptions {
int BonusInstThreshold = 1;		int BonusInstThreshold = 1;
bool ForwardSwitchCondToPhi = false;		bool ForwardSwitchCondToPhi = false;
bool ConvertSwitchToLookupTable = false;		bool ConvertSwitchToLookupTable = false;
bool NeedCanonicalLoop = true;		bool NeedCanonicalLoop = true;
bool HoistCommonInsts = false;		bool HoistCommonInsts = false;
bool SinkCommonInsts = false;		bool SinkCommonInsts = false;
bool SimplifyCondBranch = true;		bool SimplifyCondBranch = true;
bool FoldTwoEntryPHINode = true;		bool FoldTwoEntryPHINode = true;
		unsigned SwitchRemovalThreshold = 0;

AssumptionCache *AC = nullptr;		AssumptionCache *AC = nullptr;

// Support 'builder' pattern to set members by name at construction time.		// Support 'builder' pattern to set members by name at construction time.
SimplifyCFGOptions &bonusInstThreshold(int I) {		SimplifyCFGOptions &bonusInstThreshold(int I) {
BonusInstThreshold = I;		BonusInstThreshold = I;
return *this;		return *this;
}		}
Show All 25 Lines	SimplifyCFGOptions &setSimplifyCondBranch(bool B) {
SimplifyCondBranch = B;		SimplifyCondBranch = B;
return *this;		return *this;
}		}

SimplifyCFGOptions &setFoldTwoEntryPHINode(bool B) {		SimplifyCFGOptions &setFoldTwoEntryPHINode(bool B) {
FoldTwoEntryPHINode = B;		FoldTwoEntryPHINode = B;
return *this;		return *this;
}		}

		SimplifyCFGOptions &switchRemovalThreshold(int I) {
		SwitchRemovalThreshold = I;
		return *this;
		}
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_UTILS_SIMPLIFYCFGOPTIONS_H		#endif // LLVM_TRANSFORMS_UTILS_SIMPLIFYCFGOPTIONS_H

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(InliningAdvisorMode::Default, "default",
clEnumValN(InliningAdvisorMode::Release, "release",		clEnumValN(InliningAdvisorMode::Release, "release",
"Use release mode (AOT-compiled model).")));		"Use release mode (AOT-compiled model).")));

static cl::opt<bool> EnableSyntheticCounts(		static cl::opt<bool> EnableSyntheticCounts(
"enable-npm-synthetic-counts", cl::init(false), cl::Hidden, cl::ZeroOrMore,		"enable-npm-synthetic-counts", cl::init(false), cl::Hidden, cl::ZeroOrMore,
cl::desc("Run synthetic function entry count generation "		cl::desc("Run synthetic function entry count generation "
"pass"));		"pass"));

		static cl::opt<bool>
		RemoveSwitchBlocks("remove-switch-blocks", cl::init(true), cl::Hidden,
		cl::desc("Convert switch blocks into a branch sequence "
		"prior to vectorization."));

		// This value determines the point at which we stop removing switch statements
		// before the vectorizer pass. Removing switch blocks and replacing them with
		// compares and branches allows architectures that support predication to
		// vectorize.
		static const int RemoveSwitchCaseThreshold = 4;

static const Regex DefaultAliasRegex(		static const Regex DefaultAliasRegex(
"^(default\|thinlto-pre-link\|thinlto\|lto-pre-link\|lto)<(O[0123sz])>$");		"^(default\|thinlto-pre-link\|thinlto\|lto-pre-link\|lto)<(O[0123sz])>$");

/// Flag to enable inline deferral during PGO.		/// Flag to enable inline deferral during PGO.
static cl::opt<bool>		static cl::opt<bool>
EnablePGOInlineDeferral("enable-npm-pgo-inline-deferral", cl::init(true),		EnablePGOInlineDeferral("enable-npm-pgo-inline-deferral", cl::init(true),
cl::Hidden,		cl::Hidden,
cl::desc("Enable inline deferral during PGO"));		cl::desc("Enable inline deferral during PGO"));
▲ Show 20 Lines • Show All 929 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
}		}

return MPM;		return MPM;
}		}

/// TODO: Should LTO cause any differences to this set of passes?		/// TODO: Should LTO cause any differences to this set of passes?
void PassBuilder::addVectorPasses(OptimizationLevel Level,		void PassBuilder::addVectorPasses(OptimizationLevel Level,
FunctionPassManager &FPM, bool IsFullLTO) {		FunctionPassManager &FPM, bool IsFullLTO) {
		if (RemoveSwitchBlocks)
		FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().switchRemovalThreshold(
		RemoveSwitchCaseThreshold)));

FPM.addPass(LoopVectorizePass(		FPM.addPass(LoopVectorizePass(
LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));		LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

if (IsFullLTO) {		if (IsFullLTO) {
// The vectorizer may have significantly shortened a loop body; unroll		// The vectorizer may have significantly shortened a loop body; unroll
// again. Unroll small loops to hide loop backedge latency and saturate any		// again. Unroll small loops to hide loop backedge latency and saturate any
// parallel execution resources of an out-of-order processor. We also then		// parallel execution resources of an out-of-order processor. We also then
// need to clean up redundancies and loop invariant code.		// need to clean up redundancies and loop invariant code.
▲ Show 20 Lines • Show All 2,013 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "simplifycfg"		#define DEBUG_TYPE "simplifycfg"

static cl::opt<unsigned> UserBonusInstThreshold(		static cl::opt<unsigned> UserBonusInstThreshold(
"bonus-inst-threshold", cl::Hidden, cl::init(1),		"bonus-inst-threshold", cl::Hidden, cl::init(1),
cl::desc("Control the number of bonus instructions (default = 1)"));		cl::desc("Control the number of bonus instructions (default = 1)"));

		static cl::opt<unsigned> UserSwitchRemovalThreshold(
		"switch-removal-threshold", cl::Hidden, cl::init(0),
		cl::desc("Set the threshold for the number of switch cases where we"
		"convert switch blocks to branches and compares"));

static cl::opt<bool> UserKeepLoops(		static cl::opt<bool> UserKeepLoops(
"keep-loops", cl::Hidden, cl::init(true),		"keep-loops", cl::Hidden, cl::init(true),
cl::desc("Preserve canonical loop structure (default = true)"));		cl::desc("Preserve canonical loop structure (default = true)"));

static cl::opt<bool> UserSwitchToLookup(		static cl::opt<bool> UserSwitchToLookup(
"switch-to-lookup", cl::Hidden, cl::init(false),		"switch-to-lookup", cl::Hidden, cl::init(false),
cl::desc("Convert switches to lookup tables (default = false)"));		cl::desc("Convert switches to lookup tables (default = false)"));

▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	static void applyCommandLineOverridesToOptions(SimplifyCFGOptions &Options) {
if (UserSwitchToLookup.getNumOccurrences())		if (UserSwitchToLookup.getNumOccurrences())
Options.ConvertSwitchToLookupTable = UserSwitchToLookup;		Options.ConvertSwitchToLookupTable = UserSwitchToLookup;
if (UserKeepLoops.getNumOccurrences())		if (UserKeepLoops.getNumOccurrences())
Options.NeedCanonicalLoop = UserKeepLoops;		Options.NeedCanonicalLoop = UserKeepLoops;
if (UserHoistCommonInsts.getNumOccurrences())		if (UserHoistCommonInsts.getNumOccurrences())
Options.HoistCommonInsts = UserHoistCommonInsts;		Options.HoistCommonInsts = UserHoistCommonInsts;
if (UserSinkCommonInsts.getNumOccurrences())		if (UserSinkCommonInsts.getNumOccurrences())
Options.SinkCommonInsts = UserSinkCommonInsts;		Options.SinkCommonInsts = UserSinkCommonInsts;
		if (UserSwitchRemovalThreshold.getNumOccurrences())
		Options.SwitchRemovalThreshold = UserSwitchRemovalThreshold;
}		}

SimplifyCFGPass::SimplifyCFGPass() : Options() {		SimplifyCFGPass::SimplifyCFGPass() : Options() {
applyCommandLineOverridesToOptions(Options);		applyCommandLineOverridesToOptions(Options);
}		}

SimplifyCFGPass::SimplifyCFGPass(const SimplifyCFGOptions &Opts)		SimplifyCFGPass::SimplifyCFGPass(const SimplifyCFGOptions &Opts)
: Options(Opts) {		: Options(Opts) {
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,135 Lines • ▼ Show 20 Lines	for (auto Case : SI->cases()) {
auto *Orig = Case.getCaseValue();		auto *Orig = Case.getCaseValue();
auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base);		auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base);
Case.setValue(		Case.setValue(
cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(ShiftC->getValue()))));		cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(ShiftC->getValue()))));
}		}
return true;		return true;
}		}

		// Attempt to turn a switch statement into a series of conditional branches
		// which we may later be able to vectorize.
		static bool TurnSmallSwitchIntoICmps(SwitchInst *SI, IRBuilder<> &Builder) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'TurnSmallSwitchIntoICmps' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'TurnSmallSwitchIntoICmps' [readability…
		assert(SI->getNumCases() > 1 && "Degenerate switch?");

		// Check to see if we have a genuine default, reachable block with executable
		// instructions in them.
		bool HasDefault =
		!isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());

		BasicBlock *DefaultBlock = HasDefault ? SI->getDefaultDest() : nullptr;
		BasicBlock *BB = SI->getParent();

		// Make sure each of the cases has a unique destination
		for (auto Case : SI->cases())
		if (!SI->findCaseDest(Case.getCaseSuccessor()))
		return false;

		// Record the total weighting for this switch block.
		uint64_t TotalWeight = 0;
		SmallVector<uint64_t, 8> Weights;
		if (HasBranchWeights(SI)) {
		GetBranchWeights(SI, Weights);
		if (Weights.size() == (SI->getNumCases() + 1))
		for (auto W : Weights)
		TotalWeight += W;
		}

		BasicBlock *FalseDest = nullptr;
		uint64_t FalseWeight = TotalWeight;
		for (auto CI : SI->cases()) {
		BasicBlock *TrueDest = CI.getCaseSuccessor();
		Value *Cmp =
		Builder.CreateICmpEQ(SI->getCondition(), CI.getCaseValue(), "switch");

		// Walk through PHIs in TrueDest and see which ones came
		// from the switch block, then remap them.
		if (FalseDest) {
		for (PHINode &PN : TrueDest->phis()) {
		for (auto PB : PN.blocks()) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto PB' can be declared as 'auto PB' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto PB' can be declared as 'auto *PB' [llvm-qualified-auto] [[https…
		if (PB == BB) {
		Value *V = PN.getIncomingValueForBlock(BB);
		PN.removeIncomingValue(BB, false);
		PN.addIncoming(V, FalseDest);
		}
		}
		}
		}

		BasicBlock *MoveAfter = FalseDest ? FalseDest : BB;
		FalseDest = BasicBlock::Create(BB->getContext(), BB->getName() + ".switch",
		BB->getParent(), BB);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - BB->getParent(), BB); + BB->getParent(), BB); Lint: Pre-merge checks: clang-format: please reformat the code ``` - BB->getParent()…
		FalseDest->moveAfter(MoveAfter);

		Instruction *I = Builder.CreateCondBr(Cmp, TrueDest, FalseDest);
		// Update weight for the newly-created conditional branch.
		// We set the weight of the TrueDest to the weight for the successor
		// of the current case. The FalseDest is assigned the remaining total
		// weight, minus the weight assigned to TrueDest.
		if (TotalWeight) {
		int Index = CI.getSuccessorIndex();
		FalseWeight -= Weights[Index];
		setBranchWeights(I, Weights[Index], FalseWeight);
		}
		Builder.SetInsertPoint(FalseDest);
		}

		if (DefaultBlock) {
		Builder.CreateBr(DefaultBlock);

		// The block that we jump to may have had some PHIs that came
		// from the block containing the switch statement. Now that we
		// are removing the switch statement we need to fix up the PHIs.

		// Walk through PHIs in DefaultBlock and see which ones came
		// from the switch block, then remap them.
		for (PHINode &PN : DefaultBlock->phis()) {
		for (auto PB : PN.blocks()) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto PB' can be declared as 'auto PB' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto PB' can be declared as 'auto *PB' [llvm-qualified-auto] [[https…
		if (PB == BB) {
		Value *V = PN.getIncomingValueForBlock(BB);
		PN.removeIncomingValue(BB, false);
		PN.addIncoming(V, FalseDest);
		}
		}
		}
		} else
		Builder.CreateUnreachable();

		// Drop the switch.
		SI->eraseFromParent();

		Builder.SetInsertPoint(BB);

		return true;
		}

bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {		bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
BasicBlock *BB = SI->getParent();		BasicBlock *BB = SI->getParent();

if (isValueEqualityComparison(SI)) {		if (isValueEqualityComparison(SI)) {
// If we only have one predecessor, and if it is a branch on this value,		// If we only have one predecessor, and if it is a branch on this value,
// see if that predecessor totally determines the outcome of this switch.		// see if that predecessor totally determines the outcome of this switch.
if (BasicBlock *OnlyPred = BB->getSinglePredecessor())		if (BasicBlock *OnlyPred = BB->getSinglePredecessor())
if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))		if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))
return requestResimplify();		return requestResimplify();

Value *Cond = SI->getCondition();		Value *Cond = SI->getCondition();
if (SelectInst *Select = dyn_cast<SelectInst>(Cond))		if (SelectInst *Select = dyn_cast<SelectInst>(Cond))
if (SimplifySwitchOnSelect(SI, Select))		if (SimplifySwitchOnSelect(SI, Select))
return requestResimplify();		return requestResimplify();

// If the block only contains the switch, see if we can fold the block		// If the block only contains the switch, see if we can fold the block
// away into any preds.		// away into any preds.
if (SI == &*BB->instructionsWithoutDebug().begin())		if (SI == &*BB->instructionsWithoutDebug().begin())
if (FoldValueComparisonIntoPredecessors(SI, Builder))		if (FoldValueComparisonIntoPredecessors(SI, Builder))
return requestResimplify();		return requestResimplify();
}		}

		unsigned NumCases = SI->getNumCases();
		bool RemoveSwitches = Options.SwitchRemovalThreshold >= NumCases;

		if (RemoveSwitches && TurnSmallSwitchIntoICmps(SI, Builder))
		return simplifyCFG(BB, TTI, DTU, Options) \| true;

// Try to transform the switch into an icmp and a branch.		// Try to transform the switch into an icmp and a branch.
if (TurnSwitchRangeIntoICmp(SI, Builder))		if (!RemoveSwitches && TurnSwitchRangeIntoICmp(SI, Builder))
return requestResimplify();		return requestResimplify();

// Remove unreachable cases.		// Remove unreachable cases.
if (eliminateDeadSwitchCases(SI, DTU, Options.AC, DL))		if (eliminateDeadSwitchCases(SI, DTU, Options.AC, DL))
return requestResimplify();		return requestResimplify();

if (switchToSelect(SI, Builder, DTU, DL, TTI))		if (switchToSelect(SI, Builder, DTU, DL, TTI))
return requestResimplify();		return requestResimplify();
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	bool SimplifyCFGOpt::simplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
if (isValueEqualityComparison(BI)) {		if (isValueEqualityComparison(BI)) {
// If we only have one predecessor, and if it is a branch on this value,		// If we only have one predecessor, and if it is a branch on this value,
// see if that predecessor totally determines the outcome of this		// see if that predecessor totally determines the outcome of this
// switch.		// switch.
if (BasicBlock *OnlyPred = BB->getSinglePredecessor())		if (BasicBlock *OnlyPred = BB->getSinglePredecessor())
if (SimplifyEqualityComparisonWithOnlyPredecessor(BI, OnlyPred, Builder))		if (SimplifyEqualityComparisonWithOnlyPredecessor(BI, OnlyPred, Builder))
return requestResimplify();		return requestResimplify();

		if (Options.SwitchRemovalThreshold == 0) {
// This block must be empty, except for the setcond inst, if it exists.		// This block must be empty, except for the setcond inst, if it exists.
// Ignore dbg and pseudo intrinsics.		// Ignore dbg and pseudo intrinsics.
auto I = BB->instructionsWithoutDebug(true).begin();		auto I = BB->instructionsWithoutDebug(true).begin();
if (&*I == BI) {		if (&*I == BI) {
if (FoldValueComparisonIntoPredecessors(BI, Builder))		if (FoldValueComparisonIntoPredecessors(BI, Builder))
return requestResimplify();		return requestResimplify();
} else if (&*I == cast<Instruction>(BI->getCondition())) {		} else if (&*I == cast<Instruction>(BI->getCondition())) {
++I;		++I;
if (&*I == BI && FoldValueComparisonIntoPredecessors(BI, Builder))		if (&*I == BI && FoldValueComparisonIntoPredecessors(BI, Builder))
return requestResimplify();		return requestResimplify();
}		}
}		}
		}

// Try to turn "br (X == 0 \| X == 1), T, F" into a switch instruction.		// Try to turn "br (X == 0 \| X == 1), T, F" into a switch instruction.
if (SimplifyBranchOnICmpChain(BI, Builder, DL))		if (SimplifyBranchOnICmpChain(BI, Builder, DL))
return true;		return true;

// If this basic block has dominating predecessor blocks and the dominating		// If this basic block has dominating predecessor blocks and the dominating
// blocks' conditions imply BI's condition, we know the direction of BI.		// blocks' conditions imply BI's condition, we know the direction of BI.
Optional<bool> Imp = isImpliedByDomCondition(BI->getCondition(), BI, DL);		Optional<bool> Imp = isImpliedByDomCondition(BI->getCondition(), BI, DL);
▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
	; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass
	; CHECK-EXT: Running pass: {{.*}}::Bye on foo			; CHECK-EXT: Running pass: {{.*}}::Bye on foo
	; CHECK-NOEXT: {{^}}			; CHECK-NOEXT: {{^}}
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
				; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on Loop			; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on Loop			; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on Loop			; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on Loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
				; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo
	; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo			; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo
	; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo			; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo
	; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo			; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass			; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
				; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O-NEXT: Running pass: LCSSAPass on foo			; CHECK-O-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
				; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
				; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-remove-switches.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -O3 -loop-vectorize -mtriple aarch64-linux-gnu -mattr=+sve -scalable-vectorization=on -S \| FileCheck %s

				define void @switch(i32* noalias %a, i32* noalias %b, i32* noalias %c, i64 %N) #0 {
				; CHECK-LABEL: @switch(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> [[TMP7]], align 4
				; CHECK-NEXT: [[TMP8:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 4, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP10:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP12:%.*]] = xor <vscale x 4 x i1> [[TMP9]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP13:%.*]] = select <vscale x 4 x i1> [[TMP8]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 false, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i1> [[TMP12]]
				; CHECK-NEXT: [[TMP14:%.*]] = xor <vscale x 4 x i1> [[TMP10]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP15:%.*]] = select <vscale x 4 x i1> [[TMP13]], <vscale x 4 x i1> [[TMP14]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 false, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP16]], i32 4, <vscale x 4 x i1> [[TMP15]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[TMP17:%.*]] = mul nsw <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
				; CHECK-NEXT: [[TMP18:%.*]] = add nsw <vscale x 4 x i32> [[TMP17]], [[WIDE_LOAD]]
				; CHECK-NEXT: [[TMP19:%.*]] = select <vscale x 4 x i1> [[TMP8]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 false, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i1> [[TMP9]]
				; CHECK-NEXT: [[TMP20:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP20]], i32 4, <vscale x 4 x i1> [[TMP19]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 4 x i1> [[TMP15]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD6]]
				; CHECK-NEXT: [[PREDPHI7:%.*]] = select <vscale x 4 x i1> [[TMP15]], <vscale x 4 x i32> [[TMP18]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP21:%.*]] = mul nsw <vscale x 4 x i32> [[PREDPHI]], [[PREDPHI]]
				; CHECK-NEXT: [[TMP22:%.*]] = add nsw <vscale x 4 x i32> [[TMP21]], [[PREDPHI7]]
				; CHECK-NEXT: [[TMP23:%.*]] = or <vscale x 4 x i1> [[TMP19]], [[TMP15]]
				; CHECK-NEXT: [[TMP24:%.*]] = select <vscale x 4 x i1> [[TMP13]], <vscale x 4 x i1> [[TMP10]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 false, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[PREDPHI8:%.*]] = select <vscale x 4 x i1> [[TMP24]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[TMP22]]
				; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP26:%.*]] = or <vscale x 4 x i1> [[TMP24]], [[TMP23]]
				; CHECK-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP25]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD9:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP27]], i32 4, <vscale x 4 x i1> [[TMP26]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[TMP28:%.*]] = mul nsw <vscale x 4 x i32> [[WIDE_MASKED_LOAD9]], [[PREDPHI8]]
				; CHECK-NEXT: [[TMP29:%.*]] = add nsw <vscale x 4 x i32> [[TMP28]], [[PREDPHI8]]
				; CHECK-NEXT: [[TMP30:%.]] = bitcast i32 [[TMP25]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD10:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP30]], i32 4, <vscale x 4 x i1> [[TMP8]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[PREDPHI11:%.*]] = select <vscale x 4 x i1> [[TMP26]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD9]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD10]]
				; CHECK-NEXT: [[PREDPHI12:%.*]] = select <vscale x 4 x i1> [[TMP26]], <vscale x 4 x i32> [[TMP29]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 4, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP31:%.*]] = mul nsw <vscale x 4 x i32> [[PREDPHI11]], [[PREDPHI11]]
				; CHECK-NEXT: [[TMP32:%.*]] = add nsw <vscale x 4 x i32> [[TMP31]], [[PREDPHI12]]
				; CHECK-NEXT: [[TMP33:%.]] = bitcast i32 [[TMP6]] to <vscale x 4 x i32>*
				; CHECK-NEXT: store <vscale x 4 x i32> [[TMP32]], <vscale x 4 x i32>* [[TMP33]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], {{.*}}
				; CHECK-NEXT: [[TMP34:%.]] = icmp eq i64 [[INDEX_NEXT]], {{.}}
				; CHECK-NEXT: br i1 [[TMP34]], label [[MIDDLE_BLOCK:%.*]], label %vector.body, !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L4:%.]] ], [ {{.}}, %for.body.preheader ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[I]]
				; CHECK-NEXT: [[TMP35:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: switch i32 [[TMP35]], label [[FOR_BODY_SWITCH5:%.*]] [
				; CHECK-NEXT: i32 4, label [[FOR_BODY_L4_CRIT_EDGE:%.*]]
				; CHECK-NEXT: i32 2, label [[FOR_BODY_L2_CRIT_EDGE:%.*]]
				; CHECK-NEXT: i32 3, label [[L3:%.*]]
				; CHECK-NEXT: ]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L4 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 4, label %L4
				i32 2, label %L2
				i32 3, label %L3
				]

				L1:
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx5
				%mul = mul nsw i32 %1, %0
				%add = add nsw i32 %mul, %0
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%2 = phi i32 [ 2, %for.body ], [ %add, %L1 ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%3 = load i32, i32* %arrayidx7
				%mul9 = mul nsw i32 %3, %3
				%add11 = add nsw i32 %2, %mul9
				store i32 %add11, i32* %arrayidx
				br label %L3

				L3:
				%4 = phi i32 [ 3, %for.body ], [ %add11, %L2 ]
				%arrayidx13 = getelementptr inbounds i32, i32* %c, i64 %i
				%5 = load i32, i32* %arrayidx13
				%mul14 = mul nsw i32 %5, %4
				%add16 = add nsw i32 %mul14, %4
				store i32 %add16, i32* %arrayidx
				br label %L4

				L4:
				%6 = phi i32 [ 4, %for.body ], [ %add16, %L3 ]
				%arrayidx17 = getelementptr inbounds i32, i32* %c, i64 %i
				%7 = load i32, i32* %arrayidx17
				%mul19 = mul nsw i32 %7, %7
				%add21 = add nsw i32 %6, %mul19
				store i32 %add21, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret void
				}

				define void @switch_VF1_UF2(i32* noalias %a, i32* noalias %b, i32* noalias %c, i64 %N) #0 {
				; CHECK-LABEL: @switch_VF1_UF2(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE6:%.*]] ]
				; CHECK-NEXT: [[INDUCTION4:%.*]] = or i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <2 x i32>*
				; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
				; CHECK-NEXT: [[TMP3:%.*]] = icmp eq <2 x i32> [[TMP2]], <i32 3, i32 3>
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <2 x i32> [[TMP2]], <i32 2, i32 2>
				; CHECK-NEXT: [[TMP5:%.*]] = mul nsw <2 x i32> [[TMP2]], <i32 3, i32 3>
				; CHECK-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP4]], <2 x i32> <i32 2, i32 2>, <2 x i32> [[TMP5]]
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
				; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_CONTINUE:%.]], label [[PRED_LOAD_IF:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP8]], align 4
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP10:%.*]] = phi i32 [ poison, %vector.body ], [ [[TMP9]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1
				; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_LOAD_CONTINUE6]], label [[PRED_LOAD_IF5:%.*]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION4]]
				; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP14:%.*]] = phi i32 [ poison, [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP14]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = mul nsw <2 x i32> [[TMP16]], <i32 3, i32 3>
				; CHECK-NEXT: [[TMP18:%.*]] = add nsw <2 x i32> [[TMP17]], [[TMP6]]
				; CHECK-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP3]], <2 x i32> <i32 3, i32 3>, <2 x i32> [[TMP18]]
				; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <2 x i32>*
				; CHECK-NEXT: [[TMP22:%.]] = load <2 x i32>, <2 x i32> [[TMP21]], align 4
				; CHECK-NEXT: [[TMP23:%.*]] = shl nsw <2 x i32> [[TMP22]], <i32 2, i32 2>
				; CHECK-NEXT: [[TMP24:%.*]] = add nsw <2 x i32> [[TMP23]], [[TMP19]]
				; CHECK-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP0]] to <2 x i32>*
				; CHECK-NEXT: store <2 x i32> [[TMP24]], <2 x i32>* [[TMP25]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP26:%.]] = icmp eq i64 [[INDEX_NEXT]], {{.}}
				; CHECK-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label %vector.body, !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L3:%.]] ], [ {{.}}, %for.body.preheader ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[I]]
				; CHECK-NEXT: [[TMP27:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[SWITCH:%.*]] = icmp eq i32 [[TMP27]], 3
				; CHECK-NEXT: br i1 [[SWITCH]], label [[L3]], label [[FOR_BODY_SWITCH:%.*]]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				%switch = icmp eq i32 %0, 3
				br i1 %switch, label %L3, label %for.body.switch

				for.body.switch:
				%switch1 = icmp eq i32 %0, 2
				br i1 %switch1, label %L2, label %for.body.switch2

				for.body.switch2:
				%add = mul nsw i32 %0, 3
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%1 = phi i32 [ %add, %for.body.switch2 ], [ %0, %for.body.switch ]
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%2 = load i32, i32* %arrayidx5
				%mul6 = mul nsw i32 %2, 3
				%add8 = add nsw i32 %1, %mul6
				store i32 %add8, i32* %arrayidx
				br label %L3

				L3:
				%3 = phi i32 [ %0, %for.body ], [ %add8, %L2 ]
				%arrayidx9 = getelementptr inbounds i32, i32* %c, i64 %i
				%4 = load i32, i32* %arrayidx9
				%mul10 = shl nsw i32 %4, 2
				%add12 = add nsw i32 %3, %mul10
				store i32 %add12, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret void
				}

				; This loop will not vectorize due to unsafe FP ops, ensure the switch statement is created again in for.body
				define float @switch_no_vectorize(i32* noalias %a, i32* noalias %b, i32* noalias %c, float %val, i64 %N) {
				; CHECK-LABEL: @switch_no_vectorize(
				; CHECK-NOT: vector.body
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L3:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[SUM_033:%.]] = phi float [ [[CONV20:%.]], [[L3]] ], [ 2.000000e+00, [[ENTRY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: switch i32 [[TMP0]], label [[FOR_BODY_SWITCH2:%.*]] [
				; CHECK-NEXT: i32 3, label [[L3]]
				; CHECK-NEXT: i32 2, label [[L2:%.*]]
				; CHECK-NEXT: ]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
				%sum.033 = phi float [ %conv20, %L3 ], [ 2.000000e+00, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 3, label %L3
				i32 2, label %L2
				]

				L1:
				%conv = sitofp i32 %0 to float
				%conv4 = fpext float %conv to double
				%add = fadd double %conv4, 1.000000e+00
				%conv5 = fpext float %sum.033 to double
				%mul = fmul double %add, %conv5
				%conv6 = fptrunc double %mul to float
				br label %L2

				L2:
				%sum.1 = phi float [ %conv6, %L1 ], [ %sum.033, %for.body ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx7
				%conv8 = sitofp i32 %1 to float
				%conv9 = fpext float %conv8 to double
				%add10 = fadd double %conv9, 2.000000e+00
				%conv11 = fpext float %sum.1 to double
				%mul12 = fmul double %add10, %conv11
				%conv13 = fptrunc double %mul12 to float
				br label %L3

				L3:
				%sum.2 = phi float [ %conv13, %L2 ], [ %sum.033, %for.body ]
				%arrayidx14 = getelementptr inbounds i32, i32* %c, i64 %i
				%2 = load i32, i32* %arrayidx14
				%conv15 = sitofp i32 %2 to float
				%conv16 = fpext float %conv15 to double
				%add17 = fadd double %conv16, 3.000000e+00
				%conv18 = fpext float %sum.2 to double
				%mul19 = fmul double %add17, %conv18
				%conv20 = fptrunc double %mul19 to float
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret float %conv20
				}

				!0 = distinct !{!0, !1, !2, !3, !4}
				!1 = !{!"llvm.loop.vectorize.width", i32 1}
				!2 = !{!"llvm.loop.interleave.count", i32 2}
				!3 = !{!"llvm.loop.vectorize.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/remove-switches.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -O3 -loop-vectorize -pass-remarks-analysis=loop-vectorize -S 2>%t \| FileCheck %s
				; RUN: cat %t \| FileCheck %s -check-prefix=CHECK-REMARKS

				; We should not vectorize this loop since we do not have masked loads and stores
				; CHECK-REMARKS: remark: <unknown>:0:0: the cost-model indicates that vectorization is not beneficial
				define void @switch_cost(i32* noalias %a, i32* noalias readonly %b, i32* noalias readonly %c, i64 %N) #0 {
				; CHECK-LABEL: @switch_cost(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-NOT: vector.body
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L4:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: switch i32 [[TMP0]], label [[FOR_BODY_SWITCH5:%.*]] [
				; CHECK-NEXT: i32 4, label [[FOR_BODY_L4_CRIT_EDGE:%.*]]
				; CHECK-NEXT: i32 2, label [[FOR_BODY_L2_CRIT_EDGE:%.*]]
				; CHECK-NEXT: i32 3, label [[L3:%.*]]
				; CHECK-NEXT: ]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L4 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 4, label %L4
				i32 2, label %L2
				i32 3, label %L3
				]

				L1:
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx5
				%mul = mul nsw i32 %1, %0
				%add = add nsw i32 %mul, %0
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%2 = phi i32 [ 2, %for.body ], [ %add, %L1 ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%3 = load i32, i32* %arrayidx7
				%mul9 = mul nsw i32 %3, %3
				%add11 = add nsw i32 %2, %mul9
				store i32 %add11, i32* %arrayidx
				br label %L3

				L3:
				%4 = phi i32 [ 3, %for.body ], [ %add11, %L2 ]
				%arrayidx13 = getelementptr inbounds i32, i32* %c, i64 %i
				%5 = load i32, i32* %arrayidx13
				%mul14 = mul nsw i32 %5, %4
				%add16 = add nsw i32 %mul14, %4
				store i32 %add16, i32* %arrayidx
				br label %L4

				L4:
				%6 = phi i32 [ 4, %for.body ], [ %add16, %L3 ]
				%arrayidx17 = getelementptr inbounds i32, i32* %c, i64 %i
				%7 = load i32, i32* %arrayidx17
				%mul19 = mul nsw i32 %7, %7
				%add21 = add nsw i32 %6, %mul19
				store i32 %add21, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret void
				}

				define void @switch(i32* noalias %a, i32* noalias %b, i64 %N) {
				; CHECK-LABEL: @switch(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP14:%.]] = icmp sgt i64 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[CMP14]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
				; CHECK: for.body.preheader:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER4:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 3, i32 3, i32 3, i32 3>
				; CHECK-NEXT: [[TMP3:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 2, i32 2, i32 2, i32 2>
				; CHECK-NEXT: [[DOTOP:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> <i32 9, i32 9, i32 9, i32 9>, <4 x i32> <i32 16, i32 16, i32 16, i32 16>
				; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>, <4 x i32> [[DOTOP]]
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD3:%.]] = load <4 x i32>, <4 x i32> [[TMP6]], align 4
				; CHECK-NEXT: [[TMP7:%.*]] = mul nsw <4 x i32> [[WIDE_LOAD3]], [[TMP4]]
				; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
				; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_PREHEADER4]]
				; CHECK: for.body.preheader4:
				; CHECK-NEXT: [[I_015_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.cond.cleanup.loopexit:
				; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
				; CHECK: for.cond.cleanup:
				; CHECK-NEXT: ret void
				; CHECK: for.body:
				; CHECK-NEXT: [[I_015:%.]] = phi i64 [ [[INC:%.]], [[FOR_BODY]] ], [ [[I_015_PH]], [[FOR_BODY_PREHEADER4]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[I_015]]
				; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[SWITCH:%.*]] = icmp eq i32 [[TMP10]], 3
				; CHECK-NEXT: [[SWITCH1:%.*]] = icmp eq i32 [[TMP10]], 2
				; CHECK-NEXT: [[R_0_OP:%.*]] = select i1 [[SWITCH1]], i32 9, i32 16
				; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[SWITCH]], i32 7, i32 [[R_0_OP]]
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[I_015]]
				; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP11]], [[ADD4]]
				; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX5]], align 4
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_015]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]

				entry:
				%cmp14 = icmp sgt i64 %N, 0
				br i1 %cmp14, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %L3
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %L3
				%i.015 = phi i64 [ %inc, %L3 ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i.015
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 3, label %L3
				i32 2, label %L2
				]

				L1: ; preds = %for.body
				br label %L2

				L2: ; preds = %for.body, %L1
				%r.0 = phi i32 [ 12, %L1 ], [ 5, %for.body ]
				br label %L3

				L3: ; preds = %for.body, %L2
				%r.1 = phi i32 [ %r.0, %L2 ], [ 3, %for.body ]
				%add4 = add nuw nsw i32 %r.1, 4
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i.015
				%1 = load i32, i32* %arrayidx5
				%mul = mul nsw i32 %1, %add4
				store i32 %mul, i32* %arrayidx5
				%inc = add nuw nsw i64 %i.015, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body, !llvm.loop !0
				}

				define void @switch_VF1_UF2(i32* noalias %a, i32* noalias readonly %b, i32* noalias readonly %c, i64 %N) {
				; CHECK-LABEL: @switch_VF1_UF2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -2
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE6:%.*]] ]
				; CHECK-NEXT: [[INDUCTION4:%.*]] = or i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDUCTION4]]
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP0]], align 4
				; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4
				; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i32 [[TMP2]], 3
				; CHECK-NEXT: [[DOTNOT8:%.*]] = icmp eq i32 [[TMP3]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP2]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP3]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = mul nsw i32 [[TMP2]], 3
				; CHECK-NEXT: [[TMP7:%.*]] = mul nsw i32 [[TMP3]], 3
				; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP4]], i32 2, i32 [[TMP6]]
				; CHECK-NEXT: [[TMP9:%.*]] = select i1 [[TMP5]], i32 2, i32 [[TMP7]]
				; CHECK-NEXT: br i1 [[DOTNOT]], label [[PRED_LOAD_CONTINUE:%.]], label [[PRED_LOAD_IF:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP10]], align 4
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP12:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: br i1 [[DOTNOT8]], label [[PRED_LOAD_CONTINUE6]], label [[PRED_LOAD_IF5:%.*]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDUCTION4]]
				; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP15:%.*]] = phi i32 [ poison, [[PRED_LOAD_CONTINUE]] ], [ [[TMP14]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP16:%.*]] = mul nsw i32 [[TMP12]], 3
				; CHECK-NEXT: [[TMP17:%.*]] = mul nsw i32 [[TMP15]], 3
				; CHECK-NEXT: [[TMP18:%.*]] = add nsw i32 [[TMP16]], [[TMP8]]
				; CHECK-NEXT: [[TMP19:%.*]] = add nsw i32 [[TMP17]], [[TMP9]]
				; CHECK-NEXT: [[PREDPHI:%.*]] = select i1 [[DOTNOT]], i32 3, i32 [[TMP18]]
				; CHECK-NEXT: [[PREDPHI7:%.*]] = select i1 [[DOTNOT8]], i32 3, i32 [[TMP19]]
				; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDUCTION4]]
				; CHECK-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP20]], align 4
				; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP21]], align 4
				; CHECK-NEXT: [[TMP24:%.*]] = shl nsw i32 [[TMP22]], 2
				; CHECK-NEXT: [[TMP25:%.*]] = shl nsw i32 [[TMP23]], 2
				; CHECK-NEXT: [[TMP26:%.*]] = add nsw i32 [[TMP24]], [[PREDPHI]]
				; CHECK-NEXT: [[TMP27:%.*]] = add nsw i32 [[TMP25]], [[PREDPHI7]]
				; CHECK-NEXT: store i32 [[TMP26]], i32* [[TMP0]], align 4
				; CHECK-NEXT: store i32 [[TMP27]], i32* [[TMP1]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L3:%.]] ], [ {{.}}, [[FOR_BODY_PREHEADER]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[I]]
				; CHECK-NEXT: [[TMP29:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[SWITCH:%.*]] = icmp eq i32 [[TMP29]], 3
				; CHECK-NEXT: br i1 [[SWITCH]], label [[L3]], label [[FOR_BODY_SWITCH:%.*]]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				%switch = icmp eq i32 %0, 3
				br i1 %switch, label %L3, label %for.body.switch

				for.body.switch:
				%switch1 = icmp eq i32 %0, 2
				br i1 %switch1, label %L2, label %for.body.switch2

				for.body.switch2:
				%add = mul nsw i32 %0, 3
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%1 = phi i32 [ %add, %for.body.switch2 ], [ %0, %for.body.switch ]
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%2 = load i32, i32* %arrayidx5
				%mul6 = mul nsw i32 %2, 3
				%add8 = add nsw i32 %1, %mul6
				store i32 %add8, i32* %arrayidx
				br label %L3

				L3:
				%3 = phi i32 [ %0, %for.body ], [ %add8, %L2 ]
				%arrayidx9 = getelementptr inbounds i32, i32* %c, i64 %i
				%4 = load i32, i32* %arrayidx9
				%mul10 = shl nsw i32 %4, 2
				%add12 = add nsw i32 %3, %mul10
				store i32 %add12, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1

				for.end:
				ret void
				}

				; This loop will not vectorize due to unsafe FP ops, ensure the switch statement is created again in for.body
				define float @switch_no_vectorize(i32* noalias %a, i32* noalias readonly %b, i32* noalias readonly %c, float %val, i64 %N) {
				; CHECK-LABEL: @switch_no_vectorize(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK-NOT: vector.body:
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L3:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[SUM_033:%.]] = phi float [ [[CONV20:%.]], [[L3]] ], [ 2.000000e+00, [[ENTRY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: switch i32 [[TMP0]], label [[FOR_BODY_SWITCH2:%.*]] [
				; CHECK-NEXT: i32 3, label [[L3]]
				; CHECK-NEXT: i32 2, label [[L2:%.*]]
				; CHECK-NEXT: ]

				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
				%sum.033 = phi float [ %conv20, %L3 ], [ 2.000000e+00, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 3, label %L3
				i32 2, label %L2
				]

				L1:
				%conv = sitofp i32 %0 to float
				%conv4 = fpext float %conv to double
				%add = fadd double %conv4, 1.000000e+00
				%conv5 = fpext float %sum.033 to double
				%mul = fmul double %add, %conv5
				%conv6 = fptrunc double %mul to float
				br label %L2

				L2:
				%sum.1 = phi float [ %conv6, %L1 ], [ %sum.033, %for.body ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx7
				%conv8 = sitofp i32 %1 to float
				%conv9 = fpext float %conv8 to double
				%add10 = fadd double %conv9, 2.000000e+00
				%conv11 = fpext float %sum.1 to double
				%mul12 = fmul double %add10, %conv11
				%conv13 = fptrunc double %mul12 to float
				br label %L3

				L3:
				%sum.2 = phi float [ %conv13, %L2 ], [ %sum.033, %for.body ]
				%arrayidx14 = getelementptr inbounds i32, i32* %c, i64 %i
				%2 = load i32, i32* %arrayidx14
				%conv15 = sitofp i32 %2 to float
				%conv16 = fpext float %conv15 to double
				%add17 = fadd double %conv16, 3.000000e+00
				%conv18 = fpext float %sum.2 to double
				%mul19 = fmul double %add17, %conv18
				%conv20 = fptrunc double %mul19 to float
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret float %conv20
				}

				!0 = distinct !{!0, !2, !4, !6}
				!1 = distinct !{!1, !3, !5, !6}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.width", i32 1}
				!4 = !{!"llvm.loop.interleave.count", i32 1}
				!5 = !{!"llvm.loop.interleave.count", i32 2}
				!6 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/SimplifyCFG/nomerge.ll

	; RUN: opt < %s -O1 -S \| FileCheck %s			; RUN: opt < %s -O1 -remove-switch-blocks=false -S \| FileCheck %s

	; The attribute nomerge prevents the 3 bar() calls from being sunk/hoisted into			; The attribute nomerge prevents the 3 bar() calls from being sunk/hoisted into
	; one inside a function. Check that there are still 3 tail calls.			; one inside a function. Check that there are still 3 tail calls.

	; Test case for preventing sinking			; Test case for preventing sinking
	; CHECK-LABEL: define void @sink			; CHECK-LABEL: define void @sink
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: tail call void @bar()			; CHECK-NEXT: tail call void @bar()
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/remove-switches.ll

This file was added.

				; RUN: opt < %s -simplifycfg -switch-removal-threshold=4 -S \| FileCheck %s

				define void @unswitch(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i64 %N){
				; CHECK-LABEL: @unswitch(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L4:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[SWITCH:%.*]] = icmp eq i32 [[TMP0]], 4
				; CHECK-NEXT: br i1 [[SWITCH]], label [[L4]], label [[FOR_BODY_SWITCH:%.*]], !prof !0
				; CHECK: for.body.switch:
				; CHECK-NEXT: [[SWITCH1:%.*]] = icmp eq i32 [[TMP0]], 2
				; CHECK-NEXT: br i1 [[SWITCH1]], label [[L2:%.]], label [[FOR_BODY_SWITCH2:%.]], !prof !1
				; CHECK: for.body.switch2:
				; CHECK-NEXT: [[SWITCH3:%.*]] = icmp eq i32 [[TMP0]], 3
				; CHECK-NEXT: br i1 [[SWITCH3]], label [[L3:%.]], label [[FOR_BODY_SWITCH4:%.]], !prof !2
				; CHECK: for.body.switch4:
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP1]], [[TMP0]]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[MUL]], [[TMP0]]
				; CHECK-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[L2]]
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L4 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 4, label %L4
				i32 2, label %L2
				i32 3, label %L3
				], !prof !0

				L1:
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx5
				%mul = mul nsw i32 %1, %0
				%add = add nsw i32 %mul, %0
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%2 = phi i32 [ %0, %for.body ], [ %add, %L1 ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%3 = load i32, i32* %arrayidx7, align 4
				%mul9 = mul nsw i32 %3, %3
				%add11 = add nsw i32 %2, %mul9
				store i32 %add11, i32* %arrayidx
				br label %L3

				L3:
				%4 = phi i32 [ %0, %for.body ], [ %add11, %L2 ]
				%arrayidx13 = getelementptr inbounds i32, i32* %c, i64 %i
				%5 = load i32, i32* %arrayidx13
				%mul14 = mul nsw i32 %5, %4
				%add16 = add nsw i32 %mul14, %4
				store i32 %add16, i32* %arrayidx
				br label %L4

				L4:
				%6 = phi i32 [ %0, %for.body ], [ %add16, %L3 ]
				%arrayidx17 = getelementptr inbounds i32, i32* %c, i64 %i
				%7 = load i32, i32* %arrayidx17
				%mul19 = mul nsw i32 %7, %7
				%add21 = add nsw i32 %6, %mul19
				store i32 %add21, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret void
				}

				; This test should not replace the switch statement as multiple cases have the same destination block
				define dso_local void @switch2(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i64 %N) {
				; CHECK-LABEL: @switch2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INC:%.]], [[L3:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[I]]
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: switch i32 [[TMP0]], label [[L1:%.*]] [
				; CHECK-NEXT: i32 4, label [[L3]]
				; CHECK-NEXT: i32 2, label [[L2:%.*]]
				; CHECK-NEXT: i32 3, label [[L3]]
				; CHECK-NEXT: ]
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %L3 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx
				switch i32 %0, label %L1 [
				i32 4, label %L3
				i32 2, label %L2
				i32 3, label %L3
				]

				L1:
				%arrayidx5 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx5
				%mul = mul nsw i32 %1, %0
				%add = add nsw i32 %mul, %0
				store i32 %add, i32* %arrayidx
				br label %L2

				L2:
				%2 = phi i32 [ %0, %for.body ], [ %add, %L1 ]
				%arrayidx7 = getelementptr inbounds i32, i32* %b, i64 %i
				%3 = load i32, i32* %arrayidx7
				%mul9 = mul nsw i32 %3, %3
				%add11 = add nsw i32 %2, %mul9
				store i32 %add11, i32* %arrayidx
				br label %L3

				L3:
				%4 = phi i32 [ %0, %for.body ], [ %0, %for.body ], [ %add11, %L2 ]
				%arrayidx13 = getelementptr inbounds i32, i32* %c, i64 %i
				%5 = load i32, i32* %arrayidx13
				%mul14 = mul nsw i32 %5, %4
				%add16 = add nsw i32 %mul14, %4
				store i32 %add16, i32* %arrayidx
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret void
				}

				!0 = !{!"branch_weights", i32 15, i32 5, i32 10, i32 2}
				; CHECK: !0 = !{!"branch_weights", i32 5, i32 27}
				; CHECK: !1 = !{!"branch_weights", i32 10, i32 17}
				; CHECK: !2 = !{!"branch_weights", i32 2, i32 15}

This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Remove switch statements before vectorizationAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 366637

clang/test/Frontend/optimization-remark-analysis.c

llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-remove-switches.ll

llvm/test/Transforms/LoopVectorize/remove-switches.ll

llvm/test/Transforms/SimplifyCFG/nomerge.ll

llvm/test/Transforms/SimplifyCFG/remove-switches.ll

[WIP] Remove switch statements before vectorization
AbandonedPublic