This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Analysis/
-
IVDescriptors.cpp
-
Transforms/Vectorize/
-
Vectorize/
16/17
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
first-order-recurrence-chains.ll
-
first-order-recurrence.ll

Differential D119661

[LV] Support chained phis as incoming values for first-order recurs.
ClosedPublic

Authored by fhahn on Feb 13 2022, 3:41 AM.

Download Raw Diff

Details

Reviewers

Ayal
spatel
gilr

Commits

rGb8709a9d03f8: [LV] Support fixed order recurrences.

Summary

If the incoming previous value of a first-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,090 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,090 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,080 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vloxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
	60,350 ms	x64 debian > Clang.Driver::aarch64-cpus.c
		View Full Test Results (9 Failed)

Event Timeline

fhahn created this revision.Feb 13 2022, 3:41 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 13 2022, 3:41 AM

fhahn requested review of this revision.Feb 13 2022, 3:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2022, 3:41 AM

Harbormaster completed remote builds in B149266: Diff 408249.Feb 13 2022, 4:24 AM

david-arm added a subscriber: david-arm.Feb 14 2022, 1:41 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8882	This change looks like it might potentially affect reductions too, since now we're potentially not recording the recipe for reductions. Should there be a reduction test for this too, or an assert that `Ingredient2Recipe[Inc]` is always non-null for reductions?
8884	nit: Should this just be `!Ingredient2Recipe[Inc]` as I thought LLVM convention wasn't to compare ==/!= with nullptr?
9268	nit: Whitespace change.

Address comments, make sure we record all first-order recurrence and reduction phi recipes, add assert .

fhahn marked 3 inline comments as done.Feb 17 2022, 7:43 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8882	I don't think the change should impact reductions, as the code here only skips recording `Inc` if it is not already marked for recording. I added an assertion that if we already have a recipe recorded at this point, it must be a reduction or first-order recurrence phi.
8884	I think it would be best to use find() as not to insert new entries. Updated.
9268	removed , thanks!

Harbormaster completed remote builds in B150241: Diff 409651.Feb 17 2022, 8:04 AM

Need to make some changes before the next round of reviews.

bsmith added a subscriber: bsmith.Feb 28 2022, 3:00 AM

Allen added a subscriber: Allen.Mar 6 2022, 7:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 7:35 PM

TKaipeng added a subscriber: TKaipeng.Mar 13 2022, 8:03 PM

TKaipeng added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8869	Should we record recipes for induction phis? It's possible to be the incoming value for later phis.

peterwaller-arm added a subscriber: peterwaller-arm.Mar 17 2022, 8:15 AM

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

In D119661#3459006, @bsmith wrote:

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

In D119661#3477766, @fhahn wrote:

In D119661#3459006, @bsmith wrote:

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

Hi, sorry to prod this again, is there any update on this, specifically which reviews/issues is this waiting on? We're very keen to push this forwards as it addresses some important issues for us, we are willing to help with development/patches in any areas that need it.

I'm aka

In D119661#3599174, @bsmith wrote:

In D119661#3477766, @fhahn wrote:

In D119661#3459006, @bsmith wrote:

Is there any update on this change? It provides some nice improvements to various workloads we have been looking at and would be good to get in.

I'm planning to pick this up again fairly soon. There's some other work ongoing to break things up a bit and model codegen for entry & exit values explicitly. I think it would be good to make the transitions for first-order recurrence first and then update the patch.

Hi, sorry to prod this again, is there any update on this, specifically which reviews/issues is this waiting on? We're very keen to push this forwards as it addresses some important issues for us, we are willing to help with development/patches in any areas that need it.

I should be able to wrap up testing for an update by the end of this week.

fhahn mentioned this in D129339: [SingleSource] Add initial vectorizer tests with recurrences..Jul 7 2022, 6:21 PM

Hi Florian, any chance of this one landing in time for LLVM 15?

In D119661#3667981, @peterwaller-arm wrote:

Hi Florian, any chance of this one landing in time for LLVM 15?

Unfortunately this seems unlikely, as I was out sick for most of last week and probably some of this week too.

fhahn mentioned this in rT1846f600f7db: [SingleSource] Add initial vectorizer tests with recurrences..Jul 27 2022, 8:18 AM

fhahn mentioned this in rG6e1ba62d0dd2: [LV] Add additional tests with multiple chained recurrences..Aug 1 2022, 2:01 AM

fhahn mentioned this in rGff5ae948a723: [LV] Add variation of test cases with order of phis flipped..Aug 1 2022, 3:38 AM

Ping.

Rebased the patch and also updated it to remember any header phis. Corresponding tests have been added in ff5ae948a7230 & 6e1ba62d0dd. I have also added additional runtime testing to llvm-test-suite in rT1846f600f7db.

Note that the current handling is geared towards chains for first order recurrences. This means we widen each first-order recurrence separately and use the incoming value of the previous recurrence in the chain for the current one.

This should be different to modeling a whole chain as a 2nd or higher order recurrence, for which we may be able to generate a single recurrence phi for the whole chain. The approach in the current patch focuses on the case where each phi in the chain has other users, so we would need to generate recurrence phis for each phi in the chain anyways. Dedicated higher-order recurrence support can be added as follow-up, but I expect the gains in practice to be limited.

fhahn marked an inline comment as done.Aug 1 2022, 3:53 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8869	Yes, that's a good point, thanks! Should be fixed in the latest version and I added some extra test.

Harbormaster completed remote builds in B178545: Diff 448982.Aug 1 2022, 5:36 AM

ping :)

In D119661#3690631, @fhahn wrote:

Ping.

Rebased the patch and also updated it to remember any header phis. Corresponding tests have been added in ff5ae948a7230 & 6e1ba62d0dd. I have also added additional runtime testing to llvm-test-suite in rT1846f600f7db.

Note that the current handling is geared towards chains for first order recurrences. This means we widen each first-order recurrence separately and use the incoming value of the previous recurrence in the chain for the current one.

This should be different to modeling a whole chain as a 2nd or higher order recurrence, for which we may be able to generate a single recurrence phi for the whole chain. The approach in the current patch focuses on the case where each phi in the chain has other users, so we would need to generate recurrence phis for each phi in the chain anyways. Dedicated higher-order recurrence support can be added as follow-up, but I expect the gains in practice to be limited.

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8896	Should the assert be retained - what happens with a header `Phi` that is not an induction, reduction or FOR?
8912	I.e., is a VPHeaderPHIRecipe? Have a verifier assert that all header phi's are assigned VPHeaderPHIRecipes if desired instead of complicating the logic here?

In D119661#3722786, @Ayal wrote:

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Thanks for taking a look!

I think renaming would require to also update the code in IVDescriptors which does the actual recurrence identification. Please let me know if you would like me to rename it accross LVLegality and IVDescriptors.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

Herald added subscribers: vkmr, rogfer01. · View Herald TranscriptAug 15 2022, 6:57 AM

fhahn marked an inline comment as done.Aug 15 2022, 7:00 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8896	Yes it should be retained, I restored the original assert.
8912	I updated the code to just check `isa<VPHeaderPHIRecipe>`. This also surfaced the fact that `VPHeaderPHIRecipe::classof` was missing checks for `VPWidenPointerInduction`. This is also fixed. I kept the assertion for now, as this allows us to verify the property of the `Ingredient2Recipe` mapping.

Harbormaster completed remote builds in B181272: Diff 452656.Aug 15 2022, 8:12 AM

In D119661#3723058, @fhahn wrote:

In D119661#3722786, @Ayal wrote:

Nice reuse of chained FOR's!
Conceptually by looking past previous header phi's, Legal->isFirstOrderRecurrence(Phi) may be more accurately renamed Legal->isFixedOrderRecurrence(Phi) - also potentially recording its associated non-header-phi "Previous", even if the recipe implements this pattern using multiple single-element FOR splices.

Thanks for taking a look!

I think renaming would require to also update the code in IVDescriptors which does the actual recurrence identification. Please let me know if you would like me to rename it accross LVLegality and IVDescriptors.

Yeah, it seems First is no longer accurate for both LVLegality and IVDescriptors (but is for the recipes). Is there any benefit in caching the non-header-phi Previous in the IVDescriptor, given the iteration-by-iteration implementation?

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

Worth leaving a TODO somewhere?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8912	Good catch re: classof. Wonder about this property we're verifying here - the IR is traversed top-down and is currently visiting the loop header phi's, so any `Inc` across the backedge which is already recorded must be a header phi (verify that?), which in turn should be mapped to a header phi recipe?
9265	Note that FOR's do not form cyclic dependences (unlike inductions and reduction), hence this loop must terminate.

fhahn mentioned this in D131989: [VPlan] Verify that header only contains header phi recipes..Aug 16 2022, 12:54 PM

Addressed latest comments, thanks!

In D119661#3726864, @Ayal wrote:

In D119661#3723058, @fhahn wrote:

In D119661#3722786, @Ayal wrote:

Nice reuse of chained FOR's!

Yeah, it seems First is no longer accurate for both LVLegality and IVDescriptors (but is for the recipes). Is there any benefit in caching the non-header-phi Previous in the IVDescriptor, given the iteration-by-iteration implementation?

Updated the naming. I've not added the filed for the non-header previous yet, because its uses currently are minimal.

Wonder if some instCombine pattern may subsequently fold the multiple shuffles (and the multiple incoming vectors each holding a single element) to optimize higher-order recurrence where some phi's feed only other phi's w/o other users.

Unfortunately it doesn't look like instcombine will be able to fold those shuffles, e.g.: https://llvm.godbolt.org/z/a8YoM6ev9

Worth leaving a TODO somewhere?

I added a TODO at the point where we construct VPFirstOrderRecurrencePHIRecipe.

Harbormaster completed remote builds in B181606: Diff 453103.Aug 16 2022, 1:47 PM

This LGTM, thanks!
Subject may also reflect that this patch extends First to Fixed.
Adding minor comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
889 ↗	(On Diff #453103)	Should this be checking if the non-phi previous dominates all users of the recurrence?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8912	Still unclear why this assert is important here.

This revision is now accepted and ready to land.Aug 17 2022, 1:49 PM

In D119661#3730086, @Ayal wrote:

This LGTM, thanks!
Subject may also reflect that this patch extends First to Fixed.
Adding minor comments.

Thanks, I'll update the description in the committed version.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
889 ↗	(On Diff #453103)	I think the current code handles this, because the non-phi incoming value will be part of a fixed-order recurrence we are checking here.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8912	I just realized that it looks like I never submitted this comment. I think the assertion doesn't add much, so I removed it in the committed version. Below my original response, if it is worth adding the assertion that could be done as follow up :) The main goal is to provide extra restrictions here. When creating header phis, the only incoming values that should be possible are header phis. All other recipes can only be recorded/added later. It might be overcautious and I could remove it as well. Good catch re: classof. Moved the change with extra verification to D131989
9265	Thanks, I added a comment.

This revision was landed with ongoing or failed builds.Aug 18 2022, 11:16 AM

Closed by commit rGb8709a9d03f8: [LV] Support fixed order recurrences. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn marked an inline comment as done.

fhahn added a commit: rGb8709a9d03f8: [LV] Support fixed order recurrences..

Hello - We are seeing some pretty large regressions from this, I think because of the cost model and MVE not having a very good way at the moment of lowering the shuffles that get produced. MVE doesn't have a 'ext' / 'vector.splice' instruction, and the cost model seems to currently be modelled as a v1 extract. Do you have any objections to changing the cost model to use a SK_Splice?

I've put up an adjustment in D132308.

fhahn mentioned this in rG7743badafa68: [VPlan] Verify that header only contains header phi recipes..Aug 27 2022, 2:06 PM

bgraur added a subscriber: bgraur.Sep 23 2022, 8:12 AM

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 23 2022, 8:12 AM

Heads-up: we have found this revision to be the culprit for an LTO compilation crash.

Here's the stack trace:

F0000 00:00:1663945422.368954    9165 logging.cc:48] assert.h assertion failed at third_party/llvm/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9135 in VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Instruction *, Instruction *> &): VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"
*** Check failure stack trace: ***
    @     0x55e5fef6ba84  absl::log_internal::LogMessage::SendToLog()
    @     0x55e5fef6b85d  absl::log_internal::LogMessage::Flush()
    @     0x55e5fef6bdc9  absl::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55e5fef5c244  __assert_fail
    @     0x55e5fe3b66f1  llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes()
    @     0x55e5fe3a92cc  llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes()
    @     0x55e5fe3a8838  llvm::LoopVectorizationPlanner::plan()
    @     0x55e5fe3bfa8d  llvm::LoopVectorizePass::processLoop()
    @     0x55e5fe3c4a34  llvm::LoopVectorizePass::runImpl()
    @     0x55e5fe3c542f  llvm::LoopVectorizePass::run()
    @     0x55e5fd37a472  llvm::detail::PassModel<>::run()
    @     0x55e5fec617db  llvm::PassManager<>::run()
    @     0x55e5fa1d5312  llvm::detail::PassModel<>::run()
    @     0x55e5fec6432d  llvm::ModuleToFunctionPassAdaptor::run()
    @     0x55e5fa1d75d2  llvm::detail::PassModel<>::run()
    @     0x55e5fec60bbb  llvm::PassManager<>::run()
    @     0x55e5fa70498e  llvm::lto::opt()
    @     0x55e5fa7071c5  llvm::lto::thinBackend()::$_3::operator()()
    @     0x55e5fa7070ad  llvm::lto::thinBackend()
    @     0x55e5fa1c3d06  clang::EmitBackendOutput()
    @     0x55e5fa1bfb18  clang::CodeGenAction::ExecuteAction()
    @     0x55e5fadcd323  clang::FrontendAction::Execute()
    @     0x55e5fad42b5f  clang::CompilerInstance::ExecuteAction()
    @     0x55e5f9db0f5e  clang::ExecuteCompilerInvocation()
    @     0x55e5f9da4ca2  cc1_main()
    @     0x55e5f9da2489  ExecuteCC1Tool()
    @     0x55e5faeed1f7  llvm::function_ref<>::callback_fn<>()
    @     0x55e5fedce4ff  llvm::CrashRecoveryContext::RunSafely()
    @     0x55e5faeeca22  clang::driver::CC1Command::Execute()
    @     0x55e5faeae7a7  clang::driver::Compilation::ExecuteCommand()
    @     0x55e5faeaead0  clang::driver::Compilation::ExecuteJobs()
    @     0x55e5faecc02f  clang::driver::Driver::ExecuteCompilation()
    @     0x55e5f9da1b1b  clang_main()
    @     0x7f6b56d77633  __libc_start_main
    @     0x55e5f9d9ebaa  _start

We're working on a reduced test case.

@fhahn could you take a look?

$ cat /tmp/c.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define void @gen_interp(ptr %0) {
  br label %2

2:                                                ; preds = %2, %1
  %3 = phi double [ %9, %2 ], [ 0.000000e+00, %1 ]
  %4 = phi double [ %11, %2 ], [ 0.000000e+00, %1 ]
  %5 = phi double [ %4, %2 ], [ 0.000000e+00, %1 ]
  %6 = phi i64 [ %10, %2 ], [ 0, %1 ]
  %7 = fdiv double %5, %3
  %8 = fdiv double 0.000000e+00, %3
  %9 = load double, ptr null, align 8
  %10 = add nuw nsw i64 %6, 1
  %11 = load double, ptr null, align 8
  store double %8, ptr %0, align 8
  %12 = icmp eq i64 %10, 0
  br i1 %12, label %13, label %2

13:                                               ; preds = %2
  ret void
}
$ ./build/rel/bin/opt -passes='loop-vectorize' -disable-output /tmp/c.ll
Use before def!                                                                                                                                                                                    
opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9136: VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Inst
ruction *, Instruction *> &): Assertion `VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"' failed.

In D119661#3812135, @aeubanks wrote:

$ cat /tmp/c.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define void @gen_interp(ptr %0) {
  br label %2

2:                                                ; preds = %2, %1
  %3 = phi double [ %9, %2 ], [ 0.000000e+00, %1 ]
  %4 = phi double [ %11, %2 ], [ 0.000000e+00, %1 ]
  %5 = phi double [ %4, %2 ], [ 0.000000e+00, %1 ]
  %6 = phi i64 [ %10, %2 ], [ 0, %1 ]
  %7 = fdiv double %5, %3
  %8 = fdiv double 0.000000e+00, %3
  %9 = load double, ptr null, align 8
  %10 = add nuw nsw i64 %6, 1
  %11 = load double, ptr null, align 8
  store double %8, ptr %0, align 8
  %12 = icmp eq i64 %10, 0
  br i1 %12, label %13, label %2

13:                                               ; preds = %2
  ret void
}
$ ./build/rel/bin/opt -passes='loop-vectorize' -disable-output /tmp/c.ll
Use before def!                                                                                                                                                                                    
opt: ../../llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9136: VPlanPtr llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(VFRange &, SmallPtrSetImpl<Instruction *> &, const MapVector<Inst
ruction *, Instruction *> &): Assertion `VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid"' failed.

Thanks, this should be fixed by D134083

Revision Contents

Path

Size

llvm/

lib/

Analysis/

IVDescriptors.cpp

14 lines

Transforms/

Vectorize/

LoopVectorize.cpp

11 lines

test/

Transforms/

LoopVectorize/

first-order-recurrence-chains.ll

103 lines

first-order-recurrence.ll

230 lines

Diff 408249

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 871 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isFirstOrderRecurrence(
// Ensure the phi node's incoming blocks are the loop preheader and latch.		// Ensure the phi node's incoming blocks are the loop preheader and latch.
if (Phi->getBasicBlockIndex(Preheader) < 0 \|\|		if (Phi->getBasicBlockIndex(Preheader) < 0 \|\|
Phi->getBasicBlockIndex(Latch) < 0)		Phi->getBasicBlockIndex(Latch) < 0)
return false;		return false;

// Get the previous value. The previous value comes from the latch edge while		// Get the previous value. The previous value comes from the latch edge while
// the initial value comes form the preheader edge.		// the initial value comes form the preheader edge.
auto *Previous = dyn_cast<Instruction>(Phi->getIncomingValueForBlock(Latch));		auto *Previous = dyn_cast<Instruction>(Phi->getIncomingValueForBlock(Latch));

		// If Previous is a phi in the header, go through incoming values from the
		// latch until we find a non-phi value. Use this as the new Previous, all uses
		// in the header will be dominated by the original phi, but need to be moved
		// after the non-phi previous value.
		SmallPtrSet<PHINode *, 4> SeenPhis;
		while (auto *PrevPhi = dyn_cast_or_null<PHINode>(Previous)) {
		if (PrevPhi->getParent() != Phi->getParent())
		return false;
		if (!SeenPhis.insert(PrevPhi).second)
		return false;
		Previous = dyn_cast<Instruction>(PrevPhi->getIncomingValueForBlock(Latch));
		}

if (!Previous \|\| !TheLoop->contains(Previous) \|\| isa<PHINode>(Previous) \|\|		if (!Previous \|\| !TheLoop->contains(Previous) \|\| isa<PHINode>(Previous) \|\|
SinkAfter.count(Previous)) // Cannot rely on dominance due to motion.		SinkAfter.count(Previous)) // Cannot rely on dominance due to motion.
return false;		return false;

// Ensure every user of the phi node (recursively) is dominated by the		// Ensure every user of the phi node (recursively) is dominated by the
// previous value. The dominance requirement ensures the loop vectorizer will		// previous value. The dominance requirement ensures the loop vectorizer will
// not need to vectorize the initial value prior to the first iteration of the		// not need to vectorize the initial value prior to the first iteration of the
// loop.		// loop.
▲ Show 20 Lines • Show All 571 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,860 Lines • ▼ Show 20 Lines	if (auto Phi = dyn_cast<PHINode>(Instr)) {
if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))		if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
return toVPRecipeResult(Recipe);		return toVPRecipeResult(Recipe);

VPHeaderPHIRecipe *PhiRecipe = nullptr;		VPHeaderPHIRecipe *PhiRecipe = nullptr;
if (Legal->isReductionVariable(Phi) \|\| Legal->isFirstOrderRecurrence(Phi)) {		if (Legal->isReductionVariable(Phi) \|\| Legal->isFirstOrderRecurrence(Phi)) {
VPValue *StartV = Operands[0];		VPValue *StartV = Operands[0];
if (Legal->isReductionVariable(Phi)) {		if (Legal->isReductionVariable(Phi)) {
const RecurrenceDescriptor &RdxDesc =		const RecurrenceDescriptor &RdxDesc =
Legal->getReductionVars().find(Phi)->second;		Legal->getReductionVars().find(Phi)->second;
		TKaipengUnsubmitted Done Reply Inline Actions Should we record recipes for induction phis? It's possible to be the incoming value for later phis. TKaipeng: Should we record recipes for induction phis? It's possible to be the incoming value for later…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, that's a good point, thanks! Should be fixed in the latest version and I added some extra test. fhahn: Yes, that's a good point, thanks! Should be fixed in the latest version and I added some extra…
assert(RdxDesc.getRecurrenceStartValue() ==		assert(RdxDesc.getRecurrenceStartValue() ==
Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));		Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));
PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,		PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,
CM.isInLoopReduction(Phi),		CM.isInLoopReduction(Phi),
CM.useOrderedReductions(RdxDesc));		CM.useOrderedReductions(RdxDesc));
} else {		} else {
		recordRecipeOf(Phi);
PhiRecipe = new VPFirstOrderRecurrencePHIRecipe(Phi, *StartV);		PhiRecipe = new VPFirstOrderRecurrencePHIRecipe(Phi, *StartV);
}		}

// Record the incoming value from the backedge, so we can add the incoming		// Record the incoming value from the backedge, so we can add the incoming
// value from the backedge after all recipes have been created.		// value from the backedge after all recipes have been created.
recordRecipeOf(cast<Instruction>(		auto *Inc = cast<Instruction>(
		david-armUnsubmitted Done Reply Inline Actions This change looks like it might potentially affect reductions too, since now we're potentially not recording the recipe for reductions. Should there be a reduction test for this too, or an assert that `Ingredient2Recipe[Inc]` is always non-null for reductions? david-arm: This change looks like it might potentially affect reductions too, since now we're potentially…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I don't think the change should impact reductions, as the code here only skips recording `Inc` if it is not already marked for recording. I added an assertion that if we already have a recipe recorded at this point, it must be a reduction or first-order recurrence phi. fhahn: I don't think the change should impact reductions, as the code here only skips recording `Inc`…
Phi->getIncomingValueForBlock(OrigLoop->getLoopLatch())));		Phi->getIncomingValueForBlock(OrigLoop->getLoopLatch()));
		if (Ingredient2Recipe[Inc] == nullptr)
		david-armUnsubmitted Done Reply Inline Actions nit: Should this just be `!Ingredient2Recipe[Inc]` as I thought LLVM convention wasn't to compare ==/!= with nullptr? david-arm: nit: Should this just be `!Ingredient2Recipe[Inc]` as I thought LLVM convention wasn't to…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think it would be best to use find() as not to insert new entries. Updated. fhahn: I think it would be best to use find() as not to insert new entries. Updated.
		recordRecipeOf(Inc);
PhisToFix.push_back(PhiRecipe);		PhisToFix.push_back(PhiRecipe);
} else {		} else {
// TODO: record backedge value for remaining pointer induction phis.		// TODO: record backedge value for remaining pointer induction phis.
assert(Phi->getType()->isPointerTy() &&		assert(Phi->getType()->isPointerTy() &&
"only pointer phis should be handled here");		"only pointer phis should be handled here");
assert(Legal->getInductionVars().count(Phi) &&		assert(Legal->getInductionVars().count(Phi) &&
"Not an induction variable");		"Not an induction variable");
InductionDescriptor II = Legal->getInductionVars().lookup(Phi);		InductionDescriptor II = Legal->getInductionVars().lookup(Phi);
VPValue *Start = Plan->getOrAddVPValue(II.getStartValue());		VPValue *Start = Plan->getOrAddVPValue(II.getStartValue());
PhiRecipe = new VPWidenPHIRecipe(Phi, Start);		PhiRecipe = new VPWidenPHIRecipe(Phi, Start);
}		}
		AyalUnsubmitted Done Reply Inline Actions Should the assert be retained - what happens with a header `Phi` that is not an induction, reduction or FOR? Ayal: Should the assert be retained - what happens with a header `Phi` that is not an induction…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes it should be retained, I restored the original assert. fhahn: Yes it should be retained, I restored the original assert.

return toVPRecipeResult(PhiRecipe);		return toVPRecipeResult(PhiRecipe);
}		}

if (isa<TruncInst>(Instr) &&		if (isa<TruncInst>(Instr) &&
(Recipe = tryToOptimizeInductionTruncate(cast<TruncInst>(Instr), Operands,		(Recipe = tryToOptimizeInductionTruncate(cast<TruncInst>(Instr), Operands,
Range, *Plan)))		Range, *Plan)))
return toVPRecipeResult(Recipe);		return toVPRecipeResult(Recipe);

if (!shouldWiden(Instr, Range))		if (!shouldWiden(Instr, Range))
return nullptr;		return nullptr;

if (auto GEP = dyn_cast<GetElementPtrInst>(Instr))		if (auto GEP = dyn_cast<GetElementPtrInst>(Instr))
return toVPRecipeResult(new VPWidenGEPRecipe(		return toVPRecipeResult(new VPWidenGEPRecipe(
GEP, make_range(Operands.begin(), Operands.end()), OrigLoop));		GEP, make_range(Operands.begin(), Operands.end()), OrigLoop));

		AyalUnsubmitted Done Reply Inline Actions I.e., is a VPHeaderPHIRecipe? Have a verifier assert that all header phi's are assigned VPHeaderPHIRecipes if desired instead of complicating the logic here? Ayal: I.e., is a VPHeaderPHIRecipe? Have a verifier assert that all header phi's are assigned…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I updated the code to just check `isa<VPHeaderPHIRecipe>`. This also surfaced the fact that `VPHeaderPHIRecipe::classof` was missing checks for `VPWidenPointerInduction`. This is also fixed. I kept the assertion for now, as this allows us to verify the property of the `Ingredient2Recipe` mapping. fhahn: I updated the code to just check `isa<VPHeaderPHIRecipe>`. This also surfaced the fact that…
		AyalUnsubmitted Done Reply Inline Actions Good catch re: classof. Wonder about this property we're verifying here - the IR is traversed top-down and is currently visiting the loop header phi's, so any `Inc` across the backedge which is already recorded must be a header phi (verify that?), which in turn should be mapped to a header phi recipe? Ayal: Good catch re: classof. Wonder about this property we're verifying here - the IR is traversed…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I just realized that it looks like I never submitted this comment. I think the assertion doesn't add much, so I removed it in the committed version. Below my original response, if it is worth adding the assertion that could be done as follow up :) The main goal is to provide extra restrictions here. When creating header phis, the only incoming values that should be possible are header phis. All other recipes can only be recorded/added later. It might be overcautious and I could remove it as well. Good catch re: classof. Moved the change with extra verification to D131989 fhahn: I just realized that it looks like I never submitted this comment. I think the assertion…
		AyalUnsubmitted Not Done Reply Inline Actions Still unclear why this assert is important here. Ayal: Still unclear why this assert is important here.
if (auto *SI = dyn_cast<SelectInst>(Instr)) {		if (auto *SI = dyn_cast<SelectInst>(Instr)) {
bool InvariantCond =		bool InvariantCond =
PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);		PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);
return toVPRecipeResult(new VPWidenSelectRecipe(		return toVPRecipeResult(new VPWidenSelectRecipe(
*SI, make_range(Operands.begin(), Operands.end()), InvariantCond));		*SI, make_range(Operands.begin(), Operands.end()), InvariantCond));
}		}

return toVPRecipeResult(tryToWiden(Instr, Operands));		return toVPRecipeResult(tryToWiden(Instr, Operands));
▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
// Introduce a recipe to combine the incoming and previous values of a		// Introduce a recipe to combine the incoming and previous values of a
// first-order recurrence.		// first-order recurrence.
for (VPRecipeBase &R : Plan->getEntry()->getEntryBasicBlock()->phis()) {		for (VPRecipeBase &R : Plan->getEntry()->getEntryBasicBlock()->phis()) {
auto *RecurPhi = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R);		auto *RecurPhi = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R);
if (!RecurPhi)		if (!RecurPhi)
continue;		continue;

VPRecipeBase *PrevRecipe = RecurPhi->getBackedgeRecipe();		VPRecipeBase *PrevRecipe = RecurPhi->getBackedgeRecipe();
		while (auto *PrevPhi =
		dyn_cast<VPFirstOrderRecurrencePHIRecipe>(PrevRecipe))
		PrevRecipe = PrevPhi->getBackedgeRecipe();
		AyalUnsubmitted Done Reply Inline Actions Note that FOR's do not form cyclic dependences (unlike inductions and reduction), hence this loop must terminate. Ayal: Note that FOR's do not form cyclic dependences (unlike inductions and reduction), hence this…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I added a comment. fhahn: Thanks, I added a comment.
VPBasicBlock *InsertBlock = PrevRecipe->getParent();		VPBasicBlock *InsertBlock = PrevRecipe->getParent();
auto *Region = GetReplicateRegion(PrevRecipe);		auto *Region = GetReplicateRegion(PrevRecipe);
if (Region)		if (Region)
InsertBlock = cast<VPBasicBlock>(Region->getSingleSuccessor());		InsertBlock = cast<VPBasicBlock>(Region->getSingleSuccessor());
if (Region \|\| PrevRecipe->isPhi())		if (Region \|\| PrevRecipe->isPhi())
Builder.setInsertPoint(InsertBlock, InsertBlock->getFirstNonPhi());		Builder.setInsertPoint(InsertBlock, InsertBlock->getFirstNonPhi());
else		else
Builder.setInsertPoint(InsertBlock, std::next(PrevRecipe->getIterator()));		Builder.setInsertPoint(InsertBlock, std::next(PrevRecipe->getIterator()));

david-armUnsubmitted Done Reply Inline Actions nit: Whitespace change. david-arm: nit: Whitespace change.
fhahnAuthorUnsubmitted Done Reply Inline Actions removed , thanks! fhahn: removed , thanks!
auto *RecurSplice = cast<VPInstruction>(		auto *RecurSplice = cast<VPInstruction>(
Builder.createNaryOp(VPInstruction::FirstOrderRecurrenceSplice,		Builder.createNaryOp(VPInstruction::FirstOrderRecurrenceSplice,
{RecurPhi, RecurPhi->getBackedgeValue()}));		{RecurPhi, RecurPhi->getBackedgeValue()}));

RecurPhi->replaceAllUsesWith(RecurSplice);		RecurPhi->replaceAllUsesWith(RecurSplice);
// Set the first operand of RecurSplice to RecurPhi again, after replacing		// Set the first operand of RecurSplice to RecurPhi again, after replacing
// all users.		// all users.
RecurSplice->setOperand(0, RecurPhi);		RecurSplice->setOperand(0, RecurPhi);
▲ Show 20 Lines • Show All 1,529 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll

; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s		; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S %s \| FileCheck %s

define void @test_chained_first_order_recurrences_1(i16* %ptr) {		define void @test_chained_first_order_recurrences_1(i16* %ptr) {
; CHECK-LABEL: @test_chained_first_order_recurrences_1		; CHECK-LABEL: @test_chained_first_order_recurrences_1
; CHECK-NOT: vector.body:		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 22>, %vector.ph ], [ [[WIDE_LOAD:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR1:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 33>, %vector.ph ], [ [[TMP4:%.]], %vector.body ]
		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 [[TMP0]]
		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i16, i16 [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP3]], align 2
		; CHECK-NEXT: [[TMP4]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR1]], <4 x i16> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i16> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: store <4 x i16> [[TMP6]], <4 x i16>* [[TMP7]], align 2
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
		; CHECK-NEXT: br i1 [[TMP8]], label %middle.block, label %vector.body
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT2:%.*]] = extractelement <4 x i16> [[TMP4]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI3:%.*]] = extractelement <4 x i16> [[TMP4]], i32 2
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]		%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]
%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]		%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%gep.ptr = getelementptr inbounds i16, i16* %ptr, i64 %iv		%gep.ptr = getelementptr inbounds i16, i16* %ptr, i64 %iv
%for.1.next = load i16, i16* %gep.ptr, align 2		%for.1.next = load i16, i16* %gep.ptr, align 2
%add = add i16 %for.1, %for.2		%add = add i16 %for.1, %for.2
store i16 %add, i16* %gep.ptr		store i16 %add, i16* %gep.ptr
%exitcond.not = icmp eq i64 %iv.next, 1000		%exitcond.not = icmp eq i64 %iv.next, 1000
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @test_chained_first_order_recurrences_2(i16* %ptr) {		define void @test_chained_first_order_recurrences_2(i16* %ptr) {
; CHECK-LABEL: @test_chained_first_order_recurrences_2		; CHECK-LABEL: @test_chained_first_order_recurrences_2
; CHECK-NOT: vector.body:		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 33>, %vector.ph ], [ [[TMP4:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR1:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 22>, %vector.ph ], [ [[WIDE_LOAD:%.]], %vector.body ]
		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 [[TMP0]]
		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i16, i16 [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP3]], align 2
		; CHECK-NEXT: [[TMP4]] = shufflevector <4 x i16> [[VECTOR_RECUR1]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i16> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: store <4 x i16> [[TMP6]], <4 x i16>* [[TMP7]], align 2
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
		; CHECK-NEXT: br i1 [[TMP8]], label %middle.block, label %vector.body, !llvm.loop [[LOOP4:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP4]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[TMP4]], i32 2
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT2:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI3:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]		%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]
%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]		%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%gep.ptr = getelementptr inbounds i16, i16* %ptr, i64 %iv		%gep.ptr = getelementptr inbounds i16, i16* %ptr, i64 %iv
%for.1.next = load i16, i16* %gep.ptr, align 2		%for.1.next = load i16, i16* %gep.ptr, align 2
%add = add i16 %for.1, %for.2		%add = add i16 %for.1, %for.2
store i16 %add, i16* %gep.ptr		store i16 %add, i16* %gep.ptr
%exitcond.not = icmp eq i64 %iv.next, 1000		%exitcond.not = icmp eq i64 %iv.next, 1000
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @test_chained_first_order_recurrences_3(i16* %ptr) {		define void @test_chained_first_order_recurrences_3(i16* %ptr) {
; CHECK-LABEL: @test_chained_first_order_recurrences_3		; CHECK-LABEL: @test_chained_first_order_recurrences_3
; CHECK-NOT: vector.body:		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 22>, %vector.ph ], [ [[WIDE_LOAD:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR1:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 33>, %vector.ph ], [ [[TMP4:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR2:%.]] = phi <4 x i16> [ <i16 poison, i16 poison, i16 poison, i16 33>, %vector.ph ], [ [[TMP5:%.]], %vector.body ]
		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 [[TMP0]]
		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i16, i16 [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x i16>, <4 x i16>* [[TMP3]], align 2
		; CHECK-NEXT: [[TMP4]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP5]] = shufflevector <4 x i16> [[VECTOR_RECUR1]], <4 x i16> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR2]], <4 x i16> [[TMP5]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i16> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i16> [[TMP7]], [[TMP6]]
		; CHECK-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP2]] to <4 x i16>*
		; CHECK-NEXT: store <4 x i16> [[TMP8]], <4 x i16>* [[TMP9]], align 2
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
		; CHECK-NEXT: br i1 [[TMP10]], label %middle.block, label %vector.body, !llvm.loop [[LOOP6:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT3:%.*]] = extractelement <4 x i16> [[TMP4]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI4:%.*]] = extractelement <4 x i16> [[TMP4]], i32 2
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT7:%.*]] = extractelement <4 x i16> [[TMP5]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI8:%.*]] = extractelement <4 x i16> [[TMP5]], i32 2
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]		%for.1 = phi i16 [ 22, %entry ], [ %for.1.next, %loop ]
%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]		%for.2 = phi i16 [ 33, %entry ], [ %for.1, %loop ]
%for.3 = phi i16 [ 33, %entry ], [ %for.2, %loop ]		%for.3 = phi i16 [ 33, %entry ], [ %for.2, %loop ]
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	loop:
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @test_chained_first_order_recurrence_sink_users_1(double* %ptr) {		define void @test_chained_first_order_recurrence_sink_users_1(double* %ptr) {
; CHECK-LABEL: @test_chained_first_order_recurrence_sink_users_1		; CHECK-LABEL: @test_chained_first_order_recurrence_sink_users_1
; CHECK-NOT: vector.body:		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 1.000000e+01>, %vector.ph ], [ [[WIDE_LOAD:%.]], %vector.body ]
		; CHECK-NEXT: [[VECTOR_RECUR1:%.]] = phi <4 x double> [ <double poison, double poison, double poison, double 2.000000e+01>, %vector.ph ], [ [[TMP4:%.]], %vector.body ]
		; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds double, double [[PTR:%.*]], i64 [[TMP0]]
		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[TMP1]], i32 0
		; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[TMP2]] to <4 x double>*
		; CHECK-NEXT: [[WIDE_LOAD]] = load <4 x double>, <4 x double>* [[TMP3]], align 8
		; CHECK-NEXT: [[TMP4]] = shufflevector <4 x double> [[VECTOR_RECUR]], <4 x double> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[VECTOR_RECUR1]], <4 x double> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
		; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x double> <double 1.000000e+01, double 1.000000e+01, double 1.000000e+01, double 1.000000e+01>, [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x double> [[TMP6]], [[TMP4]]
		; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP2]] to <4 x double>*
		; CHECK-NEXT: store <4 x double> [[TMP7]], <4 x double>* [[TMP8]], align 8
		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 996
		; CHECK-NEXT: br i1 [[TMP9]], label %middle.block, label %vector.body, !llvm.loop [[LOOP10:![0-9]+]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 999, 996
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x double> [[WIDE_LOAD]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x double> [[WIDE_LOAD]], i32 2
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT2:%.*]] = extractelement <4 x double> [[TMP4]], i32 3
		; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI3:%.*]] = extractelement <4 x double> [[TMP4]], i32 2
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%for.1 = phi double [ 10.0, %entry ], [ %for.1.next, %loop ]		%for.1 = phi double [ 10.0, %entry ], [ %for.1.next, %loop ]
%for.2 = phi double [ 20.0, %entry ], [ %for.1, %loop ]		%for.2 = phi double [ 20.0, %entry ], [ %for.1, %loop ]
%iv = phi i64 [ 1, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 1, %entry ], [ %iv.next, %loop ]
Show All 12 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,804 Lines • ▼ Show 20 Lines
	;			;
	;			;
	define i32 @PR27246() {			define i32 @PR27246() {
	; CHECK-LABEL: @PR27246(			; CHECK-LABEL: @PR27246(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; CHECK-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; CHECK-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1:%.]], [[FOR_COND_CLEANUP3]] ]			; CHECK-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[I_016]], 2147483644
				; CHECK-NEXT: [[IND_END:%.*]] = and i32 [[I_016]], 3
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[I_016]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[DOTSPLAT]], <i32 0, i32 -1, i32 -2, i32 -3>
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 -4, i32 -4, i32 -4, i32 -4>
				; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP0]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[VEC_IND]], i64 3
				; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[VEC_IND]], i64 2
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[I_016]], [[FOR_COND1_PREHEADER]] ], [ [[IND_END]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_COND1:%.*]]			; CHECK-NEXT: br label [[FOR_COND1:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret i32 [[E_1]]			; CHECK-NEXT: ret i32 [[E_1_LCSSA]]
	; CHECK: for.cond1:			; CHECK: for.cond1:
	; CHECK-NEXT: [[E_1]] = phi i32 [ [[K_0:%.*]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; CHECK-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; CHECK-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK: for.cond.cleanup3:			; CHECK: for.cond.cleanup3:
				; CHECK-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; UNROLL-LABEL: @PR27246(			; UNROLL-LABEL: @PR27246(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; UNROLL-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; UNROLL: for.cond1.preheader:			; UNROLL: for.cond1.preheader:
	; UNROLL-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; UNROLL-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; UNROLL-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1:%.]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 8
				; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; UNROLL: vector.ph:
				; UNROLL-NEXT: [[N_VEC:%.*]] = and i32 [[I_016]], 2147483640
				; UNROLL-NEXT: [[IND_END:%.*]] = and i32 [[I_016]], 7
				; UNROLL-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[I_016]], i64 0
				; UNROLL-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; UNROLL-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[DOTSPLAT]], <i32 0, i32 -1, i32 -2, i32 -3>
				; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
				; UNROLL: vector.body:
				; UNROLL-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NEXT: [[STEP_ADD:%.*]] = add <4 x i32> [[VEC_IND]], <i32 poison, i32 poison, i32 -4, i32 -4>
				; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
				; UNROLL-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 -8, i32 -8, i32 -8, i32 -8>
				; UNROLL-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; UNROLL-NEXT: br i1 [[TMP0]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; UNROLL: middle.block:
				; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i64 3
				; UNROLL-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i64 2
				; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; UNROLL: scalar.ph:
				; UNROLL-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; UNROLL-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[I_016]], [[FOR_COND1_PREHEADER]] ], [ [[IND_END]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NEXT: br label [[FOR_COND1:%.*]]			; UNROLL-NEXT: br label [[FOR_COND1:%.*]]
	; UNROLL: for.cond.cleanup:			; UNROLL: for.cond.cleanup:
	; UNROLL-NEXT: ret i32 [[E_1]]			; UNROLL-NEXT: ret i32 [[E_1_LCSSA]]
	; UNROLL: for.cond1:			; UNROLL: for.cond1:
	; UNROLL-NEXT: [[E_1]] = phi i32 [ [[K_0:%.*]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; UNROLL-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; UNROLL-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; UNROLL-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; UNROLL-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; UNROLL-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP19:![0-9]+]]
	; UNROLL: for.cond.cleanup3:			; UNROLL: for.cond.cleanup3:
				; UNROLL-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; UNROLL-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; UNROLL-NO-IC-LABEL: @PR27246(			; UNROLL-NO-IC-LABEL: @PR27246(
	; UNROLL-NO-IC-NEXT: entry:			; UNROLL-NO-IC-NEXT: entry:
	; UNROLL-NO-IC-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; UNROLL-NO-IC-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; UNROLL-NO-IC: for.cond1.preheader:			; UNROLL-NO-IC: for.cond1.preheader:
	; UNROLL-NO-IC-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; UNROLL-NO-IC-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; UNROLL-NO-IC-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NO-IC-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; UNROLL-NO-IC-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 8
				; UNROLL-NO-IC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; UNROLL-NO-IC: vector.ph:
				; UNROLL-NO-IC-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[I_016]], 8
				; UNROLL-NO-IC-NEXT: [[N_VEC:%.*]] = sub i32 [[I_016]], [[N_MOD_VF]]
				; UNROLL-NO-IC-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]
				; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[E_015]], i32 3
				; UNROLL-NO-IC-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[I_016]], i32 0
				; UNROLL-NO-IC-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; UNROLL-NO-IC-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[DOTSPLAT]], <i32 0, i32 -1, i32 -2, i32 -3>
				; UNROLL-NO-IC-NEXT: br label [[VECTOR_BODY:%.*]]
				; UNROLL-NO-IC: vector.body:
				; UNROLL-NO-IC-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[STEP_ADD:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-IC-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-IC-NEXT: [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], <i32 -4, i32 -4, i32 -4, i32 -4>
				; UNROLL-NO-IC-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]
				; UNROLL-NO-IC-NEXT: [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
				; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], -1
				; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], -2
				; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -3
				; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = add i32 [[OFFSET_IDX]], -4
				; UNROLL-NO-IC-NEXT: [[TMP5:%.*]] = add i32 [[OFFSET_IDX]], -5
				; UNROLL-NO-IC-NEXT: [[TMP6:%.*]] = add i32 [[OFFSET_IDX]], -6
				; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = add i32 [[OFFSET_IDX]], -7
				; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; UNROLL-NO-IC-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[VEC_IND]], <4 x i32> [[STEP_ADD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
				; UNROLL-NO-IC-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[STEP_ADD]], <i32 -4, i32 -4, i32 -4, i32 -4>
				; UNROLL-NO-IC-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; UNROLL-NO-IC-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; UNROLL-NO-IC: middle.block:
				; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i32 3
				; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i32 2
				; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; UNROLL-NO-IC: scalar.ph:
				; UNROLL-NO-IC-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; UNROLL-NO-IC-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]
	; UNROLL-NO-IC-NEXT: br label [[FOR_COND1:%.*]]			; UNROLL-NO-IC-NEXT: br label [[FOR_COND1:%.*]]
	; UNROLL-NO-IC: for.cond.cleanup:			; UNROLL-NO-IC: for.cond.cleanup:
	; UNROLL-NO-IC-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NO-IC-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]
	; UNROLL-NO-IC-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]			; UNROLL-NO-IC-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]
	; UNROLL-NO-IC: for.cond1:			; UNROLL-NO-IC: for.cond1:
	; UNROLL-NO-IC-NEXT: [[E_1:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NO-IC-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NO-IC-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NO-IC-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-IC-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; UNROLL-NO-IC-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; UNROLL-NO-IC-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; UNROLL-NO-IC-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; UNROLL-NO-IC-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP19:![0-9]+]]
	; UNROLL-NO-IC: for.cond.cleanup3:			; UNROLL-NO-IC: for.cond.cleanup3:
	; UNROLL-NO-IC-NEXT: [[E_1_LCSSA]] = phi i32 [ [[E_1]], [[FOR_COND1]] ]			; UNROLL-NO-IC-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-IC-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; UNROLL-NO-IC-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; UNROLL-NO-VF-LABEL: @PR27246(			; UNROLL-NO-VF-LABEL: @PR27246(
	; UNROLL-NO-VF-NEXT: entry:			; UNROLL-NO-VF-NEXT: entry:
	; UNROLL-NO-VF-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; UNROLL-NO-VF: for.cond1.preheader:			; UNROLL-NO-VF: for.cond1.preheader:
	; UNROLL-NO-VF-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; UNROLL-NO-VF-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; UNROLL-NO-VF-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NO-VF-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 2
				; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; UNROLL-NO-VF: vector.ph:
				; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[I_016]], 2
				; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[I_016]], [[N_MOD_VF]]
				; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]
				; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
				; UNROLL-NO-VF: vector.body:
				; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ [[E_015]], [[VECTOR_PH]] ], [ [[INDUCTION1:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]
				; UNROLL-NO-VF-NEXT: [[INDUCTION:%.*]] = add i32 [[OFFSET_IDX]], 0
				; UNROLL-NO-VF-NEXT: [[INDUCTION1]] = add i32 [[OFFSET_IDX]], -1
				; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; UNROLL-NO-VF-NEXT: br i1 [[TMP0]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
				; UNROLL-NO-VF: middle.block:
				; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; UNROLL-NO-VF: scalar.ph:
				; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[INDUCTION1]], [[MIDDLE_BLOCK]] ]
				; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_COND1:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_COND1:%.*]]
	; UNROLL-NO-VF: for.cond.cleanup:			; UNROLL-NO-VF: for.cond.cleanup:
	; UNROLL-NO-VF-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NO-VF-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]
	; UNROLL-NO-VF-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]			; UNROLL-NO-VF-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]
	; UNROLL-NO-VF: for.cond1:			; UNROLL-NO-VF: for.cond1:
	; UNROLL-NO-VF-NEXT: [[E_1:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NO-VF-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; UNROLL-NO-VF-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; UNROLL-NO-VF-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; UNROLL-NO-VF-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; UNROLL-NO-VF-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP18:![0-9]+]]
	; UNROLL-NO-VF: for.cond.cleanup3:			; UNROLL-NO-VF: for.cond.cleanup3:
	; UNROLL-NO-VF-NEXT: [[E_1_LCSSA]] = phi i32 [ [[E_1]], [[FOR_COND1]] ]			; UNROLL-NO-VF-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[INDUCTION]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; UNROLL-NO-VF-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; SINK-AFTER-LABEL: @PR27246(			; SINK-AFTER-LABEL: @PR27246(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; SINK-AFTER: for.cond1.preheader:			; SINK-AFTER: for.cond1.preheader:
	; SINK-AFTER-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; SINK-AFTER-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; SINK-AFTER-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]			; SINK-AFTER-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 4
				; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; SINK-AFTER: vector.ph:
				; SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[I_016]], 4
				; SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i32 [[I_016]], [[N_MOD_VF]]
				; SINK-AFTER-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]
				; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[E_015]], i32 3
				; SINK-AFTER-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[I_016]], i32 0
				; SINK-AFTER-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; SINK-AFTER-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[DOTSPLAT]], <i32 0, i32 -1, i32 -2, i32 -3>
				; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
				; SINK-AFTER: vector.body:
				; SINK-AFTER-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[VEC_IND:%.]], [[VECTOR_BODY]] ]
				; SINK-AFTER-NEXT: [[VEC_IND]] = phi <4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
				; SINK-AFTER-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]
				; SINK-AFTER-NEXT: [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
				; SINK-AFTER-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], -1
				; SINK-AFTER-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], -2
				; SINK-AFTER-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -3
				; SINK-AFTER-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; SINK-AFTER-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 -4, i32 -4, i32 -4, i32 -4>
				; SINK-AFTER-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; SINK-AFTER-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; SINK-AFTER: middle.block:
				; SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[VEC_IND]], i32 3
				; SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[VEC_IND]], i32 2
				; SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; SINK-AFTER: scalar.ph:
				; SINK-AFTER-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; SINK-AFTER-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]
	; SINK-AFTER-NEXT: br label [[FOR_COND1:%.*]]			; SINK-AFTER-NEXT: br label [[FOR_COND1:%.*]]
	; SINK-AFTER: for.cond.cleanup:			; SINK-AFTER: for.cond.cleanup:
	; SINK-AFTER-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]			; SINK-AFTER-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]
	; SINK-AFTER-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]			; SINK-AFTER-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]
	; SINK-AFTER: for.cond1:			; SINK-AFTER: for.cond1:
	; SINK-AFTER-NEXT: [[E_1:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; SINK-AFTER-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; SINK-AFTER-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; SINK-AFTER-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; SINK-AFTER-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; SINK-AFTER-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; SINK-AFTER-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; SINK-AFTER-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; SINK-AFTER-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; SINK-AFTER-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP19:![0-9]+]]
	; SINK-AFTER: for.cond.cleanup3:			; SINK-AFTER: for.cond.cleanup3:
	; SINK-AFTER-NEXT: [[E_1_LCSSA]] = phi i32 [ [[E_1]], [[FOR_COND1]] ]			; SINK-AFTER-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
	; SINK-AFTER-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; SINK-AFTER-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; NO-SINK-AFTER-LABEL: @PR27246(			; NO-SINK-AFTER-LABEL: @PR27246(
	; NO-SINK-AFTER-NEXT: entry:			; NO-SINK-AFTER-NEXT: entry:
	; NO-SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; NO-SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; NO-SINK-AFTER: for.cond1.preheader:			; NO-SINK-AFTER: for.cond1.preheader:
	; NO-SINK-AFTER-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]			; NO-SINK-AFTER-NEXT: [[I_016:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_COND_CLEANUP3:%.]] ]
	; NO-SINK-AFTER-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]			; NO-SINK-AFTER-NEXT: [[E_015:%.]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.]], [[FOR_COND_CLEANUP3]] ]
				; NO-SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 4
				; NO-SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-SINK-AFTER: vector.ph:
				; NO-SINK-AFTER-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[I_016]], 4
				; NO-SINK-AFTER-NEXT: [[N_VEC:%.*]] = sub i32 [[I_016]], [[N_MOD_VF]]
				; NO-SINK-AFTER-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]
				; NO-SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i32> poison, i32 [[E_015]], i32 3
				; NO-SINK-AFTER-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[I_016]], i32 0
				; NO-SINK-AFTER-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x i32> [[DOTSPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; NO-SINK-AFTER-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[DOTSPLAT]], <i32 0, i32 -1, i32 -2, i32 -3>
				; NO-SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-SINK-AFTER: vector.body:
				; NO-SINK-AFTER-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-SINK-AFTER-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ [[VECTOR_RECUR_INIT]], [[VECTOR_PH]] ], [ [[VEC_IND:%.]], [[VECTOR_BODY]] ]
				; NO-SINK-AFTER-NEXT: [[VEC_IND]] = phi <4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
				; NO-SINK-AFTER-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]
				; NO-SINK-AFTER-NEXT: [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
				; NO-SINK-AFTER-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], -1
				; NO-SINK-AFTER-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], -2
				; NO-SINK-AFTER-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -3
				; NO-SINK-AFTER-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; NO-SINK-AFTER-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; NO-SINK-AFTER-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 -4, i32 -4, i32 -4, i32 -4>
				; NO-SINK-AFTER-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; NO-SINK-AFTER-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; NO-SINK-AFTER: middle.block:
				; NO-SINK-AFTER-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[I_016]], [[N_VEC]]
				; NO-SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[VEC_IND]], i32 3
				; NO-SINK-AFTER-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[VEC_IND]], i32 2
				; NO-SINK-AFTER-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
				; NO-SINK-AFTER: scalar.ph:
				; NO-SINK-AFTER-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; NO-SINK-AFTER-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]
	; NO-SINK-AFTER-NEXT: br label [[FOR_COND1:%.*]]			; NO-SINK-AFTER-NEXT: br label [[FOR_COND1:%.*]]
	; NO-SINK-AFTER: for.cond.cleanup:			; NO-SINK-AFTER: for.cond.cleanup:
	; NO-SINK-AFTER-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]			; NO-SINK-AFTER-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]
	; NO-SINK-AFTER-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]			; NO-SINK-AFTER-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]
	; NO-SINK-AFTER: for.cond1:			; NO-SINK-AFTER: for.cond1:
	; NO-SINK-AFTER-NEXT: [[E_1:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[E_015]], [[FOR_COND1_PREHEADER]] ]			; NO-SINK-AFTER-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; NO-SINK-AFTER-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; NO-SINK-AFTER-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; NO-SINK-AFTER-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; NO-SINK-AFTER-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; NO-SINK-AFTER-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; NO-SINK-AFTER-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; NO-SINK-AFTER-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]]			; NO-SINK-AFTER-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP19:![0-9]+]]
	; NO-SINK-AFTER: for.cond.cleanup3:			; NO-SINK-AFTER: for.cond.cleanup3:
	; NO-SINK-AFTER-NEXT: [[E_1_LCSSA]] = phi i32 [ [[E_1]], [[FOR_COND1]] ]			; NO-SINK-AFTER-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
	; NO-SINK-AFTER-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; NO-SINK-AFTER-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; NO-SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; NO-SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; NO-SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; NO-SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader:			for.cond1.preheader:
	▲ Show 20 Lines • Show All 5,063 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Support chained phis as incoming values for first-order recurs.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 408249

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

[LV] Support chained phis as incoming values for first-order recurs.
ClosedPublic