This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopPeel.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
LoopUnrollPass.cpp
-
Utils/
11/16
LoopPeel.cpp
-
test/Transforms/
-
Transforms/
-
LoopUnroll/
-
peel-to-turn-invariant-accesses-dereferenceable.ll
-
PhaseOrdering/AArch64/
-
AArch64/
-
peel-multiple-unreachable-exits-for-vectorization.ll

Differential D108114

[LoopPeel] Peel if it turns invariant loads dereferenceable.
ClosedPublic

Authored by fhahn on Aug 16 2021, 3:36 AM.

Download Raw Diff

Details

Reviewers

reames
efriedma
skatkov
mkazantsev

Commits

rGcd0ba9dc58c5: [LoopPeel] Peel if it turns invariant loads dereferenceable.

Summary

This patch adds a new cost heuristic that allows peeling a single
iteration off multi-exit read-only loops, if they contain loop-invariant
loads that dominate the latch. If all non-latch exits are terminated
with unreachable, the invariant loads in the loop are guaranteed to be
dereferenceable, enabling hoisting/CSE'ing them.

This enables vectorization of loops with certain runtime-checks, like
multiple calls to std::vector::at.

This should give a 20-30% improvement in score of Geekbench5/HDR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Aug 16 2021, 3:36 AM

Herald added subscribers: jfb, zzheng, hiraditya. · View Herald TranscriptAug 16 2021, 3:36 AM

fhahn requested review of this revision.Aug 16 2021, 3:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 3:36 AM

Harbormaster completed remote builds in B119674: Diff 366580.Aug 16 2021, 3:36 AM

fhahn added a parent revision: D108108: [LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks..Aug 16 2021, 3:56 AM

Before reviewing the patch, a high level question. Fair warning, I am a bit worried about cost heuristic changes here, they tend to be delicate.

Why do we need to peel this? LICM is generally good at using speculation to hoist, and trivial unswitching is good at versioning the conditions to remove them. I'd expect to see the motivating case already handled by some combination of existing transforms. (You might have to iterate them a few times. So is this working around a pass ordering issue?)

llvm/lib/Transforms/Utils/LoopPeel.cpp
206	You don't appear to be requiring the load controls an exit. It's much less obvious that "just" removing a load is worthwhile. Consider a huge loop with one potentially invariant load.

In D108114#2949657, @reames wrote:

Before reviewing the patch, a high level question. Fair warning, I am a bit worried about cost heuristic changes here, they tend to be delicate.

Why do we need to peel this? LICM is generally good at using speculation to hoist, and trivial unswitching is good at versioning the conditions to remove them. I'd expect to see the motivating case already handled by some combination of existing transforms. (You might have to iterate them a few times. So is this working around a pass ordering issue?)

Do you have any pointers at the kind of speculation LICM could be doing? I could not spot anything that would be applicable after an initial look.

I think the motivating case could indeed be handled by a set of existing passes, which was my first try. I think it would look roughly like the following:

run LICM to hoist out the invariant loads feeding the branch in the header,
run indvars to turn the check fed by the hoisted loads into an invariant check,
unswitch
LICM to hoist the now unconditional loads feeding the second branch in the unswitched loop,
run indvars to turn the second branch condition into an invariant.
unswitch

(there's a small caveat that this pass order does not catch the specific test case in peel-multiple-unreachable-exits-for-vectorization.ll, but the unreduced motivating std::vector::at example, which I just noticed)

The problem with that particular order is that it clashes with the existing order which is LICM,Unswitch in one loop pass manager, followed by Indvars and others in a separate loop pass manager. The passes are split up in different loop pass managers because we run function passes in between them (InstCombine & SimplifyCFG). Adjusting the pipeline seems like it would be quite a big shakeup and peeling off the first iteration seemed like a less invasive change and less work for the optimizations overall. Note that we would have to run LICM,indvars,unswitch once for each std::vector::at call/runtime check.

What do you think? I think eliminating the separation between the 2 loop pass managers would be beneficial in its own right, but SimplifyCFG and InstCombine seems like a substantial gap to bridge.

In D108114#2950373, @fhahn wrote:

I think eliminating the separation between the 2 loop pass managers would be beneficial in its own right, but SimplifyCFG and InstCombine seems like a substantial gap to bridge.

There's also a FIXME in PassBuilder.cpp about that.

In D108114#2950373, @fhahn wrote:

In D108114#2949657, @reames wrote:

Before reviewing the patch, a high level question. Fair warning, I am a bit worried about cost heuristic changes here, they tend to be delicate.

Why do we need to peel this? LICM is generally good at using speculation to hoist, and trivial unswitching is good at versioning the conditions to remove them. I'd expect to see the motivating case already handled by some combination of existing transforms. (You might have to iterate them a few times. So is this working around a pass ordering issue?)

Do you have any pointers at the kind of speculation LICM could be doing? I could not spot anything that would be applicable after an initial look.

LICM does two forms of legality reasoning for trap-safety. Option 1 is guaranteed to execute. Option 2 is proving that the hoisted load can't fault at the new location. Looking at your test case (peel-multiple-unreachable-exits-for-vectorization.ll), option 1 does work (when iterated as you note). Option 2 would require deref facts on the arguments A and B of @sum_2_at_with_int_conversion. I don't see any reason why clang shouldn't be able to emit the deref info on those args? (Assuming this is lowering a c++ reference to a std::vector<> at least.) If it can, LICM should be able to speculate the loads for start and end from each vector outside the loop.

I think the motivating case could indeed be handled by a set of existing passes, which was my first try. I think it would look roughly like the following:

run LICM to hoist out the invariant loads feeding the branch in the header,

run indvars to turn the check fed by the hoisted loads into an invariant check,

unswitch

LICM to hoist the now unconditional loads feeding the second branch in the unswitched loop,

run indvars to turn the second branch condition into an invariant.

unswitch

(there's a small caveat that this pass order does not catch the specific test case in peel-multiple-unreachable-exits-for-vectorization.ll, but the unreduced motivating std::vector::at example, which I just noticed)

The problem with that particular order is that it clashes with the existing order which is LICM,Unswitch in one loop pass manager, followed by Indvars and others in a separate loop pass manager. The passes are split up in different loop pass managers because we run function passes in between them (InstCombine & SimplifyCFG). Adjusting the pipeline seems like it would be quite a big shakeup and peeling off the first iteration seemed like a less invasive change and less work for the optimizations overall. Note that we would have to run LICM,indvars,unswitch once for each std::vector::at call/runtime check.

What do you think? I think eliminating the separation between the 2 loop pass managers would be beneficial in its own right, but SimplifyCFG and InstCombine seems like a substantial gap to bridge.

I agree with your reasoning all the way through.

We'd previously explored the idea of adding loop versions of InstCombine and SimplifyCFG. (See LoopSimplifyCFG, I don't remember where we stood on instcombine.)

One mid-term idea would be to keep the split loop pass, but have the second one contain the same passes as the first. (Essentially, we'd have one unrolled iteration of the loop iteration with the instcombine and simplifycfg in place.) Though, actually, I don't remember, do we even iterate loop passes to fixed point if there are changes being made to the loop? (All of my context on this involves an out of tree pass order which solves this with brute force repetition.)

In D108114#2951160, @reames wrote:

Do you have any pointers at the kind of speculation LICM could be doing? I could not spot anything that would be applicable after an initial look.

LICM does two forms of legality reasoning for trap-safety. Option 1 is guaranteed to execute. Option 2 is proving that the hoisted load can't fault at the new location. Looking at your test case (peel-multiple-unreachable-exits-for-vectorization.ll), option 1 does work (when iterated as you note). Option 2 would require deref facts on the arguments A and B of @sum_2_at_with_int_conversion. I don't see any reason why clang shouldn't be able to emit the deref info on those args? (Assuming this is lowering a c++ reference to a std::vector<> at least.) If it can, LICM should be able to speculate the loads for start and end from each vector outside the loop.

I realize that the problem description missed a key bit of information: the motivating case has vectors passed through pointers, not references so deref is not guaranteed unfortunately.

What do you think? I think eliminating the separation between the 2 loop pass managers would be beneficial in its own right, but SimplifyCFG and InstCombine seems like a substantial gap to bridge.

I agree with your reasoning all the way through.

We'd previously explored the idea of adding loop versions of InstCombine and SimplifyCFG. (See LoopSimplifyCFG, I don't remember where we stood on instcombine.)

One mid-term idea would be to keep the split loop pass, but have the second one contain the same passes as the first. (Essentially, we'd have one unrolled iteration of the loop iteration with the instcombine and simplifycfg in place.) Though, actually, I don't remember, do we even iterate loop passes to fixed point if there are changes being made to the loop? (All of my context on this involves an out of tree pass order which solves this with brute force repetition.)

AFAIK we do not iterate loop passes to a fixed point until no changes are made, I think we only execute them for each loop once (and new loops). Let me take a look and see on how re-composing the existing passes would look like.

In D108114#2955727, @fhahn wrote:

In D108114#2951160, @reames wrote:

I realize that the problem description missed a key bit of information: the motivating case has vectors passed through pointers, not references so deref is not guaranteed unfortunately.

Shouldn't points to std::vectors still be deref_or_null? If so, we should be able to prove non-null here.

AFAIK we do not iterate loop passes to a fixed point until no changes are made, I think we only execute them for each loop once (and new loops). Let me take a look and see on how re-composing the existing passes would look like.

Honestly, I feel like we really should do either a) bounded iteration, or b) unroll the bounded iteration by hand a small handful of times. The whole premise of the new PM was to allow conditional pass execution, this is a case where doing so is clearly "worth it".

mkazantsev added inline comments.Aug 29 2021, 10:08 PM

llvm/lib/Transforms/Utils/LoopPeel.cpp
170	Returns... ? Is that supposed to be boolean?
182	Can it be a single-exit loop where latch doesn't exit?
207	Does this actually lead to the effect you are aiming? Some complexly computed pointer (e.g. result of chain of geps) may be proven loop-invariant by SCEV, but how other passes will figure out this load is from a dereferenceable pointer?

mnadeem added a subscriber: mnadeem.Sep 1 2021, 4:41 PM

Pending action by author, please request review once ready for more discussion. Just getting this off my active review queue.

This revision now requires changes to proceed.Sep 10 2021, 11:55 AM

Bring the patch up to date again, address reviewer comments!

In D108114#2969919, @reames wrote:

In D108114#2955727, @fhahn wrote:

In D108114#2951160, @reames wrote:

I realize that the problem description missed a key bit of information: the motivating case has vectors passed through pointers, not references so deref is not guaranteed unfortunately.

Shouldn't points to std::vectors still be deref_or_null? If so, we should be able to prove non-null here.

I think I am missing something here. I tired to put toegether a reduced example: https://clang.godbolt.org/z/jfqKzzK5x . I don't see how deref_or_null on %A/%B would help here. We do not check of %B is null before executing the first load %B.start = load i64*, i64** %B.gep.start.

AFAIK we do not iterate loop passes to a fixed point until no changes are made, I think we only execute them for each loop once (and new loops). Let me take a look and see on how re-composing the existing passes would look like.

Honestly, I feel like we really should do either a) bounded iteration, or b) unroll the bounded iteration by hand a small handful of times. The whole premise of the new PM was to allow conditional pass execution, this is a case where doing so is clearly "worth it".

I went back an experiemented with different pass-orderings. It looks like one other problem is that we cannot compute the backedge-taken count for the loop, because we cannot compute the exit count for the exit that depends on the load in the loop. There might be a way around that issue, but even then running indvar,licm,simple-loop-unswitch multiple times would still be a major shakeup of the pipeline I'd like to avoid for now, if possible.

Harbormaster completed remote builds in B127612: Diff 377986.Oct 7 2021, 1:06 PM

LGTM, but please make sure Philip is also OK.

llvm/lib/Transforms/Utils/LoopPeel.cpp
184	Can we reuse logic from D110922? OK if it's done in a follow-up, just consider this.

Shouldn't there be a check somewhere whether the load is already dereferenceable, in which case we presumably don't want to peel it?

llvm/lib/Transforms/Utils/LoopPeel.cpp
177	This variable is initialized to false and then never changed?

(just submitting responses to comments I forgot to send yesterday)

llvm/lib/Transforms/Utils/LoopPeel.cpp
170	It returns the number of iterations to peel off (added a comment). At the moment it's either 0 or 1, and the main reason it is not a bool is to assign the result directly to `DesiredPeelCount`.
182	Yes, the latch can also be non-exiting. Dropped the assert.
206	Thanks for the suggestion! I updated the code to track which values/instructions are users of such loads. Peeling is only limited to cases where an exit condition uses such a load (transitively).
207	Good point, I changed it to use the weaker `Loop::isLoopInvariant`.

fhahn mentioned this in rG2cc7013b0ef4: [LoopPeel] Add tests where peeling turns invar accesses dereferenceable..Oct 8 2021, 2:19 AM

In D108114#3050318, @nikic wrote:

Shouldn't there be a check somewhere whether the load is already dereferenceable, in which case we presumably don't want to peel it?

I added a check using isDereferenceablePointer to catch some cases where the load is guaranteed to be dereferenceable. This is still quite weak, but in practice I think most dereferenceable loads should be moved out of the loop before we check for peeling (especailly given the restriction to read only loops at the moment).

fhahn marked 4 inline comments as done.Oct 8 2021, 2:23 AM

fhahn added inline comments.

llvm/lib/Transforms/Utils/LoopPeel.cpp
177	Thanks, that's a leftover from earlier. Removed
184	I just saw that D110922 got reverted again. I'll submit a follow-up once the patch has settled.

Harbormaster completed remote builds in B127708: Diff 378137.Oct 8 2021, 3:08 AM

dmakogon added a subscriber: dmakogon.Oct 8 2021, 4:14 AM

dmakogon added inline comments.

llvm/lib/Transforms/Utils/LoopPeel.cpp
176	Seems like there's no need of the SE now
184	D110922 is relanded now

Remove unused SE argument, use IsBlockFollowedByDeoptOrUnreachable

Harbormaster completed remote builds in B127742: Diff 378190.Oct 8 2021, 8:24 AM

@reames do the latest update / the recent responses address your concerns?

In D108114#3055669, @fhahn wrote:

@reames do the latest update / the recent responses address your concerns?

Yep, you addressed the major concerns. I think you could stand to wordsmith the function comment and submit message a bit to be clearer about the new heuristic, but the code addresses my concerns.

Note that I have not done a detailed code review as the patch was already LGTMed by others. I skimmed and didn't spot anything huge, but it was only a skim.

nikic added inline comments.Oct 11 2021, 12:54 PM

llvm/lib/Transforms/Utils/LoopPeel.cpp
225	nit: `LoadUsers.contains(Exiting->getTerminator())`?

This revision was not accepted when it landed; it landed in state Needs Review.Oct 12 2021, 3:42 AM

Closed by commit rGcd0ba9dc58c5: [LoopPeel] Peel if it turns invariant loads dereferenceable. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGcd0ba9dc58c5: [LoopPeel] Peel if it turns invariant loads dereferenceable..

fhahn mentioned this in rG40d85f16c45e: [LoopPeel] Use any_of & contains instead of for & find..Oct 12 2021, 4:18 AM

In D108114#3056044, @reames wrote:

In D108114#3055669, @fhahn wrote:

@reames do the latest update / the recent responses address your concerns?

Yep, you addressed the major concerns. I think you could stand to wordsmith the function comment and submit message a bit to be clearer about the new heuristic, but the code addresses my concerns.

Thanks for confirming. I tried to adjust the message of the commit to make it more in line with the committed code.

llvm/lib/Transforms/Utils/LoopPeel.cpp
225	Thanks, I unfortunately forgot to add this to the committed version, but I pushed a follow up which also uses `any_of` instead of the explicit loop: 40d85f16c45e

fhahn mentioned this in D135451: [TTI] New PPC target hook enableUncondDivisionSpeculation.Oct 11 2022, 8:21 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

LoopPeel.h

4 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

2 lines

Utils/

LoopPeel.cpp

68 lines

test/

Transforms/

LoopUnroll/

peel-to-turn-invariant-accesses-dereferenceable.ll

126 lines

PhaseOrdering/

AArch64/

peel-multiple-unreachable-exits-for-vectorization.ll

189 lines

Diff 378954

llvm/include/llvm/Transforms/Utils/LoopPeel.h

	Show All 26 Lines
	gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,			gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
	const TargetTransformInfo &TTI,			const TargetTransformInfo &TTI,
	Optional<bool> UserAllowPeeling,			Optional<bool> UserAllowPeeling,
	Optional<bool> UserAllowProfileBasedPeeling,			Optional<bool> UserAllowProfileBasedPeeling,
	bool UnrollingSpecficValues = false);			bool UnrollingSpecficValues = false);

	void computePeelCount(Loop *L, unsigned LoopSize,			void computePeelCount(Loop *L, unsigned LoopSize,
	TargetTransformInfo::PeelingPreferences &PP,			TargetTransformInfo::PeelingPreferences &PP,
	unsigned &TripCount, ScalarEvolution &SE,			unsigned &TripCount, DominatorTree &DT,
	unsigned Threshold = UINT_MAX);			ScalarEvolution &SE, unsigned Threshold = UINT_MAX);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_UTILS_LOOPPEEL_H			#endif // LLVM_TRANSFORMS_UTILS_LOOPPEEL_H

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 986 Lines • ▼ Show 20 Lines	if (UnrollFactor) {
TripCount = FullUnrollTripCount;		TripCount = FullUnrollTripCount;
TripMultiple = UP.UpperBound ? 1 : TripMultiple;		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
return ExplicitUnroll;		return ExplicitUnroll;
} else {		} else {
UP.Count = FullUnrollTripCount;		UP.Count = FullUnrollTripCount;
}		}

// 4th priority is loop peeling.		// 4th priority is loop peeling.
computePeelCount(L, LoopSize, PP, TripCount, SE, UP.Threshold);		computePeelCount(L, LoopSize, PP, TripCount, DT, SE, UP.Threshold);
if (PP.PeelCount) {		if (PP.PeelCount) {
UP.Runtime = false;		UP.Runtime = false;
UP.Count = 1;		UP.Count = 1;
return ExplicitUnroll;		return ExplicitUnroll;
}		}

// Before starting partial unrolling, set up.partial to true,		// Before starting partial unrolling, set up.partial to true,
// if user explicitly asked for unrolling		// if user explicitly asked for unrolling
▲ Show 20 Lines • Show All 638 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopPeel.cpp

//===- LoopPeel.cpp -------------------------------------------------------===//		//===- LoopPeel.cpp -------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Loop Peeling Utilities.		// Loop Peeling Utilities.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Utils/LoopPeel.h"		#include "llvm/Transforms/Utils/LoopPeel.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	static unsigned calculateIterationsToInvariance(
}		}

// If we found that this Phi lies in an invariant chain, update the map.		// If we found that this Phi lies in an invariant chain, update the map.
if (ToInvariance != InfiniteIterationsToInvariance)		if (ToInvariance != InfiniteIterationsToInvariance)
IterationsToInvariance[Phi] = ToInvariance;		IterationsToInvariance[Phi] = ToInvariance;
return ToInvariance;		return ToInvariance;
}		}

		// Try to find any invariant memory reads that will become dereferenceable in
		// the remainder loop after peeling. The load must also be used (transitively)
		mkazantsevUnsubmitted Done Reply Inline Actions Returns... ? Is that supposed to be boolean? mkazantsev: Returns... ? Is that supposed to be boolean?
		fhahnAuthorUnsubmitted Done Reply Inline Actions It returns the number of iterations to peel off (added a comment). At the moment it's either 0 or 1, and the main reason it is not a bool is to assign the result directly to `DesiredPeelCount`. fhahn: It returns the number of iterations to peel off (added a comment). At the moment it's either 0…
		// by an exit condition. Returns the number of iterations to peel off (at the
		// moment either 0 or 1).
		static unsigned peelToTurnInvariantLoadsDerefencebale(Loop &L,
		DominatorTree &DT) {
		// Skip loops with a single exiting block, because there should be no benefit
		// for the heuristic below.
		dmakogonUnsubmitted Not Done Reply Inline Actions Seems like there's no need of the SE now dmakogon: Seems like there's no need of the SE now
		if (L.getExitingBlock())
		nikicUnsubmitted Done Reply Inline Actions This variable is initialized to false and then never changed? nikic: This variable is initialized to false and then never changed?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, that's a leftover from earlier. Removed fhahn: Thanks, that's a leftover from earlier. Removed
		return 0;

		// All non-latch exit blocks must have an UnreachableInst terminator.
		// Otherwise the heuristic below may not be profitable.
		SmallVector<BasicBlock *, 4> Exits;
		mkazantsevUnsubmitted Done Reply Inline Actions Can it be a single-exit loop where latch doesn't exit? mkazantsev: Can it be a single-exit loop where latch doesn't exit?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, the latch can also be non-exiting. Dropped the assert. fhahn: Yes, the latch can also be non-exiting. Dropped the assert.
		L.getUniqueNonLatchExitBlocks(Exits);
		if (any_of(Exits, [](const BasicBlock *BB) {
		mkazantsevUnsubmitted Done Reply Inline Actions Can we reuse logic from D110922? OK if it's done in a follow-up, just consider this. mkazantsev: Can we reuse logic from D110922? OK if it's done in a follow-up, just consider this.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I just saw that D110922 got reverted again. I'll submit a follow-up once the patch has settled. fhahn: I just saw that D110922 got reverted again. I'll submit a follow-up once the patch has settled.
		dmakogonUnsubmitted Not Done Reply Inline Actions D110922 is relanded now dmakogon: D110922 is relanded now
		return !isa<UnreachableInst>(BB->getTerminator());
		}))
		return 0;

		// Now look for invariant loads that dominate the latch and are not known to
		// be dereferenceable. If there are such loads and no writes, they will become
		// dereferenceable in the loop if the first iteration is peeled off. Also
		// collect the set of instructions controlled by such loads. Only peel if an
		// exit condition uses (transitively) such a load.
		BasicBlock *Header = L.getHeader();
		BasicBlock *Latch = L.getLoopLatch();
		SmallPtrSet<Value *, 8> LoadUsers;
		const DataLayout &DL = L.getHeader()->getModule()->getDataLayout();
		for (BasicBlock *BB : L.blocks()) {
		for (Instruction &I : *BB) {
		if (I.mayWriteToMemory())
		return 0;

		auto Iter = LoadUsers.find(&I);
		if (Iter != LoadUsers.end()) {
		for (Value *U : I.users())
		LoadUsers.insert(U);
		reamesUnsubmitted Not Done Reply Inline Actions You don't appear to be requiring the load controls an exit. It's much less obvious that "just" removing a load is worthwhile. Consider a huge loop with one potentially invariant load. reames: You don't appear to be requiring the load controls an exit. It's much less obvious that "just"…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks for the suggestion! I updated the code to track which values/instructions are users of such loads. Peeling is only limited to cases where an exit condition uses such a load (transitively). fhahn: Thanks for the suggestion! I updated the code to track which values/instructions are users of…
		}
		mkazantsevUnsubmitted Not Done Reply Inline Actions Does this actually lead to the effect you are aiming? Some complexly computed pointer (e.g. result of chain of geps) may be proven loop-invariant by SCEV, but how other passes will figure out this load is from a dereferenceable pointer? mkazantsev: Does this actually lead to the effect you are aiming? Some complexly computed pointer (e.g.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Good point, I changed it to use the weaker `Loop::isLoopInvariant`. fhahn: Good point, I changed it to use the weaker `Loop::isLoopInvariant`.
		// Do not look for reads in the header; they can already be hoisted
		// without peeling.
		if (BB == Header)
		continue;
		if (auto *LI = dyn_cast<LoadInst>(&I)) {
		Value *Ptr = LI->getPointerOperand();
		if (DT.dominates(BB, Latch) && L.isLoopInvariant(Ptr) &&
		!isDereferenceablePointer(Ptr, LI->getType(), DL, LI, &DT))
		for (Value *U : I.users())
		LoadUsers.insert(U);
		}
		}
		}
		SmallVector<BasicBlock *> ExitingBlocks;
		L.getExitingBlocks(ExitingBlocks);
		for (BasicBlock *Exiting : ExitingBlocks)
		if (LoadUsers.find(Exiting->getTerminator()) != LoadUsers.end())
		return 1;
		nikicUnsubmitted Not Done Reply Inline Actions nit: `LoadUsers.contains(Exiting->getTerminator())`? nikic: nit: `LoadUsers.contains(Exiting->getTerminator())`?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I unfortunately forgot to add this to the committed version, but I pushed a follow up which also uses `any_of` instead of the explicit loop: 40d85f16c45e fhahn: Thanks, I unfortunately forgot to add this to the committed version, but I pushed a follow up…
		return 0;
		}

// Return the number of iterations to peel off that make conditions in the		// Return the number of iterations to peel off that make conditions in the
// body true/false. For example, if we peel 2 iterations off the loop below,		// body true/false. For example, if we peel 2 iterations off the loop below,
// the condition i < 2 can be evaluated at compile time.		// the condition i < 2 can be evaluated at compile time.
// for (i = 0; i < n; i++)		// for (i = 0; i < n; i++)
// if (i < 2)		// if (i < 2)
// ..		// ..
// else		// else
// ..		// ..
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
}		}

return DesiredPeelCount;		return DesiredPeelCount;
}		}

// Return the number of iterations we want to peel off.		// Return the number of iterations we want to peel off.
void llvm::computePeelCount(Loop *L, unsigned LoopSize,		void llvm::computePeelCount(Loop *L, unsigned LoopSize,
TargetTransformInfo::PeelingPreferences &PP,		TargetTransformInfo::PeelingPreferences &PP,
unsigned &TripCount, ScalarEvolution &SE,		unsigned &TripCount, DominatorTree &DT,
unsigned Threshold) {		ScalarEvolution &SE, unsigned Threshold) {
assert(LoopSize > 0 && "Zero loop size is not allowed!");		assert(LoopSize > 0 && "Zero loop size is not allowed!");
// Save the PP.PeelCount value set by the target in		// Save the PP.PeelCount value set by the target in
// TTI.getPeelingPreferences or by the flag -unroll-peel-count.		// TTI.getPeelingPreferences or by the flag -unroll-peel-count.
unsigned TargetPeelCount = PP.PeelCount;		unsigned TargetPeelCount = PP.PeelCount;
PP.PeelCount = 0;		PP.PeelCount = 0;
if (!canPeel(L))		if (!canPeel(L))
return;		return;

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (2 * LoopSize <= Threshold && UnrollPeelMaxCount > 0) {

// Pay respect to limitations implied by loop size and the max peel count.		// Pay respect to limitations implied by loop size and the max peel count.
unsigned MaxPeelCount = UnrollPeelMaxCount;		unsigned MaxPeelCount = UnrollPeelMaxCount;
MaxPeelCount = std::min(MaxPeelCount, Threshold / LoopSize - 1);		MaxPeelCount = std::min(MaxPeelCount, Threshold / LoopSize - 1);

DesiredPeelCount = std::max(DesiredPeelCount,		DesiredPeelCount = std::max(DesiredPeelCount,
countToEliminateCompares(*L, MaxPeelCount, SE));		countToEliminateCompares(*L, MaxPeelCount, SE));

		if (DesiredPeelCount == 0)
		DesiredPeelCount = peelToTurnInvariantLoadsDerefencebale(*L, DT);

if (DesiredPeelCount > 0) {		if (DesiredPeelCount > 0) {
DesiredPeelCount = std::min(DesiredPeelCount, MaxPeelCount);		DesiredPeelCount = std::min(DesiredPeelCount, MaxPeelCount);
// Consider max peel count limitation.		// Consider max peel count limitation.
assert(DesiredPeelCount > 0 && "Wrong loop size estimation?");		assert(DesiredPeelCount > 0 && "Wrong loop size estimation?");
if (DesiredPeelCount + AlreadyPeeled <= UnrollPeelMaxCount) {		if (DesiredPeelCount + AlreadyPeeled <= UnrollPeelMaxCount) {
LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount		LLVM_DEBUG(dbgs() << "Peel " << DesiredPeelCount
<< " iteration(s) to turn"		<< " iteration(s) to turn"
<< " some Phis into invariants.\n");		<< " some Phis into invariants.\n");
▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/peel-to-turn-invariant-accesses-dereferenceable.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-unroll -S %s \| FileCheck %s			; RUN: opt -loop-unroll -S %s \| FileCheck %s

	declare void @foo()			declare void @foo()

	define i32 @peel_readonly_to_make_loads_derefenceable(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {			define i32 @peel_readonly_to_make_loads_derefenceable(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {
	; CHECK-LABEL: @peel_readonly_to_make_loads_derefenceable(			; CHECK-LABEL: @peel_readonly_to_make_loads_derefenceable(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL_BEGIN:%.*]]
				; CHECK: loop.header.peel.begin:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL:%.*]]
				; CHECK: loop.header.peel:
				; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN_PEEL:%.]], label [[UNREACHABLE_EXIT:%.*]]
				; CHECK: then.peel:
				; CHECK-NEXT: [[I_PEEL:%.]] = load i32, i32 [[INV:%.*]], align 4
				; CHECK-NEXT: [[C_2_PEEL:%.*]] = icmp ult i32 [[I_PEEL]], 2
				; CHECK-NEXT: br i1 [[C_2_PEEL]], label [[LOOP_LATCH_PEEL:%.*]], label [[UNREACHABLE_EXIT]]
				; CHECK: loop.latch.peel:
				; CHECK-NEXT: [[GEP_PEEL:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 1
				; CHECK-NEXT: [[LV_PEEL:%.]] = load i32, i32 [[GEP_PEEL]], align 4
				; CHECK-NEXT: [[SUM_NEXT_PEEL:%.*]] = add i32 0, [[LV_PEEL]]
				; CHECK-NEXT: [[IV_NEXT_PEEL:%.*]] = add nuw nsw i32 1, 1
				; CHECK-NEXT: [[C_3_PEEL:%.*]] = icmp ult i32 1, 1000
				; CHECK-NEXT: br i1 [[C_3_PEEL]], label [[LOOP_HEADER_PEEL_NEXT:%.]], label [[EXIT:%.]]
				; CHECK: loop.header.peel.next:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL_NEXT1:%.*]]
				; CHECK: loop.header.peel.next1:
				; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
				; CHECK: entry.peel.newph:
				; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
				; CHECK: loop.header:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
				; CHECK-NEXT: [[SUM:%.]] = phi i32 [ [[SUM_NEXT_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[SUM_NEXT:%.]], [[LOOP_LATCH]] ]
				; CHECK-NEXT: br i1 [[C_1]], label [[THEN:%.]], label [[UNREACHABLE_EXIT_LOOPEXIT:%.]]
				; CHECK: then:
				; CHECK-NEXT: [[I:%.]] = load i32, i32 [[INV]], align 4
				; CHECK-NEXT: [[C_2:%.*]] = icmp ult i32 [[I]], 2
				; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[UNREACHABLE_EXIT_LOOPEXIT]]
				; CHECK: loop.latch:
				; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR]], i32 [[IV]]
				; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[GEP]], align 4
				; CHECK-NEXT: [[SUM_NEXT]] = add i32 [[SUM]], [[LV]]
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000
				; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: [[SUM_NEXT_LCSSA_PH:%.*]] = phi i32 [ [[SUM_NEXT]], [[LOOP_LATCH]] ]
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i32 [ [[SUM_NEXT_PEEL]], [[LOOP_LATCH_PEEL]] ], [ [[SUM_NEXT_LCSSA_PH]], [[EXIT_LOOPEXIT]] ]
				; CHECK-NEXT: ret i32 [[SUM_NEXT_LCSSA]]
				; CHECK: unreachable.exit.loopexit:
				; CHECK-NEXT: br label [[UNREACHABLE_EXIT]]
				; CHECK: unreachable.exit:
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: unreachable
				;
				entry:
				br label %loop.header

				loop.header:
				%iv = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
				%sum = phi i32 [ 0, %entry ], [ %sum.next, %loop.latch ]
				br i1 %c.1, label %then, label %unreachable.exit

				then:
				%i = load i32, i32* %inv
				%c.2 = icmp ult i32 %i, 2
				br i1 %c.2, label %loop.latch, label %unreachable.exit

				loop.latch:
				%gep = getelementptr i32, i32* %ptr, i32 %iv
				%lv = load i32, i32* %gep
				%sum.next = add i32 %sum, %lv
				%iv.next = add nuw nsw i32 %iv, 1
				%c.3 = icmp ult i32 %iv, 1000
				br i1 %c.3, label %loop.header, label %exit

				exit:
				ret i32 %sum.next

				unreachable.exit:
				call void @foo()
				unreachable
				}

				define i32 @peel_readonly_to_make_loads_derefenceable_exits_lead_to_unreachable(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {
				; CHECK-LABEL: @peel_readonly_to_make_loads_derefenceable_exits_lead_to_unreachable(
				; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[LOOP_LATCH]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[UNREACHABLE_EXIT:%.*]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[EXIT_2:%.*]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: [[I:%.]] = load i32, i32 [[INV:%.*]], align 4			; CHECK-NEXT: [[I:%.]] = load i32, i32 [[INV:%.*]], align 4
	; CHECK-NEXT: [[C_2:%.*]] = icmp ult i32 [[I]], 2			; CHECK-NEXT: [[C_2:%.*]] = icmp ult i32 [[I]], 2
	; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[UNREACHABLE_EXIT]]			; CHECK-NEXT: br i1 [[C_2]], label [[THEN_2:%.*]], label [[EXIT_2]]
				; CHECK: then.2:
				; CHECK-NEXT: [[C_4:%.*]] = icmp ult i32 [[I]], 4
				; CHECK-NEXT: br i1 [[C_4]], label [[LOOP_LATCH]], label [[EXIT_3:%.*]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]
	; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[GEP]], align 4			; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[GEP]], align 4
	; CHECK-NEXT: [[SUM_NEXT]] = add i32 [[SUM]], [[LV]]			; CHECK-NEXT: [[SUM_NEXT]] = add i32 [[SUM]], [[LV]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
	; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000			; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000
	; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i32 [ [[SUM_NEXT]], [[LOOP_LATCH]] ]			; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i32 [ [[SUM_NEXT]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: ret i32 [[SUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[SUM_NEXT_LCSSA]]
	; CHECK: unreachable.exit:			; CHECK: exit.2:
				; CHECK-NEXT: br label [[UNREACHABLE_BB:%.*]]
				; CHECK: exit.3:
				; CHECK-NEXT: br label [[UNREACHABLE_BB]]
				; CHECK: unreachable.bb:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
	%sum = phi i32 [ 0, %entry ], [ %sum.next, %loop.latch ]			%sum = phi i32 [ 0, %entry ], [ %sum.next, %loop.latch ]
	br i1 %c.1, label %then, label %unreachable.exit			br i1 %c.1, label %then, label %exit.2

	then:			then:
	%i = load i32, i32* %inv			%i = load i32, i32* %inv
	%c.2 = icmp ult i32 %i, 2			%c.2 = icmp ult i32 %i, 2
	br i1 %c.2, label %loop.latch, label %unreachable.exit			br i1 %c.2, label %then.2, label %exit.2

				then.2:
				%c.4 = icmp ult i32 %i, 4
				br i1 %c.4, label %loop.latch, label %exit.3

	loop.latch:			loop.latch:
	%gep = getelementptr i32, i32* %ptr, i32 %iv			%gep = getelementptr i32, i32* %ptr, i32 %iv
	%lv = load i32, i32* %gep			%lv = load i32, i32* %gep
	%sum.next = add i32 %sum, %lv			%sum.next = add i32 %sum, %lv
	%iv.next = add nuw nsw i32 %iv, 1			%iv.next = add nuw nsw i32 %iv, 1
	%c.3 = icmp ult i32 %iv, 1000			%c.3 = icmp ult i32 %iv, 1000
	br i1 %c.3, label %loop.header, label %exit			br i1 %c.3, label %loop.header, label %exit

	exit:			exit:
	ret i32 %sum.next			ret i32 %sum.next

	unreachable.exit:			exit.2:
				br label %unreachable.bb

				exit.3:
				br label %unreachable.bb

				unreachable.bb:
	call void @foo()			call void @foo()
	unreachable			unreachable
	}			}

	define i32 @do_not_peel_readonly_load_in_header(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {			define i32 @do_not_peel_readonly_load_in_header(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {
	; CHECK-LABEL: @do_not_peel_readonly_load_in_header(			; CHECK-LABEL: @do_not_peel_readonly_load_in_header(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines

	unreachable.exit:			unreachable.exit:
	call void @foo()			call void @foo()
	unreachable			unreachable
	}			}

	declare i32 @llvm.experimental.deoptimize.i32(...)			declare i32 @llvm.experimental.deoptimize.i32(...)

	define i32 @do_not_peel_with_deopt_exit(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {			define i32 @peel_with_deopt_exit(i32* %ptr, i32 %N, i32* %inv, i1 %c.1) {
	; CHECK-LABEL: @do_not_peel_with_deopt_exit(			; CHECK-LABEL: @peel_with_deopt_exit(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[LOOP_LATCH]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[UNREACHABLE_EXIT:%.*]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[DEOPT_EXIT:%.*]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: [[I:%.]] = load i32, i32 [[INV:%.*]], align 4			; CHECK-NEXT: [[I:%.]] = load i32, i32 [[INV:%.*]], align 4
	; CHECK-NEXT: [[C_2:%.*]] = icmp ult i32 [[I]], 2			; CHECK-NEXT: [[C_2:%.*]] = icmp ult i32 [[I]], 2
	; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[UNREACHABLE_EXIT]]			; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[DEOPT_EXIT]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]
	; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[GEP]], align 4			; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[GEP]], align 4
	; CHECK-NEXT: [[SUM_NEXT]] = add i32 [[SUM]], [[LV]]			; CHECK-NEXT: [[SUM_NEXT]] = add i32 [[SUM]], [[LV]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
	; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000			; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000
	; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i32 [ [[SUM_NEXT]], [[LOOP_LATCH]] ]			; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.*]] = phi i32 [ [[SUM_NEXT]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: ret i32 [[SUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[SUM_NEXT_LCSSA]]
	; CHECK: unreachable.exit:			; CHECK: deopt.exit:
	; CHECK-NEXT: [[SUM_LCSSA:%.*]] = phi i32 [ [[SUM]], [[THEN]] ], [ [[SUM]], [[LOOP_HEADER]] ]			; CHECK-NEXT: [[SUM_LCSSA:%.*]] = phi i32 [ [[SUM]], [[THEN]] ], [ [[SUM]], [[LOOP_HEADER]] ]
	; CHECK-NEXT: [[RVAL:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 [[SUM_LCSSA]]) ]			; CHECK-NEXT: [[RVAL:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 [[SUM_LCSSA]]) ]
	; CHECK-NEXT: ret i32 [[RVAL]]			; CHECK-NEXT: ret i32 [[RVAL]]
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
	%sum = phi i32 [ 0, %entry ], [ %sum.next, %loop.latch ]			%sum = phi i32 [ 0, %entry ], [ %sum.next, %loop.latch ]
	br i1 %c.1, label %then, label %unreachable.exit			br i1 %c.1, label %then, label %deopt.exit

	then:			then:
	%i = load i32, i32* %inv			%i = load i32, i32* %inv
	%c.2 = icmp ult i32 %i, 2			%c.2 = icmp ult i32 %i, 2
	br i1 %c.2, label %loop.latch, label %unreachable.exit			br i1 %c.2, label %loop.latch, label %deopt.exit

	loop.latch:			loop.latch:
	%gep = getelementptr i32, i32* %ptr, i32 %iv			%gep = getelementptr i32, i32* %ptr, i32 %iv
	%lv = load i32, i32* %gep			%lv = load i32, i32* %gep
	%sum.next = add i32 %sum, %lv			%sum.next = add i32 %sum, %lv
	%iv.next = add nuw nsw i32 %iv, 1			%iv.next = add nuw nsw i32 %iv, 1
	%c.3 = icmp ult i32 %iv, 1000			%c.3 = icmp ult i32 %iv, 1000
	br i1 %c.3, label %loop.header, label %exit			br i1 %c.3, label %loop.header, label %exit

	exit:			exit:
	ret i32 %sum.next			ret i32 %sum.next

	unreachable.exit:			deopt.exit:
	%rval = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %sum) ]			%rval = call i32(...) @llvm.experimental.deoptimize.i32() [ "deopt"(i32 %sum) ]
	ret i32 %rval			ret i32 %rval
	}			}

	define i32 @do_not_peel_when_header_exiting(i32* %ptr, i32 %N, i32* %inv) {			define i32 @do_not_peel_when_header_exiting(i32* %ptr, i32 %N, i32* %inv) {
	; CHECK-LABEL: @do_not_peel_when_header_exiting(			; CHECK-LABEL: @do_not_peel_when_header_exiting(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -mtriple=arm64-apple-ios -S %s \| FileCheck %s			; RUN: opt -O2 -mtriple=arm64-apple-ios -S %s \| FileCheck %s

	%vec = type { i64, i64 }			%vec = type { i64, i64 }

	; Test to ensure a loop with multiple loads guarded by runtime-checks (like			; Test to ensure a loop with multiple loads guarded by runtime-checks (like
	; from multiple calls to C++'s std::vector::at) can be vectorized after			; from multiple calls to C++'s std::vector::at) can be vectorized after
	; hoisting the runtime checks out of the loop.			; hoisting the runtime checks out of the loop.

	define i64 @sum_2_at_with_int_conversion(%vec* %A, %vec* %B, i64 %N) {			define i64 @sum_2_at_with_int_conversion(%vec* %A, %vec* %B, i64 %N) {
	; CHECK-LABEL: @sum_2_at_with_int_conversion(			; CHECK-LABEL: @sum_2_at_with_int_conversion(
	; CHECK-NEXT: entry:			; CHECK-NEXT: at_with_int_conversion.exit12.peel:
	; CHECK-NEXT: [[GEP_START_I:%.]] = getelementptr [[VEC:%.]], %vec* [[A:%.*]], i64 0, i32 0			; CHECK-NEXT: [[GEP_START_I:%.]] = getelementptr [[VEC:%.]], %vec* [[A:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[START_I:%.]] = load i64, i64** [[GEP_START_I]], align 8			; CHECK-NEXT: [[START_I:%.]] = load i64, i64** [[GEP_START_I]], align 8
	; CHECK-NEXT: [[GEP_END_I:%.]] = getelementptr [[VEC]], %vec [[A]], i64 0, i32 1			; CHECK-NEXT: [[GEP_END_I:%.]] = getelementptr [[VEC]], %vec [[A]], i64 0, i32 1
	; CHECK-NEXT: [[END_I:%.]] = load i64, i64** [[GEP_END_I]], align 8			; CHECK-NEXT: [[END_I:%.]] = load i64, i64** [[GEP_END_I]], align 8
	; CHECK-NEXT: [[START_INT_I:%.]] = ptrtoint i64 [[START_I]] to i64			; CHECK-NEXT: [[START_INT_I:%.]] = ptrtoint i64 [[START_I]] to i64
	; CHECK-NEXT: [[END_INT_I:%.]] = ptrtoint i64 [[END_I]] to i64			; CHECK-NEXT: [[END_INT_I:%.]] = ptrtoint i64 [[END_I]] to i64
	; CHECK-NEXT: [[SUB_I:%.*]] = sub i64 [[END_INT_I]], [[START_INT_I]]			; CHECK-NEXT: [[SUB_I:%.*]] = sub i64 [[END_INT_I]], [[START_INT_I]]
	; CHECK-NEXT: [[GEP_START_I1:%.]] = getelementptr [[VEC]], %vec [[B:%.*]], i64 0, i32 0			; CHECK-NEXT: [[GEP_END_I3:%.]] = getelementptr [[VEC]], %vec [[B:%.*]], i64 0, i32 1
	; CHECK-NEXT: [[GEP_END_I3:%.]] = getelementptr [[VEC]], %vec [[B]], i64 0, i32 1			; CHECK-NEXT: [[GEP_START_I1:%.]] = getelementptr [[VEC]], %vec [[B]], i64 0, i32 0
				; CHECK-NEXT: [[START_I2_PEEL:%.]] = load i64, i64** [[GEP_START_I1]], align 8
				; CHECK-NEXT: [[END_I4_PEEL:%.]] = load i64, i64** [[GEP_END_I3]], align 8
				; CHECK-NEXT: [[START_INT_I5_PEEL:%.]] = ptrtoint i64 [[START_I2_PEEL]] to i64
				; CHECK-NEXT: [[END_INT_I6_PEEL:%.]] = ptrtoint i64 [[END_I4_PEEL]] to i64
				; CHECK-NEXT: [[SUB_I7_PEEL:%.*]] = sub i64 [[END_INT_I6_PEEL]], [[START_INT_I5_PEEL]]
				; CHECK-NEXT: [[LV_I_PEEL:%.]] = load i64, i64 [[START_I]], align 4
				; CHECK-NEXT: [[LV_I10_PEEL:%.]] = load i64, i64 [[START_I2_PEEL]], align 4
				; CHECK-NEXT: [[SUM_NEXT_PEEL:%.*]] = add i64 [[LV_I_PEEL]], [[LV_I10_PEEL]]
				; CHECK-NEXT: [[C_PEEL:%.]] = icmp sgt i64 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[C_PEEL]], label [[LOOP_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: [[UMIN:%.*]] = call i64 @llvm.umin.i64(i64 [[SUB_I7_PEEL]], i64 [[SUB_I]])
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
				; CHECK-NEXT: [[UMIN16:%.*]] = call i64 @llvm.umin.i64(i64 [[UMIN]], i64 [[TMP0]])
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[UMIN16]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 5
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[LOOP_PREHEADER22:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = and i64 [[TMP1]], 3
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i64 4, i64 [[N_MOD_VF]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[TMP3]]
				; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[N_VEC]], 1
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> <i64 poison, i64 0>, i64 [[SUM_NEXT_PEEL]], i32 0
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i64> [ [[TMP4]], [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI18:%.]] = phi <2 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr i64, i64 [[START_I]], i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 4
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr i64, i64 [[TMP5]], i64 2
				; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[TMP7]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD19:%.]] = load <2 x i64>, <2 x i64> [[TMP8]], align 4
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr i64, i64 [[START_I2_PEEL]], i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP10:%.]] = bitcast i64 [[TMP9]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD20:%.]] = load <2 x i64>, <2 x i64> [[TMP10]], align 4
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr i64, i64 [[TMP9]], i64 2
				; CHECK-NEXT: [[TMP12:%.]] = bitcast i64 [[TMP11]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD21:%.]] = load <2 x i64>, <2 x i64> [[TMP12]], align 4
				; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
				; CHECK-NEXT: [[TMP14:%.*]] = add <2 x i64> [[WIDE_LOAD19]], [[VEC_PHI18]]
				; CHECK-NEXT: [[TMP15]] = add <2 x i64> [[TMP13]], [[WIDE_LOAD20]]
				; CHECK-NEXT: [[TMP16]] = add <2 x i64> [[TMP14]], [[WIDE_LOAD21]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i64> [[TMP16]], [[TMP15]]
				; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[BIN_RDX]])
				; CHECK-NEXT: br label [[LOOP_PREHEADER22]]
				; CHECK: loop.preheader22:
				; CHECK-NEXT: [[IV_PH:%.*]] = phi i64 [ 1, [[LOOP_PREHEADER]] ], [ [[IND_END]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: [[SUM_PH:%.*]] = phi i64 [ [[SUM_NEXT_PEEL]], [[LOOP_PREHEADER]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT12:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT12:%.*]] ], [ [[IV_PH]], [[LOOP_PREHEADER22]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT12]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i64 [ [[SUM_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT12]] ], [ [[SUM_PH]], [[LOOP_PREHEADER22]] ]
	; CHECK-NEXT: [[INRANGE_I:%.*]] = icmp ult i64 [[SUB_I]], [[IV]]			; CHECK-NEXT: [[INRANGE_I:%.*]] = icmp ult i64 [[SUB_I]], [[IV]]
	; CHECK-NEXT: br i1 [[INRANGE_I]], label [[ERROR_I:%.]], label [[AT_WITH_INT_CONVERSION_EXIT:%.]]			; CHECK-NEXT: br i1 [[INRANGE_I]], label [[ERROR_I:%.]], label [[AT_WITH_INT_CONVERSION_EXIT:%.]]
	; CHECK: error.i:			; CHECK: error.i:
	; CHECK-NEXT: tail call void @error()			; CHECK-NEXT: tail call void @error()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: at_with_int_conversion.exit:			; CHECK: at_with_int_conversion.exit:
	; CHECK-NEXT: [[START_I2:%.]] = load i64, i64** [[GEP_START_I1]], align 8			; CHECK-NEXT: [[INRANGE_I8:%.*]] = icmp ult i64 [[SUB_I7_PEEL]], [[IV]]
	; CHECK-NEXT: [[END_I4:%.]] = load i64, i64** [[GEP_END_I3]], align 8
	; CHECK-NEXT: [[START_INT_I5:%.]] = ptrtoint i64 [[START_I2]] to i64
	; CHECK-NEXT: [[END_INT_I6:%.]] = ptrtoint i64 [[END_I4]] to i64
	; CHECK-NEXT: [[SUB_I7:%.*]] = sub i64 [[END_INT_I6]], [[START_INT_I5]]
	; CHECK-NEXT: [[INRANGE_I8:%.*]] = icmp ult i64 [[SUB_I7]], [[IV]]
	; CHECK-NEXT: br i1 [[INRANGE_I8]], label [[ERROR_I11:%.*]], label [[AT_WITH_INT_CONVERSION_EXIT12]]			; CHECK-NEXT: br i1 [[INRANGE_I8]], label [[ERROR_I11:%.*]], label [[AT_WITH_INT_CONVERSION_EXIT12]]
	; CHECK: error.i11:			; CHECK: error.i11:
	; CHECK-NEXT: tail call void @error()			; CHECK-NEXT: tail call void @error()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: at_with_int_conversion.exit12:			; CHECK: at_with_int_conversion.exit12:
	; CHECK-NEXT: [[GEP_IDX_I:%.]] = getelementptr i64, i64 [[START_I]], i64 [[IV]]			; CHECK-NEXT: [[GEP_IDX_I:%.]] = getelementptr i64, i64 [[START_I]], i64 [[IV]]
	; CHECK-NEXT: [[LV_I:%.]] = load i64, i64 [[GEP_IDX_I]], align 4			; CHECK-NEXT: [[LV_I:%.]] = load i64, i64 [[GEP_IDX_I]], align 4
	; CHECK-NEXT: [[GEP_IDX_I9:%.]] = getelementptr i64, i64 [[START_I2]], i64 [[IV]]			; CHECK-NEXT: [[GEP_IDX_I9:%.]] = getelementptr i64, i64 [[START_I2_PEEL]], i64 [[IV]]
	; CHECK-NEXT: [[LV_I10:%.]] = load i64, i64 [[GEP_IDX_I9]], align 4			; CHECK-NEXT: [[LV_I10:%.]] = load i64, i64 [[GEP_IDX_I9]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add i64 [[LV_I]], [[SUM]]			; CHECK-NEXT: [[ADD:%.*]] = add i64 [[LV_I]], [[SUM]]
	; CHECK-NEXT: [[SUM_NEXT]] = add i64 [[ADD]], [[LV_I10]]			; CHECK-NEXT: [[SUM_NEXT]] = add i64 [[ADD]], [[LV_I10]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[C:%.]] = icmp slt i64 [[IV]], [[N:%.]]			; CHECK-NEXT: [[C:%.*]] = icmp slt i64 [[IV]], [[N]]
	; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i64 [[SUM_NEXT]]			; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.]] = phi i64 [ [[SUM_NEXT_PEEL]], [[AT_WITH_INT_CONVERSION_EXIT12_PEEL:%.]] ], [ [[SUM_NEXT]], [[AT_WITH_INT_CONVERSION_EXIT12]] ]
				; CHECK-NEXT: ret i64 [[SUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop ]			%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop ]
	%a = call i64 @at_with_int_conversion(%vec* %A, i64 %iv)			%a = call i64 @at_with_int_conversion(%vec* %A, i64 %iv)
	%b = call i64 @at_with_int_conversion(%vec* %B, i64 %iv)			%b = call i64 @at_with_int_conversion(%vec* %B, i64 %iv)
	%add = add i64 %a, %b			%add = add i64 %a, %b
	%sum.next = add i64 %sum, %add			%sum.next = add i64 %sum, %add
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%c = icmp slt i64 %iv, %N			%c = icmp slt i64 %iv, %N
	br i1 %c, label %loop, label %exit			br i1 %c, label %loop, label %exit

	exit:			exit:
	ret i64 %sum.next			ret i64 %sum.next
	}			}

	define i64 @sum_3_at_with_int_conversion(%vec* %A, %vec* %B, %vec* %C, i64 %N) {			define i64 @sum_3_at_with_int_conversion(%vec* %A, %vec* %B, %vec* %C, i64 %N) {
	; CHECK-LABEL: @sum_3_at_with_int_conversion(			; CHECK-LABEL: @sum_3_at_with_int_conversion(
	; CHECK-NEXT: entry:			; CHECK-NEXT: at_with_int_conversion.exit24.peel:
	; CHECK-NEXT: [[GEP_START_I:%.]] = getelementptr [[VEC:%.]], %vec* [[A:%.*]], i64 0, i32 0			; CHECK-NEXT: [[GEP_START_I:%.]] = getelementptr [[VEC:%.]], %vec* [[A:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[START_I:%.]] = load i64, i64** [[GEP_START_I]], align 8			; CHECK-NEXT: [[START_I:%.]] = load i64, i64** [[GEP_START_I]], align 8
	; CHECK-NEXT: [[GEP_END_I:%.]] = getelementptr [[VEC]], %vec [[A]], i64 0, i32 1			; CHECK-NEXT: [[GEP_END_I:%.]] = getelementptr [[VEC]], %vec [[A]], i64 0, i32 1
	; CHECK-NEXT: [[END_I:%.]] = load i64, i64** [[GEP_END_I]], align 8			; CHECK-NEXT: [[END_I:%.]] = load i64, i64** [[GEP_END_I]], align 8
	; CHECK-NEXT: [[START_INT_I:%.]] = ptrtoint i64 [[START_I]] to i64			; CHECK-NEXT: [[START_INT_I:%.]] = ptrtoint i64 [[START_I]] to i64
	; CHECK-NEXT: [[END_INT_I:%.]] = ptrtoint i64 [[END_I]] to i64			; CHECK-NEXT: [[END_INT_I:%.]] = ptrtoint i64 [[END_I]] to i64
	; CHECK-NEXT: [[SUB_I:%.*]] = sub i64 [[END_INT_I]], [[START_INT_I]]			; CHECK-NEXT: [[SUB_I:%.*]] = sub i64 [[END_INT_I]], [[START_INT_I]]
	; CHECK-NEXT: [[GEP_START_I1:%.]] = getelementptr [[VEC]], %vec [[B:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[GEP_END_I3:%.]] = getelementptr [[VEC]], %vec [[B]], i64 0, i32 1
	; CHECK-NEXT: [[GEP_START_I13:%.]] = getelementptr [[VEC]], %vec [[C:%.*]], i64 0, i32 0			; CHECK-NEXT: [[GEP_START_I13:%.]] = getelementptr [[VEC]], %vec [[C:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[GEP_END_I15:%.]] = getelementptr [[VEC]], %vec [[C]], i64 0, i32 1			; CHECK-NEXT: [[GEP_END_I15:%.]] = getelementptr [[VEC]], %vec [[C]], i64 0, i32 1
				; CHECK-NEXT: [[GEP_END_I3:%.]] = getelementptr [[VEC]], %vec [[B:%.*]], i64 0, i32 1
				; CHECK-NEXT: [[GEP_START_I1:%.]] = getelementptr [[VEC]], %vec [[B]], i64 0, i32 0
				; CHECK-NEXT: [[LV_I_PEEL:%.]] = load i64, i64 [[START_I]], align 4
				; CHECK-NEXT: [[START_I2_PEEL:%.]] = load i64, i64** [[GEP_START_I1]], align 8
				; CHECK-NEXT: [[END_I4_PEEL:%.]] = load i64, i64** [[GEP_END_I3]], align 8
				; CHECK-NEXT: [[START_INT_I5_PEEL:%.]] = ptrtoint i64 [[START_I2_PEEL]] to i64
				; CHECK-NEXT: [[END_INT_I6_PEEL:%.]] = ptrtoint i64 [[END_I4_PEEL]] to i64
				; CHECK-NEXT: [[SUB_I7_PEEL:%.*]] = sub i64 [[END_INT_I6_PEEL]], [[START_INT_I5_PEEL]]
				; CHECK-NEXT: [[START_I14_PEEL:%.]] = load i64, i64** [[GEP_START_I13]], align 8
				; CHECK-NEXT: [[END_I16_PEEL:%.]] = load i64, i64** [[GEP_END_I15]], align 8
				; CHECK-NEXT: [[START_INT_I17_PEEL:%.]] = ptrtoint i64 [[START_I14_PEEL]] to i64
				; CHECK-NEXT: [[END_INT_I18_PEEL:%.]] = ptrtoint i64 [[END_I16_PEEL]] to i64
				; CHECK-NEXT: [[SUB_I19_PEEL:%.*]] = sub i64 [[END_INT_I18_PEEL]], [[START_INT_I17_PEEL]]
				; CHECK-NEXT: [[LV_I10_PEEL:%.]] = load i64, i64 [[START_I2_PEEL]], align 4
				; CHECK-NEXT: [[LV_I22_PEEL:%.]] = load i64, i64 [[START_I14_PEEL]], align 4
				; CHECK-NEXT: [[ADD_2_PEEL:%.*]] = add i64 [[LV_I_PEEL]], [[LV_I10_PEEL]]
				; CHECK-NEXT: [[SUM_NEXT_PEEL:%.*]] = add i64 [[ADD_2_PEEL]], [[LV_I22_PEEL]]
				; CHECK-NEXT: [[COND_PEEL:%.]] = icmp sgt i64 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[COND_PEEL]], label [[LOOP_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: [[UMIN:%.*]] = call i64 @llvm.umin.i64(i64 [[SUB_I19_PEEL]], i64 [[SUB_I7_PEEL]])
				; CHECK-NEXT: [[UMIN28:%.*]] = call i64 @llvm.umin.i64(i64 [[UMIN]], i64 [[SUB_I]])
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
				; CHECK-NEXT: [[UMIN29:%.*]] = call i64 @llvm.umin.i64(i64 [[UMIN28]], i64 [[TMP0]])
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[UMIN29]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 5
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[LOOP_PREHEADER37:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = and i64 [[TMP1]], 3
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i64 4, i64 [[N_MOD_VF]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[TMP3]]
				; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[N_VEC]], 1
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> <i64 poison, i64 0>, i64 [[SUM_NEXT_PEEL]], i32 0
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i64> [ [[TMP4]], [[VECTOR_PH]] ], [ [[TMP21:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_PHI31:%.]] = phi <2 x i64> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP22:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr i64, i64 [[START_I]], i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 4
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr i64, i64 [[TMP5]], i64 2
				; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[TMP7]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD32:%.]] = load <2 x i64>, <2 x i64> [[TMP8]], align 4
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr i64, i64 [[START_I2_PEEL]], i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP10:%.]] = bitcast i64 [[TMP9]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD33:%.]] = load <2 x i64>, <2 x i64> [[TMP10]], align 4
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr i64, i64 [[TMP9]], i64 2
				; CHECK-NEXT: [[TMP12:%.]] = bitcast i64 [[TMP11]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD34:%.]] = load <2 x i64>, <2 x i64> [[TMP12]], align 4
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr i64, i64 [[START_I14_PEEL]], i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP14:%.]] = bitcast i64 [[TMP13]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD35:%.]] = load <2 x i64>, <2 x i64> [[TMP14]], align 4
				; CHECK-NEXT: [[TMP15:%.]] = getelementptr i64, i64 [[TMP13]], i64 2
				; CHECK-NEXT: [[TMP16:%.]] = bitcast i64 [[TMP15]] to <2 x i64>*
				; CHECK-NEXT: [[WIDE_LOAD36:%.]] = load <2 x i64>, <2 x i64> [[TMP16]], align 4
				; CHECK-NEXT: [[TMP17:%.*]] = add <2 x i64> [[WIDE_LOAD]], [[VEC_PHI]]
				; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i64> [[WIDE_LOAD32]], [[VEC_PHI31]]
				; CHECK-NEXT: [[TMP19:%.*]] = add <2 x i64> [[TMP17]], [[WIDE_LOAD33]]
				; CHECK-NEXT: [[TMP20:%.*]] = add <2 x i64> [[TMP18]], [[WIDE_LOAD34]]
				; CHECK-NEXT: [[TMP21]] = add <2 x i64> [[TMP19]], [[WIDE_LOAD35]]
				; CHECK-NEXT: [[TMP22]] = add <2 x i64> [[TMP20]], [[WIDE_LOAD36]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i64> [[TMP22]], [[TMP21]]
				; CHECK-NEXT: [[TMP24:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[BIN_RDX]])
				; CHECK-NEXT: br label [[LOOP_PREHEADER37]]
				; CHECK: loop.preheader37:
				; CHECK-NEXT: [[IV_PH:%.*]] = phi i64 [ 1, [[LOOP_PREHEADER]] ], [ [[IND_END]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: [[SUM_PH:%.*]] = phi i64 [ [[SUM_NEXT_PEEL]], [[LOOP_PREHEADER]] ], [ [[TMP24]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT24:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT24:%.*]] ], [ [[IV_PH]], [[LOOP_PREHEADER37]] ]
	; CHECK-NEXT: [[SUM:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[SUM_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT24]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i64 [ [[SUM_NEXT:%.]], [[AT_WITH_INT_CONVERSION_EXIT24]] ], [ [[SUM_PH]], [[LOOP_PREHEADER37]] ]
	; CHECK-NEXT: [[INRANGE_I:%.*]] = icmp ult i64 [[SUB_I]], [[IV]]			; CHECK-NEXT: [[INRANGE_I:%.*]] = icmp ult i64 [[SUB_I]], [[IV]]
	; CHECK-NEXT: br i1 [[INRANGE_I]], label [[ERROR_I:%.]], label [[AT_WITH_INT_CONVERSION_EXIT:%.]]			; CHECK-NEXT: br i1 [[INRANGE_I]], label [[ERROR_I:%.]], label [[AT_WITH_INT_CONVERSION_EXIT:%.]]
	; CHECK: error.i:			; CHECK: error.i:
	; CHECK-NEXT: tail call void @error()			; CHECK-NEXT: tail call void @error()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: at_with_int_conversion.exit:			; CHECK: at_with_int_conversion.exit:
	; CHECK-NEXT: [[GEP_IDX_I:%.]] = getelementptr i64, i64 [[START_I]], i64 [[IV]]			; CHECK-NEXT: [[GEP_IDX_I:%.]] = getelementptr i64, i64 [[START_I]], i64 [[IV]]
	; CHECK-NEXT: [[LV_I:%.]] = load i64, i64 [[GEP_IDX_I]], align 4			; CHECK-NEXT: [[LV_I:%.]] = load i64, i64 [[GEP_IDX_I]], align 4
	; CHECK-NEXT: [[START_I2:%.]] = load i64, i64** [[GEP_START_I1]], align 8			; CHECK-NEXT: [[INRANGE_I8:%.*]] = icmp ult i64 [[SUB_I7_PEEL]], [[IV]]
	; CHECK-NEXT: [[END_I4:%.]] = load i64, i64** [[GEP_END_I3]], align 8
	; CHECK-NEXT: [[START_INT_I5:%.]] = ptrtoint i64 [[START_I2]] to i64
	; CHECK-NEXT: [[END_INT_I6:%.]] = ptrtoint i64 [[END_I4]] to i64
	; CHECK-NEXT: [[SUB_I7:%.*]] = sub i64 [[END_INT_I6]], [[START_INT_I5]]
	; CHECK-NEXT: [[INRANGE_I8:%.*]] = icmp ult i64 [[SUB_I7]], [[IV]]
	; CHECK-NEXT: br i1 [[INRANGE_I8]], label [[ERROR_I11:%.]], label [[AT_WITH_INT_CONVERSION_EXIT12:%.]]			; CHECK-NEXT: br i1 [[INRANGE_I8]], label [[ERROR_I11:%.]], label [[AT_WITH_INT_CONVERSION_EXIT12:%.]]
	; CHECK: error.i11:			; CHECK: error.i11:
	; CHECK-NEXT: tail call void @error()			; CHECK-NEXT: tail call void @error()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: at_with_int_conversion.exit12:			; CHECK: at_with_int_conversion.exit12:
	; CHECK-NEXT: [[START_I14:%.]] = load i64, i64** [[GEP_START_I13]], align 8			; CHECK-NEXT: [[INRANGE_I20:%.*]] = icmp ult i64 [[SUB_I19_PEEL]], [[IV]]
	; CHECK-NEXT: [[END_I16:%.]] = load i64, i64** [[GEP_END_I15]], align 8
	; CHECK-NEXT: [[START_INT_I17:%.]] = ptrtoint i64 [[START_I14]] to i64
	; CHECK-NEXT: [[END_INT_I18:%.]] = ptrtoint i64 [[END_I16]] to i64
	; CHECK-NEXT: [[SUB_I19:%.*]] = sub i64 [[END_INT_I18]], [[START_INT_I17]]
	; CHECK-NEXT: [[INRANGE_I20:%.*]] = icmp ult i64 [[SUB_I19]], [[IV]]
	; CHECK-NEXT: br i1 [[INRANGE_I20]], label [[ERROR_I23:%.*]], label [[AT_WITH_INT_CONVERSION_EXIT24]]			; CHECK-NEXT: br i1 [[INRANGE_I20]], label [[ERROR_I23:%.*]], label [[AT_WITH_INT_CONVERSION_EXIT24]]
	; CHECK: error.i23:			; CHECK: error.i23:
	; CHECK-NEXT: tail call void @error()			; CHECK-NEXT: tail call void @error()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: at_with_int_conversion.exit24:			; CHECK: at_with_int_conversion.exit24:
	; CHECK-NEXT: [[GEP_IDX_I9:%.]] = getelementptr i64, i64 [[START_I2]], i64 [[IV]]			; CHECK-NEXT: [[GEP_IDX_I9:%.]] = getelementptr i64, i64 [[START_I2_PEEL]], i64 [[IV]]
	; CHECK-NEXT: [[LV_I10:%.]] = load i64, i64 [[GEP_IDX_I9]], align 4			; CHECK-NEXT: [[LV_I10:%.]] = load i64, i64 [[GEP_IDX_I9]], align 4
	; CHECK-NEXT: [[GEP_IDX_I21:%.]] = getelementptr i64, i64 [[START_I14]], i64 [[IV]]			; CHECK-NEXT: [[GEP_IDX_I21:%.]] = getelementptr i64, i64 [[START_I14_PEEL]], i64 [[IV]]
	; CHECK-NEXT: [[LV_I22:%.]] = load i64, i64 [[GEP_IDX_I21]], align 4			; CHECK-NEXT: [[LV_I22:%.]] = load i64, i64 [[GEP_IDX_I21]], align 4
	; CHECK-NEXT: [[ADD_1:%.*]] = add i64 [[LV_I]], [[SUM]]			; CHECK-NEXT: [[ADD_1:%.*]] = add i64 [[LV_I]], [[SUM]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i64 [[ADD_1]], [[LV_I10]]			; CHECK-NEXT: [[ADD_2:%.*]] = add i64 [[ADD_1]], [[LV_I10]]
	; CHECK-NEXT: [[SUM_NEXT]] = add i64 [[ADD_2]], [[LV_I22]]			; CHECK-NEXT: [[SUM_NEXT]] = add i64 [[ADD_2]], [[LV_I22]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[COND:%.]] = icmp slt i64 [[IV]], [[N:%.]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[IV]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[COND]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i64 [[SUM_NEXT]]			; CHECK-NEXT: [[SUM_NEXT_LCSSA:%.]] = phi i64 [ [[SUM_NEXT_PEEL]], [[AT_WITH_INT_CONVERSION_EXIT24_PEEL:%.]] ], [ [[SUM_NEXT]], [[AT_WITH_INT_CONVERSION_EXIT24]] ]
				; CHECK-NEXT: ret i64 [[SUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop ]			%sum = phi i64 [ 0, %entry ], [ %sum.next, %loop ]
	%a = call i64 @at_with_int_conversion(%vec* %A, i64 %iv)			%a = call i64 @at_with_int_conversion(%vec* %A, i64 %iv)
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopPeel] Peel if it turns invariant loads dereferenceable.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 378954

llvm/include/llvm/Transforms/Utils/LoopPeel.h

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

llvm/lib/Transforms/Utils/LoopPeel.cpp

llvm/test/Transforms/LoopUnroll/peel-to-turn-invariant-accesses-dereferenceable.ll

llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll

[LoopPeel] Peel if it turns invariant loads dereferenceable.
ClosedPublic