This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] adjust for new `convergent` semantics
Needs ReviewPublic

Authored by sameerds on Jun 26 2023, 12:52 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
foad
arsenm
efriedma
jdoerfert

Summary

Adjust the unrolling check for the new semantics:

There is no restriction on loops with convergent operations that are controlled inside the loop -- their behavior with respect to cross-thread communication is (partially) implementation-defined, and the loop unrolling is part of the implementation, so...

Fix unrolling with loop heart intrinsics: in a typical loop with a loop heart in the header that uses a token from outside the loop, duplicating the intrinsic would introduce multiple static uses of a convergence control token in a cycle that does not contain its definition.

Spell out the setup of UnrollLoopOptions to reduce the potential for
confusion caused by very long struct initializers.

Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

sameerds created this revision.Jun 26 2023, 12:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 12:52 AM

Herald added subscribers: zzheng, hiraditya. · View Herald Transcript

sameerds requested review of this revision.Jun 26 2023, 12:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 12:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sameerds added reviewers: nhaehnle, foad, arsenm, efriedma, jdoerfert.Jun 26 2023, 12:54 AM

sameerds added a subscriber: Restricted Project.

Herald added subscribers: StephenFan, wdng. · View Herald TranscriptJun 26 2023, 12:54 AM

sameerds added a parent revision: D152431: [Inliner] Handle convergence control when inlining a call.Jun 26 2023, 12:55 AM

Harbormaster completed remote builds in B241104: Diff 534453.Jun 26 2023, 1:02 AM

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

Also could you point reviewer's to the documentation on the heart intrinsic system and how it relates to the current modeling? I know there's a slide deque or two somewhere, hoping for something more formal.

In D153744#4448078, @JonChesterfield wrote:

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

Also could you point reviewer's to the documentation on the heart intrinsic system and how it relates to the current modeling? I know there's a slide deque or two somewhere, hoping for something more formal.

Thanks for taking an interest, Jon. There's a reason why the intrinsics have the word "experimental" in their name. Risk is limited to any flow that actually decides to use them, and no one is being exhorted to actually run this stuff on their GPU at this point. As for documentation, there's definitely been a LOT more activity than just a "slide deck or two", both inside AMD as well as outside. Please do refer to the "stack" of reviews that this Phab review belongs to.

Also, there's this, for links to all historical stuff:
https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613

In D153744#4448078, @JonChesterfield wrote:

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

The fun part is things are

llvm/lib/Transforms/Utils/LoopUnroll.cpp
472–497	This is a lot for one LLVM_DEBUG, move this all to a function?

! In D153744#4448146, @sameerds wrote:
As for documentation, there's definitely been a LOT more activity than just a "slide deck or two", both inside AMD as well as outside. Please do refer to the "stack" of reviews that this Phab review belongs to.

Also, there's this, for links to all historical stuff:
https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613

It's been a multi-year thing with multiple authors, I doubt trawling the history will provide anyone with an accurate idea of what we're currently trying to build.

This stack is revising how control flow is modelled in llvm for GPUs. The failure modes are substantial. Could part of the implementation effort be directed to writing down what the overall algorithm is intended to be, somewhere that is visible outside of AMD? Hopefully that's a direct adaption of whatever design document you are working from internally.

In D153744#4448501, @JonChesterfield wrote:

It's been a multi-year thing with multiple authors, I doubt trawling the history will provide anyone with an accurate idea of what we're currently trying to build.

This stack is revising how control flow is modelled in llvm for GPUs. The failure modes are substantial. Could part of the implementation effort be directed to writing down what the overall algorithm is intended to be, somewhere that is visible outside of AMD? Hopefully that's a direct adaption of whatever design document you are working from internally.

This review points to earlier "parent" reviews. In particular:

D147116: [RFC] Introduce convergence control intrinsics

That's the very first review in this stack.

arsenm added inline comments.Jun 26 2023, 11:15 AM

llvm/lib/Analysis/CodeMetrics.cpp
114–138	It might be worth introducing a ConvergenceControlInst subclass of IntrinsicInst

rebase

For what it's worth, the formal semantics of convergence tokens and intrinsics landed in LLVM. This patch relaxes the constraints on loop unrolling in the presence of convergence tokens as defined in that spec:

https://llvm.org/docs/ConvergentOperations.html

Harbormaster completed remote builds in B245805: Diff 540971.Jul 17 2023, 7:35 AM

rebased
addressed the comment about having too much code inside LLVM_DEBUG

sameerds marked 2 inline comments as done.Jul 18 2023, 9:27 AM

sameerds added inline comments.

llvm/lib/Analysis/CodeMetrics.cpp
114–138	I suppose so, but it seems too early to attempt for an experimental feature!
llvm/lib/Transforms/Utils/LoopUnroll.cpp
472–497	Fixed by moving it out into a function.

sameerds edited parent revisions, added: D147116: [RFC] Introduce convergence control intrinsics; removed: D152431: [Inliner] Handle convergence control when inlining a call.Jul 18 2023, 9:31 AM

arsenm added inline comments.Jul 18 2023, 10:55 AM

llvm/lib/Analysis/CodeMetrics.cpp
114–138	Part of the experiment is how inconvenient it is, and also there's no better time to have the convenient code than when implementing it. We have ConstrainedFPIntrinsic for "experimental" features too. It's not a lot of work to add (not that it should be part of this patch)

Harbormaster completed remote builds in B246275: Diff 541598.Jul 18 2023, 4:29 PM

sameerds marked 3 inline comments as done.Jul 18 2023, 8:41 PM

sameerds added inline comments.

llvm/lib/Analysis/CodeMetrics.cpp
114–138	Does that mean this change is good to go?

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

CodeMetrics.h

9 lines

Transforms/

Utils/

UnrollLoop.h

22 lines

lib/

Analysis/

CodeMetrics.cpp

62 lines

Frontend/

OpenMP/

OMPIRBuilder.cpp

8 lines

Transforms/

Scalar/

LoopUnrollAndJamPass.cpp

9 lines

LoopUnrollPass.cpp

70 lines

Utils/

LoopUnroll.cpp

71 lines

LoopUnrollRuntime.cpp

15 lines

test/

Transforms/

LoopUnroll/

convergent.controlled.ll

562 lines

Diff 541598

llvm/include/llvm/Analysis/CodeMetrics.h

Show All 14 Lines
#define LLVM_ANALYSIS_CODEMETRICS_H		#define LLVM_ANALYSIS_CODEMETRICS_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/Support/InstructionCost.h"		#include "llvm/Support/InstructionCost.h"

namespace llvm {		namespace llvm {
class AssumptionCache;		class AssumptionCache;
class BasicBlock;		class BasicBlock;
		class Instruction;
class Loop;		class Loop;
class Function;		class Function;
template <class T> class SmallPtrSetImpl;		template <class T> class SmallPtrSetImpl;
class TargetTransformInfo;		class TargetTransformInfo;
class Value;		class Value;

/// Utility to calculate the size and a few similar metrics for a set		/// Utility to calculate the size and a few similar metrics for a set
/// of basic blocks.		/// of basic blocks.
Show All 9 Lines	struct CodeMetrics {
///		///
/// True if this function contains one or more indirect branches, or it contains		/// True if this function contains one or more indirect branches, or it contains
/// one or more 'noduplicate' instructions.		/// one or more 'noduplicate' instructions.
bool notDuplicatable = false;		bool notDuplicatable = false;

/// True if this function contains a call to a convergent function.		/// True if this function contains a call to a convergent function.
bool convergent = false;		bool convergent = false;

		/// True if the code contains an uncontrolled convergent operation.
		bool convergentUncontrolled = false;

/// True if this function calls alloca (in the C sense).		/// True if this function calls alloca (in the C sense).
bool usesDynamicAlloca = false;		bool usesDynamicAlloca = false;

/// Code size cost of the analyzed blocks.		/// Code size cost of the analyzed blocks.
InstructionCost NumInsts = 0;		InstructionCost NumInsts = 0;

/// Number of analyzed blocks.		/// Number of analyzed blocks.
unsigned NumBlocks = false;		unsigned NumBlocks = false;

		/// Keeps track of loop heart intrinsics and their convergencectrl token use.
		std::vector<std::pair<const Instruction , Value >> convergenceHearts;

/// Keeps track of basic block code size estimates.		/// Keeps track of basic block code size estimates.
DenseMap<const BasicBlock *, InstructionCost> NumBBInsts;		DenseMap<const BasicBlock *, InstructionCost> NumBBInsts;

/// Keep track of the number of calls to 'big' functions.		/// Keep track of the number of calls to 'big' functions.
unsigned NumCalls = false;		unsigned NumCalls = false;

/// The number of calls to internal functions with a single caller.		/// The number of calls to internal functions with a single caller.
///		///
/// These are likely targets for future inlining, likely exposed by		/// These are likely targets for future inlining, likely exposed by
/// interleaved devirtualization.		/// interleaved devirtualization.
unsigned NumInlineCandidates = 0;		unsigned NumInlineCandidates = 0;

/// How many instructions produce vector values.		/// How many instructions produce vector values.
///		///
/// The inliner is more aggressive with inlining vector kernels.		/// The inliner is more aggressive with inlining vector kernels.
unsigned NumVectorInsts = 0;		unsigned NumVectorInsts = 0;

/// How many 'ret' instructions the blocks contain.		/// How many 'ret' instructions the blocks contain.
unsigned NumRets = 0;		unsigned NumRets = 0;

/// Add information about a block to the current state.		/// Add information about a block to the current state.
void analyzeBasicBlock(const BasicBlock *BB, const TargetTransformInfo &TTI,		void analyzeBasicBlock(const BasicBlock *BB, const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value *> &EphValues,		const SmallPtrSetImpl<const Value *> &EphValues,
bool PrepareForLTO = false);		bool PrepareForLTO = false, const Loop *L = nullptr);

/// Collect a loop's ephemeral values (those used only by an assume		/// Collect a loop's ephemeral values (those used only by an assume
/// or similar intrinsics in the loop).		/// or similar intrinsics in the loop).
static void collectEphemeralValues(const Loop L, AssumptionCache AC,		static void collectEphemeralValues(const Loop L, AssumptionCache AC,
SmallPtrSetImpl<const Value *> &EphValues);		SmallPtrSetImpl<const Value *> &EphValues);

/// Collect a functions's ephemeral values (those used only by an		/// Collect a functions's ephemeral values (those used only by an
/// assume or similar intrinsics in the function).		/// assume or similar intrinsics in the function).
static void collectEphemeralValues(const Function L, AssumptionCache AC,		static void collectEphemeralValues(const Function L, AssumptionCache AC,
SmallPtrSetImpl<const Value *> &EphValues);		SmallPtrSetImpl<const Value *> &EphValues);
};		};

}		}

#endif		#endif

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

struct UnrollLoopOptions {		struct UnrollLoopOptions {
unsigned Count;		unsigned Count;
bool Force;		bool Force;
bool Runtime;		bool Runtime;
bool AllowExpensiveTripCount;		bool AllowExpensiveTripCount;
bool UnrollRemainder;		bool UnrollRemainder;
bool ForgetAllSCEV;		bool ForgetAllSCEV;
		const Instruction *Heart = nullptr;
};		};

LoopUnrollResult UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,		LoopUnrollResult UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
ScalarEvolution SE, DominatorTree DT,		ScalarEvolution SE, DominatorTree DT,
AssumptionCache *AC,		AssumptionCache *AC,
const llvm::TargetTransformInfo *TTI,		const llvm::TargetTransformInfo *TTI,
OptimizationRemarkEmitter *ORE, bool PreserveLCSSA,		OptimizationRemarkEmitter *ORE, bool PreserveLCSSA,
Loop **RemainderLoop = nullptr);		Loop **RemainderLoop = nullptr);
Show All 38 Lines	TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
Loop *L, ScalarEvolution &SE, const TargetTransformInfo &TTI,		Loop *L, ScalarEvolution &SE, const TargetTransformInfo &TTI,
BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,		BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,
llvm::OptimizationRemarkEmitter &ORE, int OptLevel,		llvm::OptimizationRemarkEmitter &ORE, int OptLevel,
std::optional<unsigned> UserThreshold, std::optional<unsigned> UserCount,		std::optional<unsigned> UserThreshold, std::optional<unsigned> UserCount,
std::optional<bool> UserAllowPartial, std::optional<bool> UserRuntime,		std::optional<bool> UserAllowPartial, std::optional<bool> UserRuntime,
std::optional<bool> UserUpperBound,		std::optional<bool> UserUpperBound,
std::optional<unsigned> UserFullUnrollMaxCount);		std::optional<unsigned> UserFullUnrollMaxCount);

InstructionCost ApproximateLoopSize(const Loop *L, unsigned &NumCalls,		enum class LoopConvergenceKind {
bool &NotDuplicatable, bool &Convergent, const TargetTransformInfo &TTI,		// No convergent operations at all.
const SmallPtrSetImpl<const Value *> &EphValues, unsigned BEInsns);		None,

		// All convergent operations are controlled and anchored inside the loop.
		AnchoredInLoop,

		// Some convergent operations, unrolling is possible subject to constraints
		// (no remainder loop).
		Some,
		};

		InstructionCost
		ApproximateLoopSize(const Loop *L, unsigned &NumCalls, bool &NotDuplicatable,
		LoopConvergenceKind &Convergent, const Instruction *&Heart,
		const TargetTransformInfo &TTI,
		const SmallPtrSetImpl<const Value *> &EphValues,
		unsigned BEInsns);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_UTILS_UNROLLLOOP_H		#endif // LLVM_TRANSFORMS_UTILS_UNROLLLOOP_H

llvm/lib/Analysis/CodeMetrics.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/InstructionCost.h"		#include "llvm/Support/InstructionCost.h"

#define DEBUG_TYPE "code-metrics"		#define DEBUG_TYPE "code-metrics"

using namespace llvm;		using namespace llvm;

static void		static void
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	assert(I->getParent()->getParent() == F &&
"Found assumption for the wrong function!");		"Found assumption for the wrong function!");

if (EphValues.insert(I).second)		if (EphValues.insert(I).second)
appendSpeculatableOperands(I, Visited, Worklist);		appendSpeculatableOperands(I, Visited, Worklist);
}		}

completeEphemeralValues(Visited, Worklist, EphValues);		completeEphemeralValues(Visited, Worklist, EphValues);
}		}

		static bool isConvergenceCtrlIntr(const Instruction &I) {
		auto *Intrinsic = dyn_cast<IntrinsicInst>(&I);
		if (!Intrinsic)
		return false;
		switch (Intrinsic->getIntrinsicID()) {
		case Intrinsic::experimental_convergence_entry:
		case Intrinsic::experimental_convergence_anchor:
		case Intrinsic::experimental_convergence_loop:
		return true;
		}
		return false;
		}

		static bool isUsedOutsideOfLoop(const Instruction &I, const Loop &L) {
		for (const auto *U : I.users()) {
		if (auto *I = dyn_cast<Instruction>(U)) {
		if (!L.contains(I->getParent()))
		return true;
		}
		}
		return false;
		}

/// Fill in the current structure with information gleaned from the specified		/// Fill in the current structure with information gleaned from the specified
		arsenmUnsubmitted Done Reply Inline Actions It might be worth introducing a ConvergenceControlInst subclass of IntrinsicInst arsenm: It might be worth introducing a ConvergenceControlInst subclass of IntrinsicInst
		sameerdsAuthorUnsubmitted Done Reply Inline Actions I suppose so, but it seems too early to attempt for an experimental feature! sameerds: I suppose so, but it seems too early to attempt for an experimental feature!
		arsenmUnsubmitted Not Done Reply Inline Actions Part of the experiment is how inconvenient it is, and also there's no better time to have the convenient code than when implementing it. We have ConstrainedFPIntrinsic for "experimental" features too. It's not a lot of work to add (not that it should be part of this patch) arsenm: Part of the experiment is how inconvenient it is, and also there's no better time to have the…
		sameerdsAuthorUnsubmitted Done Reply Inline Actions Does that mean this change is good to go? sameerds: Does that mean this change is good to go?
/// block.		/// block.
void CodeMetrics::analyzeBasicBlock(		void CodeMetrics::analyzeBasicBlock(
const BasicBlock *BB, const TargetTransformInfo &TTI,		const BasicBlock *BB, const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value *> &EphValues, bool PrepareForLTO) {		const SmallPtrSetImpl<const Value *> &EphValues, bool PrepareForLTO,
		const Loop *L) {
++NumBlocks;		++NumBlocks;
InstructionCost NumInstsBeforeThisBB = NumInsts;		InstructionCost NumInstsBeforeThisBB = NumInsts;
for (const Instruction &I : *BB) {		for (const Instruction &I : *BB) {
// Skip ephemeral values.		// Skip ephemeral values.
if (EphValues.count(&I))		if (EphValues.count(&I))
continue;		continue;

// Special handling for calls.		// Special handling for calls.
Show All 31 Lines	for (const Instruction &I : *BB) {
if (const AllocaInst *AI = dyn_cast<AllocaInst>(&I)) {		if (const AllocaInst *AI = dyn_cast<AllocaInst>(&I)) {
if (!AI->isStaticAlloca())		if (!AI->isStaticAlloca())
this->usesDynamicAlloca = true;		this->usesDynamicAlloca = true;
}		}

if (isa<ExtractElementInst>(I) \|\| I.getType()->isVectorTy())		if (isa<ExtractElementInst>(I) \|\| I.getType()->isVectorTy())
++NumVectorInsts;		++NumVectorInsts;

if (I.getType()->isTokenTy() && I.isUsedOutsideOfBlock(BB))		if (I.getType()->isTokenTy()) {
notDuplicatable = true;		if (L && isConvergenceCtrlIntr(I)) {
		notDuplicatable = isUsedOutsideOfLoop(I, *L);
		} else {
		notDuplicatable = I.isUsedOutsideOfBlock(BB);
		}
		}

if (const CallInst *CI = dyn_cast<CallInst>(&I)) {		if (const CallBase *CB = dyn_cast<CallBase>(&I)) {
if (CI->cannotDuplicate())		if (CB->cannotDuplicate())
notDuplicatable = true;		notDuplicatable = true;
if (CI->isConvergent())		if (CB->isConvergent()) {
convergent = true;		convergent = true;
}

if (const InvokeInst *InvI = dyn_cast<InvokeInst>(&I))		auto *intrinsicInst = dyn_cast<IntrinsicInst>(CB);
if (InvI->cannotDuplicate())		auto control = CB->getOperandBundle(LLVMContext::OB_convergencectrl);
notDuplicatable = true;		if (intrinsicInst && intrinsicInst->getIntrinsicID() ==
		Intrinsic::experimental_convergence_loop) {
		assert(control &&
		"invalid IR: loop heart without convergencectrl bundle");
		Value *token = control->Inputs[0].get();
		convergenceHearts.emplace_back(intrinsicInst, token);
		} else if (!control &&
		(!intrinsicInst \|\|
		intrinsicInst->getIntrinsicID() !=
		Intrinsic::experimental_convergence_anchor)) {
		convergentUncontrolled = true;
		}
		}
		}

NumInsts += TTI.getInstructionCost(&I, TargetTransformInfo::TCK_CodeSize);		NumInsts += TTI.getInstructionCost(&I, TargetTransformInfo::TCK_CodeSize);
}		}

if (isa<ReturnInst>(BB->getTerminator()))		if (isa<ReturnInst>(BB->getTerminator()))
++NumRets;		++NumRets;

// We never want to inline functions that contain an indirectbr. This is		// We never want to inline functions that contain an indirectbr. This is
Show All 16 Lines

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

Show First 20 Lines • Show All 3,425 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
if (Alloca->getParent() == &F->getEntryBlock())		if (Alloca->getParent() == &F->getEntryBlock())
EphValues.insert(&I);		EphValues.insert(&I);
}		}
}		}
}		}

unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
bool NotDuplicatable;		bool NotDuplicatable;
bool Convergent;		LoopConvergenceKind Convergent;
		const Instruction *Heart;
InstructionCost LoopSizeIC =		InstructionCost LoopSizeIC =
ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,		ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,
TTI, EphValues, UP.BEInsns);		Heart, TTI, EphValues, UP.BEInsns);
LLVM_DEBUG(dbgs() << "Estimated loop size is " << LoopSizeIC << "\n");		LLVM_DEBUG(dbgs() << "Estimated loop size is " << LoopSizeIC << "\n");

// Loop is not unrollable if the loop contains certain instructions.		// Loop is not unrollable if the loop contains certain instructions.
if (NotDuplicatable \|\| Convergent \|\| !LoopSizeIC.isValid()) {		if (NotDuplicatable \|\| !LoopSizeIC.isValid() \|\|
		Convergent != LoopConvergenceKind::None) {
LLVM_DEBUG(dbgs() << "Loop not considered unrollable\n");		LLVM_DEBUG(dbgs() << "Loop not considered unrollable\n");
return 1;		return 1;
}		}
unsigned LoopSize = *LoopSizeIC.getValue();		unsigned LoopSize = *LoopSizeIC.getValue();

// TODO: Determine trip count of \p CLI if constant, computeUnrollCount might		// TODO: Determine trip count of \p CLI if constant, computeUnrollCount might
// be able to use it.		// be able to use it.
int TripCount = 0;		int TripCount = 0;
▲ Show 20 Lines • Show All 2,703 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,
if (!isSafeToUnrollAndJam(L, SE, DT, DI, *LI)) {		if (!isSafeToUnrollAndJam(L, SE, DT, DI, *LI)) {
LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");		LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

// Approximate the loop size and collect useful info		// Approximate the loop size and collect useful info
unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
bool NotDuplicatable;		bool NotDuplicatable;
bool Convergent;
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(L, &AC, EphValues);		CodeMetrics::collectEphemeralValues(L, &AC, EphValues);
Loop *SubLoop = L->getSubLoops()[0];		Loop *SubLoop = L->getSubLoops()[0];
		LoopConvergenceKind Convergent;
		const Instruction *Heart;
InstructionCost InnerLoopSizeIC =		InstructionCost InnerLoopSizeIC =
ApproximateLoopSize(SubLoop, NumInlineCandidates, NotDuplicatable,		ApproximateLoopSize(SubLoop, NumInlineCandidates, NotDuplicatable,
Convergent, TTI, EphValues, UP.BEInsns);		Convergent, Heart, TTI, EphValues, UP.BEInsns);
InstructionCost OuterLoopSizeIC =		InstructionCost OuterLoopSizeIC =
ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,		ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,
TTI, EphValues, UP.BEInsns);		Heart, TTI, EphValues, UP.BEInsns);
LLVM_DEBUG(dbgs() << " Outer Loop Size: " << OuterLoopSizeIC << "\n");		LLVM_DEBUG(dbgs() << " Outer Loop Size: " << OuterLoopSizeIC << "\n");
LLVM_DEBUG(dbgs() << " Inner Loop Size: " << InnerLoopSizeIC << "\n");		LLVM_DEBUG(dbgs() << " Inner Loop Size: " << InnerLoopSizeIC << "\n");

if (!InnerLoopSizeIC.isValid() \|\| !OuterLoopSizeIC.isValid()) {		if (!InnerLoopSizeIC.isValid() \|\| !OuterLoopSizeIC.isValid()) {
LLVM_DEBUG(dbgs() << " Not unrolling loop which contains instructions"		LLVM_DEBUG(dbgs() << " Not unrolling loop which contains instructions"
<< " with invalid cost.\n");		<< " with invalid cost.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
unsigned InnerLoopSize = *InnerLoopSizeIC.getValue();		unsigned InnerLoopSize = *InnerLoopSizeIC.getValue();
unsigned OuterLoopSize = *OuterLoopSizeIC.getValue();		unsigned OuterLoopSize = *OuterLoopSizeIC.getValue();

if (NotDuplicatable) {		if (NotDuplicatable) {
LLVM_DEBUG(dbgs() << " Not unrolling loop which contains non-duplicatable "		LLVM_DEBUG(dbgs() << " Not unrolling loop which contains non-duplicatable "
"instructions.\n");		"instructions.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
if (NumInlineCandidates != 0) {		if (NumInlineCandidates != 0) {
LLVM_DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");		LLVM_DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
if (Convergent) {		if (Convergent != LoopConvergenceKind::None) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << " Not unrolling loop with convergent instructions.\n");		dbgs() << " Not unrolling loop with convergent instructions.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

// Save original loop IDs for after the transformation.		// Save original loop IDs for after the transformation.
MDNode *OrigOuterLoopID = L->getLoopID();		MDNode *OrigOuterLoopID = L->getLoopID();
MDNode *OrigSubLoopID = SubLoop->getLoopID();		MDNode *OrigSubLoopID = SubLoop->getLoopID();
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 658 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Analysis finished:\n"
<< "UnrolledCost: " << UnrolledCost << ", "		<< "UnrolledCost: " << UnrolledCost << ", "
<< "RolledDynamicCost: " << RolledDynamicCost << "\n");		<< "RolledDynamicCost: " << RolledDynamicCost << "\n");
return {{unsigned(*UnrolledCost.getValue()),		return {{unsigned(*UnrolledCost.getValue()),
unsigned(*RolledDynamicCost.getValue())}};		unsigned(*RolledDynamicCost.getValue())}};
}		}

/// ApproximateLoopSize - Approximate the size of the loop.		/// ApproximateLoopSize - Approximate the size of the loop.
InstructionCost llvm::ApproximateLoopSize(		InstructionCost llvm::ApproximateLoopSize(
const Loop *L, unsigned &NumCalls, bool &NotDuplicatable, bool &Convergent,		const Loop *L, unsigned &NumCalls, bool &NotDuplicatable,
		LoopConvergenceKind &Convergent, const Instruction *&Heart,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value *> &EphValues, unsigned BEInsns) {		const SmallPtrSetImpl<const Value *> &EphValues, unsigned BEInsns) {
CodeMetrics Metrics;		CodeMetrics Metrics;
for (BasicBlock *BB : L->blocks())		bool convergenceControlledByOutside = false;
Metrics.analyzeBasicBlock(BB, TTI, EphValues);
		Heart = nullptr;

		for (BasicBlock *BB : L->blocks()) {
		Metrics.analyzeBasicBlock(BB, TTI, EphValues, /* PrepareForLTO= */ false,
		L);

		for (const auto &heart : Metrics.convergenceHearts) {
		BasicBlock *defBlock = cast<Instruction>(heart.second)->getParent();
		if (!L->contains(defBlock)) {
		convergenceControlledByOutside = true;
		assert(!Heart && "invalid IR: loop has multiple relevant hearts");
		Heart = heart.first;
		}
		}
		Metrics.convergenceHearts.clear();
		}
NumCalls = Metrics.NumInlineCandidates;		NumCalls = Metrics.NumInlineCandidates;
NotDuplicatable = Metrics.notDuplicatable;		NotDuplicatable = Metrics.notDuplicatable;
Convergent = Metrics.convergent;
		if (Metrics.convergentUncontrolled \|\| convergenceControlledByOutside)
		Convergent = LoopConvergenceKind::Some;
		else if (Metrics.convergent)
		Convergent = LoopConvergenceKind::AnchoredInLoop;
		else
		Convergent = LoopConvergenceKind::None;

InstructionCost LoopSize = Metrics.NumInsts;		InstructionCost LoopSize = Metrics.NumInsts;

// Don't allow an estimate of size zero. This would allows unrolling of loops		// Don't allow an estimate of size zero. This would allows unrolling of loops
// with huge iteration counts, which is a compile time problem even if it's		// with huge iteration counts, which is a compile time problem even if it's
// not a problem for code quality. Also, the code using this size may assume		// not a problem for code quality. Also, the code using this size may assume
// that each loop has at least three instructions (likely a conditional		// that each loop has at least three instructions (likely a conditional
// branch, a comparison feeding that branch, and some kind of loop increment		// branch, a comparison feeding that branch, and some kind of loop increment
▲ Show 20 Lines • Show All 490 Lines • ▼ Show 20 Lines	tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI, ScalarEvolution &SE,
// When automatic unrolling is disabled, do not unroll unless overridden for		// When automatic unrolling is disabled, do not unroll unless overridden for
// this loop.		// this loop.
if (OnlyWhenForced && !(TM & TM_Enable))		if (OnlyWhenForced && !(TM & TM_Enable))
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

bool OptForSize = L->getHeader()->getParent()->hasOptSize();		bool OptForSize = L->getHeader()->getParent()->hasOptSize();
unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
bool NotDuplicatable;		bool NotDuplicatable;
bool Convergent;
TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(		TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(
L, SE, TTI, BFI, PSI, ORE, OptLevel, ProvidedThreshold, ProvidedCount,		L, SE, TTI, BFI, PSI, ORE, OptLevel, ProvidedThreshold, ProvidedCount,
ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound,		ProvidedAllowPartial, ProvidedRuntime, ProvidedUpperBound,
ProvidedFullUnrollMaxCount);		ProvidedFullUnrollMaxCount);
TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences(		TargetTransformInfo::PeelingPreferences PP = gatherPeelingPreferences(
L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, true);		L, SE, TTI, ProvidedAllowPeeling, ProvidedAllowProfileBasedPeeling, true);

// Exit early if unrolling is disabled. For OptForSize, we pick the loop size		// Exit early if unrolling is disabled. For OptForSize, we pick the loop size
// as threshold later on.		// as threshold later on.
if (UP.Threshold == 0 && (!UP.Partial \|\| UP.PartialThreshold == 0) &&		if (UP.Threshold == 0 && (!UP.Partial \|\| UP.PartialThreshold == 0) &&
!OptForSize)		!OptForSize)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(L, &AC, EphValues);		CodeMetrics::collectEphemeralValues(L, &AC, EphValues);

		LoopConvergenceKind Convergent;
		const Instruction *Heart;
InstructionCost LoopSizeIC =		InstructionCost LoopSizeIC =
ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,		ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,
TTI, EphValues, UP.BEInsns);		Heart, TTI, EphValues, UP.BEInsns);
LLVM_DEBUG(dbgs() << " Loop Size = " << LoopSizeIC << "\n");		LLVM_DEBUG(dbgs() << " Loop Size = " << LoopSizeIC << "\n");

if (!LoopSizeIC.isValid()) {		if (!LoopSizeIC.isValid()) {
LLVM_DEBUG(dbgs() << " Not unrolling loop which contains instructions"		LLVM_DEBUG(dbgs() << " Not unrolling loop which contains instructions"
<< " with invalid cost.\n");		<< " with invalid cost.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
unsigned LoopSize = *LoopSizeIC.getValue();		unsigned LoopSize = *LoopSizeIC.getValue();
Show All 39 Lines	if (ExitingBlock)
TripMultiple = SE.getSmallConstantTripMultiple(L, ExitingBlock);		TripMultiple = SE.getSmallConstantTripMultiple(L, ExitingBlock);
}		}

// If the loop contains a convergent operation, the prelude we'd add		// If the loop contains a convergent operation, the prelude we'd add
// to do the first few instructions before we hit the unrolled loop		// to do the first few instructions before we hit the unrolled loop
// is unsafe -- it adds a control-flow dependency to the convergent		// is unsafe -- it adds a control-flow dependency to the convergent
// operation. Therefore restrict remainder loop (try unrolling without).		// operation. Therefore restrict remainder loop (try unrolling without).
//		//
// TODO: This is quite conservative. In practice, convergent_op()		// TODO: This is still somewhat conservative, as we could allow the remainder
// is likely to be called unconditionally in the loop. In this		// if the trip count is uniform (and we don't have an unnatural heart).
// case, the program would be ill-formed (on most architectures)		bool ConvergentAllowsRuntime = true;
// unless n were the same on all threads in a thread group.		switch (Convergent) {
// Assuming n is the same on all threads, any kind of unrolling is		case LoopConvergenceKind::None:
// safe. But currently llvm's notion of convergence isn't powerful		case LoopConvergenceKind::AnchoredInLoop:
// enough to express this.		break; // no convergence-related restrictions
if (Convergent)		case LoopConvergenceKind::Some:
UP.AllowRemainder = false;		UP.AllowRemainder = false;
		ConvergentAllowsRuntime = false;
		break;
		}

// Try to find the trip count upper bound if we cannot find the exact trip		// Try to find the trip count upper bound if we cannot find the exact trip
// count.		// count.
unsigned MaxTripCount = 0;		unsigned MaxTripCount = 0;
bool MaxOrZero = false;		bool MaxOrZero = false;
if (!TripCount) {		if (!TripCount) {
MaxTripCount = SE.getSmallConstantMaxTripCount(L);		MaxTripCount = SE.getSmallConstantMaxTripCount(L);
MaxOrZero = SE.isBackedgeTakenCountMaxOrZero(L);		MaxOrZero = SE.isBackedgeTakenCountMaxOrZero(L);
}		}

// computeUnrollCount() decides whether it is beneficial to use upper bound to		// computeUnrollCount() decides whether it is beneficial to use upper bound to
// fully unroll the loop.		// fully unroll the loop.
bool UseUpperBound = false;		bool UseUpperBound = false;
bool IsCountSetExplicitly = computeUnrollCount(		bool IsCountSetExplicitly = computeUnrollCount(
L, TTI, DT, LI, &AC, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero,		L, TTI, DT, LI, &AC, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero,
TripMultiple, LoopSize, UP, PP, UseUpperBound);		TripMultiple, LoopSize, UP, PP, UseUpperBound);
if (!UP.Count)		if (!UP.Count)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

		UP.Runtime &= ConvergentAllowsRuntime;

if (PP.PeelCount) {		if (PP.PeelCount) {
assert(UP.Count == 1 && "Cannot perform peel and unroll in the same step");		assert(UP.Count == 1 && "Cannot perform peel and unroll in the same step");
LLVM_DEBUG(dbgs() << "PEELING loop %" << L->getHeader()->getName()		LLVM_DEBUG(dbgs() << "PEELING loop %" << L->getHeader()->getName()
<< " with iteration count " << PP.PeelCount << "!\n");		<< " with iteration count " << PP.PeelCount << "!\n");
ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Peeled", L->getStartLoc(),		return OptimizationRemark(DEBUG_TYPE, "Peeled", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< " peeled loop by " << ore::NV("PeelCount", PP.PeelCount)		<< " peeled loop by " << ore::NV("PeelCount", PP.PeelCount)
Show All 26 Lines	tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI, ScalarEvolution &SE,
// computeUnrollCount().		// computeUnrollCount().
UP.Runtime &= TripCount == 0 && TripMultiple % UP.Count != 0;		UP.Runtime &= TripCount == 0 && TripMultiple % UP.Count != 0;

// Save loop properties before it is transformed.		// Save loop properties before it is transformed.
MDNode *OrigLoopID = L->getLoopID();		MDNode *OrigLoopID = L->getLoopID();

// Unroll the loop.		// Unroll the loop.
Loop *RemainderLoop = nullptr;		Loop *RemainderLoop = nullptr;
		UnrollLoopOptions ULO;
		ULO.Count = UP.Count;
		ULO.Force = UP.Force;
		ULO.AllowExpensiveTripCount = UP.AllowExpensiveTripCount;
		ULO.UnrollRemainder = UP.UnrollRemainder;
		ULO.Runtime = UP.Runtime;
		ULO.ForgetAllSCEV = ForgetAllSCEV;
		ULO.Heart = Heart;
LoopUnrollResult UnrollResult = UnrollLoop(		LoopUnrollResult UnrollResult = UnrollLoop(
L,		L, ULO, LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop);
{UP.Count, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,
UP.UnrollRemainder, ForgetAllSCEV},
LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop);
if (UnrollResult == LoopUnrollResult::Unmodified)		if (UnrollResult == LoopUnrollResult::Unmodified)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

if (RemainderLoop) {		if (RemainderLoop) {
std::optional<MDNode *> RemainderLoopID =		std::optional<MDNode *> RemainderLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,
LLVMLoopUnrollFollowupRemainder});		LLVMLoopUnrollFollowupRemainder});
if (RemainderLoopID)		if (RemainderLoopID)
▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : L->getBlocks()) {
}		}
// We can't do recursive deletion until we're done iterating, as we might		// We can't do recursive deletion until we're done iterating, as we might
// have a phi which (potentially indirectly) uses instructions later in		// have a phi which (potentially indirectly) uses instructions later in
// the block we're iterating through.		// the block we're iterating through.
RecursivelyDeleteTriviallyDeadInstructions(DeadInsts);		RecursivelyDeleteTriviallyDeadInstructions(DeadInsts);
}		}
}		}

		// Loops containing convergent instructions that are uncontrolled or controlled
		// from outside the loop must have a count that divides their TripMultiple.
		LLVM_ATTRIBUTE_USED
		static bool canHaveUnrollRemainder(const Loop *L) {
		// If the first call in the loop header the loop intrinsic, then the loop is
		// controlled from outside.
		auto *H = L->getHeader();
		for (auto &I : *H) {
		if (isa<CallBase>(I)) {
		if (auto *intrinsicInst = dyn_cast<IntrinsicInst>(&I)) {
		if (intrinsicInst->getIntrinsicID() ==
		Intrinsic::experimental_convergence_loop)
		return false;
		}
		break;
		}
		}

		// Check for uncontrolled convergent operations.
		for (auto &BB : L->blocks()) {
		for (auto &I : *BB) {
		if (auto *CB = dyn_cast<CallBase>(&I)) {
		if (auto *intrinsicInst = dyn_cast<IntrinsicInst>(CB)) {
		if (intrinsicInst->getIntrinsicID() ==
		Intrinsic::experimental_convergence_anchor) {
		// A call to anchor does not count as uncontrolled. We don't need to
		// check further because there can be no uncontrolled operations in
		// this function.
		return true;
		}
		if (CB->isConvergent() && CB->countOperandBundlesOfType(
		LLVMContext::OB_convergencectrl) == 0) {
		return false;
		}
		}
		}
		}
		}
		return true;
		}

/// Unroll the given loop by Count. The loop must be in LCSSA form. Unrolling		/// Unroll the given loop by Count. The loop must be in LCSSA form. Unrolling
/// can only fail when the loop's latch block is not terminated by a conditional		/// can only fail when the loop's latch block is not terminated by a conditional
/// branch instruction. However, if the trip count (and multiple) are not known,		/// branch instruction. However, if the trip count (and multiple) are not known,
/// loop unrolling will mostly produce more code that is no faster.		/// loop unrolling will mostly produce more code that is no faster.
///		///
/// If Runtime is true then UnrollLoop will try to insert a prologue or		/// If Runtime is true then UnrollLoop will try to insert a prologue or
/// epilogue that ensures the latch has a trip multiple of Count. UnrollLoop		/// epilogue that ensures the latch has a trip multiple of Count. UnrollLoop
/// will not runtime-unroll the loop if computing the run-time trip count will		/// will not runtime-unroll the loop if computing the run-time trip count will
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
// unconditional branch in the unrolled loop in some cases.		// unconditional branch in the unrolled loop in some cases.
bool LatchIsExiting = L->isLoopExiting(LatchBlock);		bool LatchIsExiting = L->isLoopExiting(LatchBlock);
if (!LatchBI \|\| (LatchBI->isConditional() && !LatchIsExiting)) {		if (!LatchBI \|\| (LatchBI->isConditional() && !LatchIsExiting)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Can't unroll; a conditional latch must exit the loop");		dbgs() << "Can't unroll; a conditional latch must exit the loop");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

// Loops containing convergent instructions cannot use runtime unrolling,		assert((!ULO.Runtime \|\| canHaveUnrollRemainder(L)) &&
// as the prologue/epilogue may add additional control-dependencies to
// convergent operations.
LLVM_DEBUG(
{
bool HasConvergent = false;
for (auto &BB : L->blocks())
for (auto &I : *BB)
if (auto *CB = dyn_cast<CallBase>(&I))
HasConvergent \|= CB->isConvergent();
assert((!HasConvergent \|\| !ULO.Runtime) &&
"Can't runtime unroll if loop contains a convergent operation.");		"Can't runtime unroll if loop contains a convergent operation.");
});

bool EpilogProfitability =		bool EpilogProfitability =
UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog		UnrollRuntimeEpilog.getNumOccurrences() ? UnrollRuntimeEpilog
: isEpilogProfitable(L);		: isEpilogProfitable(L);

if (ULO.Runtime &&		if (ULO.Runtime &&
!UnrollRuntimeLoopRemainder(L, ULO.Count, ULO.AllowExpensiveTripCount,		!UnrollRuntimeLoopRemainder(L, ULO.Count, ULO.AllowExpensiveTripCount,
EpilogProfitability, ULO.UnrollRemainder,		EpilogProfitability, ULO.UnrollRemainder,
ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,		ULO.ForgetAllSCEV, LI, SE, DT, AC, TTI,
PreserveLCSSA, RemainderLoop)) {		PreserveLCSSA, RemainderLoop)) {
if (ULO.Force)		if (ULO.Force)
ULO.Runtime = false;		ULO.Runtime = false;
else {		else {
LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "		LLVM_DEBUG(dbgs() << "Won't unroll; remainder loop could not be "
"generated when assuming runtime trip count\n");		"generated when assuming runtime trip count\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
}		}

using namespace ore;		using namespace ore;
// Report the unrolling decision.		// Report the unrolling decision.
if (CompletelyUnroll) {		if (CompletelyUnroll) {
LLVM_DEBUG(dbgs() << "COMPLETELY UNROLLING loop %" << Header->getName()		LLVM_DEBUG(dbgs() << "COMPLETELY UNROLLING loop %" << Header->getName()
<< " with trip count " << ULO.Count << "!\n");		<< " with trip count " << ULO.Count << "!\n");
if (ORE)		if (ORE)
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "FullyUnrolled", L->getStartLoc(),		return OptimizationRemark(DEBUG_TYPE, "FullyUnrolled", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "completely unrolled loop with "		<< "completely unrolled loop with "
<< NV("UnrollCount", ULO.Count) << " iterations";		<< NV("UnrollCount", ULO.Count) << " iterations";
		arsenmUnsubmitted Done Reply Inline Actions This is a lot for one LLVM_DEBUG, move this all to a function? arsenm: This is a lot for one LLVM_DEBUG, move this all to a function?
		sameerdsAuthorUnsubmitted Done Reply Inline Actions Fixed by moving it out into a function. sameerds: Fixed by moving it out into a function.
});		});
} else {		} else {
LLVM_DEBUG(dbgs() << "UNROLLING loop %" << Header->getName() << " by "		LLVM_DEBUG(dbgs() << "UNROLLING loop %" << Header->getName() << " by "
<< ULO.Count);		<< ULO.Count);
if (ULO.Runtime)		if (ULO.Runtime)
LLVM_DEBUG(dbgs() << " with run-time trip count");		LLVM_DEBUG(dbgs() << " with run-time trip count");
LLVM_DEBUG(dbgs() << "!\n");		LLVM_DEBUG(dbgs() << "!\n");

▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
// loop, the associated metadata must be cloned for each iteration.		// loop, the associated metadata must be cloned for each iteration.
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;		SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes);		identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes);

// We place the unrolled iterations immediately after the original loop		// We place the unrolled iterations immediately after the original loop
// latch. This is a reasonable default placement if we don't have block		// latch. This is a reasonable default placement if we don't have block
// frequencies, and if we do, well the layout will be adjusted later.		// frequencies, and if we do, well the layout will be adjusted later.
auto BlockInsertPt = std::next(LatchBlock->getIterator());		auto BlockInsertPt = std::next(LatchBlock->getIterator());

		assert(ULO.Heart == nullptr \|\| ULO.Heart->getParent() == Header);

for (unsigned It = 1; It != ULO.Count; ++It) {		for (unsigned It = 1; It != ULO.Count; ++It) {
SmallVector<BasicBlock *, 8> NewBlocks;		SmallVector<BasicBlock *, 8> NewBlocks;
SmallDenseMap<const Loop , Loop , 4> NewLoops;		SmallDenseMap<const Loop , Loop , 4> NewLoops;
NewLoops[L] = L;		NewLoops[L] = L;

for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {		for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
ValueToValueMapTy VMap;		ValueToValueMapTy VMap;
BasicBlock New = CloneBasicBlock(BB, VMap, "." + Twine(It));		BasicBlock New = CloneBasicBlock(BB, VMap, "." + Twine(It));
Header->getParent()->insert(BlockInsertPt, New);		Header->getParent()->insert(BlockInsertPt, New);

assert((BB != Header \|\| LI->getLoopFor(BB) == L) &&		assert((BB != Header \|\| LI->getLoopFor(BB) == L) &&
"Header should not be in a sub-loop");		"Header should not be in a sub-loop");
// Tell LI about New.		// Tell LI about New.
const Loop OldLoop = addClonedBlockToLoopInfo(BB, New, LI, NewLoops);		const Loop OldLoop = addClonedBlockToLoopInfo(BB, New, LI, NewLoops);
if (OldLoop)		if (OldLoop)
LoopsToSimplify.insert(NewLoops[OldLoop]);		LoopsToSimplify.insert(NewLoops[OldLoop]);

if (*BB == Header)		if (*BB == Header) {
// Loop over all of the PHI nodes in the block, changing them to use		// Loop over all of the PHI nodes in the block, changing them to use
// the incoming values from the previous block.		// the incoming values from the previous block.
for (PHINode *OrigPHI : OrigPHINode) {		for (PHINode *OrigPHI : OrigPHINode) {
PHINode *NewPHI = cast<PHINode>(VMap[OrigPHI]);		PHINode *NewPHI = cast<PHINode>(VMap[OrigPHI]);
Value *InVal = NewPHI->getIncomingValueForBlock(LatchBlock);		Value *InVal = NewPHI->getIncomingValueForBlock(LatchBlock);
if (Instruction *InValI = dyn_cast<Instruction>(InVal))		if (Instruction *InValI = dyn_cast<Instruction>(InVal))
if (It > 1 && L->contains(InValI))		if (It > 1 && L->contains(InValI))
InVal = LastValueMap[InValI];		InVal = LastValueMap[InValI];
VMap[OrigPHI] = InVal;		VMap[OrigPHI] = InVal;
NewPHI->eraseFromParent();		NewPHI->eraseFromParent();
}		}

		// Eliminate copies of the loop heart intrinsic, if any.
		if (ULO.Heart) {
		auto it = VMap.find(ULO.Heart);
		assert(it != VMap.end());
		Instruction *heartCopy = cast<Instruction>(it->second);
		heartCopy->eraseFromParent();
		VMap.erase(it);
		}
		}

// Update our running map of newest clones		// Update our running map of newest clones
LastValueMap[*BB] = New;		LastValueMap[*BB] = New;
for (ValueToValueMapTy::iterator VI = VMap.begin(), VE = VMap.end();		for (ValueToValueMapTy::iterator VI = VMap.begin(), VE = VMap.end();
VI != VE; ++VI)		VI != VE; ++VI)
LastValueMap[VI->first] = VI->second;		LastValueMap[VI->first] = VI->second;

// Add phi entries for newly created values to all exit blocks.		// Add phi entries for newly created values to all exit blocks.
for (BasicBlock Succ : successors(BB)) {		for (BasicBlock Succ : successors(BB)) {
▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 987 Lines • ▼ Show 20 Lines	if (OtherExits.size() > 0) {
// preserve LoopSimplifyForm.		// preserve LoopSimplifyForm.
if (remainderLoop)		if (remainderLoop)
formDedicatedExitBlocks(remainderLoop, DT, LI, nullptr, PreserveLCSSA);		formDedicatedExitBlocks(remainderLoop, DT, LI, nullptr, PreserveLCSSA);
}		}

auto UnrollResult = LoopUnrollResult::Unmodified;		auto UnrollResult = LoopUnrollResult::Unmodified;
if (remainderLoop && UnrollRemainder) {		if (remainderLoop && UnrollRemainder) {
LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");		LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");
UnrollResult =		UnrollLoopOptions ULO;
UnrollLoop(remainderLoop,		ULO.Count = Count - 1;
{/Count/ Count - 1, /Force/ false, /Runtime/ false,		ULO.Force = false;
/AllowExpensiveTripCount/ false,		ULO.Runtime = false;
/UnrollRemainder/ false, ForgetAllSCEV},		ULO.AllowExpensiveTripCount = false;
LI, SE, DT, AC, TTI, /ORE/ nullptr, PreserveLCSSA);		ULO.UnrollRemainder = false;
		ULO.ForgetAllSCEV = ForgetAllSCEV;
		UnrollResult = UnrollLoop(remainderLoop, ULO, LI, SE, DT, AC, TTI,
		/ORE/ nullptr, PreserveLCSSA);
}		}

if (ResultLoop && UnrollResult != LoopUnrollResult::FullyUnrolled)		if (ResultLoop && UnrollResult != LoopUnrollResult::FullyUnrolled)
*ResultLoop = remainderLoop;		*ResultLoop = remainderLoop;
NumRuntimeUnrolled++;		NumRuntimeUnrolled++;
return true;		return true;
}		}

llvm/test/Transforms/LoopUnroll/convergent.controlled.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -passes=loop-unroll -unroll-runtime -unroll-allow-partial -S \| FileCheck %s

				declare void @f() convergent
				declare void @g()

				; Although this loop contains a convergent instruction, it should be
				; fully unrolled.
				define i32 @full_unroll() {
				; CHECK-LABEL: @full_unroll(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: br label [[L3:%.*]]
				; CHECK: l3:
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: br label [[A:%.*]]
				; CHECK: a:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_1:%.*]]
				; CHECK: a.1:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_2:%.*]]
				; CHECK: a.2:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				br label %l3

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %a ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 3
				br label %a

				a:
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				br i1 %exitcond, label %exit, label %l3

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction, but it should be partially
				; unrolled. The unroll count is the largest power of 2 that divides the
				; multiple -- 4, in this case.
				define i32 @runtime_unroll(i32 %n) {
				; CHECK-LABEL: @runtime_unroll(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: [[LOOP_CTL:%.]] = mul nsw i32 [[N:%.]], 12
				; CHECK-NEXT: br label [[L3:%.*]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC_3:%.]], [[A_3:%.]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: br label [[A:%.*]]
				; CHECK: a:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_1:%.*]]
				; CHECK: a.1:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_2:%.*]]
				; CHECK: a.2:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_3]]
				; CHECK: a.3:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[INC_3]] = add nsw i32 [[X_0]], 4
				; CHECK-NEXT: [[EXITCOND_3:%.*]] = icmp eq i32 [[INC_3]], [[LOOP_CTL]]
				; CHECK-NEXT: br i1 [[EXITCOND_3]], label [[EXIT:%.*]], label [[L3]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				%loop_ctl = mul nsw i32 %n, 12
				br label %l3

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %a ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				br label %a

				a:
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %loop_ctl
				br i1 %exitcond, label %exit, label %l3

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction, so its partial unroll
				; count must divide its trip multiple. This overrides its unroll
				; pragma -- we unroll exactly 8 times, even though 16 is requested.
				define i32 @pragma_unroll(i32 %n) {
				; CHECK-LABEL: @pragma_unroll(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: [[LOOP_CTL:%.]] = mul nsw i32 [[N:%.]], 24
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC_7:%.]], [[A_7:%.]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: br label [[A:%.*]]
				; CHECK: a:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_1:%.*]]
				; CHECK: a.1:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_2:%.*]]
				; CHECK: a.2:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_3:%.*]]
				; CHECK: a.3:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_4:%.*]]
				; CHECK: a.4:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_5:%.*]]
				; CHECK: a.5:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_6:%.*]]
				; CHECK: a.6:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: br label [[A_7]]
				; CHECK: a.7:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[INC_7]] = add nsw i32 [[X_0]], 8
				; CHECK-NEXT: [[EXITCOND_7:%.*]] = icmp eq i32 [[INC_7]], [[LOOP_CTL]]
				; CHECK-NEXT: br i1 [[EXITCOND_7]], label [[EXIT:%.*]], label [[L3]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				%loop_ctl = mul nsw i32 %n, 24
				br label %l3, !llvm.loop !0

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %a ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				br label %a

				a:
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %loop_ctl
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !0

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction. Since the pragma loop unroll
				; count 2 divides trip count 4. The loop unroll should respect the pragma.
				define void @pragma_unroll_divisible_trip_count() {
				; CHECK-LABEL: @pragma_unroll_divisible_trip_count(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC_1:%.*]], [[L3]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[INC_1]] = add nuw nsw i32 [[X_0]], 2
				; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp eq i32 [[INC_1]], 4
				; CHECK-NEXT: br i1 [[EXITCOND_1]], label [[EXIT:%.*]], label [[L3]], !llvm.loop [[LOOP6:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 4
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				ret void
				}

				; This loop contains a convergent instruction. Since the pragma loop unroll
				; count 2 divides trip multiple 2. The loop unroll should respect the pragma.
				define i32 @pragma_unroll_divisible_trip_multiple(i32 %n) {
				; CHECK-LABEL: @pragma_unroll_divisible_trip_multiple(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: [[LOOP_CTL:%.]] = mul nsw i32 [[N:%.]], 2
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC_1:%.*]], [[L3]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[INC_1]] = add nsw i32 [[X_0]], 2
				; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp eq i32 [[INC_1]], [[LOOP_CTL]]
				; CHECK-NEXT: br i1 [[EXITCOND_1]], label [[EXIT:%.*]], label [[L3]], !llvm.loop [[LOOP7:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				%loop_ctl = mul nsw i32 %n, 2
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %loop_ctl
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction. Since the pragma loop unroll
				; count 2 is unknown to divide runtime trip count, the loop is not unrolled
				; since remainder is forbidden for unrolling convergent loop.
				define i32 @pragma_unroll_indivisible_runtime_trip_count(i32 %n) {
				; CHECK-LABEL: @pragma_unroll_indivisible_runtime_trip_count(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[L3]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[INC]] = add nsw i32 [[X_0]], 1
				; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i32 [[INC]], [[N:%.]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[L3]], !llvm.loop [[LOOP4]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction. Since the pragma loop unroll
				; count 2 does not divide trip count 5, the loop is not unrolled by 2
				; since remainder is forbidden for unrolling convergent loop. Instead, the
				; loop gets fully unrolled.
				define i32 @pragma_unroll_indivisible_trip_count() {
				; CHECK-LABEL: @pragma_unroll_indivisible_trip_count(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ANCHOR:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l3:
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[ANCHOR]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 5
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				ret i32 0
				}

				; This loop contains a convergent instruction that is anchored inside the loop
				; itself. It is unrolled by 2 with remainder, as requested by the loop metadata.
				define i32 @pragma_unroll_with_remainder(i32 %n) {
				; CHECK-LABEL: @pragma_unroll_with_remainder(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = freeze i32 [[N:%.]]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[TMP0]], -1
				; CHECK-NEXT: [[XTRAITER:%.*]] = and i32 [[TMP0]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[TMP1]], 1
				; CHECK-NEXT: br i1 [[TMP2]], label [[EXIT_UNR_LCSSA:%.]], label [[ENTRY_NEW:%.]]
				; CHECK: entry.new:
				; CHECK-NEXT: [[UNROLL_ITER:%.*]] = sub i32 [[TMP0]], [[XTRAITER]]
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[INC_1:%.]], [[L3]] ]
				; CHECK-NEXT: [[NITER:%.]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[NITER_NEXT_1:%.]], [[L3]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: [[TOK_LOOP_1:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP_1]]) ]
				; CHECK-NEXT: [[INC_1]] = add nsw i32 [[X_0]], 2
				; CHECK-NEXT: [[NITER_NEXT_1]] = add i32 [[NITER]], 2
				; CHECK-NEXT: [[NITER_NCMP_1:%.*]] = icmp eq i32 [[NITER_NEXT_1]], [[UNROLL_ITER]]
				; CHECK-NEXT: br i1 [[NITER_NCMP_1]], label [[EXIT_UNR_LCSSA_LOOPEXIT:%.*]], label [[L3]], !llvm.loop [[LOOP8:![0-9]+]]
				; CHECK: exit.unr-lcssa.loopexit:
				; CHECK-NEXT: br label [[EXIT_UNR_LCSSA]]
				; CHECK: exit.unr-lcssa:
				; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp ne i32 [[XTRAITER]], 0
				; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[L3_EPIL_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: l3.epil.preheader:
				; CHECK-NEXT: br label [[L3_EPIL:%.*]]
				; CHECK: l3.epil:
				; CHECK-NEXT: [[TOK_LOOP_EPIL:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP_EPIL]]) ]
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				ret i32 0
				}

				; Don't unroll a loop that is extended by convergence controls.
				;
				; We could theoretically duplicate the extension part, but this is not
				; implemented.
				define i32 @extended_loop(i32 %n) {
				; CHECK-LABEL: @extended_loop(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[L3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[L3]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: [[INC]] = add nsw i32 [[X_0]], 1
				; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i32 [[INC]], [[N:%.]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[L3]], !llvm.loop [[LOOP4]]
				; CHECK: exit:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_LOOP]]) ]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %l3, !llvm.loop !1

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
				%tok.loop = call token @llvm.experimental.convergence.anchor()
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %exit, label %l3, !llvm.loop !1

				exit:
				call void @f() [ "convergencectrl"(token %tok.loop) ]
				ret i32 0
				}

				; Inner loop is extended beyond the outer loop. No unrolling possible.

				define i32 @extended_inner_loop_1(i32 %n, i1 %cond) {
				; CHECK-LABEL: @extended_inner_loop_1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[L3:%.*]]
				; CHECK: l3:
				; CHECK-NEXT: [[X_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[LATCH:%.]] ]
				; CHECK-NEXT: [[TOK_LOOP:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: [[INC]] = add nsw i32 [[X_0]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 4
				; CHECK-NEXT: br label [[L2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2:
				; CHECK-NEXT: [[TOK_L2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2]]) ]
				; CHECK-NEXT: br i1 [[COND:%.*]], label [[L2]], label [[LATCH]], !llvm.loop [[LOOP4]]
				; CHECK: latch:
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[L3]]
				; CHECK: exit:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2]]) ]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %l3

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %latch ]
				%tok.loop = call token @llvm.experimental.convergence.anchor()
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 4
				br label %l2, !llvm.loop !1

				l2:
				%tok.l2 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %tok.l2) ]
				br i1 %cond, label %l2, label %latch, !llvm.loop !1

				latch:
				br i1 %exitcond, label %exit, label %l3

				exit:
				call void @f() [ "convergencectrl"(token %tok.l2) ]
				ret i32 0
				}

				; Inner loop is extended inside the outer loop. Outer loop is unrolled.

				define i32 @extended_inner_loop_2(i32 %n, i1 %cond) {
				; CHECK-LABEL: @extended_inner_loop_2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[L3:%.*]]
				; CHECK: l3:
				; CHECK-NEXT: br label [[L2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2:
				; CHECK-NEXT: [[TOK_L2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2]]) ]
				; CHECK-NEXT: br i1 [[COND:%.]], label [[L2]], label [[LATCH:%.]], !llvm.loop [[LOOP4]]
				; CHECK: latch:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2]]) ]
				; CHECK-NEXT: br label [[L2_1:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.1:
				; CHECK-NEXT: [[TOK_L2_1:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_1]], label [[LATCH_1:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: latch.1:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1]]) ]
				; CHECK-NEXT: br label [[L2_2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.2:
				; CHECK-NEXT: [[TOK_L2_2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_2]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_2]], label [[LATCH_2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: latch.2:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_2]]) ]
				; CHECK-NEXT: br label [[L2_3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.3:
				; CHECK-NEXT: [[TOK_L2_3:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_3]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_3]], label [[LATCH_3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: latch.3:
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_3]]) ]
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %l3

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %latch ]
				%tok.loop = call token @llvm.experimental.convergence.anchor()
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 4
				br label %l2, !llvm.loop !1

				l2:
				%tok.l2 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %tok.l2) ]
				br i1 %cond, label %l2, label %latch, !llvm.loop !1

				latch:
				call void @f() [ "convergencectrl"(token %tok.l2) ]
				br i1 %exitcond, label %exit, label %l3

				exit:
				ret i32 0
				}

				; No extension. Both loops unrolled.

				define i32 @unroll_nest(i32 %n, i1 %cond) {
				; CHECK-LABEL: @unroll_nest(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[L3:%.*]]
				; CHECK: l3:
				; CHECK-NEXT: br label [[L2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2:
				; CHECK-NEXT: [[TOK_L2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2]]) ]
				; CHECK-NEXT: br i1 [[COND:%.]], label [[L2_1:%.]], label [[LATCH:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.1:
				; CHECK-NEXT: [[TOK_L2_1:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2]], label [[LATCH]], !llvm.loop [[LOOP9:![0-9]+]]
				; CHECK: latch:
				; CHECK-NEXT: br label [[L2_12:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.12:
				; CHECK-NEXT: [[TOK_L2_11:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_11]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_1_1:%.]], label [[LATCH_1:%.]], !llvm.loop [[LOOP4]]
				; CHECK: l2.1.1:
				; CHECK-NEXT: [[TOK_L2_1_1:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1_1]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_12]], label [[LATCH_1]], !llvm.loop [[LOOP9]]
				; CHECK: latch.1:
				; CHECK-NEXT: br label [[L2_2:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.2:
				; CHECK-NEXT: [[TOK_L2_2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_2]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_1_2:%.]], label [[LATCH_2:%.]], !llvm.loop [[LOOP4]]
				; CHECK: l2.1.2:
				; CHECK-NEXT: [[TOK_L2_1_2:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1_2]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_2]], label [[LATCH_2]], !llvm.loop [[LOOP9]]
				; CHECK: latch.2:
				; CHECK-NEXT: br label [[L2_3:%.*]], !llvm.loop [[LOOP4]]
				; CHECK: l2.3:
				; CHECK-NEXT: [[TOK_L2_3:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_3]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_1_3:%.]], label [[LATCH_3:%.]], !llvm.loop [[LOOP4]]
				; CHECK: l2.1.3:
				; CHECK-NEXT: [[TOK_L2_1_3:%.*]] = call token @llvm.experimental.convergence.anchor()
				; CHECK-NEXT: call void @f() [ "convergencectrl"(token [[TOK_L2_1_3]]) ]
				; CHECK-NEXT: br i1 [[COND]], label [[L2_3]], label [[LATCH_3]], !llvm.loop [[LOOP9]]
				; CHECK: latch.3:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				br label %l3

				l3:
				%x.0 = phi i32 [ 0, %entry ], [ %inc, %latch ]
				%tok.loop = call token @llvm.experimental.convergence.anchor()
				%inc = add nsw i32 %x.0, 1
				%exitcond = icmp eq i32 %inc, 4
				br label %l2, !llvm.loop !1

				l2:
				%tok.l2 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %tok.l2) ]
				br i1 %cond, label %l2, label %latch, !llvm.loop !1

				latch:
				br i1 %exitcond, label %exit, label %l3

				exit:
				ret i32 0
				}

				declare token @llvm.experimental.convergence.anchor()
				declare token @llvm.experimental.convergence.loop()

				!0 = !{!0, !{!"llvm.loop.unroll.count", i32 16}}
				!1 = !{!1, !{!"llvm.loop.unroll.count", i32 2}}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] adjust for new `convergent` semanticsNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541598

llvm/include/llvm/Analysis/CodeMetrics.h

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

llvm/lib/Analysis/CodeMetrics.cpp

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

llvm/lib/Transforms/Utils/LoopUnroll.cpp

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/test/Transforms/LoopUnroll/convergent.controlled.ll

[LoopUnroll] adjust for new `convergent` semantics
Needs ReviewPublic