This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Check for HardwareLoop Latch ExitBlock
ClosedPublic

Authored by samparker on Jun 14 2019, 8:37 AM.

Download Raw Diff

Details

Reviewers

hfinkel
markus
JonChesterfield
dmgreen
SjoerdMeijer

Commits

rG1bd3d00e7e5a: [CodeGen] Check for HardwareLoop Latch ExitBlock
rL363556: [CodeGen] Check for HardwareLoop Latch ExitBlock

Summary

The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge.

Diff Detail

Repository: rL LLVM

Event Timeline

samparker created this revision.Jun 14 2019, 8:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 8:37 AM

Herald added a subscriber: javed.absar. · View Herald Transcript

lebedev.ri resigned from this revision.Jun 14 2019, 4:13 PM

dmgreen added inline comments.Jun 15 2019, 1:14 AM

lib/CodeGen/HardwareLoops.cpp
247 ↗	(On Diff #204773)	Maybe "RequiresLoopExitingLatch" as a more descriptive name?
lib/Target/ARM/ARMTargetTransformInfo.cpp
706 ↗	(On Diff #204773)	Is it worth just checking for the LoopExitingLatch here, as opposed to adding a parameter for it? (Put another way, why would one be a parameter and the other be something for the backend to figure out?)

Cheers Dave. I've removed the ExitBlock check in the ARM backend. Also, currently we'd need CounterInReg == RequiresLoopLatchExit so I've just switched to querying CounterInReg instead of introducing a new parameter. If, later, there's another target which has a separate requirement, then we can re-add the option.

Am I correct in saying that this loop will find the first exiting block that dominates all the latches in the loop (with some other conditions like a loop invariant non zero exit count). Plus is now also a latch?

But the loop may have other exits and other latches?

The "other exits" sounds like it should be OK for arm low overhead loops. At least I can't think of a reason right now why they are not OK. For vector tail predicated loops I imagine that you would need to ensure that a "LCTP" is placed on the other exits, to ensure the tail predicate doesn't pollute extra instructions after the loop.

I'm not sure about the "other latches" part. Is that something that's OK? I imagine it doesn't come up very often.

Actually, looking at it, because we can compute the BETC, we will only have a single latch. Ignore me about that bit.

For scalar low overhead loops, this LGTM. Vector loops we are looking into later, as far as I understand.

This revision is now accepted and ready to land.Jun 17 2019, 4:57 AM

Closed by commit rL363556: [CodeGen] Check for HardwareLoop Latch ExitBlock (authored by sam_parker). · Explain WhyJun 17 2019, 6:36 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

10 lines

lib/

CodeGen/

HardwareLoops.cpp

16 lines

Target/

ARM/

ARMTargetTransformInfo.cpp

4 lines

test/

Transforms/

HardwareLoops/

ARM/

structure.ll

76 lines

unconditional-latch.ll

46 lines

Diff 205063

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	public:
};		};

/// Get target-customized preferences for the generic loop unrolling		/// Get target-customized preferences for the generic loop unrolling
/// transformation. The caller will initialize UP with the current		/// transformation. The caller will initialize UP with the current
/// target-independent defaults.		/// target-independent defaults.
void getUnrollingPreferences(Loop *L, ScalarEvolution &,		void getUnrollingPreferences(Loop *L, ScalarEvolution &,
UnrollingPreferences &UP) const;		UnrollingPreferences &UP) const;

/// Attributes of a target dependent hardware loop. Here, the term 'element'		/// Attributes of a target dependent hardware loop.
/// describes the work performed by an IR loop that has not been vectorized
/// by the compiler.
struct HardwareLoopInfo {		struct HardwareLoopInfo {
HardwareLoopInfo() = delete;		HardwareLoopInfo() = delete;
HardwareLoopInfo(Loop *L) : L(L) { }		HardwareLoopInfo(Loop *L) : L(L) { }
Loop *L = nullptr;		Loop *L = nullptr;
BasicBlock *ExitBlock = nullptr;		BasicBlock *ExitBlock = nullptr;
BranchInst *ExitBranch = nullptr;		BranchInst *ExitBranch = nullptr;
const SCEV *ExitCount = nullptr;		const SCEV *ExitCount = nullptr;
IntegerType *CountType = nullptr;		IntegerType *CountType = nullptr;
Value *LoopDecrement = nullptr; // The maximum number of elements		Value *LoopDecrement = nullptr; // Decrement the loop counter by this
// processed in the loop body.		// value in every iteration.
bool IsNestingLegal = false; // Can a hardware loop be a parent to		bool IsNestingLegal = false; // Can a hardware loop be a parent to
// another hardware loop.		// another hardware loop?
bool CounterInReg = false; // Should loop counter be updated in		bool CounterInReg = false; // Should loop counter be updated in
// the loop via a phi?		// the loop via a phi?
};		};

/// Query the target whether it would be profitable to convert the given loop		/// Query the target whether it would be profitable to convert the given loop
/// into a hardware loop.		/// into a hardware loop.
bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,		bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
AssumptionCache &AC,		AssumptionCache &AC,
▲ Show 20 Lines • Show All 1,324 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/HardwareLoops.cpp

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	bool HardwareLoops::TryConvertLoop(TTI::HardwareLoopInfo &HWLoopInfo) {
Loop *L = HWLoopInfo.L;		Loop *L = HWLoopInfo.L;
LLVM_DEBUG(dbgs() << "HWLoops: Try to convert profitable loop: " << *L);		LLVM_DEBUG(dbgs() << "HWLoops: Try to convert profitable loop: " << *L);

SmallVector<BasicBlock*, 4> ExitingBlocks;		SmallVector<BasicBlock*, 4> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);

for (SmallVectorImpl<BasicBlock *>::iterator I = ExitingBlocks.begin(),		for (SmallVectorImpl<BasicBlock *>::iterator I = ExitingBlocks.begin(),
IE = ExitingBlocks.end(); I != IE; ++I) {		IE = ExitingBlocks.end(); I != IE; ++I) {
const SCEV EC = SE->getExitCount(L, I);		BasicBlock BB = I;

		// If we pass the updated counter back through a phi, we need to know
		// which latch the updated value will be coming from.
		if (!L->isLoopLatch(BB)) {
		if ((ForceHardwareLoopPHI.getNumOccurrences() && ForceHardwareLoopPHI) \|\|
		HWLoopInfo.CounterInReg)
		continue;
		}

		const SCEV *EC = SE->getExitCount(L, BB);
if (isa<SCEVCouldNotCompute>(EC))		if (isa<SCEVCouldNotCompute>(EC))
continue;		continue;
if (const SCEVConstant *ConstEC = dyn_cast<SCEVConstant>(EC)) {		if (const SCEVConstant *ConstEC = dyn_cast<SCEVConstant>(EC)) {
if (ConstEC->getValue()->isZero())		if (ConstEC->getValue()->isZero())
continue;		continue;
} else if (!SE->isLoopInvariant(EC, L))		} else if (!SE->isLoopInvariant(EC, L))
continue;		continue;

if (SE->getTypeSizeInBits(EC->getType()) >		if (SE->getTypeSizeInBits(EC->getType()) >
HWLoopInfo.CountType->getBitWidth())		HWLoopInfo.CountType->getBitWidth())
continue;		continue;

// If this exiting block is contained in a nested loop, it is not eligible		// If this exiting block is contained in a nested loop, it is not eligible
// for insertion of the branch-and-decrement since the inner loop would		// for insertion of the branch-and-decrement since the inner loop would
// end up messing up the value in the CTR.		// end up messing up the value in the CTR.
if (!HWLoopInfo.IsNestingLegal && LI->getLoopFor(*I) != L &&		if (!HWLoopInfo.IsNestingLegal && LI->getLoopFor(BB) != L &&
!ForceNestedLoop)		!ForceNestedLoop)
continue;		continue;

// We now have a loop-invariant count of loop iterations (which is not the		// We now have a loop-invariant count of loop iterations (which is not the
// constant zero) for which we know that this loop will not exit via this		// constant zero) for which we know that this loop will not exit via this
// existing block.		// existing block.

// We need to make sure that this block will run on every loop iteration.		// We need to make sure that this block will run on every loop iteration.
Show All 10 Lines	for (pred_iterator PI = pred_begin(L->getHeader()),
break;		break;
}		}
}		}

if (NotAlways)		if (NotAlways)
continue;		continue;

// Make sure this blocks ends with a conditional branch.		// Make sure this blocks ends with a conditional branch.
Instruction TI = (I)->getTerminator();		Instruction *TI = BB->getTerminator();
if (!TI)		if (!TI)
continue;		continue;

if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
if (!BI->isConditional())		if (!BI->isConditional())
continue;		continue;

HWLoopInfo.ExitBranch = BI;		HWLoopInfo.ExitBranch = BI;
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 696 Lines • ▼ Show 20 Lines	bool ARMTTIImpl::isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
AssumptionCache &AC,		AssumptionCache &AC,
TargetLibraryInfo *LibInfo,		TargetLibraryInfo *LibInfo,
TTI::HardwareLoopInfo &HWLoopInfo) {		TTI::HardwareLoopInfo &HWLoopInfo) {
// Low-overhead branches are only supported in the 'low-overhead branch'		// Low-overhead branches are only supported in the 'low-overhead branch'
// extension of v8.1-m.		// extension of v8.1-m.
if (!ST->hasLOB() \|\| DisableLowOverheadLoops)		if (!ST->hasLOB() \|\| DisableLowOverheadLoops)
return false;		return false;

// For now, for simplicity, only support loops with one exit block.
if (!L->getExitBlock())
return false;

if (!SE.hasLoopInvariantBackedgeTakenCount(L))		if (!SE.hasLoopInvariantBackedgeTakenCount(L))
return false;		return false;

const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(L);		const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(L);
if (isa<SCEVCouldNotCompute>(BackedgeTakenCount))		if (isa<SCEVCouldNotCompute>(BackedgeTakenCount))
return false;		return false;

const SCEV *TripCountSCEV =		const SCEV *TripCountSCEV =
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/HardwareLoops/ARM/structure.ll

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	while.cond1.while.end_crit_edge.us:
%inc6.us = add nuw i32 %i.021.us, 1		%inc6.us = add nuw i32 %i.021.us, 1
%exitcond23 = icmp eq i32 %inc6.us, %N		%exitcond23 = icmp eq i32 %inc6.us, %N
br i1 %exitcond23, label %while.end7, label %while.cond1.preheader.us		br i1 %exitcond23, label %while.end7, label %while.cond1.preheader.us

while.end7:		while.end7:
ret void		ret void
}		}

		; CHECK-LABEL: not_rotated
		; CHECK-NOT: call void @llvm.set.loop.iterations
		; CHECK-NOT: call i32 @llvm.loop.decrement.i32
		define void @not_rotated(i32, i16* nocapture, i16 signext) {
		br label %4

		4:
		%5 = phi i32 [ 0, %3 ], [ %19, %18 ]
		%6 = icmp eq i32 %5, %0
		br i1 %6, label %20, label %7

		7:
		%8 = mul i32 %5, %0
		br label %9

		9:
		%10 = phi i32 [ %17, %12 ], [ 0, %7 ]
		%11 = icmp eq i32 %10, %0
		br i1 %11, label %18, label %12

		12:
		%13 = add i32 %10, %8
		%14 = getelementptr inbounds i16, i16* %1, i32 %13
		%15 = load i16, i16* %14, align 2
		%16 = add i16 %15, %2
		store i16 %16, i16* %14, align 2
		%17 = add i32 %10, 1
		br label %9

		18:
		%19 = add i32 %5, 1
		br label %4

		20:
		ret void
		}

		; CHECK-LABEL: multi_latch
		; CHECK-NOT: call void @llvm.set.loop.iterations
		; CHECK-NOT: call i32 @llvm.loop.decrement
		define void @multi_latch(i32* %a, i32* %b, i32 %N) {
		entry:
		%half = lshr i32 %N, 1
		br label %header

		header:
		%iv = phi i32 [ 0, %entry ], [ %count.next, %latch.0 ], [ %count.next, %latch.1 ]
		%cmp = icmp ult i32 %iv, %half
		%addr.a = getelementptr i32, i32* %a, i32 %iv
		%addr.b = getelementptr i32, i32* %b, i32 %iv
		br i1 %cmp, label %if.then, label %if.else

		if.then:
		store i32 %iv, i32* %addr.a
		br label %latch.0

		if.else:
		store i32 %iv, i32* %addr.b
		br label %latch.0

		latch.0:
		%count.next = add nuw i32 %iv, 1
		%cmp.1 = icmp ult i32 %count.next, %half
		br i1 %cmp.1, label %header, label %latch.1

		latch.1:
		%ld = load i32, i32* %addr.a
		store i32 %ld, i32* %addr.b
		%cmp.2 = icmp ult i32 %count.next, %N
		br i1 %cmp.2, label %header, label %latch.1

		exit:
		ret void
		}


declare void @llvm.set.loop.iterations.i32(i32) #0		declare void @llvm.set.loop.iterations.i32(i32) #0
declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #0		declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #0

llvm/trunk/test/Transforms/HardwareLoops/unconditional-latch.ll

				; RUN: opt -force-hardware-loops=true -hardware-loop-decrement=1 -hardware-loop-counter-bitwidth=32 -hardware-loops -S %s -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-ALLOW
				; RUN: opt -force-hardware-loops=true -hardware-loop-decrement=1 -hardware-loop-counter-bitwidth=32 -force-hardware-loop-phi=true -hardware-loops -S %s -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-LATCH

				; CHECK-LABEL: not_rotated
				; CHECK-LATCH-NOT: call void @llvm.set.loop.iterations
				; CHECK-LATCH-NOT: call i1 @llvm.loop.decrement

				; CHECK-ALLOW: call void @llvm.set.loop.iterations.i32(i32 %4)
				; CHECK-ALLOW: br label %10

				; CHECK-ALLOW: [[CMP:%[^ ]+]] = call i1 @llvm.loop.decrement.i32(i32 1)
				; CHECK-ALLOW: br i1 [[CMP]], label %13, label %19

				define void @not_rotated(i32, i16* nocapture, i16 signext) {
				br label %4

				4:
				%5 = phi i32 [ 0, %3 ], [ %19, %18 ]
				%6 = icmp eq i32 %5, %0
				br i1 %6, label %20, label %7

				7:
				%8 = mul i32 %5, %0
				br label %9

				9:
				%10 = phi i32 [ %17, %12 ], [ 0, %7 ]
				%11 = icmp eq i32 %10, %0
				br i1 %11, label %18, label %12

				12:
				%13 = add i32 %10, %8
				%14 = getelementptr inbounds i16, i16* %1, i32 %13
				%15 = load i16, i16* %14, align 2
				%16 = add i16 %15, %2
				store i16 %16, i16* %14, align 2
				%17 = add i32 %10, 1
				br label %9

				18:
				%19 = add i32 %5, 1
				br label %4

				20:
				ret void
				}