This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow v6m runtime loop unrolling
ClosedPublic

Authored by dmgreen on Mar 30 2021, 6:30 AM.

Download Raw Diff

Details

Reviewers

NickGuy
SjoerdMeijer
efriedma
simon_tatham
samparker

Commits

rGda98177cda16: [ARM] Allow v6m runtime loop unrolling

Summary

This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Mar 30 2021, 6:30 AM

Herald added subscribers: danielkiss, zzheng, hiraditya, kristof.beyls. · View Herald TranscriptMar 30 2021, 6:30 AM

dmgreen requested review of this revision.Mar 30 2021, 6:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 6:30 AM

Just a query on the context of this work: this wasn't enabled at that time because of some regressions. How does that look now? Does this work rely on some fixes to address that, or has the picture changed?

Harbormaster completed remote builds in B96301: Diff 334117.Mar 30 2021, 7:06 AM

I'm not sure exactly why T1 unrolling wasn't enabled in the past. I think it was causing more trouble than it was worth, and not being a focus at the time was dropped fairly early. The extra tuning that was done for T2 after that would have helped T1 not regress too.

As with any change like this, some things are better, a few things are worse. In general the performance looks good though (I would not have suggested it if it didn't!). I've done a few minor changes elsewhere, but they were fairly generic, not v6m specific. There is more that we could probably get out of it in by tuning it in places, and I was contemplating whether to try and tune that now or to get this in and work from there. All the geomeans of the benchmarks I've ran are looking healthy though.

Nice one, thanks.

This revision is now accepted and ready to land.Apr 1 2021, 2:39 AM

Closed by commit rGda98177cda16: [ARM] Allow v6m runtime loop unrolling (authored by dmgreen). · Explain WhyApr 1 2021, 1:22 PM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGda98177cda16: [ARM] Allow v6m runtime loop unrolling.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMTargetTransformInfo.cpp

4 lines

test/

Transforms/

LoopUnroll/

ARM/

loop-unrolling.ll

186 lines

Diff 334809

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 2,134 Lines • ▼ Show 20 Lines	if (!ST->isMClass())
return BasicTTIImplBase::getUnrollingPreferences(L, SE, UP);		return BasicTTIImplBase::getUnrollingPreferences(L, SE, UP);

// Disable loop unrolling for Oz and Os.		// Disable loop unrolling for Oz and Os.
UP.OptSizeThreshold = 0;		UP.OptSizeThreshold = 0;
UP.PartialOptSizeThreshold = 0;		UP.PartialOptSizeThreshold = 0;
if (L->getHeader()->getParent()->hasOptSize())		if (L->getHeader()->getParent()->hasOptSize())
return;		return;

// Only enable on Thumb-2 targets.
if (!ST->isThumb2())
return;

SmallVector<BasicBlock*, 4> ExitingBlocks;		SmallVector<BasicBlock*, 4> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);
LLVM_DEBUG(dbgs() << "Loop has:\n"		LLVM_DEBUG(dbgs() << "Loop has:\n"
<< "Blocks: " << L->getNumBlocks() << "\n"		<< "Blocks: " << L->getNumBlocks() << "\n"
<< "Exit blocks: " << ExitingBlocks.size() << "\n");		<< "Exit blocks: " << ExitingBlocks.size() << "\n");

// Only allow another exit other than the latch. This acts as an early exit		// Only allow another exit other than the latch. This acts as an early exit
// as it mirrors the profitability calculation of the runtime unroller.		// as it mirrors the profitability calculation of the runtime unroller.
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll

	; RUN: opt -mtriple=armv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A			; RUN: opt -mtriple=armv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-NOUNROLL
	; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A			; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-NOUNROLL
	; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a72 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A			; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a72 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-NOUNROLL
	; RUN: opt -mtriple=thumbv8m -mcpu=cortex-m23 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T1			; RUN: opt -mtriple=thumbv8m -mcpu=cortex-m23 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL
	; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2			; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL
	; RUN: opt -mtriple=thumbv7em -mcpu=cortex-m7 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2			; RUN: opt -mtriple=thumbv7em -mcpu=cortex-m7 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL

	; CHECK-LABEL: partial			; CHECK-LABEL: partial
	define arm_aapcs_vfpcc void @partial(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B) local_unnamed_addr #0 {			define arm_aapcs_vfpcc void @partial(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B) local_unnamed_addr #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; CHECK-LABEL: for.body			; CHECK-LABEL: for.body
	for.body:			for.body:

	; CHECK-UNROLL-A: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV2:%[a-z.0-9]+]], %for.body ]			; CHECK-NOUNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV2:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-A: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1			; CHECK-NOUNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-A: [[IV2]] = add nuw nsw i32 [[IV1]], 1			; CHECK-NOUNROLL: [[IV2]] = add nuw nsw i32 [[IV1]], 1
	; CHECK-UNROLL-A: [[CMP:%[a-z.0-9]+]] = icmp eq i32 [[IV2]], 1024			; CHECK-NOUNROLL: [[CMP:%[a-z.0-9]+]] = icmp eq i32 [[IV2]], 1024
	; CHECK-UNROLL-A: br i1 [[CMP]], label [[END:%[a-z.]+]], label %for.body			; CHECK-NOUNROLL: br i1 [[CMP]], label [[END:%[a-z.]+]], label %for.body

	; CHECK-UNROLL-T1: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]			; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV16:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-T1: [[IV1]] = add nuw nsw i32 [[IV0]], 1			; CHECK-UNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T1: [[CMP:%[a-z.0-9]+]] = icmp eq i32 [[IV1]], 1024			; CHECK-UNROLL: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1
	; CHECK-UNROLL-T1: br i1 [[CMP]], label [[END:%[a-z.]+]], label %for.body			; CHECK-UNROLL: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1
				; CHECK-UNROLL: [[IV4:%[a-z.0-9]+]] = add nuw nsw i32 [[IV3]], 1
	; CHECK-UNROLL-T2: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV16:%[a-z.0-9]+]], %for.body ]			; CHECK-UNROLL: [[IV5:%[a-z.0-9]+]] = add nuw nsw i32 [[IV4]], 1
	; CHECK-UNROLL-T2: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1			; CHECK-UNROLL: [[IV6:%[a-z.0-9]+]] = add nuw nsw i32 [[IV5]], 1
	; CHECK-UNROLL-T2: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1			; CHECK-UNROLL: [[IV7:%[a-z.0-9]+]] = add nuw nsw i32 [[IV6]], 1
	; CHECK-UNROLL-T2: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1			; CHECK-UNROLL: [[IV8:%[a-z.0-9]+]] = add nuw nsw i32 [[IV7]], 1
	; CHECK-UNROLL-T2: [[IV4:%[a-z.0-9]+]] = add nuw nsw i32 [[IV3]], 1			; CHECK-UNROLL: [[IV9:%[a-z.0-9]+]] = add nuw nsw i32 [[IV8]], 1
	; CHECK-UNROLL-T2: [[IV5:%[a-z.0-9]+]] = add nuw nsw i32 [[IV4]], 1			; CHECK-UNROLL: [[IV10:%[a-z.0-9]+]] = add nuw nsw i32 [[IV9]], 1
	; CHECK-UNROLL-T2: [[IV6:%[a-z.0-9]+]] = add nuw nsw i32 [[IV5]], 1			; CHECK-UNROLL: [[IV11:%[a-z.0-9]+]] = add nuw nsw i32 [[IV10]], 1
	; CHECK-UNROLL-T2: [[IV7:%[a-z.0-9]+]] = add nuw nsw i32 [[IV6]], 1			; CHECK-UNROLL: [[IV12:%[a-z.0-9]+]] = add nuw nsw i32 [[IV11]], 1
	; CHECK-UNROLL-T2: [[IV8:%[a-z.0-9]+]] = add nuw nsw i32 [[IV7]], 1			; CHECK-UNROLL: [[IV13:%[a-z.0-9]+]] = add nuw nsw i32 [[IV12]], 1
	; CHECK-UNROLL-T2: [[IV9:%[a-z.0-9]+]] = add nuw nsw i32 [[IV8]], 1			; CHECK-UNROLL: [[IV14:%[a-z.0-9]+]] = add nuw nsw i32 [[IV13]], 1
	; CHECK-UNROLL-T2: [[IV10:%[a-z.0-9]+]] = add nuw nsw i32 [[IV9]], 1			; CHECK-UNROLL: [[IV15:%[a-z.0-9]+]] = add nuw nsw i32 [[IV14]], 1
	; CHECK-UNROLL-T2: [[IV11:%[a-z.0-9]+]] = add nuw nsw i32 [[IV10]], 1			; CHECK-UNROLL: [[IV16]] = add nuw nsw i32 [[IV15]], 1
	; CHECK-UNROLL-T2: [[IV12:%[a-z.0-9]+]] = add nuw nsw i32 [[IV11]], 1			; CHECK-UNROLL: [[CMP:%[a-z.0-9]+]] = icmp eq i32 [[IV16]], 1024
	; CHECK-UNROLL-T2: [[IV13:%[a-z.0-9]+]] = add nuw nsw i32 [[IV12]], 1			; CHECK-UNROLL: br i1 [[CMP]], label [[END:%[a-z.]+]], label %for.body
	; CHECK-UNROLL-T2: [[IV14:%[a-z.0-9]+]] = add nuw nsw i32 [[IV13]], 1
	; CHECK-UNROLL-T2: [[IV15:%[a-z.0-9]+]] = add nuw nsw i32 [[IV14]], 1
	; CHECK-UNROLL-T2: [[IV16]] = add nuw nsw i32 [[IV15]], 1
	; CHECK-UNROLL-T2: [[CMP:%[a-z.0-9]+]] = icmp eq i32 [[IV16]], 1024
	; CHECK-UNROLL-T2: br i1 [[CMP]], label [[END:%[a-z.]+]], label %for.body

	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.08			%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.08
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.08			%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.08
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	%mul = mul nsw i32 %1, %0			%mul = mul nsw i32 %1, %0
	%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.08			%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.08
	Show All 9 Lines
	; CHECK-LABEL: runtime			; CHECK-LABEL: runtime
	define arm_aapcs_vfpcc void @runtime(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B, i32 %N) local_unnamed_addr #0 {			define arm_aapcs_vfpcc void @runtime(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B, i32 %N) local_unnamed_addr #0 {
	entry:			entry:
	%cmp8 = icmp eq i32 %N, 0			%cmp8 = icmp eq i32 %N, 0
	br i1 %cmp8, label %for.cond.cleanup, label %for.body			br i1 %cmp8, label %for.cond.cleanup, label %for.body

	; CHECK-LABEL: for.body			; CHECK-LABEL: for.body
	for.body:			for.body:
	; CHECK-UNROLL-A: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z.0-9]+]] ], [ [[IV2:%[a-z.0-9]+]], %for.body ]			; CHECK-NOUNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z.0-9]+]] ], [ [[IV2:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-A: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1			; CHECK-NOUNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-A: [[IV2]] = add nuw i32 [[IV1]], 1			; CHECK-NOUNROLL: [[IV2]] = add nuw i32 [[IV1]], 1
	; CHECK-UNROLL-A: br			; CHECK-NOUNROLL: br

	; CHECK-UNROLL-T1: %i.09 = phi i32 [ %inc, %for.body ], [ 0			; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z.0-9]+]] ], [ [[IV4:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-T1: %inc = add nuw i32 %i.09, 1			; CHECK-UNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T1: %exitcond = icmp eq i32 %inc, %N			; CHECK-UNROLL: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1
	; CHECK-UNROLL-T1: br			; CHECK-UNROLL: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1
				; CHECK-UNROLL: [[IV4]] = add nuw i32 [[IV3]], 1
	; CHECK-UNROLL-T2: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z.0-9]+]] ], [ [[IV4:%[a-z.0-9]+]], %for.body ]			; CHECK-UNROLL: br
	; CHECK-UNROLL-T2: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T2: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1			; CHECK-UNROLL: for.body.epil:
	; CHECK-UNROLL-T2: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1			; CHECK-UNROLL: for.body.epil.1:
	; CHECK-UNROLL-T2: [[IV4]] = add nuw i32 [[IV3]], 1			; CHECK-UNROLL: for.body.epil.2:
	; CHECK-UNROLL-T2: br

	; CHECK-UNROLL-T2: for.body.epil:
	; CHECK-UNROLL-T2: for.body.epil.1:
	; CHECK-UNROLL-T2: for.body.epil.2:

	%i.09 = phi i32 [ %inc, %for.body ], [ 0, %entry ]			%i.09 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.09			%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.09
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.09			%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.09
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	%mul = mul nsw i32 %1, %0			%mul = mul nsw i32 %1, %0
	%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.09			%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.09
	Show All 22 Lines

	for.cond.cleanup3:			for.cond.cleanup3:
	%inc11 = add nuw i32 %h.026, 1			%inc11 = add nuw i32 %h.026, 1
	%exitcond27 = icmp eq i32 %inc11, %N			%exitcond27 = icmp eq i32 %inc11, %N
	br i1 %exitcond27, label %for.cond.cleanup, label %for.body4.lr.ph			br i1 %exitcond27, label %for.cond.cleanup, label %for.body4.lr.ph

	; CHECK-LABEL: for.body4			; CHECK-LABEL: for.body4
	for.body4:			for.body4:
	; CHECK-UNROLL-T1: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ [[IV1:%[a-z.0-9]+]], %for.body4 ]			; CHECK-NOUNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ [[IV1:%[a-z.0-9]+]], %for.body4 ]
	; CHECK-UNROLL-T1: [[IV1]] = add nuw i32 [[IV0]], 1			; CHECK-NOUNROLL: [[IV1]] = add nuw i32 [[IV0]], 1
	; CHECK-UNROLL-T1: br			; CHECK-NOUNROLL: br

	; CHECK-UNROLL-T2: for.body4.epil:			; CHECK-UNROLL: for.body4.epil:
	; CHECK-UNROLL-T2: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ [[IV4:%[a-z.0-9]+]], %for.body4 ]			; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, [[PRE:%[a-z0-9.]+]] ], [ [[IV4:%[a-z.0-9]+]], %for.body4 ]
	; CHECK-UNROLL-T2: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1			; CHECK-UNROLL: [[IV1:%[a-z.0-9]+]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T2: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1			; CHECK-UNROLL: [[IV2:%[a-z.0-9]+]] = add nuw nsw i32 [[IV1]], 1
	; CHECK-UNROLL-T2: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1			; CHECK-UNROLL: [[IV3:%[a-z.0-9]+]] = add nuw nsw i32 [[IV2]], 1
	; CHECK-UNROLL-T2: [[IV4]] = add nuw i32 [[IV3]], 1			; CHECK-UNROLL: [[IV4]] = add nuw i32 [[IV3]], 1
	; CHECK-UNROLL-T2: br			; CHECK-UNROLL: br
	; CHECK-UNROLL-T2: for.body4.epil.1:			; CHECK-UNROLL: for.body4.epil.1:
	; CHECK-UNROLL-T2: for.body4.epil.2:			; CHECK-UNROLL: for.body4.epil.2:

	%w.024 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ]			%w.024 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ]
	%add = add i32 %w.024, %mul			%add = add i32 %w.024, %mul
	%arrayidx = getelementptr inbounds i16, i16* %A, i32 %add			%arrayidx = getelementptr inbounds i16, i16* %A, i32 %add
	%0 = load i16, i16* %arrayidx, align 2			%0 = load i16, i16* %arrayidx, align 2
	%conv = sext i16 %0 to i32			%conv = sext i16 %0 to i32
	%arrayidx5 = getelementptr inbounds i16, i16* %B, i32 %w.024			%arrayidx5 = getelementptr inbounds i16, i16* %B, i32 %w.024
	%1 = load i16, i16* %arrayidx5, align 2			%1 = load i16, i16* %arrayidx5, align 2
	Show All 13 Lines
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup:			for.cond.cleanup:
	ret void			ret void

	; CHECK-LABEL: for.body			; CHECK-LABEL: for.body
	for.body:			for.body:
	; CHECK-UNROLL-A: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]			; CHECK-NOUNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-A: [[IV1]] = add nuw nsw i32 [[IV0]], 1			; CHECK-NOUNROLL: [[IV1]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-A: icmp eq i32 [[IV1]], 1024			; CHECK-NOUNROLL: icmp eq i32 [[IV1]], 1024
	; CHECK-UNROLL-A: br			; CHECK-NOUNROLL: br

	; CHECK-UNROLL-T1: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]			; CHECK-UNROLL: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-T1: [[IV1]] = add nuw nsw i32 [[IV0]], 1			; CHECK-UNROLL: [[IV1]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T1: icmp eq i32 [[IV1]], 1024			; CHECK-UNROLL: icmp eq i32 [[IV1]], 1024
	; CHECK-UNROLL-T1: br			; CHECK-UNROLL: br

	; CHECK-UNROLL-T2: [[IV0:%[a-z.0-9]+]] = phi i32 [ 0, %entry ], [ [[IV1:%[a-z.0-9]+]], %for.body ]
	; CHECK-UNROLL-T2: [[IV1]] = add nuw nsw i32 [[IV0]], 1
	; CHECK-UNROLL-T2: icmp eq i32 [[IV1]], 1024
	; CHECK-UNROLL-T2: br

	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.08			%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.08
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.08			%arrayidx1 = getelementptr inbounds i32, i32* %B, i32 %i.08
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	%call = tail call arm_aapcs_vfpcc i32 @some_func(i32 %0, i32 %1) #3			%call = tail call arm_aapcs_vfpcc i32 @some_func(i32 %0, i32 %1) #3
	%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.08			%arrayidx2 = getelementptr inbounds i32, i32* %C, i32 %i.08
	store i32 %call, i32* %arrayidx2, align 4			store i32 %call, i32* %arrayidx2, align 4
	%inc = add nuw nsw i32 %i.08, 1			%inc = add nuw nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %inc, 1024			%exitcond = icmp eq i32 %inc, 1024
	br i1 %exitcond, label %for.cond.cleanup, label %for.body			br i1 %exitcond, label %for.cond.cleanup, label %for.body
	}			}

	; CHECK-LABEL: iterate_inc			; CHECK-LABEL: iterate_inc
	; CHECK-UNROLL-A: %n.addr.04 = phi %struct.Node* [ %1, %while.body ], [ %n, %while.body.preheader ]			; CHECK-NOUNROLL: %n.addr.04 = phi %struct.Node* [ %1, %while.body ], [ %n, %while.body.preheader ]
	; CHECK-UNROLL-A: %tobool = icmp eq %struct.Node* %1, null			; CHECK-NOUNROLL: %tobool = icmp eq %struct.Node* %1, null
	; CHECK-UNROLL-A: br i1 %tobool			; CHECK-NOUNROLL: br i1 %tobool
	; CHECK-UNROLL-A-NOT: load			; CHECK-NOUNROLL-NOT: load

	; CHECK-UNROLL-T1: %n.addr.04 = phi %struct.Node* [ %1, %while.body ], [ %n, %while.body.preheader ]			; CHECK-UNROLL: [[CMP0:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR0:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T1: %tobool = icmp eq %struct.Node* %1, null			; CHECK-UNROLL: br i1 [[CMP0]], label [[END:%[a-z.0-9]+]]
	; CHECK-UNROLL-T1: br i1 %tobool			; CHECK-UNROLL: [[CMP1:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR1:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T1-NOT: load			; CHECK-UNROLL: br i1 [[CMP1]], label [[END]]
				; CHECK-UNROLL: [[CMP2:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR2:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T2: [[CMP0:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR0:%[a-z.0-9]+]], null			; CHECK-UNROLL: br i1 [[CMP2]], label [[END]]
	; CHECK-UNROLL-T2: br i1 [[CMP0]], label [[END:%[a-z.0-9]+]]			; CHECK-UNROLL: [[CMP3:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR3:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T2: [[CMP1:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR1:%[a-z.0-9]+]], null			; CHECK-UNROLL: br i1 [[CMP3]], label [[END]]
	; CHECK-UNROLL-T2: br i1 [[CMP1]], label [[END]]			; CHECK-UNROLL: [[CMP4:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR4:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T2: [[CMP2:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR2:%[a-z.0-9]+]], null			; CHECK-UNROLL: br i1 [[CMP4]], label [[END]]
	; CHECK-UNROLL-T2: br i1 [[CMP2]], label [[END]]			; CHECK-UNROLL-NOT: load
	; CHECK-UNROLL-T2: [[CMP3:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR3:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T2: br i1 [[CMP3]], label [[END]]
	; CHECK-UNROLL-T2: [[CMP4:%[a-z.0-9]+]] = icmp eq %struct.Node* [[VAR4:%[a-z.0-9]+]], null
	; CHECK-UNROLL-T2: br i1 [[CMP4]], label [[END]]
	; CHECK-UNROLL-T2-NOT: load

	%struct.Node = type { %struct.Node*, i32 }			%struct.Node = type { %struct.Node*, i32 }

	define arm_aapcscc void @iterate_inc(%struct.Node* %n) local_unnamed_addr #0 {			define arm_aapcscc void @iterate_inc(%struct.Node* %n) local_unnamed_addr #0 {
	entry:			entry:
	%tobool3 = icmp eq %struct.Node* %n, null			%tobool3 = icmp eq %struct.Node* %n, null
	br i1 %tobool3, label %while.end, label %while.body.preheader			br i1 %tobool3, label %while.end, label %while.body.preheader

	Show All 19 Lines