This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Don't force bases of foldable formulae to the final type.
ClosedPublic

Authored by ebevhan on Jan 16 2018, 7:01 AM.

Download Raw Diff

Details

Reviewers

atrick
qcolombet
sanjoy

Commits

rG6d06976e74dc: [LSR] Don't force bases of foldable formulae to the final type.
rL323946: [LSR] Don't force bases of foldable formulae to the final type.

Summary

Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.

For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.

Diff Detail

Repository: rL LLVM

Event Timeline

ebevhan created this revision.Jan 16 2018, 7:01 AM

Hi,

Could you upload the patch with the full context?

I'd like to check the type is consistent with the surrounding code.

Cheers,
-Quentin

Included the complete context. Sorry for missing that!

qcolombet added inline comments.Jan 26 2018, 3:28 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996 ↗	(On Diff #130837)	I don't think this is what we want, FullV is supposed to have the same type as F. That said, I would have expected your test case to have been recognized by the check line 4948. Could you check why this is not happening?

Out-of-curiosity do you actually see codegen differences with that patch?

lib/Transforms/Scalar/LoopStrengthReduce.cpp
5012 ↗	(On Diff #130837)	I believe we would have the same problem here.
5022 ↗	(On Diff #130837)	And here.

Basically, I am fine either way (with or without the cast), but we would need to be consistent.

In D42103#989687, @qcolombet wrote:

Out-of-curiosity do you actually see codegen differences with that patch?

The only difference in the test case is that the base and scale registers on the memory operation are swapped.

In our downstream target we see fairly large differences because our pointers and integers go in different registers, so it's inefficient to do a GEP like this where the base and scale are 'swapped'.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996 ↗	(On Diff #130837)	In the test case in this patch (and it's probably the same for the local test case I found this in originally), the check on 4948 does trigger. Ty is i64 and OpTy is i8. Since the effective SCEV type for both of these is i64, Ty is set to i8 and the base (which is originally an integer) is converted into a pointer when it's expanded.

Alright, LGTM then.
Please also patches the two other places I mentioned assuming you can have a test case for them.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
4996 ↗	(On Diff #130837)	Thanks for double checking.

This revision is now accepted and ready to land.Jan 29 2018, 4:15 PM

I replaced the other two instances of Ty with nullptr, and only observed a difference in one AMDGPU test case (preserve-addrspace-assert.ll), likely caused by the change on line 5022. Before, GEPs for double* and i32* were produced, but with the patch, GEPs for the containing struct are produced instead. This is probably because the expander was being forced to emit the pre-offset expressions as the final types (which were double* and i32*) but with the patch it will emit GEPs that take the struct pointer instead.

It's not really clear if this is better or not, and I can't really say that I have a test case that represents this change specifically. Should I upload a diff with the fixes to the test case or leave this patch as is?

Let's leave the patch as is.

Rebased. Another test case was affected, but as far as I can tell, the only difference is the elimination of two bitcasts.

LGTM

Closed by commit rL323946: [LSR] Don't force bases of foldable formulae to the final type. (authored by uabelho). · Explain WhyJan 31 2018, 10:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

2 lines

test/

Transforms/

LoopStrengthReduce/

X86/

macro-fuse-cmp.ll

56 lines

nested-loop.ll

20 lines

Diff 132329

llvm/trunk/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 4,987 Lines • ▼ Show 20 Lines	if (F.Scale != 0) {
} else {		} else {
// Otherwise just expand the scaled register and an explicit scale,		// Otherwise just expand the scaled register and an explicit scale,
// which is expected to be matched as part of the address.		// which is expected to be matched as part of the address.

// Flush the operand list to suppress SCEVExpander hoisting address modes.		// Flush the operand list to suppress SCEVExpander hoisting address modes.
// Unless the addressing mode will not be folded.		// Unless the addressing mode will not be folded.
if (!Ops.empty() && LU.Kind == LSRUse::Address &&		if (!Ops.empty() && LU.Kind == LSRUse::Address &&
isAMCompletelyFolded(TTI, LU, F)) {		isAMCompletelyFolded(TTI, LU, F)) {
Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);		Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), nullptr);
Ops.clear();		Ops.clear();
Ops.push_back(SE.getUnknown(FullV));		Ops.push_back(SE.getUnknown(FullV));
}		}
ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr));		ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr));
if (F.Scale != 1)		if (F.Scale != 1)
ScaledS =		ScaledS =
SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));		SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));
Ops.push_back(ScaledS);		Ops.push_back(ScaledS);
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopStrengthReduce/X86/macro-fuse-cmp.ll

	Show All 10 Lines
	; PR35681 - https://bugs.llvm.org/show_bug.cgi?id=35681			; PR35681 - https://bugs.llvm.org/show_bug.cgi?id=35681
	; FIXME: If a CPU can macro-fuse a compare and branch, then we discount that			; FIXME: If a CPU can macro-fuse a compare and branch, then we discount that
	; cost in LSR and avoid generating large offsets in each memory access.			; cost in LSR and avoid generating large offsets in each memory access.
	; This reduces code size and may improve decode throughput.			; This reduces code size and may improve decode throughput.

	define void @maxArray(double* noalias nocapture %x, double* noalias nocapture readonly %y) {			define void @maxArray(double* noalias nocapture %x, double* noalias nocapture readonly %y) {
	; JAG-LABEL: @maxArray(			; JAG-LABEL: @maxArray(
	; JAG-NEXT: entry:			; JAG-NEXT: entry:
	; JAG-NEXT: [[Y1:%.]] = bitcast double [[Y:%.]] to <2 x double>			; JAG-NEXT: [[Y1:%.]] = bitcast double [[Y:%.]] to i8
	; JAG-NEXT: [[X4:%.]] = bitcast double [[X:%.]] to <2 x double>			; JAG-NEXT: [[X3:%.]] = bitcast double [[X:%.]] to i8
	; JAG-NEXT: [[X45:%.]] = bitcast <2 x double> [[X4]] to i8*
	; JAG-NEXT: [[Y12:%.]] = bitcast <2 x double> [[Y1]] to i8*
	; JAG-NEXT: br label [[VECTOR_BODY:%.*]]			; JAG-NEXT: br label [[VECTOR_BODY:%.*]]
	; JAG: vector.body:			; JAG: vector.body:
	; JAG-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[VECTOR_BODY]] ], [ -524288, [[ENTRY:%.*]] ]			; JAG-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[VECTOR_BODY]] ], [ -524288, [[ENTRY:%.*]] ]
	; JAG-NEXT: [[UGLYGEP9:%.]] = getelementptr i8, i8 [[X45]], i64 [[LSR_IV]]			; JAG-NEXT: [[UGLYGEP7:%.]] = getelementptr i8, i8 [[X3]], i64 [[LSR_IV]]
	; JAG-NEXT: [[UGLYGEP910:%.]] = bitcast i8 [[UGLYGEP9]] to <2 x double>*			; JAG-NEXT: [[UGLYGEP78:%.]] = bitcast i8 [[UGLYGEP7]] to <2 x double>*
	; JAG-NEXT: [[SCEVGEP11:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP910]], i64 32768			; JAG-NEXT: [[SCEVGEP9:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP78]], i64 32768
	; JAG-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[Y12]], i64 [[LSR_IV]]			; JAG-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[Y1]], i64 [[LSR_IV]]
	; JAG-NEXT: [[UGLYGEP3:%.]] = bitcast i8 [[UGLYGEP]] to <2 x double>*			; JAG-NEXT: [[UGLYGEP2:%.]] = bitcast i8 [[UGLYGEP]] to <2 x double>*
	; JAG-NEXT: [[SCEVGEP:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP3]], i64 32768			; JAG-NEXT: [[SCEVGEP:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP2]], i64 32768
	; JAG-NEXT: [[XVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP11]], align 8			; JAG-NEXT: [[XVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP9]], align 8
	; JAG-NEXT: [[YVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP]], align 8			; JAG-NEXT: [[YVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP]], align 8
	; JAG-NEXT: [[CMP:%.*]] = fcmp ogt <2 x double> [[YVAL]], [[XVAL]]			; JAG-NEXT: [[CMP:%.*]] = fcmp ogt <2 x double> [[YVAL]], [[XVAL]]
	; JAG-NEXT: [[MAX:%.*]] = select <2 x i1> [[CMP]], <2 x double> [[YVAL]], <2 x double> [[XVAL]]			; JAG-NEXT: [[MAX:%.*]] = select <2 x i1> [[CMP]], <2 x double> [[YVAL]], <2 x double> [[XVAL]]
	; JAG-NEXT: [[UGLYGEP6:%.]] = getelementptr i8, i8 [[X45]], i64 [[LSR_IV]]			; JAG-NEXT: [[UGLYGEP4:%.]] = getelementptr i8, i8 [[X3]], i64 [[LSR_IV]]
	; JAG-NEXT: [[UGLYGEP67:%.]] = bitcast i8 [[UGLYGEP6]] to <2 x double>*			; JAG-NEXT: [[UGLYGEP45:%.]] = bitcast i8 [[UGLYGEP4]] to <2 x double>*
	; JAG-NEXT: [[SCEVGEP8:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP67]], i64 32768			; JAG-NEXT: [[SCEVGEP6:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP45]], i64 32768
	; JAG-NEXT: store <2 x double> [[MAX]], <2 x double>* [[SCEVGEP8]], align 8			; JAG-NEXT: store <2 x double> [[MAX]], <2 x double>* [[SCEVGEP6]], align 8
	; JAG-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 16			; JAG-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 16
	; JAG-NEXT: [[DONE:%.*]] = icmp eq i64 [[LSR_IV_NEXT]], 0			; JAG-NEXT: [[DONE:%.*]] = icmp eq i64 [[LSR_IV_NEXT]], 0
	; JAG-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[VECTOR_BODY]]			; JAG-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[VECTOR_BODY]]
	; JAG: exit:			; JAG: exit:
	; JAG-NEXT: ret void			; JAG-NEXT: ret void
	;			;
	; HSW-LABEL: @maxArray(			; HSW-LABEL: @maxArray(
	; HSW-NEXT: entry:			; HSW-NEXT: entry:
	; HSW-NEXT: [[Y1:%.]] = bitcast double [[Y:%.]] to <2 x double>			; HSW-NEXT: [[Y1:%.]] = bitcast double [[Y:%.]] to i8
	; HSW-NEXT: [[X4:%.]] = bitcast double [[X:%.]] to <2 x double>			; HSW-NEXT: [[X3:%.]] = bitcast double [[X:%.]] to i8
	; HSW-NEXT: [[X45:%.]] = bitcast <2 x double> [[X4]] to i8*
	; HSW-NEXT: [[Y12:%.]] = bitcast <2 x double> [[Y1]] to i8*
	; HSW-NEXT: br label [[VECTOR_BODY:%.*]]			; HSW-NEXT: br label [[VECTOR_BODY:%.*]]
	; HSW: vector.body:			; HSW: vector.body:
	; HSW-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[VECTOR_BODY]] ], [ -524288, [[ENTRY:%.*]] ]			; HSW-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[VECTOR_BODY]] ], [ -524288, [[ENTRY:%.*]] ]
	; HSW-NEXT: [[UGLYGEP9:%.]] = getelementptr i8, i8 [[X45]], i64 [[LSR_IV]]			; HSW-NEXT: [[UGLYGEP7:%.]] = getelementptr i8, i8 [[X3]], i64 [[LSR_IV]]
	; HSW-NEXT: [[UGLYGEP910:%.]] = bitcast i8 [[UGLYGEP9]] to <2 x double>*			; HSW-NEXT: [[UGLYGEP78:%.]] = bitcast i8 [[UGLYGEP7]] to <2 x double>*
	; HSW-NEXT: [[SCEVGEP11:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP910]], i64 32768			; HSW-NEXT: [[SCEVGEP9:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP78]], i64 32768
	; HSW-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[Y12]], i64 [[LSR_IV]]			; HSW-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[Y1]], i64 [[LSR_IV]]
	; HSW-NEXT: [[UGLYGEP3:%.]] = bitcast i8 [[UGLYGEP]] to <2 x double>*			; HSW-NEXT: [[UGLYGEP2:%.]] = bitcast i8 [[UGLYGEP]] to <2 x double>*
	; HSW-NEXT: [[SCEVGEP:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP3]], i64 32768			; HSW-NEXT: [[SCEVGEP:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP2]], i64 32768
	; HSW-NEXT: [[XVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP11]], align 8			; HSW-NEXT: [[XVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP9]], align 8
	; HSW-NEXT: [[YVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP]], align 8			; HSW-NEXT: [[YVAL:%.]] = load <2 x double>, <2 x double> [[SCEVGEP]], align 8
	; HSW-NEXT: [[CMP:%.*]] = fcmp ogt <2 x double> [[YVAL]], [[XVAL]]			; HSW-NEXT: [[CMP:%.*]] = fcmp ogt <2 x double> [[YVAL]], [[XVAL]]
	; HSW-NEXT: [[MAX:%.*]] = select <2 x i1> [[CMP]], <2 x double> [[YVAL]], <2 x double> [[XVAL]]			; HSW-NEXT: [[MAX:%.*]] = select <2 x i1> [[CMP]], <2 x double> [[YVAL]], <2 x double> [[XVAL]]
	; HSW-NEXT: [[UGLYGEP6:%.]] = getelementptr i8, i8 [[X45]], i64 [[LSR_IV]]			; HSW-NEXT: [[UGLYGEP4:%.]] = getelementptr i8, i8 [[X3]], i64 [[LSR_IV]]
	; HSW-NEXT: [[UGLYGEP67:%.]] = bitcast i8 [[UGLYGEP6]] to <2 x double>*			; HSW-NEXT: [[UGLYGEP45:%.]] = bitcast i8 [[UGLYGEP4]] to <2 x double>*
	; HSW-NEXT: [[SCEVGEP8:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP67]], i64 32768			; HSW-NEXT: [[SCEVGEP6:%.]] = getelementptr <2 x double>, <2 x double> [[UGLYGEP45]], i64 32768
	; HSW-NEXT: store <2 x double> [[MAX]], <2 x double>* [[SCEVGEP8]], align 8			; HSW-NEXT: store <2 x double> [[MAX]], <2 x double>* [[SCEVGEP6]], align 8
	; HSW-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 16			; HSW-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 16
	; HSW-NEXT: [[DONE:%.*]] = icmp eq i64 [[LSR_IV_NEXT]], 0			; HSW-NEXT: [[DONE:%.*]] = icmp eq i64 [[LSR_IV_NEXT]], 0
	; HSW-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[VECTOR_BODY]]			; HSW-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[VECTOR_BODY]]
	; HSW: exit:			; HSW: exit:
	; HSW-NEXT: ret void			; HSW-NEXT: ret void
	;			;
	; BASE-LABEL: maxArray:			; BASE-LABEL: maxArray:
	; BASE: # %bb.0: # %entry			; BASE: # %bb.0: # %entry
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopStrengthReduce/X86/nested-loop.ll

	Show All 9 Lines

	define void @foo(i32 %size, i32 %nsteps, i32 %hsize, i32* %lined, i8* %maxarray) {			define void @foo(i32 %size, i32 %nsteps, i32 %hsize, i32* %lined, i8* %maxarray) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP215:%.]] = icmp sgt i32 [[SIZE:%.]], 1			; CHECK-NEXT: [[CMP215:%.]] = icmp sgt i32 [[SIZE:%.]], 1
	; CHECK-NEXT: [[T0:%.*]] = zext i32 [[SIZE]] to i64			; CHECK-NEXT: [[T0:%.*]] = zext i32 [[SIZE]] to i64
	; CHECK-NEXT: [[T1:%.]] = sext i32 [[NSTEPS:%.]] to i64			; CHECK-NEXT: [[T1:%.]] = sext i32 [[NSTEPS:%.]] to i64
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[T0]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[T0]], -1
	; CHECK-NEXT: [[TMP1:%.]] = inttoptr i64 [[TMP0]] to i8
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[LSR_IV1:%.]] = phi i64 [ [[LSR_IV_NEXT2:%.]], [[FOR_INC:%.]] ], [ 1, [[ENTRY:%.]] ]			; CHECK-NEXT: [[LSR_IV1:%.]] = phi i64 [ [[LSR_IV_NEXT2:%.]], [[FOR_INC:%.]] ], [ 1, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[INDVARS_IV2:%.]] = phi i64 [ [[INDVARS_IV_NEXT3:%.]], [[FOR_INC]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[INDVARS_IV2:%.]] = phi i64 [ [[INDVARS_IV_NEXT3:%.]], [[FOR_INC]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: [[LSR_IV13:%.]] = inttoptr i64 [[LSR_IV1]] to i8
	; CHECK-NEXT: br i1 [[CMP215]], label [[FOR_BODY2_PREHEADER:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP215]], label [[FOR_BODY2_PREHEADER:%.*]], label [[FOR_INC]]
	; CHECK: for.body2.preheader:			; CHECK: for.body2.preheader:
	; CHECK-NEXT: br label [[FOR_BODY2:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2:%.*]]
	; CHECK: for.body2:			; CHECK: for.body2:
	; CHECK-NEXT: [[LSR_IV4:%.]] = phi i8 [ [[SCEVGEP:%.]], [[FOR_BODY2]] ], [ [[MAXARRAY:%.]], [[FOR_BODY2_PREHEADER]] ]			; CHECK-NEXT: [[LSR_IV3:%.]] = phi i8 [ [[SCEVGEP:%.]], [[FOR_BODY2]] ], [ [[MAXARRAY:%.]], [[FOR_BODY2_PREHEADER]] ]
	; CHECK-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[FOR_BODY2]] ], [ [[TMP0]], [[FOR_BODY2_PREHEADER]] ]			; CHECK-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[FOR_BODY2]] ], [ [[TMP0]], [[FOR_BODY2_PREHEADER]] ]
	; CHECK-NEXT: [[LSR_IV45:%.]] = ptrtoint i8 [[LSR_IV4]] to i64			; CHECK-NEXT: [[SCEVGEP6:%.]] = getelementptr i8, i8 [[LSR_IV3]], i64 1
	; CHECK-NEXT: [[SCEVGEP8:%.]] = getelementptr i8, i8 [[LSR_IV4]], i64 1			; CHECK-NEXT: [[V1:%.]] = load i8, i8 [[SCEVGEP6]], align 1
	; CHECK-NEXT: [[V1:%.]] = load i8, i8 [[SCEVGEP8]], align 1			; CHECK-NEXT: [[SCEVGEP5:%.]] = getelementptr i8, i8 [[LSR_IV3]], i64 [[TMP0]]
	; CHECK-NEXT: [[SCEVGEP7:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[LSR_IV45]]			; CHECK-NEXT: [[V2:%.]] = load i8, i8 [[SCEVGEP5]], align 1
	; CHECK-NEXT: [[V2:%.]] = load i8, i8 [[SCEVGEP7]], align 1
	; CHECK-NEXT: [[TMPV:%.*]] = xor i8 [[V1]], [[V2]]			; CHECK-NEXT: [[TMPV:%.*]] = xor i8 [[V1]], [[V2]]
	; CHECK-NEXT: [[SCEVGEP6:%.]] = getelementptr i8, i8 [[LSR_IV13]], i64 [[LSR_IV45]]			; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i8, i8 [[LSR_IV3]], i64 [[LSR_IV1]]
	; CHECK-NEXT: store i8 [[TMPV]], i8* [[SCEVGEP6]], align 1			; CHECK-NEXT: store i8 [[TMPV]], i8* [[SCEVGEP4]], align 1
	; CHECK-NEXT: [[LSR_IV_NEXT]] = add i64 [[LSR_IV]], -1			; CHECK-NEXT: [[LSR_IV_NEXT]] = add i64 [[LSR_IV]], -1
	; CHECK-NEXT: [[SCEVGEP]] = getelementptr i8, i8* [[LSR_IV4]], i64 1			; CHECK-NEXT: [[SCEVGEP]] = getelementptr i8, i8* [[LSR_IV3]], i64 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[LSR_IV_NEXT]], 0			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[LSR_IV_NEXT]], 0
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY2]], label [[FOR_INC_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY2]], label [[FOR_INC_LOOPEXIT:%.*]]
	; CHECK: for.inc.loopexit:			; CHECK: for.inc.loopexit:
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT3]] = add nuw nsw i64 [[INDVARS_IV2]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT3]] = add nuw nsw i64 [[INDVARS_IV2]], 1
	; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i64 [[LSR_IV1]], [[T0]]			; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i64 [[LSR_IV1]], [[T0]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT3]], [[T1]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT3]], [[T1]]
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	for.inc: ; preds = %for.inc.loopexit, %for.body			for.inc: ; preds = %for.inc.loopexit, %for.body
	%indvars.iv.next3 = add nuw nsw i64 %indvars.iv2, 1			%indvars.iv.next3 = add nuw nsw i64 %indvars.iv2, 1
	%cmp = icmp slt i64 %indvars.iv.next3, %t1			%cmp = icmp slt i64 %indvars.iv.next3, %t1
	br i1 %cmp, label %for.body, label %for.end.loopexit			br i1 %cmp, label %for.body, label %for.end.loopexit

	for.end.loopexit: ; preds = %for.inc			for.end.loopexit: ; preds = %for.inc
	ret void			ret void
	}			}