This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Attempt to increase the accuracy of LSR's setup cost
ClosedPublic

Authored by dmgreen on Feb 28 2019, 3:09 AM.

Download Raw Diff

Details

Reviewers

kparzysz
bcahoon
samparker
qcolombet
gilr

Commits

rGffc922ec35f8: [LSR] Attempt to increase the accuracy of LSR's setup cost
rL355597: [LSR] Attempt to increase the accuracy of LSR's setup cost

Summary

In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code.

This patch tries to make the calculation of the setup cost a little more thorough, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);

So this will only alter results if none of the other variables turn out to be different.

I've ran benchmarks and codesize on ARM and AArch64, but showed minor improvements in performance and some codesize improvements.

However, this does seem to alter some of the tests in hexagon, in ways that I'm not 100% sure for these tests are "better". I think swp-carried has too many undef's for it to calculate the costs correctly. swp-epilog-phi5.ll now has a "loop0" and a "loop1" which may mean it's not pipelined any more? And two-combinations-bug.ll isn't showing the same behaviour as the bug is trying to test, so I've turned this extra cost calculation off there. My understanding is that because we are only altering the SetupCost, the last in the list of compared variables, this shouldn't really be making the loops worse in most cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Feb 28 2019, 3:09 AM

Herald added subscribers: jdoerfert, javed.absar, kristof.beyls. · View Herald TranscriptFeb 28 2019, 3:09 AM

Hi David,

The Hexagon related test changes look good to me. Neither of these test changes indicate that your patch causes any problems. In swp-carried-1.ll, the hardware loop is no longer generated. In this case, I should just change the test to use a real value for the initial loop induction variable, %v4, instead of undef, say 0 (I can do that later). In swp-epilog-phi5.ll, the compiler is now generating an extra, innermost, hardware loop, so that's why the test changes from loop0 to loop1. The loop1/endloop1 instructions represent an outer hardware loop. This is another test that I should fix since subtle changes in the generated code cause the compiler to create/not create a hardware loop. Thanks for letting us know about the changes to the Hexagon tests. Let me know if you have any further questions.

Thanks,
Brendon

Hi Dave,

Is the lsr-setupcost test new? I don't see it in my working directory.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224 ↗	(On Diff #188685)	Shouldn't we have this before the first recursive call?
1303 ↗	(On Diff #188685)	Use a lamda instead?

The Hexagon related test changes look good to me. Neither of these test changes indicate that your patch causes any problems. In swp-carried-1.ll, the hardware loop is no longer generated. In this case, I should just change the test to use a real value for the initial loop induction variable, %v4, instead of undef, say 0 (I can do that later). In swp-epilog-phi5.ll, the compiler is now generating an extra, innermost, hardware loop, so that's why the test changes from loop0 to loop1. The loop1/endloop1 instructions represent an outer hardware loop. This is another test that I should fix since subtle changes in the generated code cause the compiler to create/not create a hardware loop.

Thanks. I didn't realise there could be nested hardware loops. Good to see there are not worse. Are you OK with me leaving them as they are here, and you fixing them as you think is best? I had a go at fixing swp-carried-1.ll, but wasn't sure how best to keep testing the old behaviour without the -lsr-recursive-setupcost=0 option, even after removing the undefs. It was using a scev like {(128 + (-1 * undef)),+,-1} before.

Is the lsr-setupcost test new? I don't see it in my working directory.

Yep, that's not is tree yet. I was trying to show the differences caused by the patch. (It would have been clearer if I'd have explained that.)

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224 ↗	(On Diff #188685)	It's was trying to replicate the behaviour from before this patch, so allowing the recursion into the addrec. EnableRecursiveSetupCost may not have been the best name for that exactly. I'll update the code to match the name better.
1303 ↗	(On Diff #188685)	It needs to be recursive (and recursive into the lambda in std::accumulate). I got tired of trying to make that work and pulled it out into a separate function.

Switchup the position of EnableRecursiveSetupCost.

samparker added inline comments.Mar 5 2019, 12:05 AM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224 ↗	(On Diff #188685)	Fair enough, then maybe we should still check for a constant start for the AddRec even when recursion isn't enabled?

Thanks. I didn't realise there could be nested hardware loops. Good to see there are not worse. Are you OK with me leaving them as they are here, and you fixing them as you think is best?

Yes. I'm ok with leaving them the way that you have changed them. I'll update the tests later.

Thanks,
Brendon

dmgreen added inline comments.Mar 7 2019, 4:01 AM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224 ↗	(On Diff #188685)	Not sure. This version here is probably cleanest going forward. The earlier version is more like the old behaviour, but not identical. I don't think it's worth replicating the old behaviour exactly, if it's going to look ugly with explicitly checking AddRec start's, as opposed to just relying on the recursion. Unless we see regressions come up from this, which I don't think we should. But I don't have a strong opinion. Let me know what you think.

I agree this looks cleaner. LGTM.

This revision is now accepted and ready to land.Mar 7 2019, 4:20 AM

Thanks Sam. And thanks Brendon!

Closed by commit rGffc922ec35f8: [LSR] Attempt to increase the accuracy of LSR's setup cost (authored by dmgreen). · Explain WhyMar 7 2019, 5:46 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2019, 5:46 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

31 lines

test/

CodeGen/

ARM/

lsr-setupcost.ll

100 lines

Hexagon/

swp-carried-1.ll

2 lines

swp-epilog-phi5.ll

4 lines

Transforms/

LoopStrengthReduce/

two-combinations-bug.ll

2 lines

Diff 189701

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <cstdlib>		#include <cstdlib>
#include <iterator>		#include <iterator>
#include <limits>		#include <limits>
		#include <numeric>
#include <map>		#include <map>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-reduce"		#define DEBUG_TYPE "loop-reduce"

/// MaxIVUsers is an arbitrary threshold that provides an early opportunity for		/// MaxIVUsers is an arbitrary threshold that provides an early opportunity for
Show All 32 Lines	static cl::opt<bool> EnableBackedgeIndexing(
"lsr-backedge-indexing", cl::Hidden, cl::init(true),		"lsr-backedge-indexing", cl::Hidden, cl::init(true),
cl::desc("Enable the generation of cross iteration indexed memops"));		cl::desc("Enable the generation of cross iteration indexed memops"));

static cl::opt<unsigned> ComplexityLimit(		static cl::opt<unsigned> ComplexityLimit(
"lsr-complexity-limit", cl::Hidden,		"lsr-complexity-limit", cl::Hidden,
cl::init(std::numeric_limits<uint16_t>::max()),		cl::init(std::numeric_limits<uint16_t>::max()),
cl::desc("LSR search space complexity limit"));		cl::desc("LSR search space complexity limit"));

		static cl::opt<bool> EnableRecursiveSetupCost(
		"lsr-recursive-setupcost", cl::Hidden, cl::init(true),
		cl::desc("Enable more thorough lsr setup cost calculation"));

#ifndef NDEBUG		#ifndef NDEBUG
// Stress test IV chain generation.		// Stress test IV chain generation.
static cl::opt<bool> StressIVChain(		static cl::opt<bool> StressIVChain(
"stress-ivchain", cl::Hidden, cl::init(false),		"stress-ivchain", cl::Hidden, cl::init(false),
cl::desc("Stress test LSR IV chains"));		cl::desc("Stress test LSR IV chains"));
#else		#else
static bool StressIVChain = false;		static bool StressIVChain = false;
#endif		#endif
▲ Show 20 Lines • Show All 1,032 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
LSRUse::KindType Kind, MemAccessTy AccessTy,		LSRUse::KindType Kind, MemAccessTy AccessTy,
GlobalValue *BaseGV, int64_t BaseOffset,		GlobalValue *BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
Instruction *Fixup = nullptr);		Instruction *Fixup = nullptr);

		static unsigned getSetupCost(const SCEV *Reg) {
		if (isa<SCEVUnknown>(Reg) \|\| isa<SCEVConstant>(Reg))
		return 1;
		if (!EnableRecursiveSetupCost)
		return 0;
		if (const auto *S = dyn_cast<SCEVAddRecExpr>(Reg))
		return getSetupCost(S->getStart());
		if (auto S = dyn_cast<SCEVCastExpr>(Reg))
		return getSetupCost(S->getOperand());
		if (auto S = dyn_cast<SCEVNAryExpr>(Reg))
		return std::accumulate(S->op_begin(), S->op_end(), 0,
		[](unsigned i, const SCEV *Reg) {
		return i + getSetupCost(Reg);
		});
		if (auto S = dyn_cast<SCEVUDivExpr>(Reg))
		return getSetupCost(S->getLHS()) + getSetupCost(S->getRHS());
		return 0;
		}

/// Tally up interesting quantities from the given register.		/// Tally up interesting quantities from the given register.
void Cost::RateRegister(const Formula &F, const SCEV *Reg,		void Cost::RateRegister(const Formula &F, const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
// If this is an addrec for another loop, it should be an invariant		// If this is an addrec for another loop, it should be an invariant
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {
return;		return;
}		}
}		}
}		}
++C.NumRegs;		++C.NumRegs;

// Rough heuristic; favor registers which don't require extra setup		// Rough heuristic; favor registers which don't require extra setup
// instructions in the preheader.		// instructions in the preheader.
if (!isa<SCEVUnknown>(Reg) &&		C.SetupCost += getSetupCost(Reg);
!isa<SCEVConstant>(Reg) &&
!(isa<SCEVAddRecExpr>(Reg) &&
(isa<SCEVUnknown>(cast<SCEVAddRecExpr>(Reg)->getStart()) \|\|
isa<SCEVConstant>(cast<SCEVAddRecExpr>(Reg)->getStart()))))
++C.SetupCost;

C.NumIVMuls += isa<SCEVMulExpr>(Reg) &&		C.NumIVMuls += isa<SCEVMulExpr>(Reg) &&
SE.hasComputableLoopEvolution(Reg, L);		SE.hasComputableLoopEvolution(Reg, L);
}		}

/// Record this register in the set. If we haven't seen it before, rate		/// Record this register in the set. If we haven't seen it before, rate
/// it. Optional LoserRegs provides a way to declare any formula that refers to		/// it. Optional LoserRegs provides a way to declare any formula that refers to
/// one of those regs an instant loser.		/// one of those regs an instant loser.
▲ Show 20 Lines • Show All 4,399 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/lsr-setupcost.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -mtriple=thumbv6m-none-eabi -loop-reduce %s -S -o - \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				%struct.arm_matrix_instance_q15 = type { i16, i16, i16* }

				define i32 @arm_mat_add_q15(%struct.arm_matrix_instance_q15* nocapture readonly %pSrcA, %struct.arm_matrix_instance_q15* nocapture readonly %pSrcB, %struct.arm_matrix_instance_q15* nocapture readonly %pDst) {
				; CHECK-LABEL: @arm_mat_add_q15(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[NUMROWS:%.]] = getelementptr inbounds [[STRUCT_ARM_MATRIX_INSTANCE_Q15:%.]], %struct.arm_matrix_instance_q15* [[PSRCA:%.*]], i32 0, i32 0
				; CHECK-NEXT: [[I0:%.]] = load i16, i16 [[NUMROWS]], align 4
				; CHECK-NEXT: [[NUMCOLS:%.]] = getelementptr inbounds [[STRUCT_ARM_MATRIX_INSTANCE_Q15]], %struct.arm_matrix_instance_q15 [[PSRCA]], i32 0, i32 1
				; CHECK-NEXT: [[I1:%.]] = load i16, i16 [[NUMCOLS]], align 2
				; CHECK-NEXT: [[MUL:%.*]] = mul i16 [[I1]], [[I0]]
				; CHECK-NEXT: [[CMP22:%.*]] = icmp eq i16 [[MUL]], 0
				; CHECK-NEXT: br i1 [[CMP22]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]
				; CHECK: while.body.preheader:
				; CHECK-NEXT: [[CONV5:%.*]] = zext i16 [[MUL]] to i32
				; CHECK-NEXT: [[PDATA2:%.]] = getelementptr inbounds [[STRUCT_ARM_MATRIX_INSTANCE_Q15]], %struct.arm_matrix_instance_q15 [[PDST:%.*]], i32 0, i32 2
				; CHECK-NEXT: [[I2:%.]] = load i16, i16** [[PDATA2]], align 4
				; CHECK-NEXT: [[PDATA1:%.]] = getelementptr inbounds [[STRUCT_ARM_MATRIX_INSTANCE_Q15]], %struct.arm_matrix_instance_q15 [[PSRCB:%.*]], i32 0, i32 2
				; CHECK-NEXT: [[I3:%.]] = load i16, i16** [[PDATA1]], align 4
				; CHECK-NEXT: [[PDATA:%.]] = getelementptr inbounds [[STRUCT_ARM_MATRIX_INSTANCE_Q15]], %struct.arm_matrix_instance_q15 [[PSRCA]], i32 0, i32 2
				; CHECK-NEXT: [[I4:%.]] = load i16, i16** [[PDATA]], align 4
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
				; CHECK: while.body:
				; CHECK-NEXT: [[PINA_026:%.]] = phi i16 [ [[INCDEC_PTR:%.*]], [[WHILE_BODY]] ], [ [[I4]], [[WHILE_BODY_PREHEADER]] ]
				; CHECK-NEXT: [[BLKCNT_025:%.]] = phi i32 [ [[DEC:%.]], [[WHILE_BODY]] ], [ [[CONV5]], [[WHILE_BODY_PREHEADER]] ]
				; CHECK-NEXT: [[PINB_024:%.]] = phi i16 [ [[INCDEC_PTR8:%.*]], [[WHILE_BODY]] ], [ [[I3]], [[WHILE_BODY_PREHEADER]] ]
				; CHECK-NEXT: [[POUT_023:%.]] = phi i16 [ [[INCDEC_PTR11:%.*]], [[WHILE_BODY]] ], [ [[I2]], [[WHILE_BODY_PREHEADER]] ]
				; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i16, i16* [[PINA_026]], i32 1
				; CHECK-NEXT: [[I5:%.]] = load i16, i16 [[PINA_026]], align 2
				; CHECK-NEXT: [[CONV7:%.*]] = sext i16 [[I5]] to i32
				; CHECK-NEXT: [[INCDEC_PTR8]] = getelementptr inbounds i16, i16* [[PINB_024]], i32 1
				; CHECK-NEXT: [[I6:%.]] = load i16, i16 [[PINB_024]], align 2
				; CHECK-NEXT: [[CONV9:%.*]] = sext i16 [[I6]] to i32
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV9]], [[CONV7]]
				; CHECK-NEXT: [[I7:%.*]] = icmp sgt i32 [[ADD]], -32768
				; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[I7]], i32 [[ADD]], i32 -32768
				; CHECK-NEXT: [[I8:%.*]] = icmp slt i32 [[SPEC_SELECT_I]], 32767
				; CHECK-NEXT: [[CALL21:%.*]] = select i1 [[I8]], i32 [[SPEC_SELECT_I]], i32 32767
				; CHECK-NEXT: [[CONV10:%.*]] = trunc i32 [[CALL21]] to i16
				; CHECK-NEXT: [[INCDEC_PTR11]] = getelementptr inbounds i16, i16* [[POUT_023]], i32 1
				; CHECK-NEXT: store i16 [[CONV10]], i16* [[POUT_023]], align 2
				; CHECK-NEXT: [[DEC]] = add nsw i32 [[BLKCNT_025]], -1
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[DEC]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[WHILE_END_LOOPEXIT:%.*]], label [[WHILE_BODY]]
				; CHECK: while.end.loopexit:
				; CHECK-NEXT: br label [[WHILE_END]]
				; CHECK: while.end:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%numRows = getelementptr inbounds %struct.arm_matrix_instance_q15, %struct.arm_matrix_instance_q15* %pSrcA, i32 0, i32 0
				%i0 = load i16, i16* %numRows, align 4
				%numCols = getelementptr inbounds %struct.arm_matrix_instance_q15, %struct.arm_matrix_instance_q15* %pSrcA, i32 0, i32 1
				%i1 = load i16, i16* %numCols, align 2
				%mul = mul i16 %i1, %i0
				%cmp22 = icmp eq i16 %mul, 0
				br i1 %cmp22, label %while.end, label %while.body.preheader

				while.body.preheader: ; preds = %entry
				%conv5 = zext i16 %mul to i32
				%pData2 = getelementptr inbounds %struct.arm_matrix_instance_q15, %struct.arm_matrix_instance_q15* %pDst, i32 0, i32 2
				%i2 = load i16, i16* %pData2, align 4
				%pData1 = getelementptr inbounds %struct.arm_matrix_instance_q15, %struct.arm_matrix_instance_q15* %pSrcB, i32 0, i32 2
				%i3 = load i16, i16* %pData1, align 4
				%pData = getelementptr inbounds %struct.arm_matrix_instance_q15, %struct.arm_matrix_instance_q15* %pSrcA, i32 0, i32 2
				%i4 = load i16, i16* %pData, align 4
				br label %while.body

				while.body: ; preds = %while.body.preheader, %while.body
				%pInA.026 = phi i16* [ %incdec.ptr, %while.body ], [ %i4, %while.body.preheader ]
				%blkCnt.025 = phi i32 [ %dec, %while.body ], [ %conv5, %while.body.preheader ]
				%pInB.024 = phi i16* [ %incdec.ptr8, %while.body ], [ %i3, %while.body.preheader ]
				%pOut.023 = phi i16* [ %incdec.ptr11, %while.body ], [ %i2, %while.body.preheader ]
				%incdec.ptr = getelementptr inbounds i16, i16* %pInA.026, i32 1
				%i5 = load i16, i16* %pInA.026, align 2
				%conv7 = sext i16 %i5 to i32
				%incdec.ptr8 = getelementptr inbounds i16, i16* %pInB.024, i32 1
				%i6 = load i16, i16* %pInB.024, align 2
				%conv9 = sext i16 %i6 to i32
				%add = add nsw i32 %conv9, %conv7
				%i7 = icmp sgt i32 %add, -32768
				%spec.select.i = select i1 %i7, i32 %add, i32 -32768
				%i8 = icmp slt i32 %spec.select.i, 32767
				%call21 = select i1 %i8, i32 %spec.select.i, i32 32767
				%conv10 = trunc i32 %call21 to i16
				%incdec.ptr11 = getelementptr inbounds i16, i16* %pOut.023, i32 1
				store i16 %conv10, i16* %pOut.023, align 2
				%dec = add nsw i32 %blkCnt.025, -1
				%cmp = icmp eq i32 %dec, 0
				br i1 %cmp, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry
				ret i32 0
				}

llvm/test/CodeGen/Hexagon/swp-carried-1.ll

	; RUN: llc -march=hexagon -rdf-opt=0 -disable-hexagon-misched -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s			; RUN: llc -march=hexagon -rdf-opt=0 -disable-hexagon-misched -hexagon-initial-cfg-cleanup=0 -lsr-recursive-setupcost=0 < %s \| FileCheck %s

	; Test that we generate the correct code when a loop carried value			; Test that we generate the correct code when a loop carried value
	; is scheduled one stage earlier than it's use. The code in			; is scheduled one stage earlier than it's use. The code in
	; isLoopCarried was returning false in this case, and the generated			; isLoopCarried was returning false in this case, and the generated
	; code was missing an copy.			; code was missing an copy.

	; CHECK: loop0(.LBB0_[[LOOP:.]],			; CHECK: loop0(.LBB0_[[LOOP:.]],
	; CHECK: .LBB0_[[LOOP]]:			; CHECK: .LBB0_[[LOOP]]:
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/Hexagon/swp-epilog-phi5.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s

	; Test that we use the correct name in an epilog phi for a phi value			; Test that we use the correct name in an epilog phi for a phi value
	; that is defined for the last time in the kernel. Previously, we			; that is defined for the last time in the kernel. Previously, we
	; used the value from kernel loop definition, but we really need			; used the value from kernel loop definition, but we really need
	; to use the value from the Phi in the kernel instead.			; to use the value from the Phi in the kernel instead.

	; In this test case, the second loop is pipelined, block b5.			; In this test case, the second loop is pipelined, block b5.

	; CHECK: loop0			; CHECK: loop1
	; CHECK: [[REG0:r([0-9]+)]] += mpyi			; CHECK: [[REG0:r([0-9]+)]] += mpyi
	; CHECK: [[REG2:r([0-9]+)]] = add([[REG1:r([0-9]+)]],add([[REG0]],#8			; CHECK: [[REG2:r([0-9]+)]] = add([[REG1:r([0-9]+)]],add([[REG0]],#8
	; CHECK: endloop0			; CHECK: endloop1

	%s.0 = type { %s.1, %s.4, %s.7, i8, i8, i32, %s.8, i32, i32, i32, i8, i8, i32, i32, double, i8, i8, i8, i8, i8, i8, i8, i8, i32, i8, i8, i8, i32, i32, i32, i32, i32, i32, i8, i32, i32, i32, i32, i32, [64 x i32], [4 x %s.9], [4 x %s.10], [4 x %s.10], i32, %s.23, i8, i8, [16 x i8], [16 x i8], [16 x i8], i32, i8, i8, i8, i8, i16, i16, i8, i8, i8, %s.11, i32, i32, i32, i32, i8, i32, [4 x %s.23], i32, i32, i32, [10 x i32], i32, i32, i32, i32, i32, %s.12, %s.13, %s.14, %s.15, %s.16, %s.17, %s.18, %s.19, %s.20, %s.21, %s.22 }			%s.0 = type { %s.1, %s.4, %s.7, i8, i8, i32, %s.8, i32, i32, i32, i8, i8, i32, i32, double, i8, i8, i8, i8, i8, i8, i8, i8, i32, i8, i8, i8, i32, i32, i32, i32, i32, i32, i8, i32, i32, i32, i32, i32, [64 x i32], [4 x %s.9], [4 x %s.10], [4 x %s.10], i32, %s.23, i8, i8, [16 x i8], [16 x i8], [16 x i8], i32, i8, i8, i8, i8, i16, i16, i8, i8, i8, %s.11, i32, i32, i32, i32, i8, i32, [4 x %s.23], i32, i32, i32, [10 x i32], i32, i32, i32, i32, i32, %s.12, %s.13, %s.14, %s.15, %s.16, %s.17, %s.18, %s.19, %s.20, %s.21, %s.22 }
	%s.1 = type { void (%s.2), void (%s.2, i32), void (%s.2), void (%s.2, i8), void (%s.2), i32, %s.3, i32, i32, i8, i32, i8*, i32, i32 }			%s.1 = type { void (%s.2), void (%s.2, i32), void (%s.2), void (%s.2, i8), void (%s.2), i32, %s.3, i32, i32, i8, i32, i8*, i32, i32 }
	%s.2 = type { %s.1, %s.4, %s.7, i8, i8, i32 }			%s.2 = type { %s.1, %s.4, %s.7, i8, i8, i32 }
	%s.3 = type { [8 x i32], [48 x i8] }			%s.3 = type { [8 x i32], [48 x i8] }
	%s.4 = type { i8* (%s.2, i32, i32), i8* (%s.2, i32, i32), i8** (%s.2, i32, i32, i32), [64 x i16]** (%s.2, i32, i32, i32), %s.5* (%s.2, i32, i8, i32, i32, i32), %s.6* (%s.2, i32, i8, i32, i32, i32), {}, i8* (%s.2, %s.5, i32, i32, i8), [64 x i16]* (%s.2, %s.6, i32, i32, i8), void (%s.2, i32), {}, i32, i32 }			%s.4 = type { i8* (%s.2, i32, i32), i8* (%s.2, i32, i32), i8** (%s.2, i32, i32, i32), [64 x i16]** (%s.2, i32, i32, i32), %s.5* (%s.2, i32, i8, i32, i32, i32), %s.6* (%s.2, i32, i8, i32, i32, i32), {}, i8* (%s.2, %s.5, i32, i32, i8), [64 x i16]* (%s.2, %s.6, i32, i32, i8), void (%s.2, i32), {}, i32, i32 }
	%s.5 = type opaque			%s.5 = type opaque
	%s.6 = type opaque			%s.6 = type opaque
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/two-combinations-bug.ll

	; RUN: opt < %s -loop-reduce -S \| FileCheck %s			; RUN: opt < %s -loop-reduce -lsr-recursive-setupcost=0 -S \| FileCheck %s

	; This test is adapted from the n-body test of the LLVM test-suite: A bug in			; This test is adapted from the n-body test of the LLVM test-suite: A bug in
	; r345114 caused LSR to generate incorrect code. The test verifies that the			; r345114 caused LSR to generate incorrect code. The test verifies that the
	; induction variable generated for the inner loop depends on the induction			; induction variable generated for the inner loop depends on the induction
	; variable of the outer loop.			; variable of the outer loop.

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines