This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
StraightLineStrengthReduce.cpp
-
test/Transforms/StraightLineStrengthReduce/
-
Transforms/
-
StraightLineStrengthReduce/
-
X86/
-
lit.local.cfg
-
no-slsr.ll
-
slsr-gep.ll
-
slsr-mul.ll
-
slsr.ll

Differential D7459

[SLSR] handle candidate form &B[i * S]
ClosedPublic

Authored by jingyue on Feb 5 2015, 10:57 PM.

Download Raw Diff

Details

Reviewers

eliben
atrick
meheff
hfinkel

Commits

rG177a81578fbc: [SLSR] handle candidate form &B[i * S]
rL233286: [SLSR] handle candidate form &B[i * S]

Summary

This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates

S1: X = &B[i * S]
S2: Y = &B[i' * S]

and S1 dominates S2, we can replace S2 with

Y = &X[(i' - i) * S]

Diff Detail

Repository: rL LLVM

Event Timeline

jingyue updated this revision to Diff 19461.Feb 5 2015, 10:57 PM

jingyue retitled this revision from to [SLSR] handle candidate form &B[i * S].

jingyue updated this object.

jingyue edited the test plan for this revision. (Show Details)

jingyue added reviewers: hfinkel, atrick, meheff, eliben.

jingyue added a subscriber: Unknown Object (MLST).

hfinkel added inline comments.Feb 6 2015, 11:24 AM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
242 ↗	(On Diff #19461)	Shouldn't you test whether or not B isa GlobalVariable? Seems like, if it is, you should set BaseGV to that, and HasBaseReg = false, and otherwise keep the current set of arguments.
317 ↗	(On Diff #19461)	I don't think this is the right approach, we're inventing ScalarEvolution all over again. SE also already handles the shift case you mention in the TODO comment below. If I take your test/Transforms/StraightLineStrengthReduce/X86/no-slsr.ll, for example, and run: opt -analyze -scalar-evolution We get: Printing analysis 'Scalar Evolution Analysis' for function 'slsr_gep': Classifying expressions for: @slsr_gep %p0 = getelementptr inbounds i32* %input, i64 0 --> %input %v0 = load i32* %p0 --> %v0 %p1 = getelementptr inbounds i32* %input, i64 %s --> ((4 * %s)<nsw> + %input)<nsw> %v1 = load i32* %p1 --> %v1 %s2 = mul nsw i64 %s, 2 --> (2 * %s) %p2 = getelementptr inbounds i32* %input, i64 %s2 --> ((8 * %s) + %input)<nsw> %v2 = load i32* %p2 --> %v2 %1 = add i32 %v0, %v1 --> (%v1 + %v0) %2 = add i32 %1, %v2 --> (%v2 + %v1 + %v0) Determining loop execution counts for: @slsr_gep And you can see, it nicely decomposes the GEPs into arithmetic expressions for you. Andy, please shout if you disagree, and I apologize for not thinking about this before, but I'm inclined to recommend that we rework this pass so that all cases are handled by SE before we get too far. LSR, of course, uses SE, and I think this pass should too for many of the same reasons (even though there is no loop handling here, we need all of the same arithmetic normalization logic, which is fairly significant). Also, I think you should consider handling the constant-offset case in GEPs because that will let us get cases where the last index in a structure index. like B[i*J].x, which is fairly common. You just need to make sure such an offset gets passed to isLegalAddressingMode (BaseOffset) above.

jingyue added inline comments.Feb 6 2015, 3:25 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
317 ↗	(On Diff #19461)	Hi Hal, Thanks for pointing this out! I original thought SCEV is for loop optimizations and didn't quite think along that line. I'll try to replace BaseExpr or even most of the Candidate class with SCEV. I need to run experiments to see how well this approach works, but in general, I am aligned with you that reusing SCEV is a better approach. Jingyue

Good observation, Hal. SCEV is there for you to use, so you might as well. I agree that we don't want to reinvent the abstraction over arithmetic expressions in each pass. I will mention two issues:

(1) SCEV may end up recursively evaluating more of the expression than you need, which could be expensive. In the first patch, you had a very simple pattern match, so SCEV didn't seem worthwhile. As you try to cover more patterns, SCEV makes more sense.

(2) SCEVExpander is fraught with peril. If you can use SCEV to help discover candidates, but leave your rewriteCandidateWithBases() code as-is, then you have nothing to worry about. If you want to use SCEVExpander then you need to be very careful about the expressions you generate.

Take 2 towards handling GEPs in SLSR

replace BaseExpr with SCEV
discover more patterns of GEP that can be SLSR'ed

Hi Andrew and Hal,

Sorry about the long delay in reworking this patch. I hope it is not too cache-cold :)

Two major changes against the previous patch are:

Replacing BaseExpr with SCEV. As we discussed before, we shouldn't reinvent another abstraction of arithmetic expressions. Reusing SCEV turns out to save lots of code!
The discovery phase of the current patch tries to match GEPs in the form of &B[..][i * S][..] and make it a candidate (char *)&B[..][0][..] + (i * element_size) * S. The old patch only works when i * S is the last index.

I wish I could use SCEV more extensively in this patch, e.g., to discover candidates and compute bumps. However, I didn't do that for two reasons:

Given an LLVM value, SCEV recursively evaluates all subexpressions of this value. The current SLSR algorithm does very shallow pattern matching and rewrites candidates based on these simple patterns. Recursively evaluating subexpressions makes SLSR harder to rewrite candidates into desired forms unless we can easily map SCEVs back to LLVM Values. For example, if S in candidate (B + i) * S can be further SCEV-evaluated, we would rewrite the candidate into some form like Y + (i' - i) * expr_of(S) instead of Y + (i' - i) * S.
ScalarEvolution is designed to be control-flow oblivious, and tends to strip nsw/nuw flags. This makes SLSR harder to factor the multiply of a sext'ed expression GEPs which happens very frequently for array indexing. For example, a GEP &B[i * S] is translated to B + sext(i * S) with the nsw flag on i * S stripped out. Without this flag, SLSR can hardly accept B + sext(i * S) as a valid GEP candidate.

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
242 ↗	(On Diff #19461)	Done. Please check out function `isCompletelyFoldable`.

add some missing comments

Few drop-by comments inline.

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
93 ↗	(On Diff #21524)	Very minor: I'd call this `Kind` to differentiate from `llvm::Type`.
327 ↗	(On Diff #21524)	Unless I'm missing some larger context; for this equivalence to hold, you'll have to prove that `Idx * S * ElementSize` does not sign-overflow. For instance, consider: `Idx->getType()` == `i8`, pointers are 16 bit, `Idx` == `4`, `S` == `4` and `ElementSize` == `24`. `sext(Idx s S) s ElementSize` is `i16 384` while `sext((Idx * ElementSize) *s S)` is `-128`.
444 ↗	(On Diff #21524)	Have you considered doing a `bitcast` to `i8`, gep on that `i8` and then back instead? That will preserve pointer-ness throughout.
451 ↗	(On Diff #21524)	Maybe use `llvm_unreachable` here?

hfinkel added inline comments.Mar 9 2015, 4:48 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
214 ↗	(On Diff #21528)	Don't need the { } here.
366 ↗	(On Diff #21528)	Can't you use SCEV to do that here? If you do, then you get the shift case for free. If the problem here is the loss of the information on the NSW, please comment on that.
444 ↗	(On Diff #21528)	No, please don't do that. Cast the pointer to an i8* (in the appropriate address space), and use a GEP.

Thanks Hal and Sanjoy for the review. I will fix all other comments.

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
366 ↗	(On Diff #21528)	NSW is one problem. The other problem, as I replied earlier, is that this may complicate the rewriting phase because the rewriting would have to translate SCEVs back to IR instructions. For example, suppose `S = a + b` and the immediate basis of `X = S * 4` is `Y = S * 3`. The current code simply rewrites `X` as `Y + S` without tracing into how S is computed. However, if we use SCEV, we would represent X as SCEV `(a + b) * 4` and Y as SCEV `(a + b) * 3`, and rewrite Y as SCEV `X + (a + b)`. After that, we would need to translate SCEV `X + (a + b)` back into IR instructions `X + (a + b)` and further into `X + S`. While I think this is doable, given the relatively simple pattern matching this pass currently has, I don't feel it's worthwhile leveraging SCEVExpander or such to rewrite SCEV into IR instructions.
327 ↗	(On Diff #21524)	Thanks for pointing this out! You are right. I'll fix it.

hfinkel added inline comments.Mar 10 2015, 5:07 AM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
366 ↗	(On Diff #21528)	Okay, please add a detailed comment about this in the code.

All comments addressed. PTAL

jingyue added inline comments.Mar 10 2015, 1:23 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
214 ↗	(On Diff #21528)	Done.
366 ↗	(On Diff #21528)	Done.
444 ↗	(On Diff #21528)	Done. Also change BumpWithGEP to BumpWithUglyGEP because we use GEPs in both cases.
93 ↗	(On Diff #21524)	Done.
327 ↗	(On Diff #21524)	Done. We now transform sext(Idx s S) s ElementSize to (sext(Idx) * ElementSize) * S
444 ↗	(On Diff #21524)	Done.
451 ↗	(On Diff #21524)	Done.

sanjoy added inline comments.Mar 10 2015, 3:09 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
325 ↗	(On Diff #21625)	Nit: `s` is confusing -- there is no difference between signed vs. unsigned integer multiplication. If you mean `nsw` then `nsw` is clearer.
378 ↗	(On Diff #21625)	You can use `m_NSWMul` here directly.

Addresses Sanjoy's follow-up comments

hfinkel added inline comments.Mar 10 2015, 4:29 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
358 ↗	(On Diff #21649)	I don't understand this loop, could you please add a comment explaining it. What changes between the two iterations, and why two?
490 ↗	(On Diff #21649)	I think we always at least have the base implementation available. Just make TTI a hard dependency.

Adds more comments and makes TTI mandatory

jingyue added inline comments.Mar 10 2015, 9:10 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
358 ↗	(On Diff #21649)	Let me know whether this comment helps.
490 ↗	(On Diff #21649)	Done.

hfinkel added inline comments.Mar 11 2015, 12:05 AM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
259 ↗	(On Diff #21673)	We always have TTI, you don't need to check for it.
272 ↗	(On Diff #21673)	If Basis.Basis != nullptr, do we want to set C.Basis to that?
461 ↗	(On Diff #21673)	Either remove this comment, or say that we use i8* GEPs, instead of inttoptr/ptrtoint because the latter interferes with pointer-aliasing analysis.
465 ↗	(On Diff #21673)	This can be just: unsigned AS = Basis.Ins->getType()->getPointerAddressSpace();
503 ↗	(On Diff #21673)	Use range-based for.
358 ↗	(On Diff #21649)	Yes, that's better.

some more minor changes

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
259 ↗	(On Diff #21673)	Thanks. I missed this one.
272 ↗	(On Diff #21673)	Yes. In that case, C.Ins can still be rewritten with respect to Basis.Ins. For example, ... &a[s]; // Basis. Assume this is the first memory reference to a, so Basis.Basis == nullptr. ... &a[s * 2]; // C. We still want to rewrite C to Basis + s.
465 ↗	(On Diff #21673)	Thanks!
503 ↗	(On Diff #21673)	I may be missing something, but I don't see any API of BasicBlock that returns an iterator_range. http://llvm.org/docs/doxygen/html/classllvm_1_1BasicBlock.html

hfinkel added inline comments.Mar 11 2015, 10:51 AM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
272 ↗	(On Diff #21673)	Okay, please do. That form should enhance ILP.
503 ↗	(On Diff #21673)	Don't need one. You can use a range-based for loop with any object that has a begin()/end(). for (auto &i : *B) allocateCandidateAndFindBasis(&I); or something like that.

use range-based for loops

meheff added inline comments.Mar 11 2015, 11:26 AM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
363 ↗	(On Diff #21743)	I see that this loop enables you to avoid code duplication for the two cases (sext-wrapped and no sext), but it requires a bit of mental effort to see what it is doing. I think this would be clearer if you just broke out the code into a separate function and avoided the loop altogether. Then the code looks like: foobarfunction(Base, ArrayIdx, ElementSize, GEP) Value *SextIdx; if (!match(ArrayIdx, m_SExt(m_Value(SextIdx))) foobarfunction(Base, SextIdx, ElementSize, GEP); In this case it is much more obvious that you're considering the two cases (sext and no sext). There is one other location in the file which uses this similar style loop construct and might be clearer expressed as a separate function call.

jingyue added inline comments.Mar 11 2015, 2:38 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
363 ↗	(On Diff #21743)	Both places fixed. PTAL.
461 ↗	(On Diff #21673)	Missed this comment. Fixed in this patch.

Extract symmetric code into a helper function.

Thanks, Jingyue. LGTM

No further comments from me. I don't see any issues and will defer to the other reviewers.

Thanks!

Hal and Sanjoy, do you have more comments?

In D7459#146494, @jingyue wrote:

Thanks!

Hal and Sanjoy, do you have more comments?

I don't have any more comments.

LGTM too.

This revision is now accepted and ready to land.Mar 26 2015, 1:24 AM

jingyue closed this revision.Mar 26 2015, 9:52 AM

Closed by commit rL233286: [SLSR] handle candidate form &B[i * S] (authored by jingyue). · Explain WhyMar 26 2015, 9:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

StraightLineStrengthReduce.cpp

395 lines

test/

Transforms/

StraightLineStrengthReduce/

X86/

2 lines

30 lines

109 lines

121 lines

119 lines

Diff 22729

llvm/trunk/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

	Show All 9 Lines
	// This file implements straight-line strength reduction (SLSR). Unlike loop			// This file implements straight-line strength reduction (SLSR). Unlike loop
	// strength reduction, this algorithm is designed to reduce arithmetic			// strength reduction, this algorithm is designed to reduce arithmetic
	// redundancy in straight-line code instead of loops. It has proven to be			// redundancy in straight-line code instead of loops. It has proven to be
	// effective in simplifying arithmetic statements derived from an unrolled loop.			// effective in simplifying arithmetic statements derived from an unrolled loop.
	// It can also simplify the logic of SeparateConstOffsetFromGEP.			// It can also simplify the logic of SeparateConstOffsetFromGEP.
	//			//
	// There are many optimizations we can perform in the domain of SLSR. This file			// There are many optimizations we can perform in the domain of SLSR. This file
	// for now contains only an initial step. Specifically, we look for strength			// for now contains only an initial step. Specifically, we look for strength
	// reduction candidate in the form of			// reduction candidates in two forms:
	//			//
	// (B + i) * S			// Form 1: (B + i) * S
				// Form 2: &B[i * S]
	//			//
	// where B and S are integer constants or variables, and i is a constant			// where S is an integer variable, and i is a constant integer. If we found two
	// integer. If we found two such candidates			// candidates
	//			//
	// S1: X = (B + i) * S S2: Y = (B + i') * S			// S1: X = (B + i) * S
				// S2: Y = (B + i') * S
				//
				// or
				//
				// S1: X = &B[i * S]
				// S2: Y = &B[i' * S]
	//			//
	// and S1 dominates S2, we call S1 a basis of S2, and can replace S2 with			// and S1 dominates S2, we call S1 a basis of S2, and can replace S2 with
	//			//
	// Y = X + (i' - i) * S			// Y = X + (i' - i) * S
	//			//
				// or
				//
				// Y = &X[(i' - i) * S]
				//
	// where (i' - i) * S is folded to the extent possible. When S2 has multiple			// where (i' - i) * S is folded to the extent possible. When S2 has multiple
	// bases, we pick the one that is closest to S2, or S2's "immediate" basis.			// bases, we pick the one that is closest to S2, or S2's "immediate" basis.
	//			//
	// TODO:			// TODO:
	//			//
	// - Handle candidates in the form of B + i * S			// - Handle candidates in the form of B + i * S
	//			//
	// - Handle candidates in the form of pointer arithmetics. e.g., B[i * S]
	//
	// - Floating point arithmetics when fast math is enabled.			// - Floating point arithmetics when fast math is enabled.
	//			//
	// - SLSR may decrease ILP at the architecture level. Targets that are very			// - SLSR may decrease ILP at the architecture level. Targets that are very
	// sensitive to ILP may want to disable it. Having SLSR to consider ILP is			// sensitive to ILP may want to disable it. Having SLSR to consider ILP is
	// left as future work.			// left as future work.
	#include <vector>			#include <vector>

	#include "llvm/ADT/DenseSet.h"			#include "llvm/ADT/DenseSet.h"
				#include "llvm/ADT/FoldingSet.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/IR/PatternMatch.h"			#include "llvm/IR/PatternMatch.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include "llvm/Transforms/Scalar.h"			#include "llvm/Transforms/Scalar.h"

	using namespace llvm;			using namespace llvm;
	using namespace PatternMatch;			using namespace PatternMatch;

	namespace {			namespace {

	class StraightLineStrengthReduce : public FunctionPass {			class StraightLineStrengthReduce : public FunctionPass {
	public:			public:
	// SLSR candidate. Such a candidate must be in the form of			// SLSR candidate. Such a candidate must be in the form of
	// (Base + Index) * Stride			// (Base + Index) * Stride
				// or
				// Base[..][Index * Stride][..]
	struct Candidate : public ilist_node<Candidate> {			struct Candidate : public ilist_node<Candidate> {
	Candidate(Value B = nullptr, ConstantInt Idx = nullptr,			enum Kind {
	Value S = nullptr, Instruction I = nullptr)			Invalid, // reserved for the default constructor
	: Base(B), Index(Idx), Stride(S), Ins(I), Basis(nullptr) {}			Mul, // (B + i) * S
	Value *Base;			GEP, // &B[..][i * S][..]
				};

				Candidate()
				: CandidateKind(Invalid), Base(nullptr), Index(nullptr),
				Stride(nullptr), Ins(nullptr), Basis(nullptr) {}
				Candidate(Kind CT, const SCEV B, ConstantInt Idx, Value *S,
				Instruction *I)
				: CandidateKind(CT), Base(B), Index(Idx), Stride(S), Ins(I),
				Basis(nullptr) {}
				Kind CandidateKind;
				const SCEV *Base;
				// Note that Index and Stride of a GEP candidate may not have the same
				// integer type. In that case, during rewriting, Stride will be
				// sign-extended or truncated to Index's type.
	ConstantInt *Index;			ConstantInt *Index;
	Value *Stride;			Value *Stride;
	// The instruction this candidate corresponds to. It helps us to rewrite a			// The instruction this candidate corresponds to. It helps us to rewrite a
	// candidate with respect to its immediate basis. Note that one instruction			// candidate with respect to its immediate basis. Note that one instruction
	// can corresponds to multiple candidates depending on how you associate the			// can corresponds to multiple candidates depending on how you associate the
	// expression. For instance,			// expression. For instance,
	//			//
	// (a + 1) * (b + 2)			// (a + 1) * (b + 2)
	//			//
	// can be treated as			// can be treated as
	//			//
	// <Base: a, Index: 1, Stride: b + 2>			// <Base: a, Index: 1, Stride: b + 2>
	//			//
	// or			// or
	//			//
	// <Base: b, Index: 2, Stride: a + 1>			// <Base: b, Index: 2, Stride: a + 1>
	Instruction *Ins;			Instruction *Ins;
	// Points to the immediate basis of this candidate, or nullptr if we cannot			// Points to the immediate basis of this candidate, or nullptr if we cannot
	// find any basis for this candidate.			// find any basis for this candidate.
	Candidate *Basis;			Candidate *Basis;
	};			};

	static char ID;			static char ID;

	StraightLineStrengthReduce() : FunctionPass(ID), DT(nullptr) {			StraightLineStrengthReduce()
				: FunctionPass(ID), DL(nullptr), DT(nullptr), TTI(nullptr) {
	initializeStraightLineStrengthReducePass(*PassRegistry::getPassRegistry());			initializeStraightLineStrengthReducePass(*PassRegistry::getPassRegistry());
	}			}

	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.addRequired<DominatorTreeWrapperPass>();			AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<ScalarEvolution>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
	// We do not modify the shape of the CFG.			// We do not modify the shape of the CFG.
	AU.setPreservesCFG();			AU.setPreservesCFG();
	}			}

				bool doInitialization(Module &M) override {
				DL = &M.getDataLayout();
				return false;
				}

	bool runOnFunction(Function &F) override;			bool runOnFunction(Function &F) override;

	private:			private:
	// Returns true if Basis is a basis for C, i.e., Basis dominates C and they			// Returns true if Basis is a basis for C, i.e., Basis dominates C and they
	// share the same base and stride.			// share the same base and stride.
	bool isBasisFor(const Candidate &Basis, const Candidate &C);			bool isBasisFor(const Candidate &Basis, const Candidate &C);
	// Checks whether I is in a candidate form. If so, adds all the matching forms			// Checks whether I is in a candidate form. If so, adds all the matching forms
	// to Candidates, and tries to find the immediate basis for each of them.			// to Candidates, and tries to find the immediate basis for each of them.
	void allocateCandidateAndFindBasis(Instruction *I);			void allocateCandidateAndFindBasis(Instruction *I);
	// Given that I is in the form of "(B + Idx) * S", adds this form to			// Allocate candidates and find bases for Mul instructions.
	// Candidates, and finds its immediate basis.			void allocateCandidateAndFindBasisForMul(Instruction *I);
	void allocateCandidateAndFindBasis(Value B, ConstantInt Idx, Value *S,			// Splits LHS into Base + Index and, if succeeds, calls
				// allocateCandidateAndFindBasis.
				void allocateCandidateAndFindBasisForMul(Value LHS, Value RHS,
				Instruction *I);
				// Allocate candidates and find bases for GetElementPtr instructions.
				void allocateCandidateAndFindBasisForGEP(GetElementPtrInst *GEP);
				// A helper function that scales Idx with ElementSize before invoking
				// allocateCandidateAndFindBasis.
				void allocateCandidateAndFindBasisForGEP(const SCEV B, ConstantInt Idx,
				Value *S, uint64_t ElementSize,
				Instruction *I);
				// Adds the given form <CT, B, Idx, S> to Candidates, and finds its immediate
				// basis.
				void allocateCandidateAndFindBasis(Candidate::Kind CT, const SCEV *B,
				ConstantInt Idx, Value S,
	Instruction *I);			Instruction *I);
	// Rewrites candidate C with respect to Basis.			// Rewrites candidate C with respect to Basis.
	void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);			void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);
				// A helper function that factors ArrayIdx to a product of a stride and a
				// constant index, and invokes allocateCandidateAndFindBasis with the
				// factorings.
				void factorArrayIndex(Value ArrayIdx, const SCEV Base, uint64_t ElementSize,
				GetElementPtrInst *GEP);
				// Emit code that computes the "bump" from Basis to C. If the candidate is a
				// GEP and the bump is not divisible by the element size of the GEP, this
				// function sets the BumpWithUglyGEP flag to notify its caller to bump the
				// basis using an ugly GEP.
				static Value *emitBump(const Candidate &Basis, const Candidate &C,
				IRBuilder<> &Builder, const DataLayout *DL,
				bool &BumpWithUglyGEP);

				const DataLayout *DL;
	DominatorTree *DT;			DominatorTree *DT;
				ScalarEvolution *SE;
				TargetTransformInfo *TTI;
	ilist<Candidate> Candidates;			ilist<Candidate> Candidates;
	// Temporarily holds all instructions that are unlinked (but not deleted) by			// Temporarily holds all instructions that are unlinked (but not deleted) by
	// rewriteCandidateWithBasis. These instructions will be actually removed			// rewriteCandidateWithBasis. These instructions will be actually removed
	// after all rewriting finishes.			// after all rewriting finishes.
	DenseSet<Instruction *> UnlinkedInstructions;			DenseSet<Instruction *> UnlinkedInstructions;
	};			};
	} // anonymous namespace			} // anonymous namespace

	char StraightLineStrengthReduce::ID = 0;			char StraightLineStrengthReduce::ID = 0;
	INITIALIZE_PASS_BEGIN(StraightLineStrengthReduce, "slsr",			INITIALIZE_PASS_BEGIN(StraightLineStrengthReduce, "slsr",
	"Straight line strength reduction", false, false)			"Straight line strength reduction", false, false)
	INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)			INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
	INITIALIZE_PASS_END(StraightLineStrengthReduce, "slsr",			INITIALIZE_PASS_END(StraightLineStrengthReduce, "slsr",
	"Straight line strength reduction", false, false)			"Straight line strength reduction", false, false)

	FunctionPass *llvm::createStraightLineStrengthReducePass() {			FunctionPass *llvm::createStraightLineStrengthReducePass() {
	return new StraightLineStrengthReduce();			return new StraightLineStrengthReduce();
	}			}

	bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,			bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,
	const Candidate &C) {			const Candidate &C) {
	return (Basis.Ins != C.Ins && // skip the same instruction			return (Basis.Ins != C.Ins && // skip the same instruction
	// Basis must dominate C in order to rewrite C with respect to Basis.			// Basis must dominate C in order to rewrite C with respect to Basis.
	DT->dominates(Basis.Ins->getParent(), C.Ins->getParent()) &&			DT->dominates(Basis.Ins->getParent(), C.Ins->getParent()) &&
	// They share the same base and stride.			// They share the same base, stride, and candidate kind.
	Basis.Base == C.Base &&			Basis.Base == C.Base &&
	Basis.Stride == C.Stride);			Basis.Stride == C.Stride &&
				Basis.CandidateKind == C.CandidateKind);
				}

				static bool isCompletelyFoldable(GetElementPtrInst *GEP,
				const TargetTransformInfo *TTI,
				const DataLayout *DL) {
				GlobalVariable *BaseGV = nullptr;
				int64_t BaseOffset = 0;
				bool HasBaseReg = false;
				int64_t Scale = 0;

				if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GEP->getPointerOperand()))
				BaseGV = GV;
				else
				HasBaseReg = true;

				gep_type_iterator GTI = gep_type_begin(GEP);
				for (auto I = GEP->idx_begin(); I != GEP->idx_end(); ++I, ++GTI) {
				if (isa<SequentialType>(*GTI)) {
				int64_t ElementSize = DL->getTypeAllocSize(GTI.getIndexedType());
				if (ConstantInt ConstIdx = dyn_cast<ConstantInt>(I)) {
				BaseOffset += ConstIdx->getSExtValue() * ElementSize;
				} else {
				// Needs scale register.
				if (Scale != 0) {
				// No addressing mode takes two scale registers.
				return false;
				}
				Scale = ElementSize;
				}
				} else {
				StructType STy = cast<StructType>(GTI);
				uint64_t Field = cast<ConstantInt>(*I)->getZExtValue();
				BaseOffset += DL->getStructLayout(STy)->getElementOffset(Field);
				}
				}
				return TTI->isLegalAddressingMode(GEP->getType()->getElementType(), BaseGV,
				BaseOffset, HasBaseReg, Scale);
	}			}

	// TODO: We currently implement an algorithm whose time complexity is linear to			// TODO: We currently implement an algorithm whose time complexity is linear to
	// the number of existing candidates. However, a better algorithm exists. We			// the number of existing candidates. However, a better algorithm exists. We
	// could depth-first search the dominator tree, and maintain a hash table that			// could depth-first search the dominator tree, and maintain a hash table that
	// contains all candidates that dominate the node being traversed. This hash			// contains all candidates that dominate the node being traversed. This hash
	// table is indexed by the base and the stride of a candidate. Therefore,			// table is indexed by the base and the stride of a candidate. Therefore,
	// finding the immediate basis of a candidate boils down to one hash-table look			// finding the immediate basis of a candidate boils down to one hash-table look
	// up.			// up.
	void StraightLineStrengthReduce::allocateCandidateAndFindBasis(Value *B,			void StraightLineStrengthReduce::allocateCandidateAndFindBasis(
	ConstantInt *Idx,			Candidate::Kind CT, const SCEV B, ConstantInt Idx, Value *S,
	Value *S,
	Instruction *I) {			Instruction *I) {
	Candidate C(B, Idx, S, I);			if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
				// If &B[Idx * S] fits into an addressing mode, do not turn it into
				// non-free computation.
				if (isCompletelyFoldable(GEP, TTI, DL))
				return;
				}

				Candidate C(CT, B, Idx, S, I);
	// Try to compute the immediate basis of C.			// Try to compute the immediate basis of C.
	unsigned NumIterations = 0;			unsigned NumIterations = 0;
	// Limit the scan radius to avoid running forever.			// Limit the scan radius to avoid running forever.
	static const unsigned MaxNumIterations = 50;			static const unsigned MaxNumIterations = 50;
	for (auto Basis = Candidates.rbegin();			for (auto Basis = Candidates.rbegin();
	Basis != Candidates.rend() && NumIterations < MaxNumIterations;			Basis != Candidates.rend() && NumIterations < MaxNumIterations;
	++Basis, ++NumIterations) {			++Basis, ++NumIterations) {
	if (isBasisFor(*Basis, C)) {			if (isBasisFor(*Basis, C)) {
	C.Basis = &(*Basis);			C.Basis = &(*Basis);
	break;			break;
	}			}
	}			}
	// Regardless of whether we find a basis for C, we need to push C to the			// Regardless of whether we find a basis for C, we need to push C to the
	// candidate list.			// candidate list.
	Candidates.push_back(C);			Candidates.push_back(C);
	}			}

	void StraightLineStrengthReduce::allocateCandidateAndFindBasis(Instruction *I) {			void StraightLineStrengthReduce::allocateCandidateAndFindBasis(Instruction *I) {
				switch (I->getOpcode()) {
				case Instruction::Mul:
				allocateCandidateAndFindBasisForMul(I);
				break;
				case Instruction::GetElementPtr:
				allocateCandidateAndFindBasisForGEP(cast<GetElementPtrInst>(I));
				break;
				}
				}

				void StraightLineStrengthReduce::allocateCandidateAndFindBasisForMul(
				Value LHS, Value RHS, Instruction *I) {
	Value *B = nullptr;			Value *B = nullptr;
	ConstantInt *Idx = nullptr;			ConstantInt *Idx = nullptr;
	// "(Base + Index) * Stride" must be a Mul instruction at the first hand.
	if (I->getOpcode() == Instruction::Mul) {
	if (IntegerType *ITy = dyn_cast<IntegerType>(I->getType())) {
	Value LHS = I->getOperand(0), RHS = I->getOperand(1);
	for (unsigned Swapped = 0; Swapped < 2; ++Swapped) {
	// Only handle the canonical operand ordering.			// Only handle the canonical operand ordering.
	if (match(LHS, m_Add(m_Value(B), m_ConstantInt(Idx)))) {			if (match(LHS, m_Add(m_Value(B), m_ConstantInt(Idx)))) {
	// If LHS is in the form of "Base + Index", then I is in the form of			// If LHS is in the form of "Base + Index", then I is in the form of
	// "(Base + Index) * RHS".			// "(Base + Index) * RHS".
	allocateCandidateAndFindBasis(B, Idx, RHS, I);			allocateCandidateAndFindBasis(Candidate::Mul, SE->getSCEV(B), Idx, RHS, I);
	} else {			} else {
	// Otherwise, at least try the form (LHS + 0) * RHS.			// Otherwise, at least try the form (LHS + 0) * RHS.
	allocateCandidateAndFindBasis(LHS, ConstantInt::get(ITy, 0), RHS, I);			ConstantInt *Zero = ConstantInt::get(cast<IntegerType>(I->getType()), 0);
				allocateCandidateAndFindBasis(Candidate::Mul, SE->getSCEV(LHS), Zero, RHS,
				I);
				}
				}

				void StraightLineStrengthReduce::allocateCandidateAndFindBasisForMul(
				Instruction *I) {
				// Try matching (B + i) * S.
				// TODO: we could extend SLSR to float and vector types.
				if (!isa<IntegerType>(I->getType()))
				return;

				Value LHS = I->getOperand(0), RHS = I->getOperand(1);
				allocateCandidateAndFindBasisForMul(LHS, RHS, I);
				if (LHS != RHS) {
				// Symmetrically, try to split RHS to Base + Index.
				allocateCandidateAndFindBasisForMul(RHS, LHS, I);
	}			}
	// Swap LHS and RHS so that we also cover the cases where LHS is the
	// stride.
	if (LHS == RHS)
	break;
	std::swap(LHS, RHS);
	}			}

				void StraightLineStrengthReduce::allocateCandidateAndFindBasisForGEP(
				const SCEV B, ConstantInt Idx, Value *S, uint64_t ElementSize,
				Instruction *I) {
				// I = B + sext(Idx nsw S) nsw ElementSize
				// = B + (sext(Idx) * ElementSize) * sext(S)
				// Casting to IntegerType is safe because we skipped vector GEPs.
				IntegerType *IntPtrTy = cast<IntegerType>(DL->getIntPtrType(I->getType()));
				ConstantInt *ScaledIdx = ConstantInt::get(
				IntPtrTy, Idx->getSExtValue() * (int64_t)ElementSize, true);
				allocateCandidateAndFindBasis(Candidate::GEP, B, ScaledIdx, S, I);
				}

				void StraightLineStrengthReduce::factorArrayIndex(Value *ArrayIdx,
				const SCEV *Base,
				uint64_t ElementSize,
				GetElementPtrInst *GEP) {
				// At least, ArrayIdx = ArrayIdx *s 1.
				allocateCandidateAndFindBasisForGEP(
				Base, ConstantInt::get(cast<IntegerType>(ArrayIdx->getType()), 1),
				ArrayIdx, ElementSize, GEP);
				Value *LHS = nullptr;
				ConstantInt *RHS = nullptr;
				// TODO: handle shl. e.g., we could treat (S << 2) as (S * 4).
				//
				// One alternative is matching the SCEV of ArrayIdx instead of ArrayIdx
				// itself. This would allow us to handle the shl case for free. However,
				// matching SCEVs has two issues:
				//
				// 1. this would complicate rewriting because the rewriting procedure
				// would have to translate SCEVs back to IR instructions. This translation
				// is difficult when LHS is further evaluated to a composite SCEV.
				//
				// 2. ScalarEvolution is designed to be control-flow oblivious. It tends
				// to strip nsw/nuw flags which are critical for SLSR to trace into
				// sext'ed multiplication.
				if (match(ArrayIdx, m_NSWMul(m_Value(LHS), m_ConstantInt(RHS)))) {
				// SLSR is currently unsafe if i * S may overflow.
				// GEP = Base + sext(LHS nsw RHS) nsw ElementSize
				allocateCandidateAndFindBasisForGEP(Base, RHS, LHS, ElementSize, GEP);
	}			}
	}			}

				void StraightLineStrengthReduce::allocateCandidateAndFindBasisForGEP(
				GetElementPtrInst *GEP) {
				// TODO: handle vector GEPs
				if (GEP->getType()->isVectorTy())
				return;

				const SCEV *GEPExpr = SE->getSCEV(GEP);
				Type *IntPtrTy = DL->getIntPtrType(GEP->getType());

				gep_type_iterator GTI = gep_type_begin(GEP);
				for (auto I = GEP->idx_begin(); I != GEP->idx_end(); ++I) {
				if (!isa<SequentialType>(*GTI++))
				continue;
				Value ArrayIdx = I;
				// Compute the byte offset of this index.
				uint64_t ElementSize = DL->getTypeAllocSize(*GTI);
				const SCEV ElementSizeExpr = SE->getSizeOfExpr(IntPtrTy, GTI);
				const SCEV *ArrayIdxExpr = SE->getSCEV(ArrayIdx);
				ArrayIdxExpr = SE->getTruncateOrSignExtend(ArrayIdxExpr, IntPtrTy);
				const SCEV *LocalOffset =
				SE->getMulExpr(ArrayIdxExpr, ElementSizeExpr, SCEV::FlagNSW);
				// The base of this candidate equals GEPExpr less the byte offset of this
				// index.
				const SCEV *Base = SE->getMinusSCEV(GEPExpr, LocalOffset);
				factorArrayIndex(ArrayIdx, Base, ElementSize, GEP);
				// When ArrayIdx is the sext of a value, we try to factor that value as
				// well. Handling this case is important because array indices are
				// typically sign-extended to the pointer size.
				Value *TruncatedArrayIdx = nullptr;
				if (match(ArrayIdx, m_SExt(m_Value(TruncatedArrayIdx))))
				factorArrayIndex(TruncatedArrayIdx, Base, ElementSize, GEP);
				}
				}

				// A helper function that unifies the bitwidth of A and B.
				static void unifyBitWidth(APInt &A, APInt &B) {
				if (A.getBitWidth() < B.getBitWidth())
				A = A.sext(B.getBitWidth());
				else if (A.getBitWidth() > B.getBitWidth())
				B = B.sext(A.getBitWidth());
				}

				Value *StraightLineStrengthReduce::emitBump(const Candidate &Basis,
				const Candidate &C,
				IRBuilder<> &Builder,
				const DataLayout *DL,
				bool &BumpWithUglyGEP) {
				APInt Idx = C.Index->getValue(), BasisIdx = Basis.Index->getValue();
				unifyBitWidth(Idx, BasisIdx);
				APInt IndexOffset = Idx - BasisIdx;

				BumpWithUglyGEP = false;
				if (Basis.CandidateKind == Candidate::GEP) {
				APInt ElementSize(
				IndexOffset.getBitWidth(),
				DL->getTypeAllocSize(
				cast<GetElementPtrInst>(Basis.Ins)->getType()->getElementType()));
				APInt Q, R;
				APInt::sdivrem(IndexOffset, ElementSize, Q, R);
				if (R.getSExtValue() == 0)
				IndexOffset = Q;
				else
				BumpWithUglyGEP = true;
				}
				// Compute Bump = C - Basis = (i' - i) * S.
				// Common case 1: if (i' - i) is 1, Bump = S.
				if (IndexOffset.getSExtValue() == 1)
				return C.Stride;
				// Common case 2: if (i' - i) is -1, Bump = -S.
				if (IndexOffset.getSExtValue() == -1)
				return Builder.CreateNeg(C.Stride);
				// Otherwise, Bump = (i' - i) * sext/trunc(S).
				ConstantInt *Delta = ConstantInt::get(Basis.Ins->getContext(), IndexOffset);
				Value *ExtendedStride = Builder.CreateSExtOrTrunc(C.Stride, Delta->getType());
				return Builder.CreateMul(ExtendedStride, Delta);
	}			}

	void StraightLineStrengthReduce::rewriteCandidateWithBasis(			void StraightLineStrengthReduce::rewriteCandidateWithBasis(
	const Candidate &C, const Candidate &Basis) {			const Candidate &C, const Candidate &Basis) {
				assert(C.CandidateKind == Basis.CandidateKind && C.Base == Basis.Base &&
				C.Stride == Basis.Stride);

	// An instruction can correspond to multiple candidates. Therefore, instead of			// An instruction can correspond to multiple candidates. Therefore, instead of
	// simply deleting an instruction when we rewrite it, we mark its parent as			// simply deleting an instruction when we rewrite it, we mark its parent as
	// nullptr (i.e. unlink it) so that we can skip the candidates whose			// nullptr (i.e. unlink it) so that we can skip the candidates whose
	// instruction is already rewritten.			// instruction is already rewritten.
	if (!C.Ins->getParent())			if (!C.Ins->getParent())
	return;			return;
	assert(C.Base == Basis.Base && C.Stride == Basis.Stride);
	// Basis = (B + i) * S
	// C = (B + i') * S
	// ==>
	// C = Basis + (i' - i) * S
	IRBuilder<> Builder(C.Ins);			IRBuilder<> Builder(C.Ins);
	ConstantInt *IndexOffset = ConstantInt::get(			bool BumpWithUglyGEP;
	C.Ins->getContext(), C.Index->getValue() - Basis.Index->getValue());			Value *Bump = emitBump(Basis, C, Builder, DL, BumpWithUglyGEP);
	Value *Reduced;			Value *Reduced = nullptr; // equivalent to but weaker than C.Ins
	// TODO: preserve nsw/nuw in some cases.			switch (C.CandidateKind) {
	if (IndexOffset->isOne()) {			case Candidate::Mul:
	// If (i' - i) is 1, fold C into Basis + S.
	Reduced = Builder.CreateAdd(Basis.Ins, C.Stride);
	} else if (IndexOffset->isMinusOne()) {
	// If (i' - i) is -1, fold C into Basis - S.
	Reduced = Builder.CreateSub(Basis.Ins, C.Stride);
	} else {
	Value *Bump = Builder.CreateMul(C.Stride, IndexOffset);
	Reduced = Builder.CreateAdd(Basis.Ins, Bump);			Reduced = Builder.CreateAdd(Basis.Ins, Bump);
				break;
				case Candidate::GEP:
				{
				Type *IntPtrTy = DL->getIntPtrType(C.Ins->getType());
				if (BumpWithUglyGEP) {
				// C = (char *)Basis + Bump
				unsigned AS = Basis.Ins->getType()->getPointerAddressSpace();
				Type *CharTy = Type::getInt8PtrTy(Basis.Ins->getContext(), AS);
				Reduced = Builder.CreateBitCast(Basis.Ins, CharTy);
				// We only considered inbounds GEP as candidates.
				Reduced = Builder.CreateInBoundsGEP(Reduced, Bump);
				Reduced = Builder.CreateBitCast(Reduced, C.Ins->getType());
				} else {
				// C = gep Basis, Bump
				// Canonicalize bump to pointer size.
				Bump = Builder.CreateSExtOrTrunc(Bump, IntPtrTy);
				Reduced = Builder.CreateInBoundsGEP(Basis.Ins, Bump);
				}
	}			}
				break;
				default:
				llvm_unreachable("C.CandidateKind is invalid");
				};
	Reduced->takeName(C.Ins);			Reduced->takeName(C.Ins);
	C.Ins->replaceAllUsesWith(Reduced);			C.Ins->replaceAllUsesWith(Reduced);
	C.Ins->dropAllReferences();			C.Ins->dropAllReferences();
	// Unlink C.Ins so that we can skip other candidates also corresponding to			// Unlink C.Ins so that we can skip other candidates also corresponding to
	// C.Ins. The actual deletion is postponed to the end of runOnFunction.			// C.Ins. The actual deletion is postponed to the end of runOnFunction.
	C.Ins->removeFromParent();			C.Ins->removeFromParent();
	UnlinkedInstructions.insert(C.Ins);			UnlinkedInstructions.insert(C.Ins);
	}			}

	bool StraightLineStrengthReduce::runOnFunction(Function &F) {			bool StraightLineStrengthReduce::runOnFunction(Function &F) {
	if (skipOptnoneFunction(F))			if (skipOptnoneFunction(F))
	return false;			return false;

				TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
	DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();			DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				SE = &getAnalysis<ScalarEvolution>();
	// Traverse the dominator tree in the depth-first order. This order makes sure			// Traverse the dominator tree in the depth-first order. This order makes sure
	// all bases of a candidate are in Candidates when we process it.			// all bases of a candidate are in Candidates when we process it.
	for (auto node = GraphTraits<DominatorTree *>::nodes_begin(DT);			for (auto node = GraphTraits<DominatorTree *>::nodes_begin(DT);
	node != GraphTraits<DominatorTree *>::nodes_end(DT); ++node) {			node != GraphTraits<DominatorTree *>::nodes_end(DT); ++node) {
	BasicBlock *B = node->getBlock();			for (auto &I : *node->getBlock())
	for (auto I = B->begin(); I != B->end(); ++I) {			allocateCandidateAndFindBasis(&I);
	allocateCandidateAndFindBasis(I);
	}
	}			}

	// Rewrite candidates in the reverse depth-first order. This order makes sure			// Rewrite candidates in the reverse depth-first order. This order makes sure
	// a candidate being rewritten is not a basis for any other candidate.			// a candidate being rewritten is not a basis for any other candidate.
	while (!Candidates.empty()) {			while (!Candidates.empty()) {
	const Candidate &C = Candidates.back();			const Candidate &C = Candidates.back();
	if (C.Basis != nullptr) {			if (C.Basis != nullptr) {
	rewriteCandidateWithBasis(C, *C.Basis);			rewriteCandidateWithBasis(C, *C.Basis);
	Show All 12 Lines

llvm/trunk/test/Transforms/StraightLineStrengthReduce/X86/lit.local.cfg

				if not 'X86' in config.root.targets:
				config.unsupported = True

llvm/trunk/test/Transforms/StraightLineStrengthReduce/X86/no-slsr.ll

				; RUN: opt < %s -slsr -gvn -dce -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Do not perform SLSR on &input[s] and &input[s * 2] which fit into addressing
				; modes of X86.
				define i32 @slsr_gep(i32* %input, i64 %s) {
				; CHECK-LABEL: @slsr_gep(
				; v0 = input[0];
				%p0 = getelementptr inbounds i32, i32* %input, i64 0
				%v0 = load i32, i32* %p0

				; v1 = input[s];
				%p1 = getelementptr inbounds i32, i32* %input, i64 %s
				; CHECK: %p1 = getelementptr inbounds i32, i32* %input, i64 %s
				%v1 = load i32, i32* %p1

				; v2 = input[s * 2];
				%s2 = mul nsw i64 %s, 2
				%p2 = getelementptr inbounds i32, i32* %input, i64 %s2
				; CHECK: %p2 = getelementptr inbounds i32, i32* %input, i64 %s2
				%v2 = load i32, i32* %p2

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

llvm/trunk/test/Transforms/StraightLineStrengthReduce/slsr-gep.ll

				; RUN: opt < %s -slsr -gvn -dce -S \| FileCheck %s

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"

				define i32 @slsr_gep(i32* %input, i64 %s) {
				; CHECK-LABEL: @slsr_gep(
				; v0 = input[0];
				%p0 = getelementptr inbounds i32, i32* %input, i64 0
				%v0 = load i32, i32* %p0

				; v1 = input[s];
				%p1 = getelementptr inbounds i32, i32* %input, i64 %s
				; CHECK: %p1 = getelementptr inbounds i32, i32* %input, i64 %s
				%v1 = load i32, i32* %p1

				; v2 = input[s * 2];
				%s2 = mul nsw i64 %s, 2
				%p2 = getelementptr inbounds i32, i32* %input, i64 %s2
				; CHECK: %p2 = getelementptr inbounds i32, i32* %p1, i64 %s
				%v2 = load i32, i32* %p2

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				define i32 @slsr_gep_sext(i32* %input, i32 %s) {
				; CHECK-LABEL: @slsr_gep_sext(
				; v0 = input[0];
				%p0 = getelementptr inbounds i32, i32* %input, i64 0
				%v0 = load i32, i32* %p0

				; v1 = input[(long)s];
				%t = sext i32 %s to i64
				%p1 = getelementptr inbounds i32, i32* %input, i64 %t
				; CHECK: %p1 = getelementptr inbounds i32, i32* %input, i64 %t
				%v1 = load i32, i32* %p1

				; v2 = input[(long)(s * 2)];
				%s2 = mul nsw i32 %s, 2
				%t2 = sext i32 %s2 to i64
				%p2 = getelementptr inbounds i32, i32* %input, i64 %t2
				; CHECK: %p2 = getelementptr inbounds i32, i32* %p1, i64 %t
				%v2 = load i32, i32* %p2

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				define i32 @slsr_gep_2d([10 x [5 x i32]]* %input, i64 %s, i64 %t) {
				; CHECK-LABEL: @slsr_gep_2d(
				; v0 = input[s][t];
				%p0 = getelementptr inbounds [10 x [5 x i32]], [10 x [5 x i32]]* %input, i64 0, i64 %s, i64 %t
				%v0 = load i32, i32* %p0

				; v1 = input[s * 2][t];
				%s2 = mul nsw i64 %s, 2
				; CHECK: [[BUMP:%[a-zA-Z0-9]+]] = mul i64 %s, 5
				%p1 = getelementptr inbounds [10 x [5 x i32]], [10 x [5 x i32]]* %input, i64 0, i64 %s2, i64 %t
				; CHECK: %p1 = getelementptr inbounds i32, i32* %p0, i64 [[BUMP]]
				%v1 = load i32, i32* %p1

				; v2 = input[s * 3][t];
				%s3 = mul nsw i64 %s, 3
				%p2 = getelementptr inbounds [10 x [5 x i32]], [10 x [5 x i32]]* %input, i64 0, i64 %s3, i64 %t
				; CHECK: %p2 = getelementptr inbounds i32, i32* %p1, i64 [[BUMP]]
				%v2 = load i32, i32* %p2

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				%struct.S = type <{ i64, i32 }>

				; In this case, the bump
				; = (char )&input[s 2][t].f1 - (char *)&input[s][t].f1
				; = 60 * s
				; which may not be divisible by typeof(input[s][t].f1) = 8. Therefore, we
				; rewrite the candidates using byte offset instead of index offset as in
				; @slsr_gep_2d.
				define i64 @slsr_gep_uglygep([10 x [5 x %struct.S]]* %input, i64 %s, i64 %t) {
				; CHECK-LABEL: @slsr_gep_uglygep(
				; v0 = input[s][t].f1;
				%p0 = getelementptr inbounds [10 x [5 x %struct.S]], [10 x [5 x %struct.S]]* %input, i64 0, i64 %s, i64 %t, i32 0
				%v0 = load i64, i64* %p0

				; v1 = input[s * 2][t].f1;
				%s2 = mul nsw i64 %s, 2
				; CHECK: [[BUMP:%[a-zA-Z0-9]+]] = mul i64 %s, 60
				%p1 = getelementptr inbounds [10 x [5 x %struct.S]], [10 x [5 x %struct.S]]* %input, i64 0, i64 %s2, i64 %t, i32 0
				; CHECK: getelementptr inbounds i8, i8* %{{[0-9]+}}, i64 [[BUMP]]
				%v1 = load i64, i64* %p1

				; v2 = input[s * 3][t].f1;
				%s3 = mul nsw i64 %s, 3
				%p2 = getelementptr inbounds [10 x [5 x %struct.S]], [10 x [5 x %struct.S]]* %input, i64 0, i64 %s3, i64 %t, i32 0
				; CHECK: getelementptr inbounds i8, i8* %{{[0-9]+}}, i64 [[BUMP]]
				%v2 = load i64, i64* %p2

				; return v0 + v1 + v2;
				%1 = add i64 %v0, %v1
				%2 = add i64 %1, %v2
				ret i64 %2
				}

llvm/trunk/test/Transforms/StraightLineStrengthReduce/slsr-mul.ll

				; RUN: opt < %s -slsr -gvn -dce -S \| FileCheck %s

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"

				declare i32 @foo(i32 %a)

				define i32 @slsr1(i32 %b, i32 %s) {
				; CHECK-LABEL: @slsr1(
				; v0 = foo(b * s);
				%mul0 = mul i32 %b, %s
				; CHECK: mul i32
				; CHECK-NOT: mul i32
				%v0 = call i32 @foo(i32 %mul0)

				; v1 = foo((b + 1) * s);
				%b1 = add i32 %b, 1
				%mul1 = mul i32 %b1, %s
				%v1 = call i32 @foo(i32 %mul1)

				; v2 = foo((b + 2) * s);
				%b2 = add i32 %b, 2
				%mul2 = mul i32 %b2, %s
				%v2 = call i32 @foo(i32 %mul2)

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				; v0 = foo(a * b)
				; v1 = foo((a + 1) * b)
				; v2 = foo(a * (b + 1))
				; v3 = foo((a + 1) * (b + 1))
				define i32 @slsr2(i32 %a, i32 %b) {
				; CHECK-LABEL: @slsr2(
				%a1 = add i32 %a, 1
				%b1 = add i32 %b, 1
				%mul0 = mul i32 %a, %b
				; CHECK: mul i32
				; CHECK-NOT: mul i32
				%mul1 = mul i32 %a1, %b
				%mul2 = mul i32 %a, %b1
				%mul3 = mul i32 %a1, %b1

				%v0 = call i32 @foo(i32 %mul0)
				%v1 = call i32 @foo(i32 %mul1)
				%v2 = call i32 @foo(i32 %mul2)
				%v3 = call i32 @foo(i32 %mul3)

				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				%3 = add i32 %2, %v3
				ret i32 %3
				}

				; The bump is a multiple of the stride.
				;
				; v0 = foo(b * s);
				; v1 = foo((b + 2) * s);
				; v2 = foo((b + 4) * s);
				; return v0 + v1 + v2;
				;
				; ==>
				;
				; mul0 = b * s;
				; v0 = foo(mul0);
				; bump = s * 2;
				; mul1 = mul0 + bump; // GVN ensures mul1 and mul2 use the same bump.
				; v1 = foo(mul1);
				; mul2 = mul1 + bump;
				; v2 = foo(mul2);
				; return v0 + v1 + v2;
				define i32 @slsr3(i32 %b, i32 %s) {
				; CHECK-LABEL: @slsr3(
				%mul0 = mul i32 %b, %s
				; CHECK: mul i32
				%v0 = call i32 @foo(i32 %mul0)

				%b1 = add i32 %b, 2
				%mul1 = mul i32 %b1, %s
				; CHECK: [[BUMP:%[a-zA-Z0-9]+]] = mul i32 %s, 2
				; CHECK: %mul1 = add i32 %mul0, [[BUMP]]
				%v1 = call i32 @foo(i32 %mul1)

				%b2 = add i32 %b, 4
				%mul2 = mul i32 %b2, %s
				; CHECK: %mul2 = add i32 %mul1, [[BUMP]]
				%v2 = call i32 @foo(i32 %mul2)

				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				; Do not rewrite a candidate if its potential basis does not dominate it.
				; v0 = 0;
				; if (cond)
				; v0 = foo(a * b);
				; v1 = foo((a + 1) * b);
				; return v0 + v1;
				define i32 @not_dominate(i1 %cond, i32 %a, i32 %b) {
				; CHECK-LABEL: @not_dominate(
				entry:
				%a1 = add i32 %a, 1
				br i1 %cond, label %then, label %merge

				then:
				%mul0 = mul i32 %a, %b
				; CHECK: %mul0 = mul i32 %a, %b
				%v0 = call i32 @foo(i32 %mul0)
				br label %merge

				merge:
				%v0.phi = phi i32 [ 0, %entry ], [ %mul0, %then ]
				%mul1 = mul i32 %a1, %b
				; CHECK: %mul1 = mul i32 %a1, %b
				%v1 = call i32 @foo(i32 %mul1)
				%sum = add i32 %v0.phi, %v1
				ret i32 %sum
				}

llvm/trunk/test/Transforms/StraightLineStrengthReduce/slsr.ll

	; RUN: opt < %s -slsr -gvn -dce -S \| FileCheck %s

	declare i32 @foo(i32 %a)

	define i32 @slsr1(i32 %b, i32 %s) {
	; CHECK-LABEL: @slsr1(
	; v0 = foo(b * s);
	%mul0 = mul i32 %b, %s
	; CHECK: mul i32
	; CHECK-NOT: mul i32
	%v0 = call i32 @foo(i32 %mul0)

	; v1 = foo((b + 1) * s);
	%b1 = add i32 %b, 1
	%mul1 = mul i32 %b1, %s
	%v1 = call i32 @foo(i32 %mul1)

	; v2 = foo((b + 2) * s);
	%b2 = add i32 %b, 2
	%mul2 = mul i32 %b2, %s
	%v2 = call i32 @foo(i32 %mul2)

	; return v0 + v1 + v2;
	%1 = add i32 %v0, %v1
	%2 = add i32 %1, %v2
	ret i32 %2
	}

	; v0 = foo(a * b)
	; v1 = foo((a + 1) * b)
	; v2 = foo(a * (b + 1))
	; v3 = foo((a + 1) * (b + 1))
	define i32 @slsr2(i32 %a, i32 %b) {
	; CHECK-LABEL: @slsr2(
	%a1 = add i32 %a, 1
	%b1 = add i32 %b, 1
	%mul0 = mul i32 %a, %b
	; CHECK: mul i32
	; CHECK-NOT: mul i32
	%mul1 = mul i32 %a1, %b
	%mul2 = mul i32 %a, %b1
	%mul3 = mul i32 %a1, %b1

	%v0 = call i32 @foo(i32 %mul0)
	%v1 = call i32 @foo(i32 %mul1)
	%v2 = call i32 @foo(i32 %mul2)
	%v3 = call i32 @foo(i32 %mul3)

	%1 = add i32 %v0, %v1
	%2 = add i32 %1, %v2
	%3 = add i32 %2, %v3
	ret i32 %3
	}

	; The bump is a multiple of the stride.
	;
	; v0 = foo(b * s);
	; v1 = foo((b + 2) * s);
	; v2 = foo((b + 4) * s);
	; return v0 + v1 + v2;
	;
	; ==>
	;
	; mul0 = b * s;
	; v0 = foo(mul0);
	; bump = s * 2;
	; mul1 = mul0 + bump; // GVN ensures mul1 and mul2 use the same bump.
	; v1 = foo(mul1);
	; mul2 = mul1 + bump;
	; v2 = foo(mul2);
	; return v0 + v1 + v2;
	define i32 @slsr3(i32 %b, i32 %s) {
	; CHECK-LABEL: @slsr3(
	%mul0 = mul i32 %b, %s
	; CHECK: mul i32
	%v0 = call i32 @foo(i32 %mul0)

	%b1 = add i32 %b, 2
	%mul1 = mul i32 %b1, %s
	; CHECK: [[BUMP:%[a-zA-Z0-9]+]] = mul i32 %s, 2
	; CHECK: %mul1 = add i32 %mul0, [[BUMP]]
	%v1 = call i32 @foo(i32 %mul1)

	%b2 = add i32 %b, 4
	%mul2 = mul i32 %b2, %s
	; CHECK: %mul2 = add i32 %mul1, [[BUMP]]
	%v2 = call i32 @foo(i32 %mul2)

	%1 = add i32 %v0, %v1
	%2 = add i32 %1, %v2
	ret i32 %2
	}

	; Do not rewrite a candidate if its potential basis does not dominate it.
	; v0 = 0;
	; if (cond)
	; v0 = foo(a * b);
	; v1 = foo((a + 1) * b);
	; return v0 + v1;
	define i32 @not_dominate(i1 %cond, i32 %a, i32 %b) {
	; CHECK-LABEL: @not_dominate(
	entry:
	%a1 = add i32 %a, 1
	br i1 %cond, label %then, label %merge

	then:
	%mul0 = mul i32 %a, %b
	; CHECK: %mul0 = mul i32 %a, %b
	%v0 = call i32 @foo(i32 %mul0)
	br label %merge

	merge:
	%v0.phi = phi i32 [ 0, %entry ], [ %mul0, %then ]
	%mul1 = mul i32 %a1, %b
	; CHECK: %mul1 = mul i32 %a1, %b
	%v1 = call i32 @foo(i32 %mul1)
	%sum = add i32 %v0.phi, %v1
	ret i32 %sum
	}