This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
-
Scalar.h
-
lib/
-
Target/NVPTX/
-
NVPTX/
-
NVPTXTargetMachine.cpp
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
-
Scalar.cpp
8
StraightLineStrengthReduce.cpp
-
test/Transforms/StraightLineStrengthReduce/
-
Transforms/
-
StraightLineStrengthReduce/
-
slsr.ll

Differential D7310

Add straight-line strength reduction to LLVM
ClosedPublic

Authored by jingyue on Jan 30 2015, 5:13 PM.

Download Raw Diff

Details

Reviewers

jholewinski
• HaoLiu
eliben
atrick
meheff
hfinkel

Commits

rGd7966ff3b948: Add straight-line strength reduction to LLVM
rL228016: Add straight-line strength reduction to LLVM

Summary

Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

#pragma unroll
foo (int i = 0; i < 3; ++i) {
  sum += foo((b + i) * s);
}

into

sum += foo(b * s);
sum += foo((b + 1) * s);
sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

b * s
(b + 1) * s
(b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

t1 = b * s
t2 = t1 + s
t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Diff Detail

Event Timeline

jingyue updated this revision to Diff 19075.Jan 30 2015, 5:13 PM

jingyue retitled this revision from to Add straight-line strength reduction to LLVM.

jingyue updated this object.

jingyue edited the test plan for this revision. (Show Details)

jingyue added reviewers: hfinkel, meheff, eliben, • HaoLiu, jholewinski.

jingyue added a subscriber: Unknown Object (MLST).

Herald added a subscriber: jholewinski. · View Herald TranscriptJan 30 2015, 5:13 PM

FYI, the link to the GCC's implementation of SLSR is https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/gimple-ssa-strength-reduction.c?view=markup.

hfinkel added a reviewer: atrick.Jan 31 2015, 12:23 AM

Thanks for working on this! We need to be careful here, however, because:

We don't want to turn "free" computations into non-free computations (because we remove the ability to do address-mode folding). LoopStrengthReduce uses part of the TTI interface for dealing with target addressing modes, and we should do that here too.
We don't want to lengthen the critical path by unnecessarily decreasing the amount of available ILP. We don't currently have a good interface for asking a target how much ILP is available for simple integer operations (we have one for vectorization interleaving factors, which is similar in spirit, but on several targets with which I work, use a different number than would be appropriate here).

Regarding (2), we could also decide the long chain is the canonical form, and targets should split these late for ILP. I've not given this a lot of thought, and so I'm not sure how easy or hard that might be. Using the MachineCombiner might be a good place to do such splitting.

What do you think?

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
12	We don't need to mention GCC here.
183	You can remove the "TODO" part of this. There are a lot of things in LLVM that only handle the canonical operand ordering.
211	If (!C.Ins->getParent()) is fine. Also no { } needed.

Hal, thanks for the comments!

(1) Given the cases I consider so far, I am not aware of any target that can fold (B + i) * S where B and S are variables into an addressing mode. I don't think such form can even fit into LLVM's AddrMode struct. However, once we start considering more cases (e.g., B + i * S), we should definitely check for addressing modes. Does this make sense?

(2) Thanks for pointing this out. I hadn't considered this issue when preparing this diff. I now understand your concern, but don't have a good solution yet. I'll investigate how LSR handles this issue and how MachineCombiner works to get a better picture. Any suggestions from other folks are appreciated!

Jingyue

karthikthecool added a subscriber: karthikthecool.Feb 2 2015, 1:47 AM

Mainly, I would like to reiterate both of Hal's very good points above.

Out of curiosity, can you explain what sort of real-world source code results in this IR pattern?

You should keep in mind, that this level of optimization is currently the job of LSR. If you're doing this within a loop the IVChain part of LSR will try to do something similar.

I only reviewed it quickly, but your implementation looks good. Thank you for putting a bound on the number of candidate checks. I hate finding those quadratic behaviors later when compile time blows up.

LGTM given that it's not enabled in the standard pipeline yet.

This revision is now accepted and ready to land.Feb 2 2015, 2:56 PM

Hi Andrew,

The code example in my file commentary

#pragma unroll
foo (int i = 0; i < 3; ++i) {
  sum += foo((b + i) * s);
}

pretty much reflects real-world source code. More often, I see (b + i) * s used as an array index which creates the same issue.

LSR cannot yet solve the problem I'm facing here due to phase ordering. Loop unrolling, which happens way before LSR, destroys the loop structure. I agree with you that, if the loop is not unrolled, LSR should take care of this pattern.

LGTM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
167	Since there can be more than one candidate would there ever be any advantage to looking deeper in the list? Some bases could be cheaper (eg, where i' - i == 1 and you don't need a multiply). However, I'd guess in the common unrolled case the latest candidate on the list is best. On the other hand though, from an ILP perspective the earliest candidate might be best.

Disable SLSR if the target schedules for ILP.

Also, added one more test where the bump is a multiple of S.

I don't like piggybacking on the scheduling mode. When a target selects the scheduler it doesn't expect to get different IR out of the optimizer. Don't surprise users.

It would be fine to add a separate hook for this pass. But I don't think it's an issue until the pass goes into the standard pipeline.

This revision now requires changes to proceed.Feb 2 2015, 10:20 PM

jingyue added inline comments.Feb 2 2015, 10:25 PM

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
12	done
167	We may want to improve the strategy in the future. The patterns I have seen so far are derived from loop unrolling, and in those cases choosing the latest candidate very likely produces the best code. Also note that in the cases where the value grows by a multiple of the stride, e.g., b * s (b + 2) * s (b + 4) * s choosing the latest candidate leads to a uniform bump (`2 * s` in this example) for every iteration, and this bump can be GVN'ed later.
183	done
211	done

stop piggybacking scheduling info

Andrew, thanks for pointing this out! I reverted my change on using scheduling info, and left a TODO in the file comment.

lgtm

This revision is now accepted and ready to land.Feb 2 2015, 10:44 PM

LGTM! Thanks for working on this!

Minor changes, NFC:
Merge from upstream
Do not put usernames in TODO

Hal, do you think this patch is good to go, given we don't push it to the standard pipeline yet? I left a TODO on the ILP issue you raised. I like your point a lot, but I feel I should do it in a separate diff.

In D7310#117774, @jingyue wrote:

Hal, do you think this patch is good to go, given we don't push it to the standard pipeline yet? I left a TODO on the ILP issue you raised. I like your point a lot, but I feel I should do it in a separate diff.

Yes, that's fine with me. Let's iterate in-tree. Please proceed.

jingyue closed this revision.Feb 3 2015, 11:38 AM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

3 lines

lib/

Target/

NVPTX/

NVPTXTargetMachine.cpp

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

Scalar.cpp

1 line

StraightLineStrengthReduce.cpp

284 lines

test/

Transforms/

StraightLineStrengthReduce/

slsr.ll

119 lines

Diff 19211

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines
	void initializeSingleLoopExtractorPass(PassRegistry&);			void initializeSingleLoopExtractorPass(PassRegistry&);
	void initializeSinkingPass(PassRegistry&);			void initializeSinkingPass(PassRegistry&);
	void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);			void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);
	void initializeSlotIndexesPass(PassRegistry&);			void initializeSlotIndexesPass(PassRegistry&);
	void initializeSpillPlacementPass(PassRegistry&);			void initializeSpillPlacementPass(PassRegistry&);
	void initializeStackProtectorPass(PassRegistry&);			void initializeStackProtectorPass(PassRegistry&);
	void initializeStackColoringPass(PassRegistry&);			void initializeStackColoringPass(PassRegistry&);
	void initializeStackSlotColoringPass(PassRegistry&);			void initializeStackSlotColoringPass(PassRegistry&);
				void initializeStraightLineStrengthReducePass(PassRegistry &);
	void initializeStripDeadDebugInfoPass(PassRegistry&);			void initializeStripDeadDebugInfoPass(PassRegistry&);
	void initializeStripDeadPrototypesPassPass(PassRegistry&);			void initializeStripDeadPrototypesPassPass(PassRegistry&);
	void initializeStripDebugDeclarePass(PassRegistry&);			void initializeStripDebugDeclarePass(PassRegistry&);
	void initializeStripNonDebugSymbolsPass(PassRegistry&);			void initializeStripNonDebugSymbolsPass(PassRegistry&);
	void initializeStripSymbolsPass(PassRegistry&);			void initializeStripSymbolsPass(PassRegistry&);
	void initializeTailCallElimPass(PassRegistry&);			void initializeTailCallElimPass(PassRegistry&);
	void initializeTailDuplicatePassPass(PassRegistry&);			void initializeTailDuplicatePassPass(PassRegistry&);
	void initializeTargetPassConfigPass(PassRegistry&);			void initializeTargetPassConfigPass(PassRegistry&);
	Show All 29 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createInstructionSimplifierPass();		(void) llvm::createInstructionSimplifierPass();
(void) llvm::createLoopVectorizePass();		(void) llvm::createLoopVectorizePass();
(void) llvm::createSLPVectorizerPass();		(void) llvm::createSLPVectorizerPass();
(void) llvm::createBBVectorizePass();		(void) llvm::createBBVectorizePass();
(void) llvm::createPartiallyInlineLibCallsPass();		(void) llvm::createPartiallyInlineLibCallsPass();
(void) llvm::createScalarizerPass();		(void) llvm::createScalarizerPass();
(void) llvm::createSeparateConstOffsetFromGEPPass();		(void) llvm::createSeparateConstOffsetFromGEPPass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
		(void) llvm::createStraightLineStrengthReducePass();

(void)new llvm::IntervalPartition();		(void)new llvm::IntervalPartition();
(void)new llvm::ScalarEvolution();		(void)new llvm::ScalarEvolution();
((llvm::Function*)nullptr)->viewCFGOnly();		((llvm::Function*)nullptr)->viewCFGOnly();
llvm::RGPassManager RGM;		llvm::RGPassManager RGM;
((llvm::RegionPass)nullptr)->runOnRegion((llvm::Region)nullptr, RGM);		((llvm::RegionPass)nullptr)->runOnRegion((llvm::Region)nullptr, RGM);
llvm::AliasSetTracker X((llvm::AliasAnalysis)nullptr);		llvm::AliasSetTracker X((llvm::AliasAnalysis)nullptr);
X.add(nullptr, 0, llvm::AAMDNodes()); // for -print-alias-sets		X.add(nullptr, 0, llvm::AAMDNodes()); // for -print-alias-sets
}		}
} ForcePassLinking; // Force link by creating a global definition.		} ForcePassLinking; // Force link by creating a global definition.
}		}

#endif		#endif

include/llvm/Transforms/Scalar.h

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	createSeparateConstOffsetFromGEPPass(const TargetMachine *TM = nullptr,
bool LowerGEP = false);		bool LowerGEP = false);

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// LoadCombine - Combine loads into bigger loads.		// LoadCombine - Combine loads into bigger loads.
//		//
BasicBlockPass *createLoadCombinePass();		BasicBlockPass *createLoadCombinePass();

		FunctionPass *
		createStraightLineStrengthReducePass(const TargetMachine *TM = nullptr);

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/NVPTX/NVPTXTargetMachine.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	void NVPTXPassConfig::addIRPasses() {
disablePass(&BranchFolderPassID);		disablePass(&BranchFolderPassID);
disablePass(&TailDuplicateID);		disablePass(&TailDuplicateID);

addPass(createNVPTXImageOptimizerPass());		addPass(createNVPTXImageOptimizerPass());
TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();
addPass(createNVPTXAssignValidGlobalNamesPass());		addPass(createNVPTXAssignValidGlobalNamesPass());
addPass(createGenericToNVVMPass());		addPass(createGenericToNVVMPass());
addPass(createNVPTXFavorNonGenericAddrSpacesPass());		addPass(createNVPTXFavorNonGenericAddrSpacesPass());
		addPass(createStraightLineStrengthReducePass(TM));
addPass(createSeparateConstOffsetFromGEPPass());		addPass(createSeparateConstOffsetFromGEPPass());
// The SeparateConstOffsetFromGEP pass creates variadic bases that can be used		// The SeparateConstOffsetFromGEP pass creates variadic bases that can be used
// by multiple GEPs. Run GVN or EarlyCSE to really reuse them. GVN generates		// by multiple GEPs. Run GVN or EarlyCSE to really reuse them. GVN generates
// significantly better code than EarlyCSE for some of our benchmarks.		// significantly better code than EarlyCSE for some of our benchmarks.
if (getOptLevel() == CodeGenOpt::Aggressive)		if (getOptLevel() == CodeGenOpt::Aggressive)
addPass(createGVNPass());		addPass(createGVNPass());
else		else
addPass(createEarlyCSEPass());		addPass(createEarlyCSEPass());
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 32 Lines	add_llvm_library(LLVMScalarOpts
SROA.cpp		SROA.cpp
SampleProfile.cpp		SampleProfile.cpp
Scalar.cpp		Scalar.cpp
ScalarReplAggregates.cpp		ScalarReplAggregates.cpp
Scalarizer.cpp		Scalarizer.cpp
SeparateConstOffsetFromGEP.cpp		SeparateConstOffsetFromGEP.cpp
SimplifyCFGPass.cpp		SimplifyCFGPass.cpp
Sink.cpp		Sink.cpp
		StraightLineStrengthReduce.cpp
StructurizeCFG.cpp		StructurizeCFG.cpp
TailRecursionElimination.cpp		TailRecursionElimination.cpp
)		)

add_dependencies(LLVMScalarOpts intrinsics_gen)		add_dependencies(LLVMScalarOpts intrinsics_gen)

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeSROAPass(Registry);		initializeSROAPass(Registry);
initializeSROA_DTPass(Registry);		initializeSROA_DTPass(Registry);
initializeSROA_SSAUpPass(Registry);		initializeSROA_SSAUpPass(Registry);
initializeCFGSimplifyPassPass(Registry);		initializeCFGSimplifyPassPass(Registry);
initializeStructurizeCFGPass(Registry);		initializeStructurizeCFGPass(Registry);
initializeSinkingPass(Registry);		initializeSinkingPass(Registry);
initializeTailCallElimPass(Registry);		initializeTailCallElimPass(Registry);
initializeSeparateConstOffsetFromGEPPass(Registry);		initializeSeparateConstOffsetFromGEPPass(Registry);
		initializeStraightLineStrengthReducePass(Registry);
initializeLoadCombinePass(Registry);		initializeLoadCombinePass(Registry);
}		}

void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {		void LLVMInitializeScalarOpts(LLVMPassRegistryRef R) {
initializeScalarOpts(*unwrap(R));		initializeScalarOpts(*unwrap(R));
}		}

void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {		void LLVMAddAggressiveDCEPass(LLVMPassManagerRef PM) {
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

This file was added.

				//===-- StraightLineStrengthReduce.cpp - ------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements straight-line strength reduction (SLSR). Unlike loop
				// strength reduction, this algorithm is designed to reduce arithmetic
				// redundancy in straight-line code instead of loops. It has proven to be
				hfinkelUnsubmitted Not Done Reply Inline Actions We don't need to mention GCC here. hfinkel: We don't need to mention GCC here.
				jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
				// effective in simplifying arithmetic statements derived from an unrolled loop.
				// It can also simplify the logic of SeparateConstOffsetFromGEP.
				//
				// There are many optimizations we can perform in the domain of SLSR. This file
				// for now contains only an initial step. Specifically, we look for strength
				// reduction candidate in the form of
				//
				// (B + i) * S
				//
				// where B and S are integer constants or variables, and i is a constant
				// integer. If we found two such candidates
				//
				// S1: X = (B + i) * S S2: Y = (B + i') * S
				//
				// and S1 dominates S2, we call S1 a basis of S2, and can replace S2 with
				//
				// Y = X + (i' - i) * S
				//
				// where (i' - i) * S is folded to the extent possible. When S2 has multiple
				// bases, we pick the one that is closest to S2, or S2's "immediate" basis.
				//
				// TODO(jingyue):
				//
				// - Handle candidates in the form of B + i * S
				//
				// - Handle candidates in the form of pointer arithmetics. e.g., B[i * S]
				//
				// - Floating point arithmetics when fast math is enabled.
				#include <vector>

				#include "llvm/ADT/DenseSet.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetLowering.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				#include "llvm/Transforms/Scalar.h"

				using namespace llvm;
				using namespace PatternMatch;

				namespace {

				class StraightLineStrengthReduce : public FunctionPass {
				public:
				// SLSR candidate. Such a candidate must be in the form of
				// (Base + Index) * Stride
				struct Candidate : public ilist_node<Candidate> {
				Candidate(Value B = nullptr, ConstantInt Idx = nullptr,
				Value S = nullptr, Instruction I = nullptr)
				: Base(B), Index(Idx), Stride(S), Ins(I), Basis(nullptr) {}
				Value *Base;
				ConstantInt *Index;
				Value *Stride;
				// The instruction this candidate corresponds to. It helps us to rewrite a
				// candidate with respect to its immediate basis. Note that one instruction
				// can corresponds to multiple candidates depending on how you associate the
				// expression. For instance,
				//
				// (a + 1) * (b + 2)
				//
				// can be treated as
				//
				// <Base: a, Index: 1, Stride: b + 2>
				//
				// or
				//
				// <Base: b, Index: 2, Stride: a + 1>
				Instruction *Ins;
				// Points to the immediate basis of this candidate, or nullptr if we cannot
				// find any basis for this candidate.
				Candidate *Basis;
				};

				static char ID;

				StraightLineStrengthReduce(const TargetMachine *TM = nullptr)
				: FunctionPass(ID), TM(TM), DT(nullptr) {
				initializeStraightLineStrengthReducePass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				// We do not modify the shape of the CFG.
				AU.setPreservesCFG();
				}

				bool runOnFunction(Function &F) override;

				private:
				// Returns true if Basis is a basis for C, i.e., Basis dominates C and they
				// share the same base and stride.
				bool isBasisFor(const Candidate &Basis, const Candidate &C);
				// Checks whether I is in a candidate form. If so, adds all the matching forms
				// to Candidates, and tries to find the immediate basis for each of them.
				void allocateCandidateAndFindBasis(Instruction *I);
				// Given that I is in the form of "(B + Idx) * S", adds this form to
				// Candidates, and finds its immediate basis.
				void allocateCandidateAndFindBasis(Value B, ConstantInt Idx, Value *S,
				Instruction *I);
				// Rewrites candidate C with respect to Basis.
				void rewriteCandidateWithBasis(const Candidate &C, const Candidate &Basis);

				const TargetMachine *TM;
				DominatorTree *DT;
				ilist<Candidate> Candidates;
				// Temporarily holds all instructions that are unlinked (but not deleted) by
				// rewriteCandidateWithBasis. These instructions will be actually removed
				// after all rewriting finishes.
				DenseSet<Instruction *> UnlinkedInstructions;
				};
				} // anonymous namespace

				char StraightLineStrengthReduce::ID = 0;
				INITIALIZE_PASS_BEGIN(StraightLineStrengthReduce, "slsr",
				"Straight line strength reduction", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(StraightLineStrengthReduce, "slsr",
				"Straight line strength reduction", false, false)

				FunctionPass *
				llvm::createStraightLineStrengthReducePass(const TargetMachine *TM) {
				return new StraightLineStrengthReduce(TM);
				}

				bool StraightLineStrengthReduce::isBasisFor(const Candidate &Basis,
				const Candidate &C) {
				return (Basis.Ins != C.Ins && // skip the same instruction
				// Basis must dominate C in order to rewrite C with respect to Basis.
				DT->dominates(Basis.Ins->getParent(), C.Ins->getParent()) &&
				// They share the same base and stride.
				Basis.Base == C.Base &&
				Basis.Stride == C.Stride);
				}

				// TODO(jingyue): We currently implement an algorithm whose time complexity is
				// linear to the number of existing candidates. However, a better algorithm
				// exists. We could depth-first search the dominator tree, and maintain a hash
				// table that contains all candidates that dominate the node being traversed.
				// This hash table is indexed by the base and the stride of a candidate.
				// Therefore, finding the immediate basis of a candidate boils down to one
				// hash-table look up.
				void StraightLineStrengthReduce::allocateCandidateAndFindBasis(Value *B,
				ConstantInt *Idx,
				Value *S,
				Instruction *I) {
				Candidate C(B, Idx, S, I);
				// Try to compute the immediate basis of C.
				unsigned NumIterations = 0;
				// Limit the scan radius to avoid running forever.
				static const int MaxNumIterations = 50;
				for (auto Basis = Candidates.rbegin();
				meheffUnsubmitted Not Done Reply Inline Actions Since there can be more than one candidate would there ever be any advantage to looking deeper in the list? Some bases could be cheaper (eg, where i' - i == 1 and you don't need a multiply). However, I'd guess in the common unrolled case the latest candidate on the list is best. On the other hand though, from an ILP perspective the earliest candidate might be best. meheff: Since there can be more than one candidate would there ever be any advantage to looking deeper…
				jingyueAuthorUnsubmitted Not Done Reply Inline Actions We may want to improve the strategy in the future. The patterns I have seen so far are derived from loop unrolling, and in those cases choosing the latest candidate very likely produces the best code. Also note that in the cases where the value grows by a multiple of the stride, e.g., b * s (b + 2) * s (b + 4) * s choosing the latest candidate leads to a uniform bump (`2 * s` in this example) for every iteration, and this bump can be GVN'ed later. jingyue: We may want to improve the strategy in the future. The patterns I have seen so far are derived…
				Basis != Candidates.rend() && NumIterations < MaxNumIterations;
				++Basis, ++NumIterations) {
				if (isBasisFor(*Basis, C)) {
				C.Basis = &(*Basis);
				break;
				}
				}
				// Regardless of whether we find a basis for C, we need to push C to the
				// candidate list.
				Candidates.push_back(C);
				}

				void StraightLineStrengthReduce::allocateCandidateAndFindBasis(Instruction *I) {
				Value *B = nullptr;
				ConstantInt *Idx = nullptr;
				// "(Base + Index) * Stride" must be a Mul instruction at the first hand.
				hfinkelUnsubmitted Not Done Reply Inline Actions You can remove the "TODO" part of this. There are a lot of things in LLVM that only handle the canonical operand ordering. hfinkel: You can remove the "TODO" part of this. There are a lot of things in LLVM that only handle the…
				jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
				if (I->getOpcode() == Instruction::Mul) {
				if (IntegerType *ITy = dyn_cast<IntegerType>(I->getType())) {
				Value LHS = I->getOperand(0), RHS = I->getOperand(1);
				for (unsigned Swapped = 0; Swapped < 2; ++Swapped) {
				// Only handle the canonical operand ordering.
				if (match(LHS, m_Add(m_Value(B), m_ConstantInt(Idx)))) {
				// If LHS is in the form of "Base + Index", then I is in the form of
				// "(Base + Index) * RHS".
				allocateCandidateAndFindBasis(B, Idx, RHS, I);
				} else {
				// Otherwise, at least try the form (LHS + 0) * RHS.
				allocateCandidateAndFindBasis(LHS, ConstantInt::get(ITy, 0), RHS, I);
				}
				// Swap LHS and RHS so that we also cover the cases where LHS is the
				// stride.
				if (LHS == RHS)
				break;
				std::swap(LHS, RHS);
				}
				}
				}
				}

				void StraightLineStrengthReduce::rewriteCandidateWithBasis(
				const Candidate &C, const Candidate &Basis) {
				// An instruction can correspond to multiple candidates. Therefore, instead of
				// simply deleting an instruction when we rewrite it, we mark its parent as
				// nullptr (i.e. unlink it) so that we can skip the candidates whose
				hfinkelUnsubmitted Not Done Reply Inline Actions If (!C.Ins->getParent()) is fine. Also no { } needed. hfinkel: If (!C.Ins->getParent()) is fine. Also no { } needed.
				jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
				// instruction is already rewritten.
				if (!C.Ins->getParent())
				return;
				assert(C.Base == Basis.Base && C.Stride == Basis.Stride);
				// Basis = (B + i) * S
				// C = (B + i') * S
				// ==>
				// C = Basis + (i' - i) * S
				IRBuilder<> Builder(C.Ins);
				ConstantInt *IndexOffset = ConstantInt::get(
				C.Ins->getContext(), C.Index->getValue() - Basis.Index->getValue());
				Value *Reduced;
				// TODO: preserve nsw/nuw in some cases.
				if (IndexOffset->isOne()) {
				// If (i' - i) is 1, fold C into Basis + S.
				Reduced = Builder.CreateAdd(Basis.Ins, C.Stride);
				} else if (IndexOffset->isMinusOne()) {
				// If (i' - i) is -1, fold C into Basis - S.
				Reduced = Builder.CreateSub(Basis.Ins, C.Stride);
				} else {
				Value *Bump = Builder.CreateMul(C.Stride, IndexOffset);
				Reduced = Builder.CreateAdd(Basis.Ins, Bump);
				}
				Reduced->takeName(C.Ins);
				C.Ins->replaceAllUsesWith(Reduced);
				C.Ins->dropAllReferences();
				// Unlink C.Ins so that we can skip other candidates also corresponding to
				// C.Ins. The actual deletion is postponed to the end of runOnFunction.
				C.Ins->removeFromParent();
				UnlinkedInstructions.insert(C.Ins);
				}

				bool StraightLineStrengthReduce::runOnFunction(Function &F) {
				if (skipOptnoneFunction(F))
				return false;

				// SLSR may decrease ILP. Therefore, we disable it when the target schedules
				// for ILP.
				if (TM) {
				const TargetLowering *TL = TM->getSubtargetImpl(F)->getTargetLowering();
				if (!TL \|\| TL->getSchedulingPreference() == Sched::ILP)
				return false;
				}

				DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				// Traverse the dominator tree in the depth-first order. This order makes sure
				// all bases of a candidate are in Candidates when we process it.
				for (auto node = GraphTraits<DominatorTree *>::nodes_begin(DT);
				node != GraphTraits<DominatorTree *>::nodes_end(DT); ++node) {
				BasicBlock *B = node->getBlock();
				for (auto I = B->begin(); I != B->end(); ++I) {
				allocateCandidateAndFindBasis(I);
				}
				}

				// Rewrite candidates in the reverse depth-first order. This order makes sure
				// a candidate being rewritten is not a basis for any other candidate.
				while (!Candidates.empty()) {
				const Candidate &C = Candidates.back();
				if (C.Basis != nullptr) {
				rewriteCandidateWithBasis(C, *C.Basis);
				}
				Candidates.pop_back();
				}

				// Delete all unlink instructions.
				for (auto I : UnlinkedInstructions) {
				delete I;
				}
				bool Ret = !UnlinkedInstructions.empty();
				UnlinkedInstructions.clear();
				return Ret;
				}

test/Transforms/StraightLineStrengthReduce/slsr.ll

This file was added.

				; RUN: opt < %s -slsr -gvn -dce -S \| FileCheck %s

				declare i32 @foo(i32 %a)

				define i32 @slsr1(i32 %b, i32 %s) {
				; CHECK-LABEL: @slsr1(
				; v0 = foo(b * s);
				%mul0 = mul i32 %b, %s
				; CHECK: mul i32
				; CHECK-NOT: mul i32
				%v0 = call i32 @foo(i32 %mul0)

				; v1 = foo((b + 1) * s);
				%b1 = add i32 %b, 1
				%mul1 = mul i32 %b1, %s
				%v1 = call i32 @foo(i32 %mul1)

				; v2 = foo((b + 2) * s);
				%b2 = add i32 %b, 2
				%mul2 = mul i32 %b2, %s
				%v2 = call i32 @foo(i32 %mul2)

				; return v0 + v1 + v2;
				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				; v0 = foo(a * b)
				; v1 = foo((a + 1) * b)
				; v2 = foo(a * (b + 1))
				; v3 = foo((a + 1) * (b + 1))
				define i32 @slsr2(i32 %a, i32 %b) {
				; CHECK-LABEL: @slsr2(
				%a1 = add i32 %a, 1
				%b1 = add i32 %b, 1
				%mul0 = mul i32 %a, %b
				; CHECK: mul i32
				; CHECK-NOT: mul i32
				%mul1 = mul i32 %a1, %b
				%mul2 = mul i32 %a, %b1
				%mul3 = mul i32 %a1, %b1

				%v0 = call i32 @foo(i32 %mul0)
				%v1 = call i32 @foo(i32 %mul1)
				%v2 = call i32 @foo(i32 %mul2)
				%v3 = call i32 @foo(i32 %mul3)

				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				%3 = add i32 %2, %v3
				ret i32 %3
				}

				; The bump is a multiple of the stride.
				;
				; v0 = foo(b * s);
				; v1 = foo((b + 2) * s);
				; v2 = foo((b + 4) * s);
				; return v0 + v1 + v2;
				;
				; ==>
				;
				; mul0 = b * s;
				; v0 = foo(mul0);
				; bump = s * 2;
				; mul1 = mul0 + bump; // GVN ensures mul1 and mul2 use the same bump.
				; v1 = foo(mul1);
				; mul2 = mul1 + bump;
				; v2 = foo(mul2);
				; return v0 + v1 + v2;
				define i32 @slsr3(i32 %b, i32 %s) {
				; CHECK-LABEL: @slsr3(
				%mul0 = mul i32 %b, %s
				; CHECK: mul i32
				%v0 = call i32 @foo(i32 %mul0)

				%b1 = add i32 %b, 2
				%mul1 = mul i32 %b1, %s
				; CHECK: [[BUMP:%[a-zA-Z0-9]+]] = mul i32 %s, 2
				; CHECK: %mul1 = add i32 %mul0, [[BUMP]]
				%v1 = call i32 @foo(i32 %mul1)

				%b2 = add i32 %b, 4
				%mul2 = mul i32 %b2, %s
				; CHECK: %mul2 = add i32 %mul1, [[BUMP]]
				%v2 = call i32 @foo(i32 %mul2)

				%1 = add i32 %v0, %v1
				%2 = add i32 %1, %v2
				ret i32 %2
				}

				; Do not rewrite a candidate if its potential basis does not dominate it.
				; v0 = 0;
				; if (cond)
				; v0 = foo(a * b);
				; v1 = foo((a + 1) * b);
				; return v0 + v1;
				define i32 @not_dominate(i1 %cond, i32 %a, i32 %b) {
				; CHECK-LABEL: @not_dominate(
				entry:
				%a1 = add i32 %a, 1
				br i1 %cond, label %then, label %merge

				then:
				%mul0 = mul i32 %a, %b
				; CHECK: %mul0 = mul i32 %a, %b
				%v0 = call i32 @foo(i32 %mul0)
				br label %merge

				merge:
				%v0.phi = phi i32 [ 0, %entry ], [ %mul0, %then ]
				%mul1 = mul i32 %a1, %b
				; CHECK: %mul1 = mul i32 %a1, %b
				%v1 = call i32 @foo(i32 %mul1)
				%sum = add i32 %v0.phi, %v1
				ret i32 %sum
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add straight-line strength reduction to LLVMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 19211

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/Target/NVPTX/NVPTXTargetMachine.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/Scalar.cpp

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp

test/Transforms/StraightLineStrengthReduce/slsr.ll

Add straight-line strength reduction to LLVM
ClosedPublic