This is an archive of the discontinued LLVM Phabricator instance.

[LIR] re-enable generation of memmove with runtime checks
Needs ReviewPublic

Authored by sebpop on Feb 21 2017, 2:15 PM.

Download Raw Diff

Details

Reviewers

chandlerc
eli.friedman
mehdi_amini
kparzysz
hfinkel

Summary

This patch fixes https://bugs.llvm.org//show_bug.cgi?id=31391

Last time memmove was enabled, LIR was using the memory data dependence static analysis to disambiguate the use of memmove().
Those patches were backed out because the dependence analysis is unsafe, and it still contains the same errors as of today.
This patch avoids using the static analysis, and instead versions the loop conditional to whether memmove is legal.

A next patch will add support to generate the runtime checks for memset as described in https://bugs.llvm.org//show_bug.cgi?id=31391#c4

The code of this patch mostly comes from Hexagon-LIR, and the revert of the previous revert of the memmove in LIR.
Krzysztof, we will probably need to refactor this code: Hexagon requires some special handling for volatile memcpy, and the rest is usable for target independent code.

Diff Detail

Event Timeline

sebpop created this revision.Feb 21 2017, 2:15 PM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 21 2017, 2:15 PM

Test case?

Why are we generating a runtime check for memmove, as opposed to memcpy? I mean, they're the same function on many platforms, but we inline calls to memcpy much more aggressively for small sizes, so there could be a performance benefit to generating llvm.memcpy instead of llvm.memmove.

Does it make sense to perform this transform as part of the loop-idiom pass? loop-idiom runs interleaved with inlining; inserting runtime checks before we inline a function seems like a bad idea.

In D30225#683038, @efriedma wrote:

Why are we generating a runtime check for memmove, as opposed to memcpy?

LIR is very conservative: it will generate a memcpy only when it knows that source and destination do not alias.
When the check for alias fails, this patch adds the needed checks to ensure that the dependence is in the right direction: WAR deps are ok to memmove, RAW deps are unsafe: see comments in the code https://bugs.llvm.org//show_bug.cgi?id=31391#c4
Of course we could add one more check to generate more memcpys than memmoves (when the number of bytes copied is less than the difference between source and destination).

I mean, they're the same function on many platforms, but we inline calls to memcpy much more aggressively for small sizes, so there could be a performance benefit to generating llvm.memcpy instead of llvm.memmove.

Right. If you tell me to add that check, I'll add it.

Does it make sense to perform this transform as part of the loop-idiom pass? loop-idiom runs interleaved with inlining; inserting runtime checks before we inline a function seems like a bad idea.

What is your suggestion?

I guess we can just generate memmove for the first iteration of this; we'll see if any testcases come up where it would help to generate memcpy instead.

In terms of where to run this... not sure there's really any existing pass that's appropriate. You want to run after inlining, but before the loop vectorizer. Maybe a new pass just before the vectorizer; call it loop-versioning-loop-idiom or something like that.

Changes to address comments from Eli and Chad.

Herald added a subscriber: sanjoy. · View Herald TranscriptFeb 25 2017, 9:09 AM

efriedma added inline comments.Feb 27 2017, 11:48 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1146	getUniqueExitBlock?
1157	Why do we need to call SimplifyInstruction here?
1238	Can we simplify this code by requiring LoopSimplify? (hasDedicatedExits() makes updating the domtree much more straightforward.)

efriedma added a reviewer: hfinkel.Feb 27 2017, 11:53 AM

efriedma set the repository for this revision to rL LLVM.

kparzysz added inline comments.Feb 27 2017, 12:28 PM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1157	I guess back when this was written, it made some difference. I don't remember what it was though. Do you think it's unnecessary now?

In D30225#683038, @efriedma wrote:

Why are we generating a runtime check for memmove, as opposed to memcpy?

We can check for safety of memcpy: if the source and destination don't overlap, then it's safe to generate memcpy. On the other hand, when they do overlap, it is not always safe to generate memmove.

The volatile memcpy for Hexagon is probably not quite as important now.

It looks like there aren't any calls to createLoopVersioningIdiomPass()?

Oh, also, performance numbers would be nice at some point.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1157	I can't see why it would make a difference... SimplifyInstruction probably won't simplify a non-trivial SCEV expression to a ConstantInt, and the rest will get cleaned up by instcombine later.

The updated patch addresses the comments from Eli and Krzysztof.

Krzysztof, I have removed the code for memmove and memcpy detection
from the Hexagon LIR: this removes the generation of volatile memcpy.
Please make sure that it is ok to remove and to rely on the generic LIR for
the detection of memmove and memcpy.

Eli, I am running the spec2k, 2k6, and the test-suite with and without the patch.
I will either post relative speedup numbers by tomorrow, or I will amend
the patch if testing exposes problems.

Updated patch.

In D30225#688953, @sebpop wrote:

Krzysztof, I have removed the code for memmove and memcpy detection
from the Hexagon LIR: this removes the generation of volatile memcpy.
Please make sure that it is ok to remove and to rely on the generic LIR for
the detection of memmove and memcpy.

Yes, that is ok.

With the patch and compiling with "-mllvm -stats" the spec 2006:
Number of memcpy's formed from loop load+stores: 42
Number of memset's formed from loop stores: 1398
Number of memmove's formed from loop load+stores: 129

Before the patch on spec 2006:
Number of memcpy's formed from loop load+stores: 98
Number of memset's formed from loop stores: 1395
Number of memmove's formed from loop load+stores: 0

With the patch on the test-suite:
Number of memcpy's formed from loop load+stores: 121
Number of memset's formed from loop stores: 3243
Number of memmove's formed from loop load+stores: 1891

without the patch on the test-suite:
Number of memcpy's formed from loop load+stores: 140
Number of memset's formed from loop stores: 3213
Number of memmove's formed from loop load+stores: 0

With the patch and compiling with "-mllvm -stats" the spec 2006:
Number of memmove's formed from loop load+stores: 129

Before the patch on spec 2006:
Number of memmove's formed from loop load+stores: 0

This is nice.

Do you know what the most common strides are?

With the patch and compiling with "-mllvm -stats" the spec 2006:
Number of memcpy's formed from loop load+stores: 42

Before the patch on spec 2006:
Number of memcpy's formed from loop load+stores: 98

This looks bad.

In D30225#689790, @efriedma wrote:

With the patch and compiling with "-mllvm -stats" the spec 2006:
Number of memcpy's formed from loop load+stores: 42

Before the patch on spec 2006:
Number of memcpy's formed from loop load+stores: 98

This looks bad.

I've been trying to understand why the number of memcpys are changing:
the patch should only add new memmoves with runtime checks.
I then decided to rerun the spec2006, and it looks like the specmake was the problem
it truncated some of the stderr output.

With the patch I see the following on cpu2006:
Number of memcpy's formed from loop load+stores: 98
Number of memset's formed from loop stores: 1398
Number of memmove's formed from loop load+stores: 155

Added check for HasMemmove.

Ping.

efriedma added inline comments.Mar 16 2017, 5:33 PM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
508	This would not work right for if the target has memmove, but not memcpy. Maybe not an issue in practice, but still kind of confusing.
986	This whole worklist thing looks very suspicious; for the purpose of figuring out whether the memmove covers the loop, why do you need to special-case the pointer operand of the load instruction?
1240	This is breaking LoopSimplify form for the loop.

kparzysz added inline comments.Mar 17 2017, 8:17 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
986	Could you explain the "special-casing" comment? I'm not seeing what you are referring to.

efriedma added inline comments.Mar 17 2017, 10:35 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
986	Suppose, for example, you have something like this: ; Assume this is a loop, where %p and %q are induction variables. %p2 = call i8* @foo(i8* returned %p) %v = load i8* %p2 store i8 %v, i8* %q You start off with the load and store on the worklist, add the call to the worklist, then conclude the loop is covered even though the call could have other side-effects. Granted, that's probably unlikely to happen in practice, but I'm not sure what this loop is trying to accomplish.

kparzysz added inline comments.Mar 17 2017, 10:43 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
986	The other side-effects should be checked elsewhere. This wouldn't be a strided load, so it shouldn't even get this far.

efriedma added inline comments.Mar 17 2017, 11:10 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
986	The other side-effects should be checked elsewhere. Elsewhere, where? Isn't the point of coverLoop to check for side-effects? This wouldn't be a strided load, so it shouldn't even get this far. It's a strided load because of the "returned" attribute on the parameter call. (IIRC, SCEV doesn't actually look through calls like this at the moment, but it could.)

kparzysz added inline comments.Mar 17 2017, 11:16 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
986	The point of this loop is to see if the loop has other stuff in it besides the load and the store, and any instructions that are only used by the load and the store. In other words, if we removed the load and the store, and all instructions that would become recursively dead, would this loop still have anything left in it. If not, it means that the code used (only) by the load/store covers the loop entirely. The side-effects of the code related to the load/store should be checked somewhere else, but not here. If it's not checked, the checks should be added.

evandro added a subscriber: evandro.Mar 7 2018, 9:03 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 7 2018, 9:03 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

LoopIdiomRecognize.h

8 lines

lib/

Passes/

PassBuilder.cpp

5 lines

PassRegistry.def

1 line

Target/

Hexagon/

HexagonLoopIdiomRecognition.cpp

562 lines

Transforms/

IPO/

PassManagerBuilder.cpp

2 lines

Scalar/

LoopIdiomRecognize.cpp

386 lines

Scalar.cpp

5 lines

test/

CodeGen/

Hexagon/

loop-idiom/

Other/

1 line

2 lines

Transforms/

LoopIdiom/

2 lines

43 lines

36 lines

35 lines

51 lines

3 lines

Diff 90343

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	void initializeLoadStoreVectorizerPass(PassRegistry&);			void initializeLoadStoreVectorizerPass(PassRegistry&);
	void initializeLocalStackSlotPassPass(PassRegistry&);			void initializeLocalStackSlotPassPass(PassRegistry&);
	void initializeLoopAccessLegacyAnalysisPass(PassRegistry&);			void initializeLoopAccessLegacyAnalysisPass(PassRegistry&);
	void initializeLoopDataPrefetchLegacyPassPass(PassRegistry &);			void initializeLoopDataPrefetchLegacyPassPass(PassRegistry &);
	void initializeLoopDeletionLegacyPassPass(PassRegistry&);			void initializeLoopDeletionLegacyPassPass(PassRegistry&);
	void initializeLoopDistributeLegacyPass(PassRegistry&);			void initializeLoopDistributeLegacyPass(PassRegistry&);
	void initializeLoopExtractorPass(PassRegistry&);			void initializeLoopExtractorPass(PassRegistry&);
	void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);			void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);
				void initializeLoopVersioningIdiomRecognizeLegacyPassPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
	void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);			void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);
	void initializeLoopInterchangePass(PassRegistry &);			void initializeLoopInterchangePass(PassRegistry &);
	void initializeLoopLoadEliminationPass(PassRegistry&);			void initializeLoopLoadEliminationPass(PassRegistry&);
	void initializeLoopPassPass(PassRegistry&);			void initializeLoopPassPass(PassRegistry&);
	void initializeLoopPredicationLegacyPassPass(PassRegistry&);			void initializeLoopPredicationLegacyPassPass(PassRegistry&);
	void initializeLoopRerollPass(PassRegistry&);			void initializeLoopRerollPass(PassRegistry&);
	void initializeLoopRotateLegacyPassPass(PassRegistry&);			void initializeLoopRotateLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopSimplifyCFGPass();		(void) llvm::createLoopSimplifyCFGPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
(void) llvm::createLoopVersioningLICMPass();		(void) llvm::createLoopVersioningLICMPass();
(void) llvm::createLoopIdiomPass();		(void) llvm::createLoopIdiomPass();
		(void) llvm::createLoopVersioningIdiomPass();
(void) llvm::createLoopRotatePass();		(void) llvm::createLoopRotatePass();
(void) llvm::createLowerExpectIntrinsicPass();		(void) llvm::createLowerExpectIntrinsicPass();
(void) llvm::createLowerInvokePass();		(void) llvm::createLowerInvokePass();
(void) llvm::createLowerSwitchPass();		(void) llvm::createLowerSwitchPass();
(void) llvm::createNaryReassociatePass();		(void) llvm::createNaryReassociatePass();
(void) llvm::createObjCARCAAWrapperPass();		(void) llvm::createObjCARCAAWrapperPass();
(void) llvm::createObjCARCAPElimPass();		(void) llvm::createObjCARCAPElimPass();
(void) llvm::createObjCARCExpandPass();		(void) llvm::createObjCARCExpandPass();
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopIdiom - This pass recognizes and replaces idioms in loops.			// LoopIdiom - This pass recognizes and replaces idioms in loops.
	//			//
	Pass *createLoopIdiomPass();			Pass *createLoopIdiomPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// LoopVersioningIdiom - This pass recognizes and replaces idioms in loops with the
				// help of loop versioning for runtime information.
				//
				Pass *createLoopVersioningIdiomPass();

				//===----------------------------------------------------------------------===//
				//
	// LoopVersioningLICM - This pass is a loop versioning pass for LICM.			// LoopVersioningLICM - This pass is a loop versioning pass for LICM.
	//			//
	Pass *createLoopVersioningLICMPass();			Pass *createLoopVersioningLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// PromoteMemoryToRegister - This pass is used to promote memory references to			// PromoteMemoryToRegister - This pass is used to promote memory references to
	// be register references. A simple example of the transformation performed by			// be register references. A simple example of the transformation performed by
	▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h

	Show All 22 Lines
	namespace llvm {			namespace llvm {

	/// Performs Loop Idiom Recognize Pass.			/// Performs Loop Idiom Recognize Pass.
	class LoopIdiomRecognizePass : public PassInfoMixin<LoopIdiomRecognizePass> {			class LoopIdiomRecognizePass : public PassInfoMixin<LoopIdiomRecognizePass> {
	public:			public:
	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,			PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};			};

				/// Performs Loop Idiom Recognize Pass with Loop Versioning.
				class LoopVersioningIdiomRecognizePass
				: public PassInfoMixin<LoopVersioningIdiomRecognizePass> {
				public:
				PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR, LPMUpdater &U);
				};
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_LOOPIDIOMRECOGNIZE_H			#endif // LLVM_TRANSFORMS_SCALAR_LOOPIDIOMRECOGNIZE_H

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level,
OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopRotatePass()));		OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopRotatePass()));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

		// Recognize loop patterns and use loop versioning to compute at run-time
		// correctness and cost of the transform.
		OptimizePM.addPass(
		createFunctionToLoopPassAdaptor(LoopVersioningIdiomRecognizePass()));

// Now run the core loop vectorizer.		// Now run the core loop vectorizer.
OptimizePM.addPass(LoopVectorizePass());		OptimizePM.addPass(LoopVectorizePass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
OptimizePM.addPass(LoopLoadEliminationPass());		OptimizePM.addPass(LoopLoadEliminationPass());

// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
▲ Show 20 Lines • Show All 830 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
	#undef LOOP_ANALYSIS			#undef LOOP_ANALYSIS

	#ifndef LOOP_PASS			#ifndef LOOP_PASS
	#define LOOP_PASS(NAME, CREATE_PASS)			#define LOOP_PASS(NAME, CREATE_PASS)
	#endif			#endif
	LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())			LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	LOOP_PASS("licm", LICMPass())			LOOP_PASS("licm", LICMPass())
	LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())			LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
				LOOP_PASS("loop-versioning-idiom", LoopVersioningIdiomRecognizePass())
	LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())			LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
	LOOP_PASS("rotate", LoopRotatePass())			LOOP_PASS("rotate", LoopRotatePass())
	LOOP_PASS("no-op-loop", NoOpLoopPass())			LOOP_PASS("no-op-loop", NoOpLoopPass())
	LOOP_PASS("print", PrintLoopPass(dbgs()))			LOOP_PASS("print", PrintLoopPass(dbgs()))
	LOOP_PASS("loop-deletion", LoopDeletionPass())			LOOP_PASS("loop-deletion", LoopDeletionPass())
	LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())			LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())
	LOOP_PASS("strength-reduce", LoopStrengthReducePass())			LOOP_PASS("strength-reduce", LoopStrengthReducePass())
	LOOP_PASS("indvars", IndVarSimplifyPass())			LOOP_PASS("indvars", IndVarSimplifyPass())
	LOOP_PASS("unroll", LoopUnrollPass::create())			LOOP_PASS("unroll", LoopUnrollPass::create())
	LOOP_PASS("unroll-full", LoopUnrollPass::createFull())			LOOP_PASS("unroll-full", LoopUnrollPass::createFull())
	LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))			LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
	LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))
	LOOP_PASS("loop-predication", LoopPredicationPass())			LOOP_PASS("loop-predication", LoopPredicationPass())
	#undef LOOP_PASS			#undef LOOP_PASS

llvm/lib/Target/Hexagon/HexagonLoopIdiomRecognition.cpp

Show All 27 Lines
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

#include <algorithm>		#include <algorithm>
#include <array>		#include <array>

using namespace llvm;		using namespace llvm;

static cl::opt<bool> DisableMemcpyIdiom("disable-memcpy-idiom",
cl::Hidden, cl::init(false),
cl::desc("Disable generation of memcpy in loop idiom recognition"));

static cl::opt<bool> DisableMemmoveIdiom("disable-memmove-idiom",
cl::Hidden, cl::init(false),
cl::desc("Disable generation of memmove in loop idiom recognition"));

static cl::opt<unsigned> RuntimeMemSizeThreshold("runtime-mem-idiom-threshold",
cl::Hidden, cl::init(0), cl::desc("Threshold (in bytes) for the runtime "
"check guarding the memmove."));

static cl::opt<unsigned> CompileTimeMemSizeThreshold(
"compile-time-mem-idiom-threshold", cl::Hidden, cl::init(64),
cl::desc("Threshold (in bytes) to perform the transformation, if the "
"runtime loop count (mem transfer size) is known at compile-time."));

static cl::opt<bool> OnlyNonNestedMemmove("only-nonnested-memmove-idiom",
cl::Hidden, cl::init(true),
cl::desc("Only enable generating memmove in non-nested loops"));

cl::opt<bool> HexagonVolatileMemcpy("disable-hexagon-volatile-memcpy",
cl::Hidden, cl::init(false),
cl::desc("Enable Hexagon-specific memcpy for volatile destination."));

static const char *HexagonVolatileMemcpyName
= "hexagon_memcpy_forward_vp4cp4n2";


namespace llvm {		namespace llvm {
void initializeHexagonLoopIdiomRecognizePass(PassRegistry&);		void initializeHexagonLoopIdiomRecognizePass(PassRegistry&);
Pass *createHexagonLoopIdiomPass();		Pass *createHexagonLoopIdiomPass();
}		}

namespace {		namespace {
class HexagonLoopIdiomRecognize : public LoopPass {		class HexagonLoopIdiomRecognize : public LoopPass {
public:		public:
Show All 15 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<TargetLibraryInfoWrapperPass>();		AU.addPreserved<TargetLibraryInfoWrapperPass>();
}		}

bool runOnLoop(Loop *L, LPPassManager &LPM) override;		bool runOnLoop(Loop *L, LPPassManager &LPM) override;

private:		private:
unsigned getStoreSizeInBytes(StoreInst *SI);
int getSCEVStride(const SCEVAddRecExpr *StoreEv);
bool isLegalStore(Loop CurLoop, StoreInst SI);
void collectStores(Loop CurLoop, BasicBlock BB,
SmallVectorImpl<StoreInst*> &Stores);
bool processCopyingStore(Loop CurLoop, StoreInst SI, const SCEV *BECount);
bool coverLoop(Loop L, SmallVectorImpl<Instruction> &Insts) const;
bool runOnLoopBlock(Loop CurLoop, BasicBlock BB, const SCEV *BECount,
SmallVectorImpl<BasicBlock*> &ExitBlocks);
bool runOnCountableLoop(Loop *L);

AliasAnalysis *AA;		AliasAnalysis *AA;
const DataLayout *DL;		const DataLayout *DL;
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LF;		LoopInfo *LF;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
ScalarEvolution *SE;		ScalarEvolution *SE;
bool HasMemcpy, HasMemmove;
};		};
}		}

char HexagonLoopIdiomRecognize::ID = 0;		char HexagonLoopIdiomRecognize::ID = 0;

INITIALIZE_PASS_BEGIN(HexagonLoopIdiomRecognize, "hexagon-loop-idiom",		INITIALIZE_PASS_BEGIN(HexagonLoopIdiomRecognize, "hexagon-loop-idiom",
"Recognize Hexagon-specific loop idioms", false, false)		"Recognize Hexagon-specific loop idioms", false, false)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
▲ Show 20 Lines • Show All 936 Lines • ▼ Show 20 Lines	bool PolynomialMultiplyRecognize::recognize() {
if (PM->getType() != PV.Res->getType())		if (PM->getType() != PV.Res->getType())
PM = IRBuilder<>(&*At).CreateIntCast(PM, PV.Res->getType(), false);		PM = IRBuilder<>(&*At).CreateIntCast(PM, PV.Res->getType(), false);

PV.Res->replaceAllUsesWith(PM);		PV.Res->replaceAllUsesWith(PM);
PV.Res->eraseFromParent();		PV.Res->eraseFromParent();
return true;		return true;
}		}


unsigned HexagonLoopIdiomRecognize::getStoreSizeInBytes(StoreInst *SI) {
uint64_t SizeInBits = DL->getTypeSizeInBits(SI->getValueOperand()->getType());
assert(((SizeInBits & 7) \|\| (SizeInBits >> 32) == 0) &&
"Don't overflow unsigned.");
return (unsigned)SizeInBits >> 3;
}


int HexagonLoopIdiomRecognize::getSCEVStride(const SCEVAddRecExpr *S) {
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(S->getOperand(1)))
return SC->getAPInt().getSExtValue();
return 0;
}


bool HexagonLoopIdiomRecognize::isLegalStore(Loop CurLoop, StoreInst SI) {
// Allow volatile stores if HexagonVolatileMemcpy is enabled.
if (!(SI->isVolatile() && HexagonVolatileMemcpy) && !SI->isSimple())
return false;

Value *StoredVal = SI->getValueOperand();
Value *StorePtr = SI->getPointerOperand();

// Reject stores that are so large that they overflow an unsigned.
uint64_t SizeInBits = DL->getTypeSizeInBits(StoredVal->getType());
if ((SizeInBits & 7) \|\| (SizeInBits >> 32) != 0)
return false;

// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided store. If we have something else, it's a
// random store we can't handle.
auto *StoreEv = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
if (!StoreEv \|\| StoreEv->getLoop() != CurLoop \|\| !StoreEv->isAffine())
return false;

// Check to see if the stride matches the size of the store. If so, then we
// know that every byte is touched in the loop.
int Stride = getSCEVStride(StoreEv);
if (Stride == 0)
return false;
unsigned StoreSize = getStoreSizeInBytes(SI);
if (StoreSize != unsigned(std::abs(Stride)))
return false;

// The store must be feeding a non-volatile load.
LoadInst *LI = dyn_cast<LoadInst>(SI->getValueOperand());
if (!LI \|\| !LI->isSimple())
return false;

// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided load. If we have something else, it's a
// random load we can't handle.
Value *LoadPtr = LI->getPointerOperand();
auto *LoadEv = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(LoadPtr));
if (!LoadEv \|\| LoadEv->getLoop() != CurLoop \|\| !LoadEv->isAffine())
return false;

// The store and load must share the same stride.
if (StoreEv->getOperand(1) != LoadEv->getOperand(1))
return false;

// Success. This store can be converted into a memcpy.
return true;
}


/// mayLoopAccessLocation - Return true if the specified loop might access the
/// specified pointer location, which is a loop-strided access. The 'Access'
/// argument specifies what the verboten forms of access are (read or write).
static bool
mayLoopAccessLocation(Value Ptr, ModRefInfo Access, Loop L,
const SCEV *BECount, unsigned StoreSize,
AliasAnalysis &AA,
SmallPtrSetImpl<Instruction *> &Ignored) {
// Get the location that may be stored across the loop. Since the access
// is strided positively through memory, we say that the modified location
// starts at the pointer and has infinite size.
uint64_t AccessSize = MemoryLocation::UnknownSize;

// If the loop iterates a fixed number of times, we can refine the access
// size to be exactly the size of the memset, which is (BECount+1)*StoreSize
if (const SCEVConstant *BECst = dyn_cast<SCEVConstant>(BECount))
AccessSize = (BECst->getValue()->getZExtValue() + 1) * StoreSize;

// TODO: For this to be really effective, we have to dive into the pointer
// operand in the store. Store to &A[i] of 100 will always return may alias
// with store of &A[100], we need to StoreLoc to be "A" with size of 100,
// which will then no-alias a store to &A[100].
MemoryLocation StoreLoc(Ptr, AccessSize);

for (auto *B : L->blocks())
for (auto &I : *B)
if (Ignored.count(&I) == 0 && (AA.getModRefInfo(&I, StoreLoc) & Access))
return true;

return false;
}


void HexagonLoopIdiomRecognize::collectStores(Loop CurLoop, BasicBlock BB,
SmallVectorImpl<StoreInst*> &Stores) {
Stores.clear();
for (Instruction &I : *BB)
if (StoreInst *SI = dyn_cast<StoreInst>(&I))
if (isLegalStore(CurLoop, SI))
Stores.push_back(SI);
}


bool HexagonLoopIdiomRecognize::processCopyingStore(Loop *CurLoop,
StoreInst SI, const SCEV BECount) {
assert((SI->isSimple() \|\| (SI->isVolatile() && HexagonVolatileMemcpy)) &&
"Expected only non-volatile stores, or Hexagon-specific memcpy"
"to volatile destination.");

Value *StorePtr = SI->getPointerOperand();
auto *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
unsigned Stride = getSCEVStride(StoreEv);
unsigned StoreSize = getStoreSizeInBytes(SI);
if (Stride != StoreSize)
return false;

// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided load. If we have something else, it's a
// random load we can't handle.
LoadInst *LI = dyn_cast<LoadInst>(SI->getValueOperand());
auto *LoadEv = cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));

// The trip count of the loop and the base pointer of the addrec SCEV is
// guaranteed to be loop invariant, which means that it should dominate the
// header. This allows us to insert code for it in the preheader.
BasicBlock *Preheader = CurLoop->getLoopPreheader();
Instruction *ExpPt = Preheader->getTerminator();
IRBuilder<> Builder(ExpPt);
SCEVExpander Expander(SE, DL, "hexagon-loop-idiom");

Type IntPtrTy = Builder.getIntPtrTy(DL, SI->getPointerAddressSpace());

// Okay, we have a strided store "p[i]" of a loaded value. We can turn
// this into a memcpy/memmove in the loop preheader now if we want. However,
// this would be unsafe to do if there is anything else in the loop that may
// read or write the memory region we're storing to. For memcpy, this
// includes the load that feeds the stores. Check for an alias by generating
// the base address and checking everything.
Value *StoreBasePtr = Expander.expandCodeFor(StoreEv->getStart(),
Builder.getInt8PtrTy(SI->getPointerAddressSpace()), ExpPt);
Value *LoadBasePtr = nullptr;

bool Overlap = false;
bool DestVolatile = SI->isVolatile();
Type *BECountTy = BECount->getType();

if (DestVolatile) {
// The trip count must fit in i32, since it is the type of the "num_words"
// argument to hexagon_memcpy_forward_vp4cp4n2.
if (StoreSize != 4 \|\| DL->getTypeSizeInBits(BECountTy) > 32) {
CleanupAndExit:
// If we generated new code for the base pointer, clean up.
Expander.clear();
if (StoreBasePtr && (LoadBasePtr != StoreBasePtr)) {
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
StoreBasePtr = nullptr;
}
if (LoadBasePtr) {
RecursivelyDeleteTriviallyDeadInstructions(LoadBasePtr, TLI);
LoadBasePtr = nullptr;
}
return false;
}
}

SmallPtrSet<Instruction*, 2> Ignore1;
Ignore1.insert(SI);
if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,
StoreSize, *AA, Ignore1)) {
// Check if the load is the offending instruction.
Ignore1.insert(LI);
if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,
StoreSize, *AA, Ignore1)) {
// Still bad. Nothing we can do.
goto CleanupAndExit;
}
// It worked with the load ignored.
Overlap = true;
}

if (!Overlap) {
if (DisableMemcpyIdiom \|\| !HasMemcpy)
goto CleanupAndExit;
} else {
// Don't generate memmove if this function will be inlined. This is
// because the caller will undergo this transformation after inlining.
Function *Func = CurLoop->getHeader()->getParent();
if (Func->hasFnAttribute(Attribute::AlwaysInline))
goto CleanupAndExit;

// In case of a memmove, the call to memmove will be executed instead
// of the loop, so we need to make sure that there is nothing else in
// the loop than the load, store and instructions that these two depend
// on.
SmallVector<Instruction*,2> Insts;
Insts.push_back(SI);
Insts.push_back(LI);
if (!coverLoop(CurLoop, Insts))
goto CleanupAndExit;

if (DisableMemmoveIdiom \|\| !HasMemmove)
goto CleanupAndExit;
bool IsNested = CurLoop->getParentLoop() != 0;
if (IsNested && OnlyNonNestedMemmove)
goto CleanupAndExit;
}

// For a memcpy, we have to make sure that the input array is not being
// mutated by the loop.
LoadBasePtr = Expander.expandCodeFor(LoadEv->getStart(),
Builder.getInt8PtrTy(LI->getPointerAddressSpace()), ExpPt);

SmallPtrSet<Instruction*, 2> Ignore2;
Ignore2.insert(SI);
if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,
*AA, Ignore2))
goto CleanupAndExit;

// Check the stride.
bool StridePos = getSCEVStride(LoadEv) >= 0;

// Currently, the volatile memcpy only emulates traversing memory forward.
if (!StridePos && DestVolatile)
goto CleanupAndExit;

bool RuntimeCheck = (Overlap \|\| DestVolatile);

BasicBlock *ExitB;
if (RuntimeCheck) {
// The runtime check needs a single exit block.
SmallVector<BasicBlock*, 8> ExitBlocks;
CurLoop->getUniqueExitBlocks(ExitBlocks);
if (ExitBlocks.size() != 1)
goto CleanupAndExit;
ExitB = ExitBlocks[0];
}

// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
// pointer size if it isn't already.
LLVMContext &Ctx = SI->getContext();
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);
unsigned Alignment = std::min(SI->getAlignment(), LI->getAlignment());
DebugLoc DLoc = SI->getDebugLoc();

const SCEV *NumBytesS =
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
if (StoreSize != 1)
NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
SCEV::FlagNUW);
Value *NumBytes = Expander.expandCodeFor(NumBytesS, IntPtrTy, ExpPt);
if (Instruction *In = dyn_cast<Instruction>(NumBytes))
if (Value Simp = SimplifyInstruction(In, DL, TLI, DT))
NumBytes = Simp;

CallInst *NewCall;

if (RuntimeCheck) {
unsigned Threshold = RuntimeMemSizeThreshold;
if (ConstantInt *CI = dyn_cast<ConstantInt>(NumBytes)) {
uint64_t C = CI->getZExtValue();
if (Threshold != 0 && C < Threshold)
goto CleanupAndExit;
if (C < CompileTimeMemSizeThreshold)
goto CleanupAndExit;
}

BasicBlock *Header = CurLoop->getHeader();
Function *Func = Header->getParent();
Loop *ParentL = LF->getLoopFor(Preheader);
StringRef HeaderName = Header->getName();

// Create a new (empty) preheader, and update the PHI nodes in the
// header to use the new preheader.
BasicBlock *NewPreheader = BasicBlock::Create(Ctx, HeaderName+".rtli.ph",
Func, Header);
if (ParentL)
ParentL->addBasicBlockToLoop(NewPreheader, *LF);
IRBuilder<>(NewPreheader).CreateBr(Header);
for (auto &In : *Header) {
PHINode *PN = dyn_cast<PHINode>(&In);
if (!PN)
break;
int bx = PN->getBasicBlockIndex(Preheader);
if (bx >= 0)
PN->setIncomingBlock(bx, NewPreheader);
}
DT->addNewBlock(NewPreheader, Preheader);
DT->changeImmediateDominator(Header, NewPreheader);

// Check for safe conditions to execute memmove.
// If stride is positive, copying things from higher to lower addresses
// is equivalent to memmove. For negative stride, it's the other way
// around. Copying forward in memory with positive stride may not be
// same as memmove since we may be copying values that we just stored
// in some previous iteration.
Value *LA = Builder.CreatePtrToInt(LoadBasePtr, IntPtrTy);
Value *SA = Builder.CreatePtrToInt(StoreBasePtr, IntPtrTy);
Value *LowA = StridePos ? SA : LA;
Value *HighA = StridePos ? LA : SA;
Value *CmpA = Builder.CreateICmpULT(LowA, HighA);
Value *Cond = CmpA;

// Check for distance between pointers.
Value *Dist = Builder.CreateSub(HighA, LowA);
Value *CmpD = Builder.CreateICmpSLT(NumBytes, Dist);
Value *CmpEither = Builder.CreateOr(Cond, CmpD);
Cond = CmpEither;

if (Threshold != 0) {
Type *Ty = NumBytes->getType();
Value *Thr = ConstantInt::get(Ty, Threshold);
Value *CmpB = Builder.CreateICmpULT(Thr, NumBytes);
Value *CmpBoth = Builder.CreateAnd(Cond, CmpB);
Cond = CmpBoth;
}
BasicBlock *MemmoveB = BasicBlock::Create(Ctx, Header->getName()+".rtli",
Func, NewPreheader);
if (ParentL)
ParentL->addBasicBlockToLoop(MemmoveB, *LF);
Instruction *OldT = Preheader->getTerminator();
Builder.CreateCondBr(Cond, MemmoveB, NewPreheader);
OldT->eraseFromParent();
Preheader->setName(Preheader->getName()+".old");
DT->addNewBlock(MemmoveB, Preheader);
// Find the new immediate dominator of the exit block.
BasicBlock *ExitD = Preheader;
for (auto PI = pred_begin(ExitB), PE = pred_end(ExitB); PI != PE; ++PI) {
BasicBlock PB = PI;
ExitD = DT->findNearestCommonDominator(ExitD, PB);
if (!ExitD)
break;
}
// If the prior immediate dominator of ExitB was dominated by the
// old preheader, then the old preheader becomes the new immediate
// dominator. Otherwise don't change anything (because the newly
// added blocks are dominated by the old preheader).
if (ExitD && DT->dominates(Preheader, ExitD)) {
DomTreeNode *BN = DT->getNode(ExitB);
DomTreeNode *DN = DT->getNode(ExitD);
BN->setIDom(DN);
}

// Add a call to memmove to the conditional block.
IRBuilder<> CondBuilder(MemmoveB);
CondBuilder.CreateBr(ExitB);
CondBuilder.SetInsertPoint(MemmoveB->getTerminator());

if (DestVolatile) {
Type *Int32Ty = Type::getInt32Ty(Ctx);
Type *Int32PtrTy = Type::getInt32PtrTy(Ctx);
Type *VoidTy = Type::getVoidTy(Ctx);
Module *M = Func->getParent();
Constant *CF = M->getOrInsertFunction(HexagonVolatileMemcpyName, VoidTy,
Int32PtrTy, Int32PtrTy, Int32Ty,
nullptr);
Function *Fn = cast<Function>(CF);
Fn->setLinkage(Function::ExternalLinkage);

const SCEV *OneS = SE->getConstant(Int32Ty, 1);
const SCEV *BECount32 = SE->getTruncateOrZeroExtend(BECount, Int32Ty);
const SCEV *NumWordsS = SE->getAddExpr(BECount32, OneS, SCEV::FlagNUW);
Value *NumWords = Expander.expandCodeFor(NumWordsS, Int32Ty,
MemmoveB->getTerminator());
if (Instruction *In = dyn_cast<Instruction>(NumWords))
if (Value Simp = SimplifyInstruction(In, DL, TLI, DT))
NumWords = Simp;

Value *Op0 = (StoreBasePtr->getType() == Int32PtrTy)
? StoreBasePtr
: CondBuilder.CreateBitCast(StoreBasePtr, Int32PtrTy);
Value *Op1 = (LoadBasePtr->getType() == Int32PtrTy)
? LoadBasePtr
: CondBuilder.CreateBitCast(LoadBasePtr, Int32PtrTy);
NewCall = CondBuilder.CreateCall(Fn, {Op0, Op1, NumWords});
} else {
NewCall = CondBuilder.CreateMemMove(StoreBasePtr, LoadBasePtr,
NumBytes, Alignment);
}
} else {
NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr,
NumBytes, Alignment);
// Okay, the memcpy has been formed. Zap the original store and
// anything that feeds into it.
RecursivelyDeleteTriviallyDeadInstructions(SI, TLI);
}

NewCall->setDebugLoc(DLoc);

DEBUG(dbgs() << " Formed " << (Overlap ? "memmove: " : "memcpy: ")
<< *NewCall << "\n"
<< " from load ptr=" << LoadEv << " at: " << LI << "\n"
<< " from store ptr=" << StoreEv << " at: " << SI << "\n");

return true;
}


// \brief Check if the instructions in Insts, together with their dependencies
// cover the loop in the sense that the loop could be safely eliminated once
// the instructions in Insts are removed.
bool HexagonLoopIdiomRecognize::coverLoop(Loop *L,
SmallVectorImpl<Instruction*> &Insts) const {
SmallSet<BasicBlock*,8> LoopBlocks;
for (auto *B : L->blocks())
LoopBlocks.insert(B);

SetVector<Instruction*> Worklist(Insts.begin(), Insts.end());

// Collect all instructions from the loop that the instructions in Insts
// depend on (plus their dependencies, etc.). These instructions will
// constitute the expression trees that feed those in Insts, but the trees
// will be limited only to instructions contained in the loop.
for (unsigned i = 0; i < Worklist.size(); ++i) {
Instruction *In = Worklist[i];
for (auto I = In->op_begin(), E = In->op_end(); I != E; ++I) {
Instruction *OpI = dyn_cast<Instruction>(I);
if (!OpI)
continue;
BasicBlock *PB = OpI->getParent();
if (!LoopBlocks.count(PB))
continue;
Worklist.insert(OpI);
}
}

// Scan all instructions in the loop, if any of them have a user outside
// of the loop, or outside of the expressions collected above, then either
// the loop has a side-effect visible outside of it, or there are
// instructions in it that are not involved in the original set Insts.
for (auto *B : L->blocks()) {
for (auto &In : *B) {
if (isa<BranchInst>(In) \|\| isa<DbgInfoIntrinsic>(In))
continue;
if (!Worklist.count(&In) && In.mayHaveSideEffects())
return false;
for (const auto &K : In.users()) {
Instruction *UseI = dyn_cast<Instruction>(K);
if (!UseI)
continue;
BasicBlock *UseB = UseI->getParent();
if (LF->getLoopFor(UseB) != L)
return false;
}
}
}

return true;
}

/// runOnLoopBlock - Process the specified block, which lives in a counted loop
/// with the specified backedge count. This block is known to be in the current
/// loop and not in any subloops.
bool HexagonLoopIdiomRecognize::runOnLoopBlock(Loop CurLoop, BasicBlock BB,
const SCEV BECount, SmallVectorImpl<BasicBlock> &ExitBlocks) {
// We can only promote stores in this block if they are unconditionally
// executed in the loop. For a block to be unconditionally executed, it has
// to dominate all the exit blocks of the loop. Verify this now.
auto DominatedByBB = [this,BB] (BasicBlock *EB) -> bool {
return DT->dominates(BB, EB);
};
if (!std::all_of(ExitBlocks.begin(), ExitBlocks.end(), DominatedByBB))
return false;

bool MadeChange = false;
// Look for store instructions, which may be optimized to memset/memcpy.
SmallVector<StoreInst*,8> Stores;
collectStores(CurLoop, BB, Stores);

// Optimize the store into a memcpy, if it feeds an similarly strided load.
for (auto &SI : Stores)
MadeChange \|= processCopyingStore(CurLoop, SI, BECount);

return MadeChange;
}


bool HexagonLoopIdiomRecognize::runOnCountableLoop(Loop *L) {
PolynomialMultiplyRecognize PMR(L, DL, DT, TLI, SE);
if (PMR.recognize())
return true;

if (!HasMemcpy && !HasMemmove)
return false;

const SCEV *BECount = SE->getBackedgeTakenCount(L);
assert(!isa<SCEVCouldNotCompute>(BECount) &&
"runOnCountableLoop() called on a loop without a predictable"
"backedge-taken count");

SmallVector<BasicBlock *, 8> ExitBlocks;
L->getUniqueExitBlocks(ExitBlocks);

bool Changed = false;

// Scan all the blocks in the loop that are not in subloops.
for (auto *BB : L->getBlocks()) {
// Ignore blocks in subloops.
if (LF->getLoopFor(BB) != L)
continue;
Changed \|= runOnLoopBlock(L, BB, BECount, ExitBlocks);
}

return Changed;
}


bool HexagonLoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {		bool HexagonLoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {
const Module &M = *L->getHeader()->getParent()->getParent();		const Module &M = *L->getHeader()->getParent()->getParent();
if (Triple(M.getTargetTriple()).getArch() != Triple::hexagon)		if (Triple(M.getTargetTriple()).getArch() != Triple::hexagon)
return false;		return false;

if (skipLoop(L))		if (skipLoop(L))
return false;		return false;

Show All 9 Lines	bool HexagonLoopIdiomRecognize::runOnLoop(Loop *L, LPPassManager &LPM) {

AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
DL = &L->getHeader()->getModule()->getDataLayout();		DL = &L->getHeader()->getModule()->getDataLayout();
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LF = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LF = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();

HasMemcpy = TLI->has(LibFunc_memcpy);		if (SE->hasLoopInvariantBackedgeTakenCount(L)) {
HasMemmove = TLI->has(LibFunc_memmove);		PolynomialMultiplyRecognize PMR(L, DL, DT, TLI, SE);
		if (PMR.recognize())
		return true;
		}

if (SE->hasLoopInvariantBackedgeTakenCount(L))
return runOnCountableLoop(L);
return false;		return false;
}		}


Pass *llvm::createHexagonLoopIdiomPass() {		Pass *llvm::createHexagonLoopIdiomPass() {
return new HexagonLoopIdiomRecognize();		return new HexagonLoopIdiomRecognize();
}		}

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 556 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

		MPM.add(createLoopVersioningIdiomPass());

MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));		MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
if (EnableLoopLoadElim)		if (EnableLoopLoadElim)
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show All 16 Lines
// presence of non-idiom instructions. The initial implementation of the		// presence of non-idiom instructions. The initial implementation of the
// heuristics applies to idioms in multi-block loops.		// heuristics applies to idioms in multi-block loops.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// TODO List:		// TODO List:
//		//
// Future loop memory idioms to recognize:		// Future loop memory idioms to recognize:
// memcmp, memmove, strlen, etc.		// memcmp, strlen, etc.
// Future floating point idioms to recognize in -ffast-math mode:		// Future floating point idioms to recognize in -ffast-math mode:
// fpowi		// fpowi
// Future integer operation idioms to recognize:		// Future integer operation idioms to recognize:
// ctpop, ctlz, cttz		// ctpop, ctlz, cttz
//		//
// Beware that isel's default lowering for ctpop is highly inefficient for		// Beware that isel's default lowering for ctpop is highly inefficient for
// i64 and larger types when i64 is legal and the value has few bits set. It		// i64 and larger types when i64 is legal and the value has few bits set. It
// would be good to enhance isel to emit a loop for ctpop in this case.		// would be good to enhance isel to emit a loop for ctpop in this case.
//		//
// This could recognize common matrix multiplies and dot product idioms and		// This could recognize common matrix multiplies and dot product idioms and
// replace them with calls to BLAS (if linked in??).		// replace them with calls to BLAS (if linked in??).
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"		#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopAccessAnalysis.h"		#include "llvm/Analysis/LoopAccessAnalysis.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
Show All 9 Lines
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-idiom"		#define DEBUG_TYPE "loop-idiom"

STATISTIC(NumMemSet, "Number of memset's formed from loop stores");		STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");		STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
		STATISTIC(NumMemMove, "Number of memmove's formed from loop load+stores");

static cl::opt<bool> UseLIRCodeSizeHeurs(		static cl::opt<bool> UseLIRCodeSizeHeurs(
"use-lir-code-size-heurs",		"use-lir-code-size-heurs",
cl::desc("Use loop idiom recognition code size heuristics when compiling"		cl::desc("Use loop idiom recognition code size heuristics when compiling"
"with -Os/-Oz"),		"with -Os/-Oz"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<unsigned>
		RuntimeMemSizeThreshold("runtime-mem-idiom-threshold", cl::Hidden,
		cl::init(0),
		cl::desc("Threshold (in bytes) for the runtime "
		"check guarding the memmove."));

		static cl::opt<unsigned> CompileTimeMemSizeThreshold(
		"compile-time-mem-idiom-threshold", cl::Hidden, cl::init(64),
		cl::desc(
		"Threshold (in bytes) to perform the transformation, if the "
		"runtime loop count (mem transfer size) is known at compile-time."));

		static cl::opt<bool> DisableMemcpyIdiom(
		"disable-memcpy-idiom", cl::Hidden, cl::init(false),
		cl::desc("Disable generation of memcpy in loop idiom recognition"));

		static cl::opt<bool> DisableMemmoveIdiom(
		"disable-memmove-idiom", cl::Hidden, cl::init(false),
		cl::desc("Disable generation of memmove in loop idiom recognition"));

		static cl::opt<bool> OnlyNonNestedMemmove(
		"only-nonnested-memmove-idiom", cl::Hidden, cl::init(false),
		cl::desc("Only enable generating memmove in non-nested loops"));

namespace {		namespace {

class LoopIdiomRecognize {		class LoopIdiomRecognize {
Loop *CurLoop;		Loop *CurLoop;
AliasAnalysis *AA;		AliasAnalysis *AA;
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LI;		LoopInfo *LI;
ScalarEvolution *SE;		ScalarEvolution *SE;
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;
const DataLayout *DL;		const DataLayout *DL;
		bool UseLoopVersioning;
bool ApplyCodeSizeHeuristics;		bool ApplyCodeSizeHeuristics;

public:		public:
explicit LoopIdiomRecognize(AliasAnalysis AA, DominatorTree DT,		explicit LoopIdiomRecognize(AliasAnalysis AA, DominatorTree DT,
LoopInfo LI, ScalarEvolution SE,		LoopInfo LI, ScalarEvolution SE,
TargetLibraryInfo *TLI,		TargetLibraryInfo *TLI,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
const DataLayout *DL)		const DataLayout *DL, bool ULV)
: CurLoop(nullptr), AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI),		: CurLoop(nullptr), AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI),
DL(DL) {}		DL(DL), UseLoopVersioning(ULV) {}

bool runOnLoop(Loop *L);		bool runOnLoop(Loop *L);

private:		private:
typedef SmallVector<StoreInst *, 8> StoreList;		typedef SmallVector<StoreInst *, 8> StoreList;
typedef MapVector<Value *, StoreList> StoreListMap;		typedef MapVector<Value *, StoreList> StoreListMap;
StoreListMap StoreRefsForMemset;		StoreListMap StoreRefsForMemset;
StoreListMap StoreRefsForMemsetPattern;		StoreListMap StoreRefsForMemsetPattern;
StoreList StoreRefsForMemcpy;		StoreList StoreRefsForMemcpy;
bool HasMemset;		bool HasMemset;
bool HasMemsetPattern;		bool HasMemsetPattern;
bool HasMemcpy;		bool HasMemcpy;
		bool HasMemmove;

/// \name Countable Loop Idiom Handling		/// \name Countable Loop Idiom Handling
/// @{		/// @{

bool runOnCountableLoop();		bool runOnCountableLoop();
bool runOnLoopBlock(BasicBlock BB, const SCEV BECount,		bool runOnLoopBlock(BasicBlock BB, const SCEV BECount,
SmallVectorImpl<BasicBlock *> &ExitBlocks);		SmallVectorImpl<BasicBlock *> &ExitBlocks);

void collectStores(BasicBlock *BB);		void collectStores(BasicBlock *BB);
bool isLegalStore(StoreInst *SI, bool &ForMemset, bool &ForMemsetPattern,		bool isLegalStore(StoreInst *SI, bool &ForMemset, bool &ForMemsetPattern,
bool &ForMemcpy);		bool &ForMemcpy);
bool processLoopStores(SmallVectorImpl<StoreInst > &SL, const SCEV BECount,		bool processLoopStores(SmallVectorImpl<StoreInst > &SL, const SCEV BECount,
bool ForMemset);		bool ForMemset);
bool processLoopMemSet(MemSetInst MSI, const SCEV BECount);		bool processLoopMemSet(MemSetInst MSI, const SCEV BECount);

bool processLoopStridedStore(Value *DestPtr, unsigned StoreSize,		bool processLoopStridedStore(Value *DestPtr, unsigned StoreSize,
unsigned StoreAlignment, Value *StoredVal,		unsigned StoreAlignment, Value *StoredVal,
Instruction *TheStore,		Instruction *TheStore,
SmallPtrSetImpl<Instruction *> &Stores,		SmallPtrSetImpl<Instruction *> &Stores,
const SCEVAddRecExpr Ev, const SCEV BECount,		const SCEVAddRecExpr Ev, const SCEV BECount,
bool NegStride, bool IsLoopMemset = false);		bool NegStride, bool IsLoopMemset = false);
		bool coverLoop(Loop L, SmallVectorImpl<Instruction> &Insts) const;
bool processLoopStoreOfLoopLoad(StoreInst SI, const SCEV BECount);		bool processLoopStoreOfLoopLoad(StoreInst SI, const SCEV BECount);
bool avoidLIRForMultiBlockLoop(bool IsMemset = false,		bool avoidLIRForMultiBlockLoop(bool IsMemset = false,
bool IsLoopMemset = false);		bool IsLoopMemset = false);

/// @}		/// @}
/// \name Noncountable Loop Idiom Handling		/// \name Noncountable Loop Idiom Handling
/// @{		/// @{

Show All 24 Lines	bool runOnLoop(Loop *L, LPPassManager &LPM) override {
ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
TargetLibraryInfo *TLI =		TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
const TargetTransformInfo *TTI =		const TargetTransformInfo *TTI =
&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(		&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
*L->getHeader()->getParent());		*L->getHeader()->getParent());
const DataLayout *DL = &L->getHeader()->getModule()->getDataLayout();		const DataLayout *DL = &L->getHeader()->getModule()->getDataLayout();

LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL);		LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL, false);
return LIR.runOnLoop(L);		return LIR.runOnLoop(L);
}		}

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
/// loop preheaders be inserted into the CFG.		/// loop preheaders be inserted into the CFG.
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}
};		};

		class LoopVersioningIdiomRecognizeLegacyPass : public LoopPass {
		public:
		static char ID;
		explicit LoopVersioningIdiomRecognizeLegacyPass() : LoopPass(ID) {
		initializeLoopVersioningIdiomRecognizeLegacyPassPass(
		*PassRegistry::getPassRegistry());
		}

		bool runOnLoop(Loop *L, LPPassManager &LPM) override {
		if (skipLoop(L))
		return false;

		AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
		ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
		TargetLibraryInfo *TLI =
		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
		const TargetTransformInfo *TTI =
		&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
		*L->getHeader()->getParent());
		const DataLayout *DL = &L->getHeader()->getModule()->getDataLayout();

		LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL, true);
		return LIR.runOnLoop(L);
		}

		/// This transformation requires natural loop information & requires that
		/// loop preheaders be inserted into the CFG.
		///
		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequiredID(LoopSimplifyID);
		AU.addRequiredID(LCSSAID);
		AU.addRequired<ScalarEvolutionWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
		getLoopAnalysisUsage(AU);
		}
		};

} // End anonymous namespace.		} // End anonymous namespace.

PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &) {
const auto *DL = &L.getHeader()->getModule()->getDataLayout();		const auto *DL = &L.getHeader()->getModule()->getDataLayout();

LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI, DL);		LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI, DL,
		false);
		if (!LIR.runOnLoop(&L))
		return PreservedAnalyses::all();

		return getLoopPassPreservedAnalyses();
		}

		PreservedAnalyses
		LoopVersioningIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
		LoopStandardAnalysisResults &AR,
		LPMUpdater &) {
		const auto *DL = &L.getHeader()->getModule()->getDataLayout();

		LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI, DL,
		true);
if (!LIR.runOnLoop(&L))		if (!LIR.runOnLoop(&L))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return getLoopPassPreservedAnalyses();		return getLoopPassPreservedAnalyses();
}		}

char LoopIdiomRecognizeLegacyPass::ID = 0;		char LoopIdiomRecognizeLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(LoopIdiomRecognizeLegacyPass, "loop-idiom",		INITIALIZE_PASS_BEGIN(LoopIdiomRecognizeLegacyPass, "loop-idiom",
"Recognize loop idioms", false, false)		"Recognize loop idioms", false, false)
INITIALIZE_PASS_DEPENDENCY(LoopPass)		INITIALIZE_PASS_DEPENDENCY(LoopPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_END(LoopIdiomRecognizeLegacyPass, "loop-idiom",		INITIALIZE_PASS_END(LoopIdiomRecognizeLegacyPass, "loop-idiom",
"Recognize loop idioms", false, false)		"Recognize loop idioms", false, false)

		char LoopVersioningIdiomRecognizeLegacyPass::ID = 0;
		INITIALIZE_PASS_BEGIN(LoopVersioningIdiomRecognizeLegacyPass,
		"loop-versioning-idiom",
		"Recognize loop idioms with loop versioning", false,
		false)
		INITIALIZE_PASS_DEPENDENCY(LoopPass)
		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
		INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_END(LoopVersioningIdiomRecognizeLegacyPass,
		"loop-versioning-idiom",
		"Recognize loop idioms with loop versioning", false, false)

Pass *llvm::createLoopIdiomPass() { return new LoopIdiomRecognizeLegacyPass(); }		Pass *llvm::createLoopIdiomPass() { return new LoopIdiomRecognizeLegacyPass(); }
		Pass *llvm::createLoopVersioningIdiomPass() {
		return new LoopVersioningIdiomRecognizeLegacyPass();
		}

static void deleteDeadInstruction(Instruction *I) {		static void deleteDeadInstruction(Instruction *I) {
I->replaceAllUsesWith(UndefValue::get(I->getType()));		I->replaceAllUsesWith(UndefValue::get(I->getType()));
I->eraseFromParent();		I->eraseFromParent();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Implementation of LoopIdiomRecognize		// Implementation of LoopIdiomRecognize
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool LoopIdiomRecognize::runOnLoop(Loop *L) {		bool LoopIdiomRecognize::runOnLoop(Loop *L) {
CurLoop = L;		CurLoop = L;
// If the loop could not be converted to canonical form, it must have an		// If the loop could not be converted to canonical form, it must have an
// indirectbr in it, just give up.		// indirectbr in it, just give up.
if (!L->getLoopPreheader())		if (!L->getLoopPreheader())
return false;		return false;

// Disable loop idiom recognition if the function's name is a common idiom.		// Disable loop idiom recognition if the function's name is a common idiom.
StringRef Name = L->getHeader()->getParent()->getName();		StringRef Name = L->getHeader()->getParent()->getName();
if (Name == "memset" \|\| Name == "memcpy")		if (Name == "memset" \|\| Name == "memcpy" \|\| Name == "memmove")
return false;		return false;

// Determine if code size heuristics need to be applied.		// Determine if code size heuristics need to be applied.
ApplyCodeSizeHeuristics =		ApplyCodeSizeHeuristics =
L->getHeader()->getParent()->optForSize() && UseLIRCodeSizeHeurs;		L->getHeader()->getParent()->optForSize() && UseLIRCodeSizeHeurs;

HasMemset = TLI->has(LibFunc_memset);		HasMemset = TLI->has(LibFunc_memset);
HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);		HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
HasMemcpy = TLI->has(LibFunc_memcpy);		HasMemcpy = TLI->has(LibFunc_memcpy);
		HasMemmove = TLI->has(LibFunc_memmove);

if (HasMemset \|\| HasMemsetPattern \|\| HasMemcpy)		if (HasMemset \|\| HasMemsetPattern \|\| HasMemcpy \|\| HasMemmove)
if (SE->hasLoopInvariantBackedgeTakenCount(L))		if (SE->hasLoopInvariantBackedgeTakenCount(L))
return runOnCountableLoop();		return runOnCountableLoop();

return runOnNoncountableLoop();		return runOnNoncountableLoop();
}		}

bool LoopIdiomRecognize::runOnCountableLoop() {		bool LoopIdiomRecognize::runOnCountableLoop() {
const SCEV *BECount = SE->getBackedgeTakenCount(CurLoop);		const SCEV *BECount = SE->getBackedgeTakenCount(CurLoop);
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	return true;
StorePtr->getType()->getPointerAddressSpace() == 0 &&		StorePtr->getType()->getPointerAddressSpace() == 0 &&
(PatternValue = getMemSetPatternValue(StoredVal, DL))) {		(PatternValue = getMemSetPatternValue(StoredVal, DL))) {
// It looks like we can use PatternValue!		// It looks like we can use PatternValue!
ForMemsetPattern = true;		ForMemsetPattern = true;
return true;		return true;
}		}

// Otherwise, see if the store can be turned into a memcpy.		// Otherwise, see if the store can be turned into a memcpy.
if (HasMemcpy) {		if (HasMemcpy \|\| HasMemmove) {
		efriedmaUnsubmitted Not Done Reply Inline Actions This would not work right for if the target has memmove, but not memcpy. Maybe not an issue in practice, but still kind of confusing. efriedma: This would not work right for if the target has memmove, but not memcpy. Maybe not an issue in…
// Check to see if the stride matches the size of the store. If so, then we		// Check to see if the stride matches the size of the store. If so, then we
// know that every byte is touched in the loop.		// know that every byte is touched in the loop.
APInt Stride = getStoreStride(StoreEv);		APInt Stride = getStoreStride(StoreEv);
unsigned StoreSize = getStoreSizeInBytes(SI, DL);		unsigned StoreSize = getStoreSizeInBytes(SI, DL);
if (StoreSize != Stride && StoreSize != -Stride)		if (StoreSize != Stride && StoreSize != -Stride)
return false;		return false;

// The store must be feeding a non-volatile load.		// The store must be feeding a non-volatile load.
▲ Show 20 Lines • Show All 435 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::processLoopStridedStore(
// Okay, the memset has been formed. Zap the original store and anything that		// Okay, the memset has been formed. Zap the original store and anything that
// feeds into it.		// feeds into it.
for (auto *I : Stores)		for (auto *I : Stores)
deleteDeadInstruction(I);		deleteDeadInstruction(I);
++NumMemSet;		++NumMemSet;
return true;		return true;
}		}

		// \brief Check if the instructions in Insts, together with their dependencies
		// cover the loop in the sense that the loop could be safely eliminated once
		// the instructions in Insts are removed.
		bool LoopIdiomRecognize::coverLoop(Loop *L,
		SmallVectorImpl<Instruction*> &Insts) const {
		SmallSet<BasicBlock*,8> LoopBlocks;
		for (auto *B : L->blocks())
		LoopBlocks.insert(B);

		SetVector<Instruction*> Worklist(Insts.begin(), Insts.end());

		// Collect all instructions from the loop that the instructions in Insts
		// depend on (plus their dependencies, etc.). These instructions will
		// constitute the expression trees that feed those in Insts, but the trees
		// will be limited only to instructions contained in the loop.
		for (unsigned i = 0; i < Worklist.size(); ++i) {
		Instruction *In = Worklist[i];
		for (auto I = In->op_begin(), E = In->op_end(); I != E; ++I) {
		Instruction *OpI = dyn_cast<Instruction>(I);
		if (!OpI)
		continue;
		BasicBlock *PB = OpI->getParent();
		if (!LoopBlocks.count(PB))
		continue;
		Worklist.insert(OpI);
		}
		}
		efriedmaUnsubmitted Not Done Reply Inline Actions This whole worklist thing looks very suspicious; for the purpose of figuring out whether the memmove covers the loop, why do you need to special-case the pointer operand of the load instruction? efriedma: This whole worklist thing looks very suspicious; for the purpose of figuring out whether the…
		kparzyszUnsubmitted Not Done Reply Inline Actions Could you explain the "special-casing" comment? I'm not seeing what you are referring to. kparzysz: Could you explain the "special-casing" comment? I'm not seeing what you are referring to.
		efriedmaUnsubmitted Not Done Reply Inline Actions Suppose, for example, you have something like this: ; Assume this is a loop, where %p and %q are induction variables. %p2 = call i8* @foo(i8* returned %p) %v = load i8* %p2 store i8 %v, i8* %q You start off with the load and store on the worklist, add the call to the worklist, then conclude the loop is covered even though the call could have other side-effects. Granted, that's probably unlikely to happen in practice, but I'm not sure what this loop is trying to accomplish. efriedma: Suppose, for example, you have something like this: ; Assume this is a loop, where %p and…
		kparzyszUnsubmitted Not Done Reply Inline Actions The other side-effects should be checked elsewhere. This wouldn't be a strided load, so it shouldn't even get this far. kparzysz: The other side-effects should be checked elsewhere. This wouldn't be a strided load, so it…
		efriedmaUnsubmitted Not Done Reply Inline Actions The other side-effects should be checked elsewhere. Elsewhere, where? Isn't the point of coverLoop to check for side-effects? This wouldn't be a strided load, so it shouldn't even get this far. It's a strided load because of the "returned" attribute on the parameter call. (IIRC, SCEV doesn't actually look through calls like this at the moment, but it could.) efriedma: > The other side-effects should be checked elsewhere. Elsewhere, where? Isn't the point of…
		kparzyszUnsubmitted Not Done Reply Inline Actions The point of this loop is to see if the loop has other stuff in it besides the load and the store, and any instructions that are only used by the load and the store. In other words, if we removed the load and the store, and all instructions that would become recursively dead, would this loop still have anything left in it. If not, it means that the code used (only) by the load/store covers the loop entirely. The side-effects of the code related to the load/store should be checked somewhere else, but not here. If it's not checked, the checks should be added. kparzysz: The point of this loop is to see if the loop has other stuff in it besides the load and the…

		// Scan all instructions in the loop, if any of them have a user outside
		// of the loop, or outside of the expressions collected above, then either
		// the loop has a side-effect visible outside of it, or there are
		// instructions in it that are not involved in the original set Insts.
		for (auto *B : L->blocks()) {
		for (auto &In : *B) {
		if (isa<BranchInst>(In) \|\| isa<DbgInfoIntrinsic>(In))
		continue;
		if (!Worklist.count(&In) && In.mayHaveSideEffects())
		return false;
		for (const auto &K : In.users()) {
		Instruction *UseI = dyn_cast<Instruction>(K);
		if (!UseI)
		continue;
		BasicBlock *UseB = UseI->getParent();
		if (LI->getLoopFor(UseB) != L)
		return false;
		}
		}
		}

		return true;
		}

/// If the stored value is a strided load in the same loop with the same stride		/// If the stored value is a strided load in the same loop with the same stride
/// this may be transformable into a memcpy. This kicks in for stuff like		/// this may be transformable into a memcpy. This kicks in for stuff like
/// for (i) A[i] = B[i];		/// for (i) A[i] = B[i];
bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,		bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
const SCEV *BECount) {		const SCEV *BECount) {
assert(SI->isSimple() && "Expected only non-volatile stores.");		assert(SI->isSimple() && "Expected only non-volatile stores.");

Value *StorePtr = SI->getPointerOperand();		Value *StorePtr = SI->getPointerOperand();
const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));		auto *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
APInt Stride = getStoreStride(StoreEv);		APInt Stride = getStoreStride(StoreEv);
unsigned StoreSize = getStoreSizeInBytes(SI, DL);		unsigned StoreSize = getStoreSizeInBytes(SI, DL);
bool NegStride = StoreSize == -Stride;		bool NegStride = StoreSize == -Stride;

// The store must be feeding a non-volatile load.		// The store must be feeding a non-volatile load.
LoadInst *LI = cast<LoadInst>(SI->getValueOperand());		LoadInst *LdI = cast<LoadInst>(SI->getValueOperand());
assert(LI->isSimple() && "Expected only non-volatile stores.");		assert(LdI->isSimple() && "Expected only non-volatile stores.");

// See if the pointer expression is an AddRec like {base,+,1} on the current		// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided load. If we have something else, it's a		// loop, which indicates a strided load. If we have something else, it's a
// random load we can't handle.		// random load we can't handle.
const SCEVAddRecExpr *LoadEv =		auto *LoadEv = cast<SCEVAddRecExpr>(SE->getSCEV(LdI->getPointerOperand()));
cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));

// The trip count of the loop and the base pointer of the addrec SCEV is		// The trip count of the loop and the base pointer of the addrec SCEV is
// guaranteed to be loop invariant, which means that it should dominate the		// guaranteed to be loop invariant, which means that it should dominate the
// header. This allows us to insert code for it in the preheader.		// header. This allows us to insert code for it in the preheader.
BasicBlock *Preheader = CurLoop->getLoopPreheader();		BasicBlock *Preheader = CurLoop->getLoopPreheader();
IRBuilder<> Builder(Preheader->getTerminator());		Instruction *ExpPt = Preheader->getTerminator();
		IRBuilder<> Builder(ExpPt);
SCEVExpander Expander(SE, DL, "loop-idiom");		SCEVExpander Expander(SE, DL, "loop-idiom");

const SCEV *StrStart = StoreEv->getStart();		const SCEV *StrStart = StoreEv->getStart();
unsigned StrAS = SI->getPointerAddressSpace();		unsigned StrAS = SI->getPointerAddressSpace();
Type IntPtrTy = Builder.getIntPtrTy(DL, StrAS);		Type IntPtrTy = Builder.getIntPtrTy(DL, StrAS);

// Handle negative strided loops.		// Handle negative strided loops.
if (NegStride)		if (NegStride)
StrStart = getStartForNegStride(StrStart, BECount, IntPtrTy, StoreSize, SE);		StrStart = getStartForNegStride(StrStart, BECount, IntPtrTy, StoreSize, SE);

// Okay, we have a strided store "p[i]" of a loaded value. We can turn		// Okay, we have a strided store "p[i]" of a loaded value. We can turn
// this into a memcpy in the loop preheader now if we want. However, this		// this into a memcpy in the loop preheader now if we want. However, this
// would be unsafe to do if there is anything else in the loop that may read		// would be unsafe to do if there is anything else in the loop that may read
// or write the memory region we're storing to. This includes the load that		// or write the memory region we're storing to. This includes the load that
// feeds the stores. Check for an alias by generating the base address and		// feeds the stores. Check for an alias by generating the base address and
// checking everything.		// checking everything.
Value *StoreBasePtr = Expander.expandCodeFor(		Value *StoreBasePtr =
StrStart, Builder.getInt8PtrTy(StrAS), Preheader->getTerminator());		Expander.expandCodeFor(StrStart, Builder.getInt8PtrTy(StrAS), ExpPt);
		Value *LoadBasePtr = nullptr;

		bool RuntimeCheck = false;
		SmallPtrSet<Instruction *, 2> Ignore1;
		Ignore1.insert(SI);
		if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,
		StoreSize, *AA, Ignore1)) {
		if (!UseLoopVersioning)
		goto CleanupAndExit;

SmallPtrSet<Instruction *, 1> Stores;		// Check if the load is the offending instruction.
Stores.insert(SI);		Ignore1.insert(LdI);
if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,		if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,
StoreSize, *AA, Stores)) {		StoreSize, *AA, Ignore1)) {
		CleanupAndExit:
Expander.clear();		Expander.clear();
// If we generated new code for the base pointer, clean up.		// If we generated new code for the base pointer, clean up.
		if (StoreBasePtr && (LoadBasePtr != StoreBasePtr)) {
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);		RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
		StoreBasePtr = nullptr;
		}
		if (LoadBasePtr) {
		RecursivelyDeleteTriviallyDeadInstructions(LoadBasePtr, TLI);
		LoadBasePtr = nullptr;
		}
return false;		return false;
}		}
		// It worked with the load ignored: we need a runtime check to disambiguate
		// the dependence relation between the load and the store.
		RuntimeCheck = true;

		// In case of a memmove, the call to memmove will be executed instead
		// of the loop, so we need to make sure that there is nothing else in
		// the loop than the load, store and instructions that these two depend
		// on.
		SmallVector<Instruction*,2> Insts;
		Insts.push_back(SI);
		Insts.push_back(LdI);
		if (!coverLoop(CurLoop, Insts))
		goto CleanupAndExit;
		}

		if (RuntimeCheck) {
		if (DisableMemmoveIdiom)
		goto CleanupAndExit;
		} else if (DisableMemcpyIdiom)
		goto CleanupAndExit;

		if (OnlyNonNestedMemmove && CurLoop->getParentLoop())
		goto CleanupAndExit;

const SCEV *LdStart = LoadEv->getStart();		const SCEV *LdStart = LoadEv->getStart();
unsigned LdAS = LI->getPointerAddressSpace();		unsigned LdAS = LdI->getPointerAddressSpace();

// Handle negative strided loops.		// Handle negative strided loops.
if (NegStride)		if (NegStride)
LdStart = getStartForNegStride(LdStart, BECount, IntPtrTy, StoreSize, SE);		LdStart = getStartForNegStride(LdStart, BECount, IntPtrTy, StoreSize, SE);

// For a memcpy, we have to make sure that the input array is not being		// For a memcpy, we have to make sure that the input array is not being
// mutated by the loop.		// mutated by the loop.
Value *LoadBasePtr = Expander.expandCodeFor(		LoadBasePtr = Expander.expandCodeFor(
LdStart, Builder.getInt8PtrTy(LdAS), Preheader->getTerminator());		LdStart, Builder.getInt8PtrTy(LdAS), Preheader->getTerminator());

		SmallPtrSet<Instruction*, 2> Ignore2;
		Ignore2.insert(SI);
if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,		if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,
*AA, Stores)) {		*AA, Ignore2))
Expander.clear();		goto CleanupAndExit;
// If we generated new code for the base pointer, clean up.
RecursivelyDeleteTriviallyDeadInstructions(LoadBasePtr, TLI);
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
return false;
}

if (avoidLIRForMultiBlockLoop())		if (avoidLIRForMultiBlockLoop())
return false;		goto CleanupAndExit;

// Okay, everything is safe, we can transform this!		// The runtime check needs a single exit block.
		BasicBlock *ExitB = nullptr;
		if (RuntimeCheck) {
		ExitB = CurLoop->getUniqueExitBlock();
		if (!ExitB)
		goto CleanupAndExit;

		// Make sure the loop is in simple form.
		assert(CurLoop->isLoopSimplifyForm() &&
		"loop versioning idiom pass should run on simple loops");
		}

// The # stored bytes is (BECount+1)*Size. Expand the trip count out to		// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
// pointer size if it isn't already.		// pointer size if it isn't already.
		LLVMContext &Ctx = SI->getContext();
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);		BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);
		unsigned Align = std::min(SI->getAlignment(), LdI->getAlignment());
		efriedmaUnsubmitted Done Reply Inline Actions getUniqueExitBlock? efriedma: getUniqueExitBlock?
		DebugLoc DLoc = SI->getDebugLoc();

const SCEV *NumBytesS =		const SCEV *NumBytesS =
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);		SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
if (StoreSize != 1)		if (StoreSize != 1)
NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),		NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
SCEV::FlagNUW);		SCEV::FlagNUW);

Value *NumBytes =		Value *NumBytes = Expander.expandCodeFor(NumBytesS, IntPtrTy, ExpPt);
Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());		CallInst *NewCall;

		efriedmaUnsubmitted Not Done Reply Inline Actions Why do we need to call SimplifyInstruction here? efriedma: Why do we need to call SimplifyInstruction here?
		kparzyszUnsubmitted Not Done Reply Inline Actions I guess back when this was written, it made some difference. I don't remember what it was though. Do you think it's unnecessary now? kparzysz: I guess back when this was written, it made some difference. I don't remember what it was…
		efriedmaUnsubmitted Done Reply Inline Actions I can't see why it would make a difference... SimplifyInstruction probably won't simplify a non-trivial SCEV expression to a ConstantInt, and the rest will get cleaned up by instcombine later. efriedma: I can't see why it would make a difference... SimplifyInstruction probably won't simplify a non…
CallInst *NewCall =		if (!RuntimeCheck) {
Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes,		NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
std::min(SI->getAlignment(), LI->getAlignment()));
NewCall->setDebugLoc(SI->getDebugLoc());		NewCall->setDebugLoc(SI->getDebugLoc());
		++NumMemCpy;
DEBUG(dbgs() << " Formed memcpy: " << *NewCall << "\n"
<< " from load ptr=" << LoadEv << " at: " << LI << "\n"
<< " from store ptr=" << StoreEv << " at: " << SI << "\n");

// Okay, the memcpy has been formed. Zap the original store and anything that		// Okay, the memcpy has been formed. Zap the original store and anything that
// feeds into it.		// feeds into it.
deleteDeadInstruction(SI);		deleteDeadInstruction(SI);
++NumMemCpy;		} else {
		unsigned Threshold = RuntimeMemSizeThreshold;
		if (ConstantInt *CI = dyn_cast<ConstantInt>(NumBytes)) {
		uint64_t C = CI->getZExtValue();
		if (Threshold != 0 && C < Threshold)
		goto CleanupAndExit;
		if (C < CompileTimeMemSizeThreshold)
		goto CleanupAndExit;
		}

		BasicBlock *Header = CurLoop->getHeader();
		Function *Func = Header->getParent();
		Loop *ParentL = LI->getLoopFor(Preheader);
		StringRef HeaderName = Header->getName();

		// Create a new (empty) preheader, and update the PHI nodes in the
		// header to use the new preheader.
		BasicBlock *NewPreheader = BasicBlock::Create(Ctx, HeaderName+".rtli.ph",
		Func, Header);
		if (ParentL)
		ParentL->addBasicBlockToLoop(NewPreheader, *LI);
		IRBuilder<>(NewPreheader).CreateBr(Header);
		for (auto &In : *Header) {
		PHINode *PN = dyn_cast<PHINode>(&In);
		if (!PN)
		break;
		int bx = PN->getBasicBlockIndex(Preheader);
		if (bx >= 0)
		PN->setIncomingBlock(bx, NewPreheader);
		}
		DT->addNewBlock(NewPreheader, Preheader);
		DT->changeImmediateDominator(Header, NewPreheader);

		// Check for safe conditions to execute memmove.
		// If stride is positive, copying things from higher to lower addresses
		// is equivalent to memmove. For negative stride, it's the other way
		// around. Copying forward in memory with positive stride may not be
		// same as memmove since we may be copying values that we just stored
		// in some previous iteration.
		Value *LA = Builder.CreatePtrToInt(LoadBasePtr, IntPtrTy);
		Value *SA = Builder.CreatePtrToInt(StoreBasePtr, IntPtrTy);
		Value *LowA = NegStride ? LA : SA;
		Value *HighA = NegStride ? SA : LA;
		// Check whether the dependence is a Write After Read: a WAR dependence has
		// the same semantics as memmove.
		Value *IsWAR = Builder.CreateICmpULT(LowA, HighA);

		// Check for distance between pointers: in the case of a RAW dependence, we
		// need to make sure that there is no overlap.
		Value *Dist = Builder.CreateSub(HighA, LowA);
		Value *CmpD = Builder.CreateICmpSLT(NumBytes, Dist);
		Value *CmpEither = Builder.CreateOr(IsWAR, CmpD);
		Value *Cond = CmpEither;

		if (Threshold != 0) {
		Type *Ty = NumBytes->getType();
		Value *Thr = ConstantInt::get(Ty, Threshold);
		Value *CmpB = Builder.CreateICmpULT(Thr, NumBytes);
		Value *CmpBoth = Builder.CreateAnd(Cond, CmpB);
		Cond = CmpBoth;
		}

		BasicBlock *MemmoveB = BasicBlock::Create(Ctx, Header->getName() + ".rtli",
		Func, NewPreheader);
		if (ParentL)
		ParentL->addBasicBlockToLoop(MemmoveB, *LI);
		Instruction *OldT = Preheader->getTerminator();
		Builder.CreateCondBr(Cond, MemmoveB, NewPreheader);
		OldT->eraseFromParent();
		Preheader->setName(Preheader->getName() + ".old");
		DT->addNewBlock(MemmoveB, Preheader);
		DT->getNode(ExitB)->setIDom(DT->getNode(Preheader));

		// Add a call to memmove in the conditional block.
		efriedmaUnsubmitted Done Reply Inline Actions Can we simplify this code by requiring LoopSimplify? (hasDedicatedExits() makes updating the domtree much more straightforward.) efriedma: Can we simplify this code by requiring LoopSimplify? (hasDedicatedExits() makes updating the…
		IRBuilder<> CondBuilder(MemmoveB);
		CondBuilder.CreateBr(ExitB);
		efriedmaUnsubmitted Not Done Reply Inline Actions This is breaking LoopSimplify form for the loop. efriedma: This is breaking LoopSimplify form for the loop.
		CondBuilder.SetInsertPoint(MemmoveB->getTerminator());
		NewCall =
		CondBuilder.CreateMemMove(StoreBasePtr, LoadBasePtr, NumBytes, Align);
		NewCall->setDebugLoc(SI->getDebugLoc());
		++NumMemMove;
		}

		DEBUG(dbgs() << " Formed " << (!RuntimeCheck ? "memcpy: " : "memmove: ")
		<< *NewCall << "\n"
		<< " from load ptr=" << LoadEv << " at: " << LdI << "\n"
		<< " from store ptr=" << StoreEv << " at: " << SI << "\n");

return true;		return true;
}		}

// When compiling for codesize we avoid idiom recognition for a multi-block loop		// When compiling for codesize we avoid idiom recognition for a multi-block loop
// unless it is a loop_memset idiom or a memset/memcpy idiom in a nested loop.		// unless it is a loop_memset idiom or a memset/memcpy idiom in a nested loop.
//		//
bool LoopIdiomRecognize::avoidLIRForMultiBlockLoop(bool IsMemset,		bool LoopIdiomRecognize::avoidLIRForMultiBlockLoop(bool IsMemset,
bool IsLoopMemset) {		bool IsLoopMemset) {
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLoopPredicationLegacyPassPass(Registry);		initializeLoopPredicationLegacyPassPass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeLoopVersioningLICMPass(Registry);		initializeLoopVersioningLICMPass(Registry);
initializeLoopIdiomRecognizeLegacyPassPass(Registry);		initializeLoopIdiomRecognizeLegacyPassPass(Registry);
		initializeLoopVersioningIdiomRecognizeLegacyPassPass(Registry);
initializeLowerAtomicLegacyPassPass(Registry);		initializeLowerAtomicLegacyPassPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
initializeLowerGuardIntrinsicLegacyPassPass(Registry);		initializeLowerGuardIntrinsicLegacyPassPass(Registry);
initializeMemCpyOptLegacyPassPass(Registry);		initializeMemCpyOptLegacyPassPass(Registry);
initializeMergedLoadStoreMotionLegacyPassPass(Registry);		initializeMergedLoadStoreMotionLegacyPassPass(Registry);
initializeNaryReassociateLegacyPassPass(Registry);		initializeNaryReassociateLegacyPassPass(Registry);
initializePartiallyInlineLibCallsLegacyPassPass(Registry);		initializePartiallyInlineLibCallsLegacyPassPass(Registry);
initializeReassociateLegacyPassPass(Registry);		initializeReassociateLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {		void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopDeletionPass());		unwrap(PM)->add(createLoopDeletionPass());
}		}

void LLVMAddLoopIdiomPass(LLVMPassManagerRef PM) {		void LLVMAddLoopIdiomPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopIdiomPass());		unwrap(PM)->add(createLoopIdiomPass());
}		}

		void LLVMAddLoopVersioningIdiomPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createLoopVersioningIdiomPass());
		}

void LLVMAddLoopRotatePass(LLVMPassManagerRef PM) {		void LLVMAddLoopRotatePass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopRotatePass());		unwrap(PM)->add(createLoopRotatePass());
}		}

void LLVMAddLoopRerollPass(LLVMPassManagerRef PM) {		void LLVMAddLoopRerollPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopRerollPass());		unwrap(PM)->add(createLoopRerollPass());
}		}

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/CodeGen/Hexagon/loop-idiom/hexagon-memmove1.ll

This file was deleted.

	; Check for recognizing the "memmove" idiom.
	; RUN: opt -basicaa -hexagon-loop-idiom -S -mtriple hexagon-unknown-elf < %s \
	; RUN: \| FileCheck %s
	; CHECK: call void @llvm.memmove

	; Function Attrs: norecurse nounwind
	define void @foo(i32* nocapture %A, i32* nocapture readonly %B, i32 %n) #0 {
	entry:
	%cmp1 = icmp sgt i32 %n, 0
	br i1 %cmp1, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry
	%arrayidx.gep = getelementptr i32, i32* %B, i32 0
	%arrayidx1.gep = getelementptr i32, i32* %A, i32 0
	br label %for.body

	for.body: ; preds = %for.body.preheader, %for.body
	%arrayidx.phi = phi i32* [ %arrayidx.gep, %for.body.preheader ], [ %arrayidx.inc, %for.body ]
	%arrayidx1.phi = phi i32* [ %arrayidx1.gep, %for.body.preheader ], [ %arrayidx1.inc, %for.body ]
	%i.02 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
	%0 = load i32, i32* %arrayidx.phi, align 4
	store i32 %0, i32* %arrayidx1.phi, align 4
	%inc = add nuw nsw i32 %i.02, 1
	%exitcond = icmp ne i32 %inc, %n
	%arrayidx.inc = getelementptr i32, i32* %arrayidx.phi, i32 1
	%arrayidx1.inc = getelementptr i32, i32* %arrayidx1.phi, i32 1
	br i1 %exitcond, label %for.body, label %for.end.loopexit

	for.end.loopexit: ; preds = %for.body
	br label %for.end

	for.end: ; preds = %for.end.loopexit, %entry
	ret void
	}

	attributes #0 = { nounwind }

llvm/test/CodeGen/Hexagon/loop-idiom/hexagon-memmove2.ll

This file was deleted.

	; RUN: opt -basicaa -hexagon-loop-idiom -S -mtriple hexagon-unknown-elf < %s \
	; RUN: \| FileCheck %s

	define void @PR14241(i32* %s, i64 %size) #0 {
	; Ensure that we don't form a memcpy for strided loops. Briefly, when we taught
	; LoopIdiom about memmove and strided loops, this got miscompiled into a memcpy
	; instead of a memmove. If we get the memmove transform back, this will catch
	; regressions.
	;
	; CHECK-LABEL: @PR14241(

	entry:
	%end.idx = add i64 %size, -1
	%end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx
	br label %while.body
	; CHECK-NOT: memcpy
	; CHECK: memmove

	while.body:
	%phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ]
	%src.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 1
	%val = load i32, i32* %src.ptr, align 4
	; CHECK: load
	%dst.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 0
	store i32 %val, i32* %dst.ptr, align 4
	; CHECK: store
	%next.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 1
	%cmp = icmp eq i32* %next.ptr, %end.ptr
	br i1 %cmp, label %exit, label %while.body

	exit:
	ret void
	; CHECK: ret void
	}

	attributes #0 = { nounwind }

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass			; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
	; CHECK-O-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>			; CHECK-O-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>
	; CHECK-O-NEXT: Starting llvm::Function pass manager run.			; CHECK-O-NEXT: Starting llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
				; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopVersioningIdiomRecognizePass
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Other/pass-pipelines.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-O2-NEXT: Deduce function attributes in RPO			; CHECK-O2-NEXT: Deduce function attributes in RPO
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; Next is the late function pass pipeline.			; Next is the late function pass pipeline.
	; CHECK-O2: FunctionPass Manager			; CHECK-O2: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; We rotate loops prior to vectorization.			; We rotate loops prior to vectorization.
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
	; CHECK-O2-NEXT: Rotate Loops			; CHECK-O2-NEXT: Rotate Loops
				; CHECK-O2: Loop Pass Manager
				; CHECK-O2-NEXT: Recognize loop idioms with loop versioning
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: Loop Vectorization			; CHECK-O2: Loop Vectorization
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: SLP Vectorizer			; CHECK-O2: SLP Vectorizer
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; After vectorization we do partial unrolling.			; After vectorization we do partial unrolling.
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
	; CHECK-O2-NEXT: Unroll loops			; CHECK-O2-NEXT: Unroll loops
	Show All 20 Lines

llvm/test/Transforms/LoopIdiom/basic.ll

	Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test13_pattern(			; CHECK-LABEL: @test13_pattern(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: bitcast			; CHECK-NEXT: bitcast
	; CHECK-NEXT: memset_pattern			; CHECK-NEXT: memset_pattern
	; CHECK-NOT: store			; CHECK-NOT: store
	; CHECK: ret void			; CHECK: ret void
	}			}



	; PR9815 - This is a partial overlap case that cannot be safely transformed			; PR9815 - This is a partial overlap case that cannot be safely transformed
	; into a memcpy.			; into a memcpy.
	@g_50 = global [7 x i32] [i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0], align 16			@g_50 = global [7 x i32] [i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0], align 16

	define i32 @test14() nounwind {			define i32 @test14() nounwind {
	entry:			entry:
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIdiom/memmove.ll

This file was added.

				; RUN: opt -S -basicaa -loop-versioning-idiom < %s \| FileCheck %s
				; RUN: opt -S -basicaa -loop-idiom < %s \| FileCheck %s -check-prefix=LOOP-IDIOM

				declare i64 @foo() nounwind

				; CHECK-LABEL: @test1
				; CHECK: call void @llvm.memmove.p0i8.p0i8.i64(
				; LOOP-IDIOM-LABEL: @test1
				; LOOP-IDIOM-NOT: call void @llvm.memmove.p0i8.p0i8.i64(

				; Nested loops
				define void @test1(i8* nocapture %A, i64 %n) nounwind {
				entry:
				%call8 = tail call i64 @foo() nounwind
				%tobool9 = icmp eq i64 %call8, 0
				br i1 %tobool9, label %while.end, label %for.cond.preheader.lr.ph

				for.cond.preheader.lr.ph: ; preds = %entry
				%cmp6 = icmp eq i64 %n, 0
				br label %for.cond.preheader

				while.cond.loopexit: ; preds = %for.body, %for.cond.preheader
				%call = tail call i64 @foo() nounwind
				%tobool = icmp eq i64 %call, 0
				br i1 %tobool, label %while.end, label %for.cond.preheader

				for.cond.preheader: ; preds = %for.cond.preheader.lr.ph, %while.cond.loopexit
				br i1 %cmp6, label %while.cond.loopexit, label %for.body

				for.body: ; preds = %for.cond.preheader, %for.body
				%i.07 = phi i64 [ %inc, %for.body ], [ 0, %for.cond.preheader ]
				%add = add i64 %i.07, 10
				%arrayidx = getelementptr inbounds i8, i8* %A, i64 %add
				%0 = load i8, i8* %arrayidx, align 1
				%arrayidx1 = getelementptr inbounds i8, i8* %A, i64 %i.07
				store i8 %0, i8* %arrayidx1, align 1
				%inc = add i64 %i.07, 1
				%exitcond = icmp eq i64 %inc, %n
				br i1 %exitcond, label %while.cond.loopexit, label %for.body

				while.end: ; preds = %while.cond.loopexit, %entry
				ret void
				}

llvm/test/Transforms/LoopIdiom/memmove1.ll

This file was added.

				; Check for recognizing the "memmove" idiom.
				; RUN: opt -loop-versioning-idiom -S < %s \| FileCheck %s

				; CHECK-LABEL: @foo(
				; CHECK: call void @llvm.memmove

				define void @foo(i32* nocapture %A, i32* nocapture readonly %B, i32 %n) #0 {
				entry:
				%cmp1 = icmp sgt i32 %n, 0
				br i1 %cmp1, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				%arrayidx.gep = getelementptr i32, i32* %B, i32 0
				%arrayidx1.gep = getelementptr i32, i32* %A, i32 0
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%arrayidx.phi = phi i32* [ %arrayidx.gep, %for.body.preheader ], [ %arrayidx.inc, %for.body ]
				%arrayidx1.phi = phi i32* [ %arrayidx1.gep, %for.body.preheader ], [ %arrayidx1.inc, %for.body ]
				%i.02 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%0 = load i32, i32* %arrayidx.phi, align 4
				store i32 %0, i32* %arrayidx1.phi, align 4
				%inc = add nuw nsw i32 %i.02, 1
				%exitcond = icmp ne i32 %inc, %n
				%arrayidx.inc = getelementptr i32, i32* %arrayidx.phi, i32 1
				%arrayidx1.inc = getelementptr i32, i32* %arrayidx1.phi, i32 1
				br i1 %exitcond, label %for.body, label %for.end.loopexit

				for.end.loopexit: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				ret void
				}

				attributes #0 = { nounwind }

llvm/test/Transforms/LoopIdiom/memmove2.ll

This file was added.

				; RUN: opt -loop-versioning-idiom -S < %s \| FileCheck %s

				define void @PR14241(i32* %s, i64 %size) #0 {
				; Ensure that we don't form a memcpy for strided loops. Briefly, when we taught
				; LoopIdiom about memmove and strided loops, this got miscompiled into a memcpy
				; instead of a memmove. If we get the memmove transform back, this will catch
				; regressions.
				;
				; CHECK-LABEL: @PR14241(

				entry:
				%end.idx = add i64 %size, -1
				%end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx
				br label %while.body
				; CHECK-NOT: memcpy
				; CHECK: memmove

				while.body:
				%phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ]
				%src.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 1
				%val = load i32, i32* %src.ptr, align 4
				; CHECK: load
				%dst.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 0
				store i32 %val, i32* %dst.ptr, align 4
				; CHECK: store
				%next.ptr = getelementptr inbounds i32, i32* %phi.ptr, i64 1
				%cmp = icmp eq i32* %next.ptr, %end.ptr
				br i1 %cmp, label %exit, label %while.body

				exit:
				ret void
				; CHECK: ret void
				}

				attributes #0 = { nounwind }

llvm/test/Transforms/LoopIdiom/memset_noidiom.ll

	Show All 22 Lines

	for.cond.for.end_crit_edge: ; preds = %for.body			for.cond.for.end_crit_edge: ; preds = %for.body
	br label %for.end			br label %for.end

	for.end: ; preds = %for.cond.for.end_crit_edge, %entry			for.end: ; preds = %for.cond.for.end_crit_edge, %entry
	ret i8* %b			ret i8* %b
	}			}

				; CHECK-LABEL: @memcpy
				; CHECK-NOT: llvm.memcpy
				define i8* @memcpy(i8* noalias %dst, i8* noalias %src, i64 %n) nounwind {
				entry:
				%tobool3 = icmp eq i64 %n, 0
				br i1 %tobool3, label %while.end, label %while.body

				while.body: ; preds = %entry, %while.body
				%c2.06 = phi i8* [ %incdec.ptr, %while.body ], [ %src, %entry ]
				%c1.05 = phi i8* [ %incdec.ptr1, %while.body ], [ %dst, %entry ]
				%n.addr.04 = phi i64 [ %dec, %while.body ], [ %n, %entry ]
				%dec = add i64 %n.addr.04, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %c2.06, i64 1
				%0 = load i8, i8* %c2.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %c1.05, i64 1
				store i8 %0, i8* %c1.05, align 1
				%tobool = icmp eq i64 %dec, 0
				br i1 %tobool, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry
				ret i8* %dst
				}

				; CHECK-LABEL: @memmove
				; CHECK-NOT: llvm.memmove
				define i8* @memmove(i8* %dst, i8* nocapture %src, i64 %count) nounwind {
				entry:
				%sub = add i64 %count, -1
				%tobool9 = icmp eq i64 %count, 0
				br i1 %tobool9, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				%add.ptr2 = getelementptr inbounds i8, i8* %src, i64 %sub
				%add.ptr = getelementptr inbounds i8, i8* %dst, i64 %sub
				br label %while.body

				while.body: ; preds = %while.body.lr.ph, %while.body
				%b.012 = phi i8* [ %add.ptr2, %while.body.lr.ph ], [ %incdec.ptr, %while.body ]
				%a.011 = phi i8* [ %add.ptr, %while.body.lr.ph ], [ %incdec.ptr3, %while.body ]
				%count.addr.010 = phi i64 [ %count, %while.body.lr.ph ], [ %dec, %while.body ]
				%dec = add i64 %count.addr.010, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %b.012, i64 -1
				%0 = load i8, i8* %b.012, align 1
				%incdec.ptr3 = getelementptr inbounds i8, i8* %a.011, i64 -1
				store i8 %0, i8* %a.011, align 1
				%tobool = icmp eq i64 %dec, 0
				br i1 %tobool, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry
				ret i8* %dst
				}

llvm/test/Transforms/LoopIdiom/pr28196.ll

	; RUN: opt -loop-idiom -S < %s \| FileCheck %s			; RUN: opt -loop-idiom -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	define void @test1() {			define void @test1() {
	entry:			entry:
	br label %for.body.preheader			br label %for.body.preheader

	for.body.preheader: ; preds = %for.cond			for.body.preheader: ; preds = %for.cond
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %for.body.preheader			for.body: ; preds = %for.body, %for.body.preheader
	Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LIR] re-enable generation of memmove with runtime checksNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 90343

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Target/Hexagon/HexagonLoopIdiomRecognition.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/test/CodeGen/Hexagon/loop-idiom/hexagon-memmove1.ll

llvm/test/CodeGen/Hexagon/loop-idiom/hexagon-memmove2.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/pass-pipelines.ll

llvm/test/Transforms/LoopIdiom/basic.ll

llvm/test/Transforms/LoopIdiom/memmove.ll

llvm/test/Transforms/LoopIdiom/memmove1.ll

llvm/test/Transforms/LoopIdiom/memmove2.ll

llvm/test/Transforms/LoopIdiom/memset_noidiom.ll

llvm/test/Transforms/LoopIdiom/pr28196.ll

[LIR] re-enable generation of memmove with runtime checks
Needs ReviewPublic