This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
CodeMetrics.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
LoopRotation.h
-
Utils/
-
LoopRotationUtils.h
-
lib/
-
Analysis/
-
CodeMetrics.cpp
-
Passes/
-
PassBuilder.cpp
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
-
LoopRotation.cpp
-
Utils/
-
LoopRotationUtils.cpp
-
test/Transforms/LoopRotate/
-
Transforms/
-
LoopRotate/
-
call-prepare-for-lto.ll

Differential D94232

[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP).
ClosedPublic

Authored by fhahn on Jan 7 2021, 6:14 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
sanwou01
asbirlea

Commits

rG83daa49758a1: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.

Summary

WIP because it is still a bit rough around the edges.

D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

I still need to think on how to best test the change. Perhaps add a
-prepare-for-lto flag to LoopRotate?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jan 7 2021, 6:14 AM

Herald added subscribers: hiraditya, inglorion. · View Herald TranscriptJan 7 2021, 6:14 AM

fhahn requested review of this revision.Jan 7 2021, 6:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2021, 6:14 AM

fhahn mentioned this in D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.Jan 7 2021, 6:16 AM

Thanks, Florian. I'll give the patch a run through our benchmarking.

it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call

This was the bit of information I was missing, thanks! With that, I think your proposed change makes a lot of sense.

Harbormaster completed remote builds in B84328: Diff 315120.Jan 7 2021, 7:06 AM

@fhahn I'm afraid I'm seeing runtime and verification errors on perlbench, gcc, gobmk, and astar in SPEC INT 2006. I'm guessing something later in the pipe really doesn't like it when certain loops aren't rotated?

In D94232#2484727, @sanwou01 wrote:

@fhahn I'm afraid I'm seeing runtime and verification errors on perlbench, gcc, gobmk, and astar in SPEC INT 2006. I'm guessing something later in the pipe really doesn't like it when certain loops aren't rotated?

Hmmm, I think it would be very surprising, if not rotating some loops would expose any serious mis-compiles in SPEC. When building with -Oz for example, hardly any loops are rotated.

What configuration are you using? I couldn't reproduce any SPEC2006 INT failures on either x86/ARM64 with -O3 -flto.

Our flags are -mcpu=native -O3 -fomit-frame-pointer -flto on a Neoverse-N1 based system, so -mcpu=neoverse-n1 should do the same thing.

It looks like this might be an issue on our end, possibly unrelated to the patch.

Harbormaster completed remote builds in B84445: Diff 315338.Jan 8 2021, 4:19 AM

Yep, definitely a problem on our end; I've got similar symptoms in other runs. Sorry about the noise, and I will try to get some perf data once we sort this out.

Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.

So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.

On SPEC 2017 there is still a problem with omnetpp_r (runtime error), which I'll have a look at next, but is otherwise neutral on geomean with only minor wobbles (+1%, -0.5% at most) on individual benchmarks.

With this having a substantial impact on performance, I'm quite keen to help get this in soon, ideally before the LLVM 12 branch, so please let me know if there is anything else I can do to help.

In D94232#2498633, @sanwou01 wrote:

So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.

Thank you very much for running the numbers!

Just to double check, positive here means good (I assume increase in score)?

On SPEC 2017 there is still a problem with omnetpp_r (runtime error), which I'll have a look at next, but is otherwise neutral on geomean with only minor wobbles (+1%, -0.5% at most) on individual benchmarks.

With this having a substantial impact on performance, I'm quite keen to help get this in soon, ideally before the LLVM 12 branch, so please let me know if there is anything else I can do to help.

Sounds good! I think the theory behind the change should mean it should be quite safe and the data so far seems encouraging; I can confirm similar improvements for astar on our tracking.

It would probably good to get this in well ahead of the 12.0 branch, so it has a chance to make it through various downstream perf-tracking systems.

In D94232#2498855, @fhahn wrote:

In D94232#2498633, @sanwou01 wrote:

So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.

Thank you very much for running the numbers!

Just to double check, positive here means good (I assume increase in score)?

Yes, positive is good :)

On SPEC 2017 there is still a problem with omnetpp_r (runtime error), which I'll have a look at next, but is otherwise neutral on geomean with only minor wobbles (+1%, -0.5% at most) on individual benchmarks.

With this having a substantial impact on performance, I'm quite keen to help get this in soon, ideally before the LLVM 12 branch, so please let me know if there is anything else I can do to help.

Sounds good! I think the theory behind the change should mean it should be quite safe and the data so far seems encouraging; I can confirm similar improvements for astar on our tracking.

It would probably good to get this in well ahead of the 12.0 branch, so it has a chance to make it through various downstream perf-tracking systems.

That's great! Agreed that this could do with a little soak on main before 12 branches, which IIRC is scheduled for 26 January?

In D94232#2500595, @sanwou01 wrote:

In D94232#2498855, @fhahn wrote:

In D94232#2498633, @sanwou01 wrote:

On SPEC 2017 there is still a problem with omnetpp_r (runtime error), which I'll have a look at next, but is otherwise neutral on geomean with only minor wobbles (+1%, -0.5% at most) on individual benchmarks.

With this having a substantial impact on performance, I'm quite keen to help get this in soon, ideally before the LLVM 12 branch, so please let me know if there is anything else I can do to help.

Sounds good! I think the theory behind the change should mean it should be quite safe and the data so far seems encouraging; I can confirm similar improvements for astar on our tracking.

It would probably good to get this in well ahead of the 12.0 branch, so it has a chance to make it through various downstream perf-tracking systems.

That's great! Agreed that this could do with a little soak on main before 12 branches, which IIRC is scheduled for 26 January?

Yes, I think the only thing outstanding with respect to testing is the omnetpp_r failure. Can you confirm if that is caused by the patch or not?

And any review of the patch would be appreciated of course!

In D94232#2498309, @sanwou01 wrote:

Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.

This is worrying, -g3 should not cause miscompiles w/o this patch.

In D94232#2503921, @fhahn wrote:

In D94232#2500595, @sanwou01 wrote:

[...]

Yes, I think the only thing outstanding with respect to testing is the omnetpp_r failure. Can you confirm if that is caused by the patch or not?

And any review of the patch would be appreciated of course!

(I thought I posted this on Friday, but looks like I forgot). The omnetpp_r crash on SPEC 2017 has not been seen again in subsequent runs so this might have been a fluke. Performance on the new runs looks fine, no change on omnetpp_r.

In term of review, could you add a test case? A -prepare-for-lto flag for loop rotate for testing purposes seems reasonable to me.

In D94232#2503930, @xbolva00 wrote:

In D94232#2498309, @sanwou01 wrote:

Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.

This is worrying, -g3 should not cause miscompiles w/o this patch.

I agree, but I don't think it should block the patch going in. I'm trying to reduce it on one of the benchmark. It only shows up with LTO enabled, so it is quite slow going. (I'm having a look at the LLVM test suite as well to see if it shows up at all there, hopefully in a slightly smaller form.)

fhahn mentioned this in rG34a2c138c896: [LoopRotate] Precommit test for prepare-for-lto handling..Jan 18 2021, 7:25 AM

In D94232#2504325, @sanwou01 wrote:

In D94232#2503921, @fhahn wrote:

In D94232#2500595, @sanwou01 wrote:

[...]

Yes, I think the only thing outstanding with respect to testing is the omnetpp_r failure. Can you confirm if that is caused by the patch or not?

And any review of the patch would be appreciated of course!

(I thought I posted this on Friday, but looks like I forgot). The omnetpp_r crash on SPEC 2017 has not been seen again in subsequent runs so this might have been a fluke. Performance on the new runs looks fine, no change on omnetpp_r.

In term of review, could you add a test case? A -prepare-for-lto flag for loop rotate for testing purposes seems reasonable to me.

Sounds good, I added a new -rotation-prepare-for-lto option and a test using it.

In D94232#2503930, @xbolva00 wrote:

In D94232#2498309, @sanwou01 wrote:

Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.

This is worrying, -g3 should not cause miscompiles w/o this patch.

I agree, but I don't think it should block the patch going in. I'm trying to reduce it on one of the benchmark. It only shows up with LTO enabled, so it is quite slow going. (I'm having a look at the LLVM test suite as well to see if it shows up at all there, hopefully in a slightly smaller form.)

I cannot reproduce this with regular LTO, as I mentioned earlier I do not think this is caused by the patch.

Herald added a subscriber: steven_wu. · View Herald TranscriptJan 18 2021, 9:01 AM

Harbormaster completed remote builds in B85610: Diff 317374.Jan 18 2021, 9:35 AM

Looks good to me!

This revision is now accepted and ready to land.Jan 18 2021, 9:49 AM

Does the same issue apply to ThinLTO?

In D94232#2505253, @nikic wrote:

Does the same issue apply to ThinLTO?

Possibly, but unfortunately I do not have performance data at the moment to back that up and not enough insight into the specifics of the ThinLTO pipeline. But it should be easy to switch to using it for ThinLTO as well, once we are confident it works as expected for LTO on a large range of workloads.

Closed by commit rG83daa49758a1: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. (authored by fhahn). · Explain WhyJan 19 2021, 2:16 AM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG83daa49758a1: [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands..

In D94232#2505253, @nikic wrote:

Does the same issue apply to ThinLTO?

D84108 mentions “size impact varies; for ThinLTO it's actually an improvement” so it sounds like the original patch missed just regular lto pipeline.

cc @lebedev.ri

xbolva00 mentioned this in D88471: [Passes] Run peeling as part of simple/full loop unrolling..Jan 19 2021, 4:26 AM

Just FYI this had a pretty massive (positive) impact on code size: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=size-text
And presumably also directly related to that, a large compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=instructions

In D94232#2506618, @nikic wrote:

Just FYI this had a pretty massive (positive) impact on code size: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=size-text
And presumably also directly related to that, a large compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=instructions

Wow.
What about runtime performance, eg. mafft? No changes?

In D94232#2506684, @xbolva00 wrote:

In D94232#2506618, @nikic wrote:

Just FYI this had a pretty massive (positive) impact on code size: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=size-text
And presumably also directly related to that, a large compile-time improvement: https://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=83daa49758a12d585fe2d9a64448e54d91bcfaff&stat=instructions

Wow.
What about runtime performance, eg. mafft? No changes?

Thanks for the heads-up.

Unfortunately I think this is an unintended bug. It looks like the original code did not skip calls that are not actual calls. With the patch, LoopRotation would also be skipped if the header contains intrinsic calls, like @llvm.dbg.* calls. Those should not block rotation though (there's nothing to inline). I pushed a fix, so we do not consider calls that are not lowered to calls as inline candidates: 3747b69b5312

I expect this should revert back the improvements with -g.

@fhahn Thanks! I did suspect that this is a bit too good to be true...

The code size changes after your fix look much more sensible: http://llvm-compile-time-tracker.com/compare.php?from=4d3081331ad854e0bff5032c818ec6414fb974c0&to=3747b69b531299f7a2a0289b8a59ac7234e47d4f&stat=size-text

ormris added a subscriber: ormris.Jan 19 2021, 10:10 AM

sanwou01 mentioned this in D96694: Use LoopRotate PrepareForLTO stage in NPM.Feb 15 2021, 1:40 AM

sanwou01 mentioned this in rG93d9a4c95aff: Use LoopRotate PrepareForLTO stage in NPM.Feb 17 2021, 6:07 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

CodeMetrics.h

3 lines

Transforms/

Scalar.h

2 lines

Scalar/

LoopRotation.h

4 lines

Utils/

LoopRotationUtils.h

3 lines

lib/

Analysis/

CodeMetrics.cpp

12 lines

Passes/

PassBuilder.cpp

3 lines

Transforms/

IPO/

PassManagerBuilder.cpp

4 lines

Scalar/

LoopRotation.cpp

29 lines

Utils/

LoopRotationUtils.cpp

17 lines

test/

Transforms/

LoopRotate/

call-prepare-for-lto.ll

7 lines

Diff 317496

llvm/include/llvm/Analysis/CodeMetrics.h

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	struct CodeMetrics {
/// The inliner is more aggressive with inlining vector kernels.		/// The inliner is more aggressive with inlining vector kernels.
unsigned NumVectorInsts = 0;		unsigned NumVectorInsts = 0;

/// How many 'ret' instructions the blocks contain.		/// How many 'ret' instructions the blocks contain.
unsigned NumRets = 0;		unsigned NumRets = 0;

/// Add information about a block to the current state.		/// Add information about a block to the current state.
void analyzeBasicBlock(const BasicBlock *BB, const TargetTransformInfo &TTI,		void analyzeBasicBlock(const BasicBlock *BB, const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value*> &EphValues);		const SmallPtrSetImpl<const Value *> &EphValues,
		bool PrepareForLTO = false);

/// Collect a loop's ephemeral values (those used only by an assume		/// Collect a loop's ephemeral values (those used only by an assume
/// or similar intrinsics in the loop).		/// or similar intrinsics in the loop).
static void collectEphemeralValues(const Loop L, AssumptionCache AC,		static void collectEphemeralValues(const Loop L, AssumptionCache AC,
SmallPtrSetImpl<const Value *> &EphValues);		SmallPtrSetImpl<const Value *> &EphValues);

/// Collect a functions's ephemeral values (those used only by an		/// Collect a functions's ephemeral values (those used only by an
/// assume or similar intrinsics in the function).		/// assume or similar intrinsics in the function).
static void collectEphemeralValues(const Function L, AssumptionCache AC,		static void collectEphemeralValues(const Function L, AssumptionCache AC,
SmallPtrSetImpl<const Value *> &EphValues);		SmallPtrSetImpl<const Value *> &EphValues);
};		};

}		}

#endif		#endif

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	// LoopReroll - This pass is a simple loop rerolling pass.			// LoopReroll - This pass is a simple loop rerolling pass.
	//			//
	Pass *createLoopRerollPass();			Pass *createLoopRerollPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopRotate - This pass is a simple loop rotating pass.			// LoopRotate - This pass is a simple loop rotating pass.
	//			//
	Pass *createLoopRotatePass(int MaxHeaderSize = -1);			Pass *createLoopRotatePass(int MaxHeaderSize = -1, bool PrepareForLTO = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopIdiom - This pass recognizes and replaces idioms in loops.			// LoopIdiom - This pass recognizes and replaces idioms in loops.
	//			//
	Pass *createLoopIdiomPass();			Pass *createLoopIdiomPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LoopRotation.h

	Show All 16 Lines
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/Transforms/Scalar/LoopPassManager.h"			#include "llvm/Transforms/Scalar/LoopPassManager.h"

	namespace llvm {			namespace llvm {

	/// A simple loop rotation transformation.			/// A simple loop rotation transformation.
	class LoopRotatePass : public PassInfoMixin<LoopRotatePass> {			class LoopRotatePass : public PassInfoMixin<LoopRotatePass> {
	public:			public:
	LoopRotatePass(bool EnableHeaderDuplication = true);			LoopRotatePass(bool EnableHeaderDuplication = true,
				bool PrepareForLTO = false);
	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,			PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);

	private:			private:
	const bool EnableHeaderDuplication;			const bool EnableHeaderDuplication;
				const bool PrepareForLTO;
	};			};
	}			}

	#endif // LLVM_TRANSFORMS_SCALAR_LOOPROTATION_H			#endif // LLVM_TRANSFORMS_SCALAR_LOOPROTATION_H

llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h

	Show All 27 Lines
	/// perform loop latch simplication as well if the flag RotationOnly			/// perform loop latch simplication as well if the flag RotationOnly
	/// is false. The flag Threshold represents the size threshold of the loop			/// is false. The flag Threshold represents the size threshold of the loop
	/// header. If the loop header's size exceeds the threshold, the loop rotation			/// header. If the loop header's size exceeds the threshold, the loop rotation
	/// will give up. The flag IsUtilMode controls the heuristic used in the			/// will give up. The flag IsUtilMode controls the heuristic used in the
	/// LoopRotation. If it is true, the profitability heuristic will be ignored.			/// LoopRotation. If it is true, the profitability heuristic will be ignored.
	bool LoopRotation(Loop L, LoopInfo LI, const TargetTransformInfo *TTI,			bool LoopRotation(Loop L, LoopInfo LI, const TargetTransformInfo *TTI,
	AssumptionCache AC, DominatorTree DT, ScalarEvolution *SE,			AssumptionCache AC, DominatorTree DT, ScalarEvolution *SE,
	MemorySSAUpdater *MSSAU, const SimplifyQuery &SQ,			MemorySSAUpdater *MSSAU, const SimplifyQuery &SQ,
	bool RotationOnly, unsigned Threshold, bool IsUtilMode);			bool RotationOnly, unsigned Threshold, bool IsUtilMode,
				bool PrepareForLTO = false);

	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/lib/Analysis/CodeMetrics.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	if (EphValues.insert(I).second)
appendSpeculatableOperands(I, Visited, Worklist);		appendSpeculatableOperands(I, Visited, Worklist);
}		}

completeEphemeralValues(Visited, Worklist, EphValues);		completeEphemeralValues(Visited, Worklist, EphValues);
}		}

/// Fill in the current structure with information gleaned from the specified		/// Fill in the current structure with information gleaned from the specified
/// block.		/// block.
void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,		void CodeMetrics::analyzeBasicBlock(
const TargetTransformInfo &TTI,		const BasicBlock *BB, const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value*> &EphValues) {		const SmallPtrSetImpl<const Value *> &EphValues, bool PrepareForLTO) {
++NumBlocks;		++NumBlocks;
unsigned NumInstsBeforeThisBB = NumInsts;		unsigned NumInstsBeforeThisBB = NumInsts;
for (const Instruction &I : *BB) {		for (const Instruction &I : *BB) {
// Skip ephemeral values.		// Skip ephemeral values.
if (EphValues.count(&I))		if (EphValues.count(&I))
continue;		continue;

// Special handling for calls.		// Special handling for calls.
if (const auto *Call = dyn_cast<CallBase>(&I)) {		if (const auto *Call = dyn_cast<CallBase>(&I)) {
if (const Function *F = Call->getCalledFunction()) {		if (const Function *F = Call->getCalledFunction()) {
// If a function is both internal and has a single use, then it is		// If a function is both internal and has a single use, then it is
// extremely likely to get inlined in the future (it was probably		// extremely likely to get inlined in the future (it was probably
// exposed by an interleaved devirtualization pass).		// exposed by an interleaved devirtualization pass).
if (!Call->isNoInline() && F->hasInternalLinkage() && F->hasOneUse())		// When preparing for LTO, liberally consider calls as inline
		// candidates.
		if (!Call->isNoInline() &&
		((F->hasInternalLinkage() && F->hasOneUse()) \|\| PrepareForLTO)) {
++NumInlineCandidates;		++NumInlineCandidates;
		}

// If this call is to function itself, then the function is recursive.		// If this call is to function itself, then the function is recursive.
// Inlining it into other functions is a bad idea, because this is		// Inlining it into other functions is a bad idea, because this is
// basically just a form of loop peeling, and our metrics aren't useful		// basically just a form of loop peeling, and our metrics aren't useful
// for that case.		// for that case.
if (F == BB->getParent())		if (F == BB->getParent())
isRecursive = true;		isRecursive = true;

▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 1,224 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
// function passes.		// function passes.

for (auto &C : VectorizerStartEPCallbacks)		for (auto &C : VectorizerStartEPCallbacks)
C(OptimizePM, Level);		C(OptimizePM, Level);

// First rotate loops that may have been un-rotated by prior passes.		// First rotate loops that may have been un-rotated by prior passes.
// Disable header duplication at -Oz.		// Disable header duplication at -Oz.
OptimizePM.addPass(createFunctionToLoopPassAdaptor(		OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LoopRotatePass(Level != OptimizationLevel::Oz), EnableMSSALoopDependency,		LoopRotatePass(Level != OptimizationLevel::Oz, LTOPreLink),
		EnableMSSALoopDependency,
/UseBlockFrequencyInfo=/false, DebugLogging));		/UseBlockFrequencyInfo=/false, DebugLogging));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

▲ Show 20 Lines • Show All 1,787 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
if (EnableSimpleLoopUnswitch) {		if (EnableSimpleLoopUnswitch) {
// The simple loop unswitch pass relies on separate cleanup passes. Schedule		// The simple loop unswitch pass relies on separate cleanup passes. Schedule
// them first so when we re-process a loop they run before other loop		// them first so when we re-process a loop they run before other loop
// passes.		// passes.
MPM.add(createLoopInstSimplifyPass());		MPM.add(createLoopInstSimplifyPass());
MPM.add(createLoopSimplifyCFGPass());		MPM.add(createLoopSimplifyCFGPass());
}		}
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));		MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
if (EnableSimpleLoopUnswitch)		if (EnableSimpleLoopUnswitch)
MPM.add(createSimpleLoopUnswitchLegacyPass());		MPM.add(createSimpleLoopUnswitchLegacyPass());
else		else
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
// FIXME: We break the loop pass pipeline here in order to do full		// FIXME: We break the loop pass pipeline here in order to do full
// simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the		// simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	if (EnableMatrix) {
MPM.add(createEarlyCSEPass(false));		MPM.add(createEarlyCSEPass(false));
}		}

addExtensionsToPM(EP_VectorizerStart, MPM);		addExtensionsToPM(EP_VectorizerStart, MPM);

// Re-rotate loops in all our loop nests. These may have fallout out of		// Re-rotate loops in all our loop nests. These may have fallout out of
// rotated form due to GVN or other transformations, and the vectorizer relies		// rotated form due to GVN or other transformations, and the vectorizer relies
// on the rotated form. Disable header duplication at -Oz.		// on the rotated form. Disable header duplication at -Oz.
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));		MPM.add(createLoopVectorizePass(!LoopsInterleaved, !LoopVectorize));
▲ Show 20 Lines • Show All 501 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopRotation.cpp

Show All 28 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-rotate"		#define DEBUG_TYPE "loop-rotate"

static cl::opt<unsigned> DefaultRotationThreshold(		static cl::opt<unsigned> DefaultRotationThreshold(
"rotation-max-header-size", cl::init(16), cl::Hidden,		"rotation-max-header-size", cl::init(16), cl::Hidden,
cl::desc("The default maximum header size for automatic loop rotation"));		cl::desc("The default maximum header size for automatic loop rotation"));

LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication)		static cl::opt<bool> PrepareForLTOOption(
: EnableHeaderDuplication(EnableHeaderDuplication) {}		"rotation-prepare-for-lto", cl::init(false), cl::Hidden,
		cl::desc("Run loop-rotation in the prepare-for-lto stage. This option "
		"should be used for testing only."));

		LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO)
		: EnableHeaderDuplication(EnableHeaderDuplication),
		PrepareForLTO(PrepareForLTO) {}

PreservedAnalyses LoopRotatePass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopRotatePass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &) {
// Vectorization requires loop-rotation. Use default threshold for loops the		// Vectorization requires loop-rotation. Use default threshold for loops the
// user explicitly marked for vectorization, even when header duplication is		// user explicitly marked for vectorization, even when header duplication is
// disabled.		// disabled.
int Threshold = EnableHeaderDuplication \|\|		int Threshold = EnableHeaderDuplication \|\|
hasVectorizeTransformation(&L) == TM_ForcedByUser		hasVectorizeTransformation(&L) == TM_ForcedByUser
? DefaultRotationThreshold		? DefaultRotationThreshold
: 0;		: 0;
const DataLayout &DL = L.getHeader()->getModule()->getDataLayout();		const DataLayout &DL = L.getHeader()->getModule()->getDataLayout();
const SimplifyQuery SQ = getBestSimplifyQuery(AR, DL);		const SimplifyQuery SQ = getBestSimplifyQuery(AR, DL);

Optional<MemorySSAUpdater> MSSAU;		Optional<MemorySSAUpdater> MSSAU;
if (AR.MSSA)		if (AR.MSSA)
MSSAU = MemorySSAUpdater(AR.MSSA);		MSSAU = MemorySSAUpdater(AR.MSSA);
bool Changed = LoopRotation(&L, &AR.LI, &AR.TTI, &AR.AC, &AR.DT, &AR.SE,		bool Changed =
MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,		LoopRotation(&L, &AR.LI, &AR.TTI, &AR.AC, &AR.DT, &AR.SE,
SQ, false, Threshold, false);		MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, SQ, false,
		Threshold, false, PrepareForLTO \|\| PrepareForLTOOption);

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

if (AR.MSSA && VerifyMemorySSA)		if (AR.MSSA && VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();		AR.MSSA->verifyMemorySSA();

auto PA = getLoopPassPreservedAnalyses();		auto PA = getLoopPassPreservedAnalyses();
if (AR.MSSA)		if (AR.MSSA)
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

class LoopRotateLegacyPass : public LoopPass {		class LoopRotateLegacyPass : public LoopPass {
unsigned MaxHeaderSize;		unsigned MaxHeaderSize;
		bool PrepareForLTO;

public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid
LoopRotateLegacyPass(int SpecifiedMaxHeaderSize = -1) : LoopPass(ID) {		LoopRotateLegacyPass(int SpecifiedMaxHeaderSize = -1,
		bool PrepareForLTO = false)
		: LoopPass(ID), PrepareForLTO(PrepareForLTO) {
initializeLoopRotateLegacyPassPass(*PassRegistry::getPassRegistry());		initializeLoopRotateLegacyPassPass(*PassRegistry::getPassRegistry());
if (SpecifiedMaxHeaderSize == -1)		if (SpecifiedMaxHeaderSize == -1)
MaxHeaderSize = DefaultRotationThreshold;		MaxHeaderSize = DefaultRotationThreshold;
else		else
MaxHeaderSize = unsigned(SpecifiedMaxHeaderSize);		MaxHeaderSize = unsigned(SpecifiedMaxHeaderSize);
}		}

// LCSSA form makes instruction renaming easier.		// LCSSA form makes instruction renaming easier.
Show All 28 Lines	bool runOnLoop(Loop *L, LPPassManager &LPM) override {
// user explicitly marked for vectorization, even when header duplication is		// user explicitly marked for vectorization, even when header duplication is
// disabled.		// disabled.
int Threshold = hasVectorizeTransformation(L) == TM_ForcedByUser		int Threshold = hasVectorizeTransformation(L) == TM_ForcedByUser
? DefaultRotationThreshold		? DefaultRotationThreshold
: MaxHeaderSize;		: MaxHeaderSize;

return LoopRotation(L, LI, TTI, AC, &DT, &SE,		return LoopRotation(L, LI, TTI, AC, &DT, &SE,
MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, SQ,		MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, SQ,
false, Threshold, false);		false, Threshold, false,
		PrepareForLTO \|\| PrepareForLTOOption);
}		}
};		};
} // end namespace		} // end namespace

char LoopRotateLegacyPass::ID = 0;		char LoopRotateLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(LoopRotateLegacyPass, "loop-rotate", "Rotate Loops",		INITIALIZE_PASS_BEGIN(LoopRotateLegacyPass, "loop-rotate", "Rotate Loops",
false, false)		false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LoopPass)		INITIALIZE_PASS_DEPENDENCY(LoopPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_END(LoopRotateLegacyPass, "loop-rotate", "Rotate Loops", false,		INITIALIZE_PASS_END(LoopRotateLegacyPass, "loop-rotate", "Rotate Loops", false,
false)		false)

Pass *llvm::createLoopRotatePass(int MaxHeaderSize) {		Pass *llvm::createLoopRotatePass(int MaxHeaderSize, bool PrepareForLTO) {
return new LoopRotateLegacyPass(MaxHeaderSize);		return new LoopRotateLegacyPass(MaxHeaderSize, PrepareForLTO);
}		}

llvm/lib/Transforms/Utils/LoopRotationUtils.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	class LoopRotate {
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;
AssumptionCache *AC;		AssumptionCache *AC;
DominatorTree *DT;		DominatorTree *DT;
ScalarEvolution *SE;		ScalarEvolution *SE;
MemorySSAUpdater *MSSAU;		MemorySSAUpdater *MSSAU;
const SimplifyQuery &SQ;		const SimplifyQuery &SQ;
bool RotationOnly;		bool RotationOnly;
bool IsUtilMode;		bool IsUtilMode;
		bool PrepareForLTO;

public:		public:
LoopRotate(unsigned MaxHeaderSize, LoopInfo *LI,		LoopRotate(unsigned MaxHeaderSize, LoopInfo *LI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
DominatorTree DT, ScalarEvolution SE, MemorySSAUpdater *MSSAU,		DominatorTree DT, ScalarEvolution SE, MemorySSAUpdater *MSSAU,
const SimplifyQuery &SQ, bool RotationOnly, bool IsUtilMode)		const SimplifyQuery &SQ, bool RotationOnly, bool IsUtilMode,
		bool PrepareForLTO)
: MaxHeaderSize(MaxHeaderSize), LI(LI), TTI(TTI), AC(AC), DT(DT), SE(SE),		: MaxHeaderSize(MaxHeaderSize), LI(LI), TTI(TTI), AC(AC), DT(DT), SE(SE),
MSSAU(MSSAU), SQ(SQ), RotationOnly(RotationOnly),		MSSAU(MSSAU), SQ(SQ), RotationOnly(RotationOnly),
IsUtilMode(IsUtilMode) {}		IsUtilMode(IsUtilMode), PrepareForLTO(PrepareForLTO) {}
bool processLoop(Loop *L);		bool processLoop(Loop *L);

private:		private:
bool rotateLoop(Loop *L, bool SimplifiedLatch);		bool rotateLoop(Loop *L, bool SimplifiedLatch);
bool simplifyLoopLatch(Loop *L);		bool simplifyLoopLatch(Loop *L);
};		};
} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	do {

// Check size of original header and reject loop if it is very big or we can't		// Check size of original header and reject loop if it is very big or we can't
// duplicate blocks inside it.		// duplicate blocks inside it.
{		{
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(L, AC, EphValues);		CodeMetrics::collectEphemeralValues(L, AC, EphValues);

CodeMetrics Metrics;		CodeMetrics Metrics;
Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues);		Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
if (Metrics.notDuplicatable) {		if (Metrics.notDuplicatable) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"		dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
<< " instructions: ";		<< " instructions: ";
L->dump());		L->dump());
return Rotated;		return Rotated;
}		}
if (Metrics.convergent) {		if (Metrics.convergent) {
LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "		LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
"instructions: ";		"instructions: ";
L->dump());		L->dump());
return Rotated;		return Rotated;
}		}
if (Metrics.NumInsts > MaxHeaderSize) {		if (Metrics.NumInsts > MaxHeaderSize) {
LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "		LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
<< Metrics.NumInsts		<< Metrics.NumInsts
<< " instructions, which is more than the threshold ("		<< " instructions, which is more than the threshold ("
<< MaxHeaderSize << " instructions): ";		<< MaxHeaderSize << " instructions): ";
L->dump());		L->dump());
++NumNotRotatedDueToHeaderSize;		++NumNotRotatedDueToHeaderSize;
return Rotated;		return Rotated;
}		}

		// When preparing for LTO, avoid rotating loops with calls that could be
		// inlined during the LTO stage.
		if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
		return Rotated;
}		}

// Now, this loop is suitable for rotation.		// Now, this loop is suitable for rotation.
BasicBlock *OrigPreheader = L->getLoopPreheader();		BasicBlock *OrigPreheader = L->getLoopPreheader();

// If the loop could not be converted to canonical form, it must have an		// If the loop could not be converted to canonical form, it must have an
// indirectbr in it, just give up.		// indirectbr in it, just give up.
if (!OrigPreheader \|\| !L->hasDedicatedExits())		if (!OrigPreheader \|\| !L->hasDedicatedExits())
▲ Show 20 Lines • Show All 405 Lines • ▼ Show 20 Lines


/// The utility to convert a loop into a loop with bottom test.		/// The utility to convert a loop into a loop with bottom test.
bool llvm::LoopRotation(Loop L, LoopInfo LI, const TargetTransformInfo *TTI,		bool llvm::LoopRotation(Loop L, LoopInfo LI, const TargetTransformInfo *TTI,
AssumptionCache AC, DominatorTree DT,		AssumptionCache AC, DominatorTree DT,
ScalarEvolution SE, MemorySSAUpdater MSSAU,		ScalarEvolution SE, MemorySSAUpdater MSSAU,
const SimplifyQuery &SQ, bool RotationOnly = true,		const SimplifyQuery &SQ, bool RotationOnly = true,
unsigned Threshold = unsigned(-1),		unsigned Threshold = unsigned(-1),
bool IsUtilMode = true) {		bool IsUtilMode = true, bool PrepareForLTO) {
LoopRotate LR(Threshold, LI, TTI, AC, DT, SE, MSSAU, SQ, RotationOnly,		LoopRotate LR(Threshold, LI, TTI, AC, DT, SE, MSSAU, SQ, RotationOnly,
IsUtilMode);		IsUtilMode, PrepareForLTO);
return LR.processLoop(L);		return LR.processLoop(L);
}		}

llvm/test/Transforms/LoopRotate/call-prepare-for-lto.ll

	; RUN: opt -S -loop-rotate < %s \| FileCheck --check-prefix=FULL %s			; RUN: opt -S -loop-rotate < %s \| FileCheck --check-prefix=FULL %s
				; RUN: opt -S -loop-rotate -rotation-prepare-for-lto < %s \| FileCheck --check-prefix=PREPARE %s
	; RUN: opt -S -passes='require<targetir>,require<assumptions>,loop(loop-rotate)' < %s \| FileCheck --check-prefix=FULL %s			; RUN: opt -S -passes='require<targetir>,require<assumptions>,loop(loop-rotate)' < %s \| FileCheck --check-prefix=FULL %s
				; RUN: opt -S -passes='require<targetir>,require<assumptions>,loop(loop-rotate)' -rotation-prepare-for-lto < %s \| FileCheck --check-prefix=PREPARE %s

	; Test case to make sure loop-rotate avoids rotating during the prepare-for-lto			; Test case to make sure loop-rotate avoids rotating during the prepare-for-lto
	; stage, when the header contains a call which may be inlined during the LTO stage.			; stage, when the header contains a call which may be inlined during the LTO stage.
	define void @test_prepare_for_lto() {			define void @test_prepare_for_lto() {
	; FULL-LABEL: @test_prepare_for_lto(			; FULL-LABEL: @test_prepare_for_lto(
	; FULL-NEXT: entry:			; FULL-NEXT: entry:
	; FULL-NEXT: %array = alloca [20 x i32], align 16			; FULL-NEXT: %array = alloca [20 x i32], align 16
	; FULL-NEXT: %arrayidx = getelementptr inbounds [20 x i32], [20 x i32]* %array, i64 0, i64 0			; FULL-NEXT: %arrayidx = getelementptr inbounds [20 x i32], [20 x i32]* %array, i64 0, i64 0
	; FULL-NEXT: call void @may_be_inlined()			; FULL-NEXT: call void @may_be_inlined()
	; FULL-NEXT: br label %for.body			; FULL-NEXT: br label %for.body
	;			;
				; PREPARE-LABEL: @test_prepare_for_lto(
				; PREPARE-NEXT: entry:
				; PREPARE-NEXT: %array = alloca [20 x i32], align 16
				; PREPARE-NEXT: br label %for.cond
				;
	entry:			entry:
	%array = alloca [20 x i32], align 16			%array = alloca [20 x i32], align 16
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.body, %entry			for.cond: ; preds = %for.body, %entry
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i.0, 100			%cmp = icmp slt i32 %i.0, 100
	%arrayidx = getelementptr inbounds [20 x i32], [20 x i32]* %array, i64 0, i64 0			%arrayidx = getelementptr inbounds [20 x i32], [20 x i32]* %array, i64 0, i64 0
	Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 317496

llvm/include/llvm/Analysis/CodeMetrics.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LoopRotation.h

llvm/include/llvm/Transforms/Utils/LoopRotationUtils.h

llvm/lib/Analysis/CodeMetrics.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/LoopRotation.cpp

llvm/lib/Transforms/Utils/LoopRotationUtils.cpp

llvm/test/Transforms/LoopRotate/call-prepare-for-lto.ll

[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands (WIP).
ClosedPublic