This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
GVN.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
10/19
GVN.cpp
-
test/Transforms/GVN/PRE/
-
Transforms/
-
GVN/
-
PRE/
-
lpre-call-wrap.ll
-
pre-aliasning-path.ll
2
pre-loop-load.ll

Differential D99926

[GVN] Introduce loop load PRE
ClosedPublic

Authored by mkazantsev on Apr 5 2021, 10:57 PM.

Download Raw Diff

Details

Reviewers

nikic
reames
nickdesaulniers
fhahn

Commits

rG8fe62b7af112: [GVN] Introduce loop load PRE

Summary

This patch allows PRE of the following type of loads:

preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

Into

preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:

We can sometimes be smarter in loop-exiting blocks via split of critical edges;
If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header.

Diff Detail

Unit TestsFailed

	Time	Test
	50 ms	x64 windows > LLVM.Transforms/GVN/PRE::lpre-call-wrap.ll
	50 ms	x64 windows > LLVM.Transforms/GVN/PRE::pre-aliasning-path.ll
	90 ms	x64 windows > LLVM.Transforms/GVN/PRE::pre-loop-load.ll

Event Timeline

mkazantsev created this revision.Apr 5 2021, 10:57 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 5 2021, 10:57 PM

mkazantsev requested review of this revision.Apr 5 2021, 10:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2021, 10:57 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mkazantsev edited the summary of this revision. (Show Details)Apr 5 2021, 10:57 PM

Harbormaster completed remote builds in B97235: Diff 335410.Apr 5 2021, 11:28 PM

lkail added a subscriber: lkail.Apr 5 2021, 11:50 PM

Just for context, I'd explored a very similar transform before in https://reviews.llvm.org/D7061. One really key difference between the prior attempt and this one is that previously I hadn't explicitly handled loops and instead tried to match this from the original IR. I don't think loop info was available at the time. Though, looking at the current code, it looks like LoopInfo is optional for the pass even now.

One other key detail that has changed is we now support speculation, and don't necessarily have to prove anticipation (which the earlier change struggled with.)

In general, I think the new approach is much more likely to be successful than the original since we're solving a subset of the problems.

Code structure wise, I want to suggest we don't try to shove this new transformation into the existing performLoadPRE codepath. Several of the concerns of that code (e.g. address translation) don't apply for the loop case, and you have at least one bug (whether we need to check speculation safety) because of trying to reuse the code.

I'd suggest instead that you split out the last third or so of that function into a helper which blindly performs the insertion, and replacement, and then implement a second performLoopLoadPRE entry which checks the appropriate legality for the new transform.

I also seriously question whether this is worth doing in old-GVN at all. The only infrastructure you actually need for this is memory aliasing and speculation safety. I'd seriously suggest writing a standalone pass which uses MemorySSA and ValueTracking, and maybe reuses the extracted helper function mentioned above.

I'm pretty sure that availability problem you are referencing does not exist. See last 2 tests with guards.

As for refactoring, I'm going to do it. Putting WIP in the patch.

Looking more into the code, I don't think that loop PRE needs any other legality checks than what we have now.

Harbormaster completed remote builds in B97897: Diff 336335.Apr 9 2021, 1:42 AM

Split code out into different method. Haven't figured out yet how to make it in a separate pass with MemorySSA, but I think having it in GVN won't harm.

Harbormaster completed remote builds in B97975: Diff 336440.Apr 9 2021, 7:42 AM

Comments inline include one serious correctness issue.

This is much cleaner than the original patch. I was initially hesitant to take this at all - as opposed to using MemorySSA or NewGVN - but with the new structure this looks a lot less invasive.

llvm/lib/Transforms/Scalar/GVN.cpp
1467	Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe. Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration. You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest.
1481	Tweak this comment a bit to emphasize that this ensures the new load executes at most as often as the original, and likely less often.
1486	I don't understand this restriction. Why is a switch not allowed?
1497	I don't think this loop does what you want, except maybe by accident. You allowed blocks outside the loop, as a result, you can end up with a bunch of available addresses and a bunch of loads before the preheader. This will likely later be DCEd since the preheader load will be the one actually used by SSA gen. I strongly suspect you want exactly two available load locations: preheader, and your one in-loop clobber block.

This revision now requires changes to proceed.Apr 13 2021, 12:11 PM

mkazantsev added inline comments.Apr 13 2021, 8:17 PM

llvm/lib/Transforms/Scalar/GVN.cpp
1467	Free on the last iteration (the loop may have multiple though) is a nasty case indeed...
1486	This was ogirinally the protection against invokes. Switch is allowed, will fix.
1497	Yes, this check was lost during the refactoring. I'm pretty sure that `eliminatePartiallyRedundantLoad` will deal with it correctly, but it's at least not obvious. Thanks for catching.

Addressed comments, fixed bug with

Harbormaster completed remote builds in B98818: Diff 337637.Apr 15 2021, 12:19 AM

I'm planning to add support for pointers basing on D99135, maybe as follow-up or on top of this.

mkazantsev planned changes to this revision.Apr 15 2021, 11:50 PM

Looks close to ready for an LGTM if you're willing to split patch as suggested.

llvm/lib/Transforms/Scalar/GVN.cpp
1497	continue the comment with something like: "because we need a place to insert a copy of the load". p.s. I'm fine with this in an initial patch, but you really should be using an alias check here as the trailing invoke might not alias the memory being PREed. Would make a good follow up patch.
1516	You can generalize the first check as !LoadPtr->canBeFreed()
1517	This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) { // this is loop block clobber(); } atomic store g = o; while(wait for other thread to free) {} } Can I ask you to pull this into a separate patch? (e.g. handle only the first two cases in this patch, and come back to the third in a follow on.)
1520	Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once D100226 lands).

Reused canBeFreed.

Removed code related to nofree analysis for further follow-up.

Improved comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1517	Good idea. Removed from this patch, need to make it more carefully.
1520	Removed.

Harbormaster completed remote builds in B99624: Diff 338727.Apr 20 2021, 12:27 AM

LGTM w/minor comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1468	Type: In order
1512	In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling canBeFreed. Please don't do this in the change being lgtmed now.
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
8–9	Please add a positive test (analogous to this one), but using an alloca.

This revision is now accepted and ready to land.Apr 20 2021, 9:18 AM

nikic added inline comments.Apr 20 2021, 10:23 AM

llvm/lib/Transforms/Scalar/GVN.cpp
1518	Where do we check that LoadPtr is loop invariant (and thus available in preheader)?
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
45–46	This test looks broken. I think for it to do something useful you'll want to pass %p to may_free_memory (or another function). Otherwise the load is just undef and there is no clobber in the loop either.

LGTM withdrawn due to issue noticed by Nikita. Please address and refresh.

llvm/lib/Transforms/Scalar/GVN.cpp
1518	Er good question, and good catch. I remember this being here, but maybe it got lost in rebase. Max, please add back the check, and a test which would have caught this.

This revision now requires changes to proceed.Apr 20 2021, 10:51 AM

mkazantsev added inline comments.Apr 20 2021, 10:00 PM

llvm/lib/Transforms/Scalar/GVN.cpp
1518	Wow, I was sure it was there. Thanks for catching!

Added loop invariant check. Test added as underlying patch.

Typo fix

mkazantsev added inline comments.Apr 21 2021, 12:14 AM

llvm/lib/Transforms/Scalar/GVN.cpp
1512	Shouldn't it be a part of `canBeFreed`?

Harbormaster completed remote builds in B99898: Diff 339116.Apr 21 2021, 1:31 AM

Harbormaster completed remote builds in B99899: Diff 339117.Apr 21 2021, 1:57 AM

LGTM

This revision is now accepted and ready to land.Apr 21 2021, 8:05 AM

Closed by commit rG8fe62b7af112: [GVN] Introduce loop load PRE (authored by mkazantsev). · Explain WhyApr 21 2021, 11:03 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG8fe62b7af112: [GVN] Introduce loop load PRE.

nikic mentioned this in D126382: [GVN] Enable enable-split-backedge-in-load-pre option by default.May 25 2022, 7:55 AM

nikic mentioned this in rG1721ff1dfd45: [GVN] Enable enable-split-backedge-in-load-pre option by default.May 30 2022, 12:56 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

GVN.h

3 lines

lib/

Transforms/

Scalar/

GVN.cpp

73 lines

test/

Transforms/

GVN/

PRE/

lpre-call-wrap.ll

21 lines

pre-aliasning-path.ll

16 lines

pre-loop-load.ll

37 lines

Diff 336440

llvm/include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	private:
/// ValuesPerBlock. If not, add it to UnavailableBlocks.		/// ValuesPerBlock. If not, add it to UnavailableBlocks.
void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,		void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
AvailValInBlkVect &ValuesPerBlock,		AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

		bool PerformLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'PerformLoopLoadPRE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'PerformLoopLoadPRE' [readability…
		UnavailBlkVect &UnavailableBlocks);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - UnavailBlkVect &UnavailableBlocks); + UnavailBlkVect &UnavailableBlocks); Lint: Pre-merge checks: clang-format: please reformat the code ``` - UnavailBlkVect…

/// Eliminates partially redundant \p Load, replacing it with \p		/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).		/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(		void eliminatePartiallyRedundantLoad(
LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
MapVector<BasicBlock , Value > &AvailableLoads);		MapVector<BasicBlock , Value > &AvailableLoads);

// Other helper routines		// Other helper routines
bool processInstruction(Instruction *I);		bool processInstruction(Instruction *I);
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

using namespace llvm;		using namespace llvm;
using namespace llvm::gvn;		using namespace llvm::gvn;
using namespace llvm::VNCoercion;		using namespace llvm::VNCoercion;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "gvn"		#define DEBUG_TYPE "gvn"

STATISTIC(NumGVNInstr, "Number of instructions deleted");		STATISTIC(NumGVNInstr, "Number of instructions deleted");
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -STATISTIC(NumGVNInstr, "Number of instructions deleted"); -STATISTIC(NumGVNLoad, "Number of loads deleted"); -STATISTIC(NumGVNPRE, "Number of instructions PRE'd"); -STATISTIC(NumGVNBlocks, "Number of blocks merged"); -STATISTIC(NumGVNSimpl, "Number of instructions simplified"); -STATISTIC(NumGVNEqProp, "Number of equalities propagated"); -STATISTIC(NumPRELoad, "Number of loads PRE'd"); +STATISTIC(NumGVNInstr, "Number of instructions deleted"); +STATISTIC(NumGVNLoad, "Number of loads deleted"); +STATISTIC(NumGVNPRE, "Number of instructions PRE'd"); 4 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` -STATISTIC(NumGVNInstr, "Number of instructions…
STATISTIC(NumGVNLoad, "Number of loads deleted");		STATISTIC(NumGVNLoad, "Number of loads deleted");
STATISTIC(NumGVNPRE, "Number of instructions PRE'd");		STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");		STATISTIC(NumGVNBlocks, "Number of blocks merged");
STATISTIC(NumGVNSimpl, "Number of instructions simplified");		STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");		STATISTIC(NumGVNEqProp, "Number of equalities propagated");
STATISTIC(NumPRELoad, "Number of loads PRE'd");		STATISTIC(NumPRELoad, "Number of loads PRE'd");
		STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");


		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Lint: Pre-merge checks: clang-format: please reformat the code ``` - ```
STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,		STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "		"Number of blocks speculated as available in "
"IsValueFullyAvailableInBlock(), max");		"IsValueFullyAvailableInBlock(), max");
STATISTIC(MaxBBSpeculationCutoffReachedTimes,		STATISTIC(MaxBBSpeculationCutoffReachedTimes,
"Number of times we we reached gvn-max-block-speculations cut-off "		"Number of times we we reached gvn-max-block-speculations cut-off "
"preventing further exploration");		"preventing further exploration");

static cl::opt<bool> GVNEnablePRE("enable-pre", cl::init(true), cl::Hidden);		static cl::opt<bool> GVNEnablePRE("enable-pre", cl::init(true), cl::Hidden);
▲ Show 20 Lines • Show All 1,332 Lines • ▼ Show 20 Lines	for (Instruction *I : NewInsts) {
VN.lookupOrAdd(I);		VN.lookupOrAdd(I);
}		}

eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads);		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads);
++NumPRELoad;		++NumPRELoad;
return true;		return true;
}		}

		bool GVN::PerformLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'PerformLoopLoadPRE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'PerformLoopLoadPRE' [readability…
		UnavailBlkVect &UnavailableBlocks) {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - UnavailBlkVect &UnavailableBlocks) { + UnavailBlkVect &UnavailableBlocks) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - UnavailBlkVect…
		if (!LI)
		return false;

		const Loop *L = LI->getLoopFor(Load->getParent());
		// TODO: Generalize to other loop blocks that dominate the latch.
		if (!L \|\| L->getHeader() != Load->getParent())
		return false;

		BasicBlock *Preheader = L->getLoopPreheader();
		BasicBlock *Latch = L->getLoopLatch();
		if (!Preheader \|\| !Latch)
		return false;

		// We can side-exit before the load is executed.
		reamesUnsubmitted Not Done Reply Inline Actions Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe. Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration. You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest. reames: Extend this comment to emphasize that this means we have proven the load must execute if the…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Free on the last iteration (the loop may have multiple though) is a nasty case indeed... mkazantsev: Free on the last iteration (the loop may have multiple though) is a nasty case indeed...
		if (ICF->isDominatedByICFIFromSameBlock(Load))
		reamesUnsubmitted Not Done Reply Inline Actions Type: In order reames: Type: In order
		return false;

		unsigned UnavailableLoopBlocks = 0;
		for (auto *Blocker : UnavailableBlocks) {
		// Blockers from outside the loop are handled in preheader.
		if (!L->contains(Blocker))
		continue;
		// Do not sink into inner loops.
		if (L != LI->getLoopFor(Blocker))
		return false;

		UnavailableLoopBlocks++;
		// So far, only PRE into not-mustexecute blocks.
		reamesUnsubmitted Done Reply Inline Actions Tweak this comment a bit to emphasize that this ensures the new load executes at most as often as the original, and likely less often. reames: Tweak this comment a bit to emphasize that this ensures the new load executes at most as often…
		if (DT->dominates(Blocker, Latch)) {
		return false;
		}

		if (!isa<BranchInst>(Blocker->getTerminator()))
		reamesUnsubmitted Not Done Reply Inline Actions I don't understand this restriction. Why is a switch not allowed? reames: I don't understand this restriction. Why is a switch not allowed?
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions This was ogirinally the protection against invokes. Switch is allowed, will fix. mkazantsev: This was ogirinally the protection against invokes. Switch is allowed, will fix.
		return false;
		}

		// TODO: We could do it if we had block frequency info here.
		if (UnavailableLoopBlocks != 1)
		return false;

		// TODO: Support critical edge splitting if blocker has more than 1 successor.
		Value *LoadPtr = Load->getPointerOperand();
		MapVector<BasicBlock , Value > AvailableLoads;
		for (auto *Blocker : UnavailableBlocks)
		reamesUnsubmitted Not Done Reply Inline Actions I don't think this loop does what you want, except maybe by accident. You allowed blocks outside the loop, as a result, you can end up with a bunch of available addresses and a bunch of loads before the preheader. This will likely later be DCEd since the preheader load will be the one actually used by SSA gen. I strongly suspect you want exactly two available load locations: preheader, and your one in-loop clobber block. reames: I don't think this loop does what you want, except maybe by accident. You allowed blocks…
		reamesUnsubmitted Not Done Reply Inline Actions continue the comment with something like: "because we need a place to insert a copy of the load". p.s. I'm fine with this in an initial patch, but you really should be using an alias check here as the trailing invoke might not alias the memory being PREed. Would make a good follow up patch. reames: continue the comment with something like: "because we need a place to insert a copy of the…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Yes, this check was lost during the refactoring. I'm pretty sure that `eliminatePartiallyRedundantLoad` will deal with it correctly, but it's at least not obvious. Thanks for catching. mkazantsev: Yes, this check was lost during the refactoring. I'm pretty sure that…
		AvailableLoads[Blocker] = LoadPtr;
		AvailableLoads[Preheader] = LoadPtr;

		LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);
		++NumPRELoopLoad;
		return true;
		}

static void reportLoadElim(LoadInst Load, Value AvailableValue,		static void reportLoadElim(LoadInst Load, Value AvailableValue,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
using namespace ore;		using namespace ore;

ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "LoadElim", Load)		return OptimizationRemark(DEBUG_TYPE, "LoadElim", Load)
		reamesUnsubmitted Not Done Reply Inline Actions In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling canBeFreed. Please don't do this in the change being lgtmed now. reames: In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Shouldn't it be a part of `canBeFreed`? mkazantsev: Shouldn't it be a part of `canBeFreed`?
<< "load of type " << NV("Type", Load->getType()) << " eliminated"		<< "load of type " << NV("Type", Load->getType()) << " eliminated"
<< setExtraArgs() << " in favor of "		<< setExtraArgs() << " in favor of "
<< NV("InfavorOfValue", AvailableValue);		<< NV("InfavorOfValue", AvailableValue);
});		});
		reamesUnsubmitted Done Reply Inline Actions You can generalize the first check as !LoadPtr->canBeFreed() reames: You can generalize the first check as !LoadPtr->canBeFreed()
}		}
		reamesUnsubmitted Done Reply Inline Actions This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) { // this is loop block clobber(); } atomic store g = o; while(wait for other thread to free) {} } Can I ask you to pull this into a separate patch? (e.g. handle only the first two cases in this patch, and come back to the third in a follow on.) reames: This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) {…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Good idea. Removed from this patch, need to make it more carefully. mkazantsev: Good idea. Removed from this patch, need to make it more carefully.

		nikicUnsubmitted Not Done Reply Inline Actions Where do we check that LoadPtr is loop invariant (and thus available in preheader)? nikic: Where do we check that LoadPtr is loop invariant (and thus available in preheader)?
		reamesUnsubmitted Not Done Reply Inline Actions Er good question, and good catch. I remember this being here, but maybe it got lost in rebase. Max, please add back the check, and a test which would have caught this. reames: Er good question, and good catch. I remember this being here, but maybe it got lost in rebase.
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Wow, I was sure it was there. Thanks for catching! mkazantsev: Wow, I was sure it was there. Thanks for catching!
/// Attempt to eliminate a load whose dependencies are		/// Attempt to eliminate a load whose dependencies are
/// non-local by performing PHI construction.		/// non-local by performing PHI construction.
		reamesUnsubmitted Not Done Reply Inline Actions Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once D100226 lands). reames: Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Removed. mkazantsev: Removed.
bool GVN::processNonLocalLoad(LoadInst *Load) {		bool GVN::processNonLocalLoad(LoadInst *Load) {
// non-local speculations are not allowed under asan.		// non-local speculations are not allowed under asan.
if (Load->getParent()->getParent()->hasFnAttribute(		if (Load->getParent()->getParent()->hasFnAttribute(
Attribute::SanitizeAddress) \|\|		Attribute::SanitizeAddress) \|\|
Load->getParent()->getParent()->hasFnAttribute(		Load->getParent()->getParent()->hasFnAttribute(
Attribute::SanitizeHWAddress))		Attribute::SanitizeHWAddress))
return false;		return false;

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	bool GVN::processNonLocalLoad(LoadInst *Load) {
}		}

// Step 4: Eliminate partial redundancy.		// Step 4: Eliminate partial redundancy.
if (!isPREEnabled() \|\| !isLoadPREEnabled())		if (!isPREEnabled() \|\| !isLoadPREEnabled())
return Changed;		return Changed;
if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))		if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))
return Changed;		return Changed;

return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);		return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) \|\| PerformLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) \|\| PerformLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks); + return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) \|\| + PerformLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks); Lint: Pre-merge checks: clang-format: please reformat the code ``` - return Changed \|\| PerformLoadPRE(Load…
}		}

static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {		static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {
if (Cmp->getPredicate() == CmpInst::Predicate::ICMP_EQ)		if (Cmp->getPredicate() == CmpInst::Predicate::ICMP_EQ)
return true;		return true;

// Floating point comparisons can be equal, but not equivalent. Cases:		// Floating point comparisons can be equal, but not equivalent. Cases:
// NaNs for unordered operators		// NaNs for unordered operators
▲ Show 20 Lines • Show All 1,414 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/PRE/lpre-call-wrap.ll

	Show All 21 Lines
	; CHECK-LABEL: @_Z12testfunctionR1A(			; CHECK-LABEL: @_Z12testfunctionR1A(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr [[STRUCT_A:%.]], %struct.A* [[ITER:%.*]], i32 0, i32 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr [[STRUCT_A:%.]], %struct.A* [[ITER:%.*]], i32 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], 0			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[TMP1]], 0
	; CHECK-NEXT: br i1 [[TMP2]], label [[RETURN:%.]], label [[BB_NPH:%.]]			; CHECK-NEXT: br i1 [[TMP2]], label [[RETURN:%.]], label [[BB_NPH:%.]]
	; CHECK: bb.nph:			; CHECK: bb.nph:
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr [[STRUCT_A]], %struct.A [[ITER]], i32 0, i32 1			; CHECK-NEXT: [[TMP3:%.]] = getelementptr [[STRUCT_A]], %struct.A [[ITER]], i32 0, i32 1
				; CHECK-NEXT: [[DOTPRE1:%.]] = load i32, i32 [[TMP3]], align 4
	; CHECK-NEXT: br label [[BB:%.*]]			; CHECK-NEXT: br label [[BB:%.*]]
	; CHECK: bb:			; CHECK: bb:
	; CHECK-NEXT: [[DOTRLE:%.]] = phi i32 [ [[TMP1]], [[BB_NPH]] ], [ [[TMP7:%.]], [[BB3_BACKEDGE:%.*]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi i32 [ [[DOTPRE1]], [[BB_NPH]] ], [ [[TMP8:%.]], [[BB3_BACKEDGE:%.*]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[DOTRLE]], 1			; CHECK-NEXT: [[DOTRLE:%.]] = phi i32 [ [[TMP1]], [[BB_NPH]] ], [ [[TMP7:%.]], [[BB3_BACKEDGE]] ]
	; CHECK-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4			; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[DOTRLE]], 1
	; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP3]], align 4			; CHECK-NEXT: store i32 [[TMP5]], i32* [[TMP0]], align 4
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], [[TMP4]]
	; CHECK-NEXT: br i1 [[TMP6]], label [[BB1:%.*]], label [[BB3_BACKEDGE]]			; CHECK-NEXT: br i1 [[TMP6]], label [[BB1:%.*]], label [[BB3_BACKEDGE]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: tail call void @_Z1gv()			; CHECK-NEXT: tail call void @_Z1gv()
	; CHECK-NEXT: [[DOTPRE:%.]] = load i32, i32 [[TMP0]], align 4			; CHECK-NEXT: [[DOTPRE:%.]] = load i32, i32 [[TMP3]], align 4
				; CHECK-NEXT: [[DOTPRE2:%.]] = load i32, i32 [[TMP0]], align 4
	; CHECK-NEXT: br label [[BB3_BACKEDGE]]			; CHECK-NEXT: br label [[BB3_BACKEDGE]]
	; CHECK: bb3.backedge:			; CHECK: bb3.backedge:
	; CHECK-NEXT: [[TMP7]] = phi i32 [ [[DOTPRE]], [[BB1]] ], [ [[TMP4]], [[BB]] ]			; CHECK-NEXT: [[TMP7]] = phi i32 [ [[DOTPRE2]], [[BB1]] ], [ [[TMP5]], [[BB]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP7]], 0			; CHECK-NEXT: [[TMP8]] = phi i32 [ [[DOTPRE]], [[BB1]] ], [ [[TMP4]], [[BB]] ]
	; CHECK-NEXT: br i1 [[TMP8]], label [[RETURN]], label [[BB]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP7]], 0
				; CHECK-NEXT: br i1 [[TMP9]], label [[RETURN]], label [[BB]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = getelementptr %struct.A, %struct.A* %iter, i32 0, i32 0 ; <i32*> [#uses=3]			%0 = getelementptr %struct.A, %struct.A* %iter, i32 0, i32 0 ; <i32*> [#uses=3]
	%1 = load i32, i32* %0, align 4 ; <i32> [#uses=2]			%1 = load i32, i32* %0, align 4 ; <i32> [#uses=2]
	%2 = icmp eq i32 %1, 0 ; <i1> [#uses=1]			%2 = icmp eq i32 %1, 0 ; <i1> [#uses=1]
	br i1 %2, label %return, label %bb.nph			br i1 %2, label %return, label %bb.nph
	Show All 27 Lines

llvm/test/Transforms/GVN/PRE/pre-aliasning-path.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -basic-aa -enable-load-pre -enable-pre -gvn -S < %s \| FileCheck %s		; RUN: opt -basic-aa -enable-load-pre -enable-pre -gvn -S < %s \| FileCheck %s

declare void @side_effect_0()		declare void @side_effect_0()

declare void @side_effect_1(i32 %x)		declare void @side_effect_1(i32 %x)

declare void @no_side_effect() readonly		declare void @no_side_effect() readonly

; TODO: We can PRE the load into the cold path, removing it from the hot path.
define i32 @test_01(i32* %p) {		define i32 @test_01(i32* %p) {
; CHECK-LABEL: @test_01(		; CHECK-LABEL: @test_01(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P:%.*]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]		; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4		; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100
; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]		; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
; CHECK: hot_path:		; CHECK: hot_path:
; CHECK-NEXT: br label [[BACKEDGE]]		; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: cold_path:		; CHECK: cold_path:
; CHECK-NEXT: call void @side_effect_0()		; CHECK-NEXT: call void @side_effect_0()
		; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P]], align 4
; CHECK-NEXT: br label [[BACKEDGE]]		; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: backedge:		; CHECK: backedge:
		; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]		; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000		; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret i32 [[X]]		; CHECK-NEXT: ret i32 [[X]]
;		;
entry:		entry:
br label %loop		br label %loop
Show All 15 Lines	backedge:
%iv.next = add i32 %iv, %x		%iv.next = add i32 %iv, %x
%loop.cond = icmp ult i32 %iv.next, 1000		%loop.cond = icmp ult i32 %iv.next, 1000
br i1 %loop.cond, label %loop, label %exit		br i1 %loop.cond, label %loop, label %exit

exit:		exit:
ret i32 %x		ret i32 %x
}		}

; TODO: We can PRE the load into the cold path, removing it from the hot path.
define i32 @test_02(i32* %p) {		define i32 @test_02(i32* %p) {
; CHECK-LABEL: @test_02(		; CHECK-LABEL: @test_02(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P:%.*]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]		; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4		; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100
; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]		; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
; CHECK: hot_path:		; CHECK: hot_path:
; CHECK-NEXT: br label [[BACKEDGE]]		; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: cold_path:		; CHECK: cold_path:
; CHECK-NEXT: call void @side_effect_1(i32 [[X]])		; CHECK-NEXT: call void @side_effect_1(i32 [[X]])
		; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P]], align 4
; CHECK-NEXT: br label [[BACKEDGE]]		; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: backedge:		; CHECK: backedge:
		; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]		; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000		; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret i32 [[X]]		; CHECK-NEXT: ret i32 [[X]]
;		;
entry:		entry:
br label %loop		br label %loop
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -basic-aa -enable-load-pre -enable-pre -gvn -S < %s \| FileCheck %s			; RUN: opt -basic-aa -enable-load-pre -enable-pre -gvn -S < %s \| FileCheck %s

	declare void @side_effect()			declare void @side_effect()
	declare i1 @side_effect_cond()			declare i1 @side_effect_cond()

	declare i32 @personality_function()			declare i32 @personality_function()

	; TODO: We can PRE the load away from the hot path.
	reamesUnsubmitted Not Done Reply Inline Actions Please add a positive test (analogous to this one), but using an alloca. reames: Please add a positive test (analogous to this one), but using an alloca.
	define i32 @test_load_on_cold_path(i32* %p) {			define i32 @test_load_on_cold_path(i32* %p) {
	; CHECK-LABEL: @test_load_on_cold_path(			; CHECK-LABEL: @test_load_on_cold_path(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: call void @side_effect()			; CHECK-NEXT: call void @side_effect()
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P]], align 4
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE1]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i32 [ 0, %entry], [%iv.next, %backedge]			%iv = phi i32 [ 0, %entry], [%iv.next, %backedge]
	%x = load i32, i32* %p			%x = load i32, i32* %p
	%cond = icmp ne i32 %x, 0			%cond = icmp ne i32 %x, 0
	br i1 %cond, label %hot_path, label %cold_path			br i1 %cond, label %hot_path, label %cold_path

	hot_path:			hot_path:
	br label %backedge			br label %backedge

	cold_path:			cold_path:
	call void @side_effect()			call void @side_effect()
				nikicUnsubmitted Not Done Reply Inline Actions This test looks broken. I think for it to do something useful you'll want to pass %p to may_free_memory (or another function). Otherwise the load is just undef and there is no clobber in the loop either. nikic: This test looks broken. I think for it to do something useful you'll want to pass %p to…
	br label %backedge			br label %backedge

	backedge:			backedge:
	%iv.next = add i32 %iv, %x			%iv.next = add i32 %iv, %x
	%loop.cond = icmp ult i32 %iv.next, 1000			%loop.cond = icmp ult i32 %iv.next, 1000
	br i1 %loop.cond, label %loop, label %exit			br i1 %loop.cond, label %loop, label %exit

	exit:			exit:
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i32 %x			ret i32 %x
	}			}

	; TODO: We can PRE via splitting of the critical edge in the cold path.			; TODO: We can PRE via splitting of the critical edge in the cold path.
	define i32 @test_load_on_exiting_cold_path_01(i32* %p) {			define i32 @test_load_on_exiting_cold_path_01(i32* %p) {
	; CHECK-LABEL: @test_load_on_exiting_cold_path_01(			; CHECK-LABEL: @test_load_on_exiting_cold_path_01(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: [[SIDE_COND:%.*]] = call i1 @side_effect_cond()			; CHECK-NEXT: [[SIDE_COND:%.*]] = call i1 @side_effect_cond()
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P]], align 4
	; CHECK-NEXT: br i1 [[SIDE_COND]], label [[BACKEDGE]], label [[COLD_EXIT:%.*]]			; CHECK-NEXT: br i1 [[SIDE_COND]], label [[BACKEDGE]], label [[COLD_EXIT:%.*]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE1]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	; CHECK: cold_exit:			; CHECK: cold_exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	;			;
	▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i32 %x			ret i32 %x
	}			}

	; TODO: We can PRE via splitting of the critical edge in the cold path. Make sure we only insert 1 load.			; TODO: We can PRE via splitting of the critical edge in the cold path. Make sure we only insert 1 load.
	define i32 @test_load_on_multi_exiting_cold_path(i32* %p) {			define i32 @test_load_on_multi_exiting_cold_path(i32* %p) {
	; CHECK-LABEL: @test_load_on_multi_exiting_cold_path(			; CHECK-LABEL: @test_load_on_multi_exiting_cold_path(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH_1:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH_1:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path.1:			; CHECK: cold_path.1:
	; CHECK-NEXT: [[SIDE_COND_1:%.*]] = call i1 @side_effect_cond()			; CHECK-NEXT: [[SIDE_COND_1:%.*]] = call i1 @side_effect_cond()
	; CHECK-NEXT: br i1 [[SIDE_COND_1]], label [[COLD_PATH_2:%.]], label [[COLD_EXIT:%.]]			; CHECK-NEXT: br i1 [[SIDE_COND_1]], label [[COLD_PATH_2:%.]], label [[COLD_EXIT:%.]]
	; CHECK: cold_path.2:			; CHECK: cold_path.2:
	; CHECK-NEXT: [[SIDE_COND_2:%.*]] = call i1 @side_effect_cond()			; CHECK-NEXT: [[SIDE_COND_2:%.*]] = call i1 @side_effect_cond()
	; CHECK-NEXT: br i1 [[SIDE_COND_2]], label [[COLD_PATH_3:%.*]], label [[COLD_EXIT]]			; CHECK-NEXT: br i1 [[SIDE_COND_2]], label [[COLD_PATH_3:%.*]], label [[COLD_EXIT]]
	; CHECK: cold_path.3:			; CHECK: cold_path.3:
	; CHECK-NEXT: [[SIDE_COND_3:%.*]] = call i1 @side_effect_cond()			; CHECK-NEXT: [[SIDE_COND_3:%.*]] = call i1 @side_effect_cond()
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P]], align 4
	; CHECK-NEXT: br i1 [[SIDE_COND_3]], label [[BACKEDGE]], label [[COLD_EXIT]]			; CHECK-NEXT: br i1 [[SIDE_COND_3]], label [[BACKEDGE]], label [[COLD_EXIT]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE1]], [[COLD_PATH_3]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	; CHECK: cold_exit:			; CHECK: cold_exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	;			;
	▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i32 %x			ret i32 %x
	}			}

	; TODO: We can PRE via split of critical edge.			; TODO: We can PRE via split of critical edge.
	define i32 @test_side_exit_after_merge(i32* %p) {			define i32 @test_side_exit_after_merge(i32* %p) {
	; CHECK-LABEL: @test_side_exit_after_merge(			; CHECK-LABEL: @test_side_exit_after_merge(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: [[COND_1:%.*]] = icmp ne i32 [[IV]], 1			; CHECK-NEXT: [[COND_1:%.*]] = icmp ne i32 [[IV]], 1
	; CHECK-NEXT: br i1 [[COND_1]], label [[DO_CALL:%.]], label [[SIDE_EXITING:%.]]			; CHECK-NEXT: br i1 [[COND_1]], label [[DO_CALL:%.]], label [[SIDE_EXITING:%.]]
	; CHECK: do_call:			; CHECK: do_call:
	; CHECK-NEXT: [[SIDE_COND:%.*]] = call i1 @side_effect_cond()			; CHECK-NEXT: [[SIDE_COND:%.*]] = call i1 @side_effect_cond()
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P]], align 4
	; CHECK-NEXT: br label [[SIDE_EXITING]]			; CHECK-NEXT: br label [[SIDE_EXITING]]
	; CHECK: side_exiting:			; CHECK: side_exiting:
				; CHECK-NEXT: [[X3:%.*]] = phi i32 [ [[X_PRE1]], [[DO_CALL]] ], [ 0, [[COLD_PATH]] ]
	; CHECK-NEXT: [[SIDE_COND_PHI:%.*]] = phi i1 [ [[SIDE_COND]], [[DO_CALL]] ], [ true, [[COLD_PATH]] ]			; CHECK-NEXT: [[SIDE_COND_PHI:%.*]] = phi i1 [ [[SIDE_COND]], [[DO_CALL]] ], [ true, [[COLD_PATH]] ]
	; CHECK-NEXT: br i1 [[SIDE_COND_PHI]], label [[BACKEDGE]], label [[COLD_EXIT:%.*]]			; CHECK-NEXT: br i1 [[SIDE_COND_PHI]], label [[BACKEDGE]], label [[COLD_EXIT:%.*]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X3]], [[SIDE_EXITING]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	; CHECK: cold_exit:			; CHECK: cold_exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	;			;
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines

	exit:			exit:
	ret i32 %x			ret i32 %x
	}			}

	define i32 @test_guard_2(i32* %p, i32 %g) {			define i32 @test_guard_2(i32* %p, i32 %g) {
	; CHECK-LABEL: @test_guard_2(			; CHECK-LABEL: @test_guard_2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[GUARD_COND:%.]] = icmp ne i32 [[IV]], [[G:%.]]			; CHECK-NEXT: [[GUARD_COND:%.]] = icmp ne i32 [[IV]], [[G:%.]]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[GUARD_COND]]) [ "deopt"() ]			; CHECK-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 [[GUARD_COND]]) [ "deopt"() ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100			; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[X]], 100
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: call void @side_effect()			; CHECK-NEXT: call void @side_effect()
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 [[P]], align 4
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE1]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	;			;
	entry:			entry:
	br label %loop			br label %loop
	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Introduce loop load PREClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 336440

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Transforms/Scalar/GVN.cpp

llvm/test/Transforms/GVN/PRE/lpre-call-wrap.ll

llvm/test/Transforms/GVN/PRE/pre-aliasning-path.ll

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll

[GVN] Introduce loop load PRE
ClosedPublic