This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
GVN.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
10/19
GVN.cpp
-
test/Transforms/GVN/PRE/
-
Transforms/
-
GVN/
-
PRE/
2
pre-loop-load.ll

Differential D99926

[GVN] Introduce loop load PRE
ClosedPublic

Authored by mkazantsev on Apr 5 2021, 10:57 PM.

Download Raw Diff

Details

Reviewers

nikic
reames
nickdesaulniers
fhahn

Commits

rG8fe62b7af112: [GVN] Introduce loop load PRE

Summary

This patch allows PRE of the following type of loads:

preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

Into

preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:

We can sometimes be smarter in loop-exiting blocks via split of critical edges;
If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkazantsev created this revision.Apr 5 2021, 10:57 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 5 2021, 10:57 PM

mkazantsev requested review of this revision.Apr 5 2021, 10:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2021, 10:57 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mkazantsev edited the summary of this revision. (Show Details)Apr 5 2021, 10:57 PM

Harbormaster completed remote builds in B97235: Diff 335410.Apr 5 2021, 11:28 PM

lkail added a subscriber: lkail.Apr 5 2021, 11:50 PM

Just for context, I'd explored a very similar transform before in https://reviews.llvm.org/D7061. One really key difference between the prior attempt and this one is that previously I hadn't explicitly handled loops and instead tried to match this from the original IR. I don't think loop info was available at the time. Though, looking at the current code, it looks like LoopInfo is optional for the pass even now.

One other key detail that has changed is we now support speculation, and don't necessarily have to prove anticipation (which the earlier change struggled with.)

In general, I think the new approach is much more likely to be successful than the original since we're solving a subset of the problems.

Code structure wise, I want to suggest we don't try to shove this new transformation into the existing performLoadPRE codepath. Several of the concerns of that code (e.g. address translation) don't apply for the loop case, and you have at least one bug (whether we need to check speculation safety) because of trying to reuse the code.

I'd suggest instead that you split out the last third or so of that function into a helper which blindly performs the insertion, and replacement, and then implement a second performLoopLoadPRE entry which checks the appropriate legality for the new transform.

I also seriously question whether this is worth doing in old-GVN at all. The only infrastructure you actually need for this is memory aliasing and speculation safety. I'd seriously suggest writing a standalone pass which uses MemorySSA and ValueTracking, and maybe reuses the extracted helper function mentioned above.

I'm pretty sure that availability problem you are referencing does not exist. See last 2 tests with guards.

As for refactoring, I'm going to do it. Putting WIP in the patch.

Looking more into the code, I don't think that loop PRE needs any other legality checks than what we have now.

Harbormaster completed remote builds in B97897: Diff 336335.Apr 9 2021, 1:42 AM

Split code out into different method. Haven't figured out yet how to make it in a separate pass with MemorySSA, but I think having it in GVN won't harm.

Harbormaster completed remote builds in B97975: Diff 336440.Apr 9 2021, 7:42 AM

Comments inline include one serious correctness issue.

This is much cleaner than the original patch. I was initially hesitant to take this at all - as opposed to using MemorySSA or NewGVN - but with the new structure this looks a lot less invasive.

llvm/lib/Transforms/Scalar/GVN.cpp
1466	Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe. Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration. You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest.
1480	Tweak this comment a bit to emphasize that this ensures the new load executes at most as often as the original, and likely less often.
1485	I don't understand this restriction. Why is a switch not allowed?
1496	I don't think this loop does what you want, except maybe by accident. You allowed blocks outside the loop, as a result, you can end up with a bunch of available addresses and a bunch of loads before the preheader. This will likely later be DCEd since the preheader load will be the one actually used by SSA gen. I strongly suspect you want exactly two available load locations: preheader, and your one in-loop clobber block.

This revision now requires changes to proceed.Apr 13 2021, 12:11 PM

mkazantsev added inline comments.Apr 13 2021, 8:17 PM

llvm/lib/Transforms/Scalar/GVN.cpp
1466	Free on the last iteration (the loop may have multiple though) is a nasty case indeed...
1485	This was ogirinally the protection against invokes. Switch is allowed, will fix.
1496	Yes, this check was lost during the refactoring. I'm pretty sure that `eliminatePartiallyRedundantLoad` will deal with it correctly, but it's at least not obvious. Thanks for catching.

Addressed comments, fixed bug with

Harbormaster completed remote builds in B98818: Diff 337637.Apr 15 2021, 12:19 AM

I'm planning to add support for pointers basing on D99135, maybe as follow-up or on top of this.

mkazantsev planned changes to this revision.Apr 15 2021, 11:50 PM

Looks close to ready for an LGTM if you're willing to split patch as suggested.

llvm/lib/Transforms/Scalar/GVN.cpp
1496	continue the comment with something like: "because we need a place to insert a copy of the load". p.s. I'm fine with this in an initial patch, but you really should be using an alias check here as the trailing invoke might not alias the memory being PREed. Would make a good follow up patch.
1515	You can generalize the first check as !LoadPtr->canBeFreed()
1516	This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) { // this is loop block clobber(); } atomic store g = o; while(wait for other thread to free) {} } Can I ask you to pull this into a separate patch? (e.g. handle only the first two cases in this patch, and come back to the third in a follow on.)
1519	Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once D100226 lands).

Reused canBeFreed.

Removed code related to nofree analysis for further follow-up.

Improved comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1516	Good idea. Removed from this patch, need to make it more carefully.
1519	Removed.

Harbormaster completed remote builds in B99624: Diff 338727.Apr 20 2021, 12:27 AM

LGTM w/minor comments.

llvm/lib/Transforms/Scalar/GVN.cpp
1467	Type: In order
1511	In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling canBeFreed. Please don't do this in the change being lgtmed now.
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
10	Please add a positive test (analogous to this one), but using an alloca.

This revision is now accepted and ready to land.Apr 20 2021, 9:18 AM

nikic added inline comments.Apr 20 2021, 10:23 AM

llvm/lib/Transforms/Scalar/GVN.cpp
1517	Where do we check that LoadPtr is loop invariant (and thus available in preheader)?
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
214	This test looks broken. I think for it to do something useful you'll want to pass %p to may_free_memory (or another function). Otherwise the load is just undef and there is no clobber in the loop either.

LGTM withdrawn due to issue noticed by Nikita. Please address and refresh.

llvm/lib/Transforms/Scalar/GVN.cpp
1517	Er good question, and good catch. I remember this being here, but maybe it got lost in rebase. Max, please add back the check, and a test which would have caught this.

This revision now requires changes to proceed.Apr 20 2021, 10:51 AM

mkazantsev added inline comments.Apr 20 2021, 10:00 PM

llvm/lib/Transforms/Scalar/GVN.cpp
1517	Wow, I was sure it was there. Thanks for catching!

Added loop invariant check. Test added as underlying patch.

Typo fix

mkazantsev added inline comments.Apr 21 2021, 12:14 AM

llvm/lib/Transforms/Scalar/GVN.cpp
1511	Shouldn't it be a part of `canBeFreed`?

Harbormaster completed remote builds in B99898: Diff 339116.Apr 21 2021, 1:31 AM

Harbormaster completed remote builds in B99899: Diff 339117.Apr 21 2021, 1:57 AM

LGTM

This revision is now accepted and ready to land.Apr 21 2021, 8:05 AM

Closed by commit rG8fe62b7af112: [GVN] Introduce loop load PRE (authored by mkazantsev). · Explain WhyApr 21 2021, 11:03 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG8fe62b7af112: [GVN] Introduce loop load PRE.

nikic mentioned this in D126382: [GVN] Enable enable-split-backedge-in-load-pre option by default.May 25 2022, 7:55 AM

nikic mentioned this in rG1721ff1dfd45: [GVN] Enable enable-split-backedge-in-load-pre option by default.May 30 2022, 12:56 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

GVN.h

6 lines

lib/

Transforms/

Scalar/

GVN.cpp

92 lines

test/

Transforms/

GVN/

PRE/

pre-loop-load.ll

9 lines

Diff 339480

llvm/include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	private:
/// ValuesPerBlock. If not, add it to UnavailableBlocks.		/// ValuesPerBlock. If not, add it to UnavailableBlocks.
void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,		void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
AvailValInBlkVect &ValuesPerBlock,		AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);		UnavailBlkVect &UnavailableBlocks);

		/// Try to replace a load which executes on each loop iteraiton with Phi
		/// translation of load in preheader and load(s) in conditionally executed
		/// paths.
		bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
		UnavailBlkVect &UnavailableBlocks);

/// Eliminates partially redundant \p Load, replacing it with \p		/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).		/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(		void eliminatePartiallyRedundantLoad(
LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,		LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
MapVector<BasicBlock , Value > &AvailableLoads);		MapVector<BasicBlock , Value > &AvailableLoads);

// Other helper routines		// Other helper routines
bool processInstruction(Instruction *I);		bool processInstruction(Instruction *I);
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

using namespace llvm;		using namespace llvm;
using namespace llvm::gvn;		using namespace llvm::gvn;
using namespace llvm::VNCoercion;		using namespace llvm::VNCoercion;
using namespace PatternMatch;		using namespace PatternMatch;

#define DEBUG_TYPE "gvn"		#define DEBUG_TYPE "gvn"

STATISTIC(NumGVNInstr, "Number of instructions deleted");		STATISTIC(NumGVNInstr, "Number of instructions deleted");
STATISTIC(NumGVNLoad, "Number of loads deleted");		STATISTIC(NumGVNLoad, "Number of loads deleted");
STATISTIC(NumGVNPRE, "Number of instructions PRE'd");		STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");		STATISTIC(NumGVNBlocks, "Number of blocks merged");
STATISTIC(NumGVNSimpl, "Number of instructions simplified");		STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");		STATISTIC(NumGVNEqProp, "Number of equalities propagated");
STATISTIC(NumPRELoad, "Number of loads PRE'd");		STATISTIC(NumPRELoad, "Number of loads PRE'd");
		STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");

STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,		STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "		"Number of blocks speculated as available in "
"IsValueFullyAvailableInBlock(), max");		"IsValueFullyAvailableInBlock(), max");
STATISTIC(MaxBBSpeculationCutoffReachedTimes,		STATISTIC(MaxBBSpeculationCutoffReachedTimes,
"Number of times we we reached gvn-max-block-speculations cut-off "		"Number of times we we reached gvn-max-block-speculations cut-off "
"preventing further exploration");		"preventing further exploration");

▲ Show 20 Lines • Show All 1,333 Lines • ▼ Show 20 Lines	for (Instruction *I : NewInsts) {
VN.lookupOrAdd(I);		VN.lookupOrAdd(I);
}		}

eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads);		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, PredLoads);
++NumPRELoad;		++NumPRELoad;
return true;		return true;
}		}

		bool GVN::performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
		UnavailBlkVect &UnavailableBlocks) {
		if (!LI)
		return false;

		const Loop *L = LI->getLoopFor(Load->getParent());
		// TODO: Generalize to other loop blocks that dominate the latch.
		if (!L \|\| L->getHeader() != Load->getParent())
		return false;

		BasicBlock *Preheader = L->getLoopPreheader();
		BasicBlock *Latch = L->getLoopLatch();
		if (!Preheader \|\| !Latch)
		return false;

		Value *LoadPtr = Load->getPointerOperand();
		reamesUnsubmitted Not Done Reply Inline Actions Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe. Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration. You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest. reames: Extend this comment to emphasize that this means we have proven the load must execute if the…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Free on the last iteration (the loop may have multiple though) is a nasty case indeed... mkazantsev: Free on the last iteration (the loop may have multiple though) is a nasty case indeed...
		// Must be available in preheader.
		reamesUnsubmitted Not Done Reply Inline Actions Type: In order reames: Type: In order
		if (!L->isLoopInvariant(LoadPtr))
		return false;

		// We plan to hoist the load to preheader without introducing a new fault.
		// In order to do it, we need to prove that we cannot side-exit the loop
		// once loop header is first entered before execution of the load.
		if (ICF->isDominatedByICFIFromSameBlock(Load))
		return false;

		BasicBlock *LoopBlock = nullptr;
		for (auto *Blocker : UnavailableBlocks) {
		// Blockers from outside the loop are handled in preheader.
		if (!L->contains(Blocker))
		reamesUnsubmitted Done Reply Inline Actions Tweak this comment a bit to emphasize that this ensures the new load executes at most as often as the original, and likely less often. reames: Tweak this comment a bit to emphasize that this ensures the new load executes at most as often…
		continue;

		// Only allow one loop block. Loop header is not less frequently executed
		// than each loop block, and likely it is much more frequently executed. But
		// in case of multiple loop blocks, we need extra information (such as block
		reamesUnsubmitted Not Done Reply Inline Actions I don't understand this restriction. Why is a switch not allowed? reames: I don't understand this restriction. Why is a switch not allowed?
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions This was ogirinally the protection against invokes. Switch is allowed, will fix. mkazantsev: This was ogirinally the protection against invokes. Switch is allowed, will fix.
		// frequency info) to understand whether it is profitable to PRE into
		// multiple loop blocks.
		if (LoopBlock)
		return false;

		// Do not sink into inner loops. This may be non-profitable.
		if (L != LI->getLoopFor(Blocker))
		return false;

		// Blocks that dominate the latch execute on every single iteration, maybe
		// except the last one. So PREing into these blocks doesn't make much sense
		reamesUnsubmitted Not Done Reply Inline Actions I don't think this loop does what you want, except maybe by accident. You allowed blocks outside the loop, as a result, you can end up with a bunch of available addresses and a bunch of loads before the preheader. This will likely later be DCEd since the preheader load will be the one actually used by SSA gen. I strongly suspect you want exactly two available load locations: preheader, and your one in-loop clobber block. reames: I don't think this loop does what you want, except maybe by accident. You allowed blocks…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Yes, this check was lost during the refactoring. I'm pretty sure that `eliminatePartiallyRedundantLoad` will deal with it correctly, but it's at least not obvious. Thanks for catching. mkazantsev: Yes, this check was lost during the refactoring. I'm pretty sure that…
		reamesUnsubmitted Not Done Reply Inline Actions continue the comment with something like: "because we need a place to insert a copy of the load". p.s. I'm fine with this in an initial patch, but you really should be using an alias check here as the trailing invoke might not alias the memory being PREed. Would make a good follow up patch. reames: continue the comment with something like: "because we need a place to insert a copy of the…
		// in most cases. But the blocks that do not necessarily execute on each
		// iteration are sometimes much colder than the header, and this is when
		// PRE is potentially profitable.
		if (DT->dominates(Blocker, Latch))
		return false;

		// Make sure that the terminator itself doesn't clobber.
		if (Blocker->getTerminator()->mayWriteToMemory())
		return false;

		LoopBlock = Blocker;
		}

		if (!LoopBlock)
		return false;
		reamesUnsubmitted Not Done Reply Inline Actions In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling canBeFreed. Please don't do this in the change being lgtmed now. reames: In a follow up, please generalize by using stripping bitcasts and inbounds geps before calling…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Shouldn't it be a part of `canBeFreed`? mkazantsev: Shouldn't it be a part of `canBeFreed`?

		// Make sure the memory at this pointer cannot be freed, therefore we can
		// safely reload from it after clobber.
		if (LoadPtr->canBeFreed())
		reamesUnsubmitted Done Reply Inline Actions You can generalize the first check as !LoadPtr->canBeFreed() reames: You can generalize the first check as !LoadPtr->canBeFreed()
		return false;
		reamesUnsubmitted Done Reply Inline Actions This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) { // this is loop block clobber(); } atomic store g = o; while(wait for other thread to free) {} } Can I ask you to pull this into a separate patch? (e.g. handle only the first two cases in this patch, and come back to the third in a follow on.) reames: This last check is incorrect. Counter example: for (i = 0; i < 1; i++) v = o.f if (c) {…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Good idea. Removed from this patch, need to make it more carefully. mkazantsev: Good idea. Removed from this patch, need to make it more carefully.

		nikicUnsubmitted Not Done Reply Inline Actions Where do we check that LoadPtr is loop invariant (and thus available in preheader)? nikic: Where do we check that LoadPtr is loop invariant (and thus available in preheader)?
		reamesUnsubmitted Not Done Reply Inline Actions Er good question, and good catch. I remember this being here, but maybe it got lost in rebase. Max, please add back the check, and a test which would have caught this. reames: Er good question, and good catch. I remember this being here, but maybe it got lost in rebase.
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Wow, I was sure it was there. Thanks for catching! mkazantsev: Wow, I was sure it was there. Thanks for catching!
		// TODO: Support critical edge splitting if blocker has more than 1 successor.
		MapVector<BasicBlock , Value > AvailableLoads;
		reamesUnsubmitted Not Done Reply Inline Actions Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once D100226 lands). reames: Style: hasFnAttribute(AttributeKind::NoFree) handles both of these cases. (Or will once…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Removed. mkazantsev: Removed.
		AvailableLoads[LoopBlock] = LoadPtr;
		AvailableLoads[Preheader] = LoadPtr;

		LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
		eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);
		++NumPRELoopLoad;
		return true;
		}

static void reportLoadElim(LoadInst Load, Value AvailableValue,		static void reportLoadElim(LoadInst Load, Value AvailableValue,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
using namespace ore;		using namespace ore;

ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "LoadElim", Load)		return OptimizationRemark(DEBUG_TYPE, "LoadElim", Load)
<< "load of type " << NV("Type", Load->getType()) << " eliminated"		<< "load of type " << NV("Type", Load->getType()) << " eliminated"
<< setExtraArgs() << " in favor of "		<< setExtraArgs() << " in favor of "
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	bool GVN::processNonLocalLoad(LoadInst *Load) {
}		}

// Step 4: Eliminate partial redundancy.		// Step 4: Eliminate partial redundancy.
if (!isPREEnabled() \|\| !isLoadPREEnabled())		if (!isPREEnabled() \|\| !isLoadPREEnabled())
return Changed;		return Changed;
if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))		if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))
return Changed;		return Changed;

return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);		return Changed \|\| PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) \|\|
		performLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
}		}

static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {		static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {
if (Cmp->getPredicate() == CmpInst::Predicate::ICMP_EQ)		if (Cmp->getPredicate() == CmpInst::Predicate::ICMP_EQ)
return true;		return true;

// Floating point comparisons can be equal, but not equivalent. Cases:		// Floating point comparisons can be equal, but not equivalent. Cases:
// NaNs for unordered operators		// NaNs for unordered operators
▲ Show 20 Lines • Show All 1,414 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -basic-aa -enable-load-pre -enable-pre -lcssa -gvn -S < %s \| FileCheck %s			; RUN: opt -basic-aa -enable-load-pre -enable-pre -lcssa -gvn -S < %s \| FileCheck %s

	declare void @side_effect() nofree			declare void @side_effect() nofree
	declare i1 @side_effect_cond() nofree			declare i1 @side_effect_cond() nofree
	declare void @may_free_memory()			declare void @may_free_memory()

	declare i32 @personality_function()			declare i32 @personality_function()

	; TODO: We can PRE the load from gc-managed memory away from the hot path.			; We can PRE the load from gc-managed memory away from the hot path.
	reamesUnsubmitted Not Done Reply Inline Actions Please add a positive test (analogous to this one), but using an alloca. reames: Please add a positive test (analogous to this one), but using an alloca.
	define i32 @test_load_on_cold_path_gc(i32 addrspace(1)* %p) gc "statepoint-example" personality i32 ()* @"personality_function" {			define i32 @test_load_on_cold_path_gc(i32 addrspace(1)* %p) gc "statepoint-example" personality i32 ()* @"personality_function" {
	; CHECK-LABEL: @test_load_on_cold_path_gc(			; CHECK-LABEL: @test_load_on_cold_path_gc(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X_PRE1:%.]] = load i32, i32 addrspace(1) [[P:%.*]], align 4
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]			; CHECK-NEXT: [[X:%.]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.]] ], [ [[X2:%.]], [[BACKEDGE:%.]] ]
	; CHECK-NEXT: [[X:%.]] = load i32, i32 addrspace(1) [[P:%.*]], align 4			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.]], [[BACKEDGE]] ]
	; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: call void @may_free_memory()			; CHECK-NEXT: call void @may_free_memory()
				; CHECK-NEXT: [[X_PRE:%.]] = load i32, i32 addrspace(1) [[P]], align 4
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
				; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]			; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
	; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000			; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
	; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 [[X]]			; CHECK-NEXT: ret i32 [[X]]
	;			;
	entry:			entry:
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; cannot free memory allocated by alloca.			; cannot free memory allocated by alloca.
	define i32 @test_load_on_cold_path_may_free_memory_alloca() {			define i32 @test_load_on_cold_path_may_free_memory_alloca() {
	; CHECK-LABEL: @test_load_on_cold_path_may_free_memory_alloca(			; CHECK-LABEL: @test_load_on_cold_path_may_free_memory_alloca(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[P:%.*]] = alloca i32, align 4			; CHECK-NEXT: [[P:%.*]] = alloca i32, align 4
	; CHECK-NEXT: call void @may_free_memory()			; CHECK-NEXT: call void @may_free_memory()
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: br i1 undef, label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]			; CHECK-NEXT: br i1 undef, label [[HOT_PATH:%.]], label [[COLD_PATH:%.]]
				nikicUnsubmitted Not Done Reply Inline Actions This test looks broken. I think for it to do something useful you'll want to pass %p to may_free_memory (or another function). Otherwise the load is just undef and there is no clobber in the loop either. nikic: This test looks broken. I think for it to do something useful you'll want to pass %p to…
	; CHECK: hot_path:			; CHECK: hot_path:
	; CHECK-NEXT: br label [[BACKEDGE:%.*]]			; CHECK-NEXT: br label [[BACKEDGE:%.*]]
	; CHECK: cold_path:			; CHECK: cold_path:
	; CHECK-NEXT: call void @may_free_memory()			; CHECK-NEXT: call void @may_free_memory()
	; CHECK-NEXT: br label [[BACKEDGE]]			; CHECK-NEXT: br label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: br i1 false, label [[BACKEDGE_LOOP_CRIT_EDGE:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 false, label [[BACKEDGE_LOOP_CRIT_EDGE:%.]], label [[EXIT:%.]]
	; CHECK: backedge.loop_crit_edge:			; CHECK: backedge.loop_crit_edge:
	▲ Show 20 Lines • Show All 714 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Introduce loop load PREClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 339480

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Transforms/Scalar/GVN.cpp

llvm/test/Transforms/GVN/PRE/pre-loop-load.ll

[GVN] Introduce loop load PRE
ClosedPublic