This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
AliasSetTracker.h
-
lib/
-
Analysis/
-
AliasSetTracker.cpp
-
Transforms/Scalar/
-
Scalar/
3/7
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
promote-atomic.ll

Differential D89264

[LICM] Make promotion faster
ClosedPublic

Authored by nikic on Oct 12 2020, 1:05 PM.

Download Raw Diff

Details

Reviewers

asbirlea
fhahn

Commits

rG403da6a69abc: Reapply [LICM] Make promotion faster
rG29482426b58e: Revert "[LICM] Make promotion faster"
rG3d8f842712d4: [LICM] Make promotion faster

Summary

As mentioned at the MSSA roundtable, LICM spends a lot of time construction an AST, despite the caps already in place. This patch is a proposal on how to reduce the impact.

The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates.

This has a large positive compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f197cf2126be3b224cadfe8b1cde9c05f638a0ea&to=7f1e24e26d198e1d80d32c87c9fd1f5cf3cc8c5f&stat=instructions We save ~1.8% geomean at O3, and lencod in particular saves 6%, with up to 25% on individual files.

There is no impact on the number of promotions (licm.NumPromoted) in test-suite in the O3 configuration with this change.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Oct 12 2020, 1:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2020, 1:05 PM

Herald added subscribers: llvm-commits, dexonsmith, steven_wu and 3 others. · View Herald Transcript

nikic requested review of this revision.Oct 12 2020, 1:05 PM

An ignorant word of caution: anything can be made faster by making it do less stuff.

There's no code-size impact on any of the CTMark programs, so code quality impact of the change should be low.

That doesn't really sound like exhaustive-enough test coverage to me.

Harbormaster completed remote builds in B74845: Diff 297673.Oct 12 2020, 1:59 PM

nikic edited the summary of this revision. (Show Details)Oct 12 2020, 2:24 PM

In D89264#2325990, @lebedev.ri wrote:

An ignorant word of caution: anything can be made faster by making it do less stuff.

There's no code-size impact on any of the CTMark programs, so code quality impact of the change should be low.

That doesn't really sound like exhaustive-enough test coverage to me.

Yes, of course :) I have checked this on the full test-suite now, and there are no codegen differences with this patch. The only differences in statistics are...

basicaa.SearchLimitReached | 146199 | 128184
basicaa.SearchTimes | 50855724 | 23825700
memory-builtins.ObjectVisitorArgument | 331614 | 254610
memory-builtins.ObjectVisitorLoad | 149739 | 81225

...which just tells us that we're doing a lot less AA queries, which is exactly the intention behind this. Of course the lit test failures indicate that regressions here are possible, even if they don't show up in practice.

In D89264#2326088, @nikic wrote:

In D89264#2325990, @lebedev.ri wrote:

An ignorant word of caution: anything can be made faster by making it do less stuff.

There's no code-size impact on any of the CTMark programs, so code quality impact of the change should be low.

That doesn't really sound like exhaustive-enough test coverage to me.

Yes, of course :) I have checked this on the full test-suite now, and there are no codegen differences with this patch. The only differences in statistics are...

I guess i was thinking of even bigger coverage (SPEC?, but i personally don't have it (either?)).

basicaa.SearchLimitReached | 146199 | 128184
basicaa.SearchTimes | 50855724 | 23825700
memory-builtins.ObjectVisitorArgument | 331614 | 254610
memory-builtins.ObjectVisitorLoad | 149739 | 81225
...which just tells us that we're doing a lot less AA queries, which is exactly the intention behind this. Of course the lit test failures indicate that regressions here are possible, even if they don't show up in practice.

IIRC those run-to-run stats changes are there always, and are therefore indicative of nondeterminism.

Could you address the loop versioning test failures?

Rebase to pick up fix for the LoopVersioningLICM noalias metadata. Those regressions are no longer present now.

Rebase over NFC to make this patch smaller.

I tested this in two configurations we use for compiler releases and I'm seeing quite a few runtime regressions in the 10-20% area and a couple of 40% run time regressions.
It doesn't look worth the compile time gain with such large run time regressions.

Add back support for "promotion of promotion".

In D89264#2328663, @asbirlea wrote:

I tested this in two configurations we use for compiler releases and I'm seeing quite a few runtime regressions in the 10-20% area and a couple of 40% run time regressions.
It doesn't look worth the compile time gain with such large run time regressions.

Thank you for running those tests! Those results are pretty unexpected to me, as this was not supposed to change the behavior in any substantial way. The two possibilities that come to mind are:

Those regressions were using the "promotion of promotion" behavior. I originally did not bother supporting this because the situation seemed so contrived, and I would expect EarlyCSE/GVN or anything else doing store-load forwarding to handle it. However, it's also really simple to support and doesn't come with additional compile-time cost, so I extended the patch to handle it (thus also removing the one lit test regression).
The different processing order for accesses ends up mattering in those cases. Due to AATags intersection the AST is order-dependent, and the different order can result in more or less accesses being promoted (and either direction could be either good or bad for performance).

It's hard to guess without a test case to look at, and unfortunately test-suite doesn't have one. Are any of the regressions you observed in publicly available code that I could check?

I don't really want to give up on this change yet, as the compile-time impact is fairly substantial for some types of code (a few >20% wins on individual files).

A couple of public benchmarks:
singlesource polybench jacobi 1d imper is seeing a 11.9% run time regression
multisource MallocBench_gs is seeing a 15.8% run time regression
both on haswell, without thinlto.
With thinlto, eigen is seeing a lot of regressions in the 5-20%range and another benchmark has the 40% spike.

I rerun the performance numbers after the promotion of promotion patch update and the results are improved.
In opt, on haswell, there are still a few notable regressions, in the range of 5-10%, with one 32% outlier; singlesource looks resolved; multisource MallocBench_gs is still seeing a 16.3% regression
With thinlto there are still lots of 5-17% regressions in eigen, but the 40% spike is replaced with a range of 5% regression to one significant improvement.

I'll run additional configurations to determine if the gains out-weight the regressions.
On the 2 configurations I tested so far, there are enough regressions that I wouldn't push this. I wouldn't drop it just yet either, it has good potential for compile and run time.

dexonsmith removed a subscriber: dexonsmith.Oct 19 2020, 5:36 PM

Marking as changes required, as there are some reported regression with this change.

This revision now requires changes to proceed.Nov 2 2020, 1:50 PM

Rebase with some minor cleanup.

@asbirlea Would you mind giving this patch another try? There have been quite a few changes on the alias analysis side in the meantime. I just ran into another case where LICM promotion takes more than 10% of compile-time and remembered this patch.

Ack, testing in progress.

Runtime impact looks reasonable now. What's the compile-time impact for the patch now?

Minor comments inline.

llvm/lib/Transforms/Scalar/LICM.cpp
477	Nit: It seems the conditions in `IsPotentiallyPromotable` would be useful as an early filter to prevent filling up the `MaybePromotable` only to clean it up shortly after.
2261	Remove the `addAllInstructionsInLoopUsingMSSA` API as well.
2289	A quick exit check for promotion is whether the AST is saturated. Probably need to add the API for that inside the AliasSetTracker. bool isSaturated(){ return AliasAnyAS != nullptr; } if (AST.isSaturated()) return {}; // Nothing to promote...

Remove addAllInstructionsInLoopUsingMSSA() API.

In D89264#2573058, @asbirlea wrote:

Runtime impact looks reasonable now. What's the compile-time impact for the patch now?

The link in the summary is already an updated run (https://llvm-compile-time-tracker.com/compare.php?from=f197cf2126be3b224cadfe8b1cde9c05f638a0ea&to=7f1e24e26d198e1d80d32c87c9fd1f5cf3cc8c5f&stat=instructions), so this is now ~1.8% geomean improvement at O3, with 5.5% on lencod in particular.

llvm/lib/Transforms/Scalar/LICM.cpp
477	The idea here is that some accesses may not be promotable initially, but may become promotable after others have been promoted. Each iteration removes all potentially promotable accesses from `MaybePromotable`, leaving the rest for the next iteration (if any promotion happened). For this reason we can't do early filtering here. (See also the comment on the loop below.)
2289	If I'm understanding how this works correctly, won't the "saturated" case be handled automatically? If the AST is saturated, then there will only be a single may-alias set, so the loop below will only iterate once and return afterwards.

ping :)

asbirlea accepted this revision.Mar 2 2021, 12:58 PM

asbirlea added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
477	I misread the condition on which accesses are kept in `MaybePromotable`. This is fine.
2289	Yes, you're right, this is good.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 2 2021, 1:21 PM

Closed by commit rG3d8f842712d4: [LICM] Make promotion faster (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG3d8f842712d4: [LICM] Make promotion faster.

nikic mentioned this in rG29034f38769c: [AST] Remove unused Loop member (NFC).Mar 2 2021, 2:12 PM

asbirlea added a commit: rG29482426b58e: Revert "[LICM] Make promotion faster".Mar 8 2021, 12:56 PM

asbirlea added a reverting change: rG29482426b58e: Revert "[LICM] Make promotion faster".

nikic reopened this revision.Mar 9 2021, 3:13 AM

Always use aliasesUnknownInst() API to correctly model atomic accesses. Add previously miscompiled test case.

Herald added a subscriber: jfb. · View Herald TranscriptMar 9 2021, 3:16 AM

Harbormaster completed remote builds in B92838: Diff 329276.Mar 9 2021, 10:40 AM

LGTM.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 11 2021, 1:50 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG403da6a69abc: Reapply [LICM] Make promotion faster (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG403da6a69abc: Reapply [LICM] Make promotion faster.

It looks like this is causing Clang to crash https://bugs.llvm.org/show_bug.cgi?id=50367

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

AliasSetTracker.h

7 lines

lib/

Analysis/

AliasSetTracker.cpp

11 lines

Transforms/

Scalar/

LICM.cpp

149 lines

test/

Transforms/

LICM/

promote-atomic.ll

34 lines

Diff 329882

llvm/include/llvm/Analysis/AliasSetTracker.h

Show All 32 Lines
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

class AAResults;		class AAResults;
class AliasSetTracker;		class AliasSetTracker;
class BasicBlock;		class BasicBlock;
class LoadInst;		class LoadInst;
class Loop;
class MemorySSA;
class AnyMemSetInst;		class AnyMemSetInst;
class AnyMemTransferInst;		class AnyMemTransferInst;
class raw_ostream;		class raw_ostream;
class StoreInst;		class StoreInst;
class VAArgInst;		class VAArgInst;
class Value;		class Value;

enum AliasResult : uint8_t;		enum AliasResult : uint8_t;
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	public:

ASTCallbackVH &operator=(Value *V);		ASTCallbackVH &operator=(Value *V);
};		};
/// Traits to tell DenseMap that tell us how to compare and hash the value		/// Traits to tell DenseMap that tell us how to compare and hash the value
/// handle.		/// handle.
struct ASTCallbackVHDenseMapInfo : public DenseMapInfo<Value *> {};		struct ASTCallbackVHDenseMapInfo : public DenseMapInfo<Value *> {};

AAResults &AA;		AAResults &AA;
MemorySSA *MSSA = nullptr;
Loop *L = nullptr;
ilist<AliasSet> AliasSets;		ilist<AliasSet> AliasSets;

using PointerMapType = DenseMap<ASTCallbackVH, AliasSet::PointerRec *,		using PointerMapType = DenseMap<ASTCallbackVH, AliasSet::PointerRec *,
ASTCallbackVHDenseMapInfo>;		ASTCallbackVHDenseMapInfo>;

// Map from pointers to their node		// Map from pointers to their node
PointerMapType PointerMap;		PointerMapType PointerMap;

public:		public:
/// Create an empty collection of AliasSets, and use the specified alias		/// Create an empty collection of AliasSets, and use the specified alias
/// analysis object to disambiguate load and store addresses.		/// analysis object to disambiguate load and store addresses.
explicit AliasSetTracker(AAResults &AA) : AA(AA) {}		explicit AliasSetTracker(AAResults &AA) : AA(AA) {}
explicit AliasSetTracker(AAResults &AA, MemorySSA MSSA, Loop L)
: AA(AA), MSSA(MSSA), L(L) {}
~AliasSetTracker() { clear(); }		~AliasSetTracker() { clear(); }

/// These methods are used to add different types of instructions to the alias		/// These methods are used to add different types of instructions to the alias
/// sets. Adding a new instruction can result in one of three actions		/// sets. Adding a new instruction can result in one of three actions
/// happening:		/// happening:
///		///
/// 1. If the instruction doesn't alias any other sets, create a new set.		/// 1. If the instruction doesn't alias any other sets, create a new set.
/// 2. If the instruction aliases exactly one set, add it to the set		/// 2. If the instruction aliases exactly one set, add it to the set
/// 3. If the instruction aliases multiple sets, merge the sets, and add		/// 3. If the instruction aliases multiple sets, merge the sets, and add
/// the instruction to the result.		/// the instruction to the result.
///		///
/// These methods return true if inserting the instruction resulted in the		/// These methods return true if inserting the instruction resulted in the
/// addition of a new alias set (i.e., the pointer did not alias anything).		/// addition of a new alias set (i.e., the pointer did not alias anything).
///		///
void add(Value *Ptr, LocationSize Size, const AAMDNodes &AAInfo); // Add a loc		void add(Value *Ptr, LocationSize Size, const AAMDNodes &AAInfo); // Add a loc
void add(LoadInst *LI);		void add(LoadInst *LI);
void add(StoreInst *SI);		void add(StoreInst *SI);
void add(VAArgInst *VAAI);		void add(VAArgInst *VAAI);
void add(AnyMemSetInst *MSI);		void add(AnyMemSetInst *MSI);
void add(AnyMemTransferInst *MTI);		void add(AnyMemTransferInst *MTI);
void add(Instruction *I); // Dispatch to one of the other add methods...		void add(Instruction *I); // Dispatch to one of the other add methods...
void add(BasicBlock &BB); // Add all instructions in basic block		void add(BasicBlock &BB); // Add all instructions in basic block
void add(const AliasSetTracker &AST); // Add alias relations from another AST		void add(const AliasSetTracker &AST); // Add alias relations from another AST
void addUnknown(Instruction *I);		void addUnknown(Instruction *I);
void addAllInstructionsInLoopUsingMSSA();

void clear();		void clear();

/// Return the alias sets that are active.		/// Return the alias sets that are active.
const ilist<AliasSet> &getAliasSets() const { return AliasSets; }		const ilist<AliasSet> &getAliasSets() const { return AliasSets; }

/// Return the alias set which contains the specified memory location. If		/// Return the alias set which contains the specified memory location. If
/// the memory location aliases two or more existing alias sets, will have		/// the memory location aliases two or more existing alias sets, will have
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/lib/Analysis/AliasSetTracker.cpp

//===- AliasSetTracker.cpp - Alias Sets Tracker implementation-------------===//		//===- AliasSetTracker.cpp - Alias Sets Tracker implementation-------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the AliasSetTracker and AliasSet classes.		// This file implements the AliasSetTracker and AliasSet classes.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/AliasSetTracker.h"		#include "llvm/Analysis/AliasSetTracker.h"
		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	for (const AliasSet &AS : AST) {
// Loop over all of the pointers in this alias set.		// Loop over all of the pointers in this alias set.
for (AliasSet::iterator ASI = AS.begin(), E = AS.end(); ASI != E; ++ASI)		for (AliasSet::iterator ASI = AS.begin(), E = AS.end(); ASI != E; ++ASI)
addPointer(		addPointer(
MemoryLocation(ASI.getPointer(), ASI.getSize(), ASI.getAAInfo()),		MemoryLocation(ASI.getPointer(), ASI.getSize(), ASI.getAAInfo()),
(AliasSet::AccessLattice)AS.Access);		(AliasSet::AccessLattice)AS.Access);
}		}
}		}

void AliasSetTracker::addAllInstructionsInLoopUsingMSSA() {
assert(MSSA && L && "MSSA and L must be available");
for (const BasicBlock *BB : L->blocks())
if (auto *Accesses = MSSA->getBlockAccesses(BB))
for (auto &Access : *Accesses)
if (auto *MUD = dyn_cast<MemoryUseOrDef>(&Access))
add(MUD->getMemoryInst());
}

// deleteValue method - This method is used to remove a pointer value from the		// deleteValue method - This method is used to remove a pointer value from the
// AliasSetTracker entirely. It should be used when an instruction is deleted		// AliasSetTracker entirely. It should be used when an instruction is deleted
// from the program to update the AST. If you don't use this, you would have		// from the program to update the AST. If you don't use this, you would have
// dangling pointers to deleted instructions.		// dangling pointers to deleted instructions.
//		//
void AliasSetTracker::deleteValue(Value *PtrVal) {		void AliasSetTracker::deleteValue(Value *PtrVal) {
// First, look up the PointerRec for this pointer.		// First, look up the PointerRec for this pointer.
PointerMapType::iterator I = PointerMap.find_as(PtrVal);		PointerMapType::iterator I = PointerMap.find_as(PtrVal);
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines

static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,		static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,
AliasSetTracker AST, MemorySSAUpdater MSSAU);		AliasSetTracker AST, MemorySSAUpdater MSSAU);

static void moveInstructionBefore(Instruction &I, Instruction &Dest,		static void moveInstructionBefore(Instruction &I, Instruction &Dest,
ICFLoopSafetyInfo &SafetyInfo,		ICFLoopSafetyInfo &SafetyInfo,
MemorySSAUpdater MSSAU, ScalarEvolution SE);		MemorySSAUpdater MSSAU, ScalarEvolution SE);

		static void foreachMemoryAccess(MemorySSA MSSA, Loop L,
		function_ref<void(Instruction *)> Fn);
		static SmallVector<SmallSetVector<Value *, 8>, 0>
		collectPromotionCandidates(MemorySSA MSSA, AliasAnalysis AA, Loop *L,
		SmallVectorImpl<Instruction *> &MaybePromotable);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,
BlockFrequencyInfo BFI, TargetLibraryInfo TLI,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI,
TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,		TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);

LoopInvariantCodeMotion(unsigned LicmMssaOptCap,		LoopInvariantCodeMotion(unsigned LicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap)
: LicmMssaOptCap(LicmMssaOptCap),		: LicmMssaOptCap(LicmMssaOptCap),
LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}

private:		private:
unsigned LicmMssaOptCap;		unsigned LicmMssaOptCap;
unsigned LicmMssaNoAccForPromotionCap;		unsigned LicmMssaNoAccForPromotionCap;

std::unique_ptr<AliasSetTracker>		std::unique_ptr<AliasSetTracker>
collectAliasInfoForLoop(Loop L, LoopInfo LI, AAResults *AA);		collectAliasInfoForLoop(Loop L, LoopInfo LI, AAResults *AA);
std::unique_ptr<AliasSetTracker>
collectAliasInfoForLoopWithMSSA(Loop L, AAResults AA,
MemorySSAUpdater *MSSAU);
};		};

struct LegacyLICMPass : public LoopPass {		struct LegacyLICMPass : public LoopPass {
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
LegacyLICMPass(		LegacyLICMPass(
unsigned LicmMssaOptCap = SetLicmMssaOptCap,		unsigned LicmMssaOptCap = SetLicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap)
: LoopPass(ID), LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap) {		: LoopPass(ID), LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap) {
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	if (!HasCatchSwitch) {
InsertPts.push_back(&*ExitBlock->getFirstInsertionPt());		InsertPts.push_back(&*ExitBlock->getFirstInsertionPt());
if (MSSAU)		if (MSSAU)
MSSAInsertPts.push_back(nullptr);		MSSAInsertPts.push_back(nullptr);
}		}

PredIteratorCache PIC;		PredIteratorCache PIC;

bool Promoted = false;		bool Promoted = false;
		if (CurAST.get()) {
// Build an AST using MSSA.
if (!CurAST.get())
CurAST = collectAliasInfoForLoopWithMSSA(L, AA, MSSAU.get());

// Loop over all of the alias sets in the tracker object.		// Loop over all of the alias sets in the tracker object.
for (AliasSet &AS : *CurAST) {		for (AliasSet &AS : *CurAST) {
// We can promote this alias set if it has a store, if it is a "Must"		// We can promote this alias set if it has a store, if it is a "Must"
// alias set, if the pointer is loop invariant, and if we are not		// alias set, if the pointer is loop invariant, and if we are not
// eliminating any volatile loads or stores.		// eliminating any volatile loads or stores.
if (AS.isForwardingAliasSet() \|\| !AS.isMod() \|\| !AS.isMustAlias() \|\|		if (AS.isForwardingAliasSet() \|\| !AS.isMod() \|\| !AS.isMustAlias() \|\|
!L->isLoopInvariant(AS.begin()->getValue()))		!L->isLoopInvariant(AS.begin()->getValue()))
continue;		continue;

assert(		assert(
!AS.empty() &&		!AS.empty() &&
"Must alias set should have at least one pointer element in it!");		"Must alias set should have at least one pointer element in it!");

SmallSetVector<Value *, 8> PointerMustAliases;		SmallSetVector<Value *, 8> PointerMustAliases;
for (const auto &ASI : AS)		for (const auto &ASI : AS)
PointerMustAliases.insert(ASI.getValue());		PointerMustAliases.insert(ASI.getValue());

Promoted \|= promoteLoopAccessesToScalars(		Promoted \|= promoteLoopAccessesToScalars(
PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,		PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,
DT, TLI, L, CurAST.get(), MSSAU.get(), &SafetyInfo, ORE);		DT, TLI, L, CurAST.get(), MSSAU.get(), &SafetyInfo, ORE);
}		}
		} else {
		SmallVector<Instruction *, 16> MaybePromotable;
		foreachMemoryAccess(MSSA, L, [&](Instruction *I) {
		MaybePromotable.push_back(I);
		asbirleaUnsubmitted Not Done Reply Inline Actions Nit: It seems the conditions in `IsPotentiallyPromotable` would be useful as an early filter to prevent filling up the `MaybePromotable` only to clean it up shortly after. asbirlea: Nit: It seems the conditions in `IsPotentiallyPromotable` would be useful as an early filter to…
		nikicAuthorUnsubmitted Done Reply Inline Actions The idea here is that some accesses may not be promotable initially, but may become promotable after others have been promoted. Each iteration removes all potentially promotable accesses from `MaybePromotable`, leaving the rest for the next iteration (if any promotion happened). For this reason we can't do early filtering here. (See also the comment on the loop below.) nikic: The idea here is that some accesses may not be promotable initially, but may become promotable…
		asbirleaUnsubmitted Not Done Reply Inline Actions I misread the condition on which accesses are kept in `MaybePromotable`. This is fine. asbirlea: I misread the condition on which accesses are kept in `MaybePromotable`. This is fine.
		});

		// Promoting one set of accesses may make the pointers for another set
		// loop invariant, so run this in a loop (with the MaybePromotable set
		// decreasing in size over time).
		bool LocalPromoted;
		do {
		LocalPromoted = false;
		for (const SmallSetVector<Value *, 8> &PointerMustAliases :
		collectPromotionCandidates(MSSA, AA, L, MaybePromotable)) {
		LocalPromoted \|= promoteLoopAccessesToScalars(
		PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC,
		LI, DT, TLI, L, /AST/nullptr, MSSAU.get(), &SafetyInfo, ORE);
		}
		Promoted \|= LocalPromoted;
		} while (LocalPromoted);
		}

// Once we have promoted values across the loop body we have to		// Once we have promoted values across the loop body we have to
// recursively reform LCSSA as any nested loop may now have values defined		// recursively reform LCSSA as any nested loop may now have values defined
// within the loop used in the outer loop.		// within the loop used in the outer loop.
// FIXME: This is really heavy handed. It would be a bit better to use an		// FIXME: This is really heavy handed. It would be a bit better to use an
// SSAUpdater strategy during promotion that was LCSSA aware and reformed		// SSAUpdater strategy during promotion that was LCSSA aware and reformed
// it as it went.		// it as it went.
if (Promoted)		if (Promoted)
▲ Show 20 Lines • Show All 1,745 Lines • ▼ Show 20 Lines	if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();		MSSAU->getMemorySSA()->verifyMemorySSA();
// If the SSAUpdater didn't use the load in the preheader, just zap it now.		// If the SSAUpdater didn't use the load in the preheader, just zap it now.
if (PreheaderLoad->use_empty())		if (PreheaderLoad->use_empty())
eraseInstruction(PreheaderLoad, SafetyInfo, CurAST, MSSAU);		eraseInstruction(PreheaderLoad, SafetyInfo, CurAST, MSSAU);

return true;		return true;
}		}

		static void foreachMemoryAccess(MemorySSA MSSA, Loop L,
		function_ref<void(Instruction *)> Fn) {
		for (const BasicBlock *BB : L->blocks())
		if (const auto *Accesses = MSSA->getBlockAccesses(BB))
		for (const auto &Access : *Accesses)
		if (const auto *MUD = dyn_cast<MemoryUseOrDef>(&Access))
		Fn(MUD->getMemoryInst());
		}

		static SmallVector<SmallSetVector<Value *, 8>, 0>
		collectPromotionCandidates(MemorySSA MSSA, AliasAnalysis AA, Loop *L,
		SmallVectorImpl<Instruction *> &MaybePromotable) {
		AliasSetTracker AST(*AA);

		auto IsPotentiallyPromotable = [L](const Instruction *I) {
		if (const auto *SI = dyn_cast<StoreInst>(I))
		return L->isLoopInvariant(SI->getPointerOperand());
		if (const auto *LI = dyn_cast<LoadInst>(I))
		return L->isLoopInvariant(LI->getPointerOperand());
		return false;
		};

		// Populate AST with potentially promotable accesses and remove them from
		// MaybePromotable, so they will not be checked again on the next iteration.
		SmallPtrSet<Value *, 16> AttemptingPromotion;
		llvm::erase_if(MaybePromotable, [&](Instruction *I) {
		if (IsPotentiallyPromotable(I)) {
		AttemptingPromotion.insert(I);
		AST.add(I);
		return true;
		}
		return false;
		});

		asbirleaUnsubmitted Not Done Reply Inline Actions A quick exit check for promotion is whether the AST is saturated. Probably need to add the API for that inside the AliasSetTracker. bool isSaturated(){ return AliasAnyAS != nullptr; } if (AST.isSaturated()) return {}; // Nothing to promote... asbirlea: A quick exit check for promotion is whether the AST is saturated. Probably need to add the API…
		nikicAuthorUnsubmitted Done Reply Inline Actions If I'm understanding how this works correctly, won't the "saturated" case be handled automatically? If the AST is saturated, then there will only be a single may-alias set, so the loop below will only iterate once and return afterwards. nikic: If I'm understanding how this works correctly, won't the "saturated" case be handled…
		asbirleaUnsubmitted Not Done Reply Inline Actions Yes, you're right, this is good. asbirlea: Yes, you're right, this is good.
		// We're only interested in must-alias sets that contain a mod.
		SmallVector<const AliasSet *, 8> Sets;
		for (AliasSet &AS : AST)
		if (!AS.isForwardingAliasSet() && AS.isMod() && AS.isMustAlias())
		Sets.push_back(&AS);

		if (Sets.empty())
		return {}; // Nothing to promote...

		// Discard any sets for which there is an aliasing non-promotable access.
		foreachMemoryAccess(MSSA, L, [&](Instruction *I) {
		if (AttemptingPromotion.contains(I))
		return;

		llvm::erase_if(Sets, [&](const AliasSet *AS) {
		return AS->aliasesUnknownInst(I, *AA);
		});
		});

		SmallVector<SmallSetVector<Value *, 8>, 0> Result;
		for (const AliasSet *Set : Sets) {
		SmallSetVector<Value *, 8> PointerMustAliases;
		for (const auto &ASI : *Set)
		PointerMustAliases.insert(ASI.getValue());
		Result.push_back(std::move(PointerMustAliases));
		}

		return Result;
		}

/// Returns an owning pointer to an alias set which incorporates aliasing info		/// Returns an owning pointer to an alias set which incorporates aliasing info
/// from L and all subloops of L.		/// from L and all subloops of L.
std::unique_ptr<AliasSetTracker>		std::unique_ptr<AliasSetTracker>
LoopInvariantCodeMotion::collectAliasInfoForLoop(Loop L, LoopInfo LI,		LoopInvariantCodeMotion::collectAliasInfoForLoop(Loop L, LoopInfo LI,
AAResults *AA) {		AAResults *AA) {
auto CurAST = std::make_unique<AliasSetTracker>(*AA);		auto CurAST = std::make_unique<AliasSetTracker>(*AA);

// Add everything from all the sub loops.		// Add everything from all the sub loops.
for (Loop *InnerL : L->getSubLoops())		for (Loop *InnerL : L->getSubLoops())
for (BasicBlock *BB : InnerL->blocks())		for (BasicBlock *BB : InnerL->blocks())
CurAST->add(*BB);		CurAST->add(*BB);

// And merge in this loop (without anything from inner loops).		// And merge in this loop (without anything from inner loops).
for (BasicBlock *BB : L->blocks())		for (BasicBlock *BB : L->blocks())
if (LI->getLoopFor(BB) == L)		if (LI->getLoopFor(BB) == L)
CurAST->add(*BB);		CurAST->add(*BB);

return CurAST;		return CurAST;
}		}

std::unique_ptr<AliasSetTracker>
LoopInvariantCodeMotion::collectAliasInfoForLoopWithMSSA(
Loop L, AAResults AA, MemorySSAUpdater *MSSAU) {
auto *MSSA = MSSAU->getMemorySSA();
auto CurAST = std::make_unique<AliasSetTracker>(*AA, MSSA, L);
CurAST->addAllInstructionsInLoopUsingMSSA();
asbirleaUnsubmitted Done Reply Inline Actions Remove the `addAllInstructionsInLoopUsingMSSA` API as well. asbirlea: Remove the `addAllInstructionsInLoopUsingMSSA` API as well.
return CurAST;
}

static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,		static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,
AliasSetTracker CurAST, Loop CurLoop,		AliasSetTracker CurAST, Loop CurLoop,
AAResults *AA) {		AAResults *AA) {
// First check to see if any of the basic blocks in CurLoop invalidate *V.		// First check to see if any of the basic blocks in CurLoop invalidate *V.
bool isInvalidatedAccordingToAST = CurAST->getAliasSetFor(MemLoc).isMod();		bool isInvalidatedAccordingToAST = CurAST->getAliasSetFor(MemLoc).isMod();

if (!isInvalidatedAccordingToAST \|\| !LICMN2Theshold)		if (!isInvalidatedAccordingToAST \|\| !LICMN2Theshold)
return isInvalidatedAccordingToAST;		return isInvalidatedAccordingToAST;
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/promote-atomic.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -licm < %s \| FileCheck %s

				%class.LiveThread = type { i64, %class.LiveThread* }

				@globallive = external dso_local global i64, align 8

				; The store should not be sunk (via scalar promotion) past the cmpxchg.

				define void @test(%class.LiveThread* %live_thread) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: [[NEXT_UNPROCESSED_:%.]] = getelementptr inbounds [[CLASS_LIVETHREAD:%.]], %class.LiveThread* [[LIVE_THREAD:%.*]], i64 0, i32 1
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: store %class.LiveThread* undef, %class.LiveThread** [[NEXT_UNPROCESSED_]], align 8
				; CHECK-NEXT: [[XCHG:%.]] = cmpxchg weak i64 @globallive, i64 undef, i64 undef release monotonic, align 8
				; CHECK-NEXT: [[DONE:%.*]] = extractvalue { i64, i1 } [[XCHG]], 1
				; CHECK-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[LOOP]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				%next_unprocessed_ = getelementptr inbounds %class.LiveThread, %class.LiveThread* %live_thread, i64 0, i32 1
				br label %loop

				loop:
				store %class.LiveThread* undef, %class.LiveThread** %next_unprocessed_, align 8
				%xchg = cmpxchg weak i64* @globallive, i64 undef, i64 undef release monotonic, align 8
				%done = extractvalue { i64, i1 } %xchg, 1
				br i1 %done, label %exit, label %loop

				exit:
				ret void
				}