This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Promote conditional, loop-invariant memory accesses to scalars
Needs ReviewPublic

Authored by dmilosevic141 on Dec 7 2021, 5:36 AM.

Download Raw Diff

Details

Reviewers

djtodoro
reames
efriedma

Summary

Promotion of conditional accesses breaks the safety property, which divides into two parts:

The memory may not be dereferencable on loop entry. In this case, we cannot hoist load instructions into the preheader basic block.
The memory model does not allow us to insert a store along any dynamic path which did not originally have one.

The LICM pass, as of D113289, hoists load instructions which are not guaranteed to execute into the preheader basic block, if the memory proves to be dereferencable on loop entry. Note that this change does not sink the store instructions which are not guaranteed to execute into the exit blocks.
Sinking the store instructions, which are not guaranteed to execute, directly breaks the second part of the safety property mentioned above. To be more precise, sinking conditional store instructions may introduce data races/traps. In order to make it thread-safe, we need to make sure the sunken store instructions are executed conditionally, depending on whether or not the value was written to.
Consider the following source code:

int u;

void f(int a[restrict], int n)
{
	for (int i = 0; i < n; ++i)
		if (a[i])
			++u;
}

This patch allows the LICM pass, in order for it to promote the conditional access of u (which would include hoisting the load instruction and sinking the store instruction), to insert a simple flag, which is initially (in the preheader) down. Whenever the control flow gets to the conditional store to u, the flag is raised. Finally, the sunken store instruction is executed conditionally (using the llvm.masked.store intrinsic) in the exit blocks, depending on whether or not the flag was raised.
For the source code given above, the following pseudo LLVM IR code (only the relevant parts are shown) corresponds to the actual LLVM IR code the LICM, now, generates:

...
entry:
  u.promoted = load u
  br for.cond
for.cond:
  u.flag = phi [ 0, entry ], [ u.flag.next, for.inc ] ; Note that the flag is down, if we get here through the preheader basic block.
  inc = phi [ u.promoted, entry ], [ inc.next, for.inc ]
  ...
for.body: 
...
if.then:
  inc.if.then = add inc, 1
  br for.inc
for.inc:
  u.flag.next = phi [ 1, if.then ], [ u.flag, for.body ] ; Note that the flag is raised, if we get here through the if.then basic block.
  inc.next = phi [ inc.if.then, if.then ], [ inc, for.body ]
  ...
for.cond.cleanup: ; This is the only exit block.
  u.flag.lcssa = phi [ u.flag, for.cond ] ; Get the flag value.
  inc.lcssa = phi [ inc, for.cond ]
  call void @llvm.masked.store(<1 x i32> [inc.lcssa], <1 x i32>* [u], i32 <alignment>, <1 x i1> [u.flag.lcssa])
  ret void
}
...

This patch addresses proposed potential optimizations from: Missed opportunities for register promotion.

Diff Detail

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > LLVM.CodeGen/NVPTX::wmma.py
	60,060 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,070 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,130 ms	x64 debian > libFuzzer.libFuzzer::large.test
	60,070 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
		View Full Test Results (7 Failed)

Event Timeline

dmilosevic141 created this revision.Dec 7 2021, 5:36 AM

Herald added subscribers: asbirlea, hiraditya. · View Herald TranscriptDec 7 2021, 5:36 AM

dmilosevic141 requested review of this revision.Dec 7 2021, 5:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2021, 5:36 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

djtodoro added a reviewer: reames.Dec 7 2021, 5:40 AM

djtodoro added a subscriber: petarj.

Thanks for working on this! This looks like an addition to https://reviews.llvm.org/D113289.

Can you please simplify the summary (e.g. the IR code could be written in a pseudo IR) ?

llvm/lib/Transforms/Scalar/LICM.cpp
147	IIUC, this should be target-independent optimization? And, GCC generates such code for both x86_64 and aarch64, right? The generated code will be bigger, so this will impact the code size, so I guess this needs motivation in terms of (SPEC) benchmarking.
1929	This is comment is redundant.
llvm/test/Transforms/LICM/conditional-access-promotion.ll
14–17	not needed
89–107	I guess we don't need these.

djtodoro added a reviewer: efriedma.Dec 7 2021, 5:50 AM

dmilosevic141 edited the summary of this revision. (Show Details)Dec 7 2021, 5:54 AM

Harbormaster completed remote builds in B137886: Diff 392345.Dec 7 2021, 6:24 AM

Thanks @djtodoro !
I've updated the following:

Simplified the summary example.
Removed the unnecessary comment from the LICM.cpp file, as well as unnecessary attributes and metadata from the conditional-access-promotion.ll file.

dmilosevic141 marked 3 inline comments as done.Dec 7 2021, 6:36 AM

dmilosevic141 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
147	IIUC, this should be target-independent optimization? And, GCC generates such code for both x86_64 and aarch64, right? Correct, GCC (excluding the Ofast optimization level, which allows data races) generates such code for both x86_64 and AArch64, hence why this should be target-independent. The generated code will be bigger, so this will impact the code size, so I guess this needs motivation in terms of (SPEC) benchmarking. Sure, will work on that.

Harbormaster completed remote builds in B137894: Diff 392376.Dec 7 2021, 7:14 AM

ntesic added a subscriber: ntesic.Dec 7 2021, 8:00 AM

conditional store promotion is one of those transforms which I've been thinking about for a long time, and have never quite felt safe implementing.

My high level concern here is profitability. It's unclear to me that inserting a flag iv, and the conditional store outside the loop is generally profitable. It's clearly profitable in many cases, but I'm a bit hesitant on whether it makes sense to do as a canonicalization. I'd love to see some data on the impact of this.

Structure wise, I strongly encourage you to use predicated stores not CFG manipulation. LICM does not generally modify the CFG; we can, but the analysis updates are complicated. (See the diamond hoisting code.) I generally think that a predicate store instruction is the "right" construct here as it leads to more obvious optimization outcomes.

However, our masked.load handling for scalar (e.g. single value vectors) leaves a bit to be desired. I'd started improving it a bit (with this goal in mind), but if you're serious about this, you'd need to spend some time working through related issues there first.

llvm/lib/Transforms/Scalar/LICM.cpp
1894	See macro comment.
2014	The abnormal exit property applies to classic store promotion too. This is either redundant, or you have some bug you should discuss and fix separately.
2065	Stray change?
2091	This was already checked above.
2093	This set does not make sense. A store only needs to be condition if we can't find any store to the address which doesn't dominate all exits. You should only need to track some global state here, not anything store specific.
2219	Please use i1 here, not i8.

Matt added a subscriber: Matt.Jan 7 2022, 7:33 AM

Thanks @reames! Sorry for the delayed response, I haven't been able to fully focus on this topic in the past couple of months.

My high level concern here is profitability. It's unclear to me that inserting a flag iv, and the conditional store outside the loop is generally profitable. It's clearly profitable in many cases, but I'm a bit hesitant on whether it makes sense to do as a canonicalization. I'd love to see some data on the impact of this.

Definetly, hopefully I'll be able to provide the data soon.

Structure wise, I strongly encourage you to use predicated stores not CFG manipulation. LICM does not generally modify the CFG; we can, but the analysis updates are complicated. (See the diamond hoisting code.) I generally think that a predicate store instruction is the "right" construct here as it leads to more obvious optimization outcomes.

I haven't come across the predicated store instructions yet. Based on the first look, they definetly fit the needs better than the CFG manipulation, thanks for the directions.
Here's my vision for using them, please let me know your thoughts:

Hoisting load instructions into the preheader would still stay the same (using regular load instructions).
An additional flag would still be needed. It'd be used (e.g. initialized in the preheader basic block, its value propagated through the basic blocks) the same way.
Sinking the conditional store instructions into the exit block(s) with CFG manipulation would be replaced with an appropriate predicated store instructions sunk into the exit block(s).

The summary example would then look like this:

...
entry:
  u.promoted = load u
  br for.cond
for.cond:
  u.flag = phi [ false, entry ], [ u.flag.next, for.inc ]
  inc = phi [ u.promoted, entry ], [ inc.next, for.inc ]
  ...
for.body: 
...
if.then:
  inc.if.then = add inc, 1
  br for.inc
for.inc:
  u.flag.next = phi [ true, if.then ], [ u.flag, for.body ]
  inc.next = phi [ inc.if.then, if.then ], [ inc, for.body ]
  ...
for.cond.cleanup: ; This is the only exit block.
  u.flag.lcssa = phi [ u.flag, for.cond ] ; Get the flag value.
  inc.lcssa = phi [ inc, for.cond ]
  call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>> &u, i32 <u_alignment>, <1xi1> u.flag.lcssa)
  ret void
}
...

I'll also take some more time diving deeper into the predicated instructions.

However, our masked.load handling for scalar (e.g. single value vectors) leaves a bit to be desired. I'd started improving it a bit (with this goal in mind), but if you're serious about this, you'd need to spend some time working through related issues there first.

I'm guessing the same goes for the masked.store.* intrinsics, so

call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>>* &u, i32 <u_alignment>, <1xi1> u.flag.lcssa)

should be swapped with something more appropriate for scalar values?
To summarize, I've rebased and fixed a few things @reames pointed out.

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 12:56 AM

dmilosevic141 added inline comments.Mar 31 2022, 12:57 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2014	Thanks for pointing this out! I implemented the loopHasNoAbnormalExits function because I was unsure whether the LICM pass makes sure there are no abnormal loop exits before it does 'classic' (non-conditional) promotions. There was no difference before and after calling this function (i.e. no bugs which this call 'fixes'), except for the (now) unnecessary overhead.
2065	Just something that I needed for the set that I was using. Reverting this back.
2091	Indeed, thanks!
2093	This set was used to propagate the value of the flag through the basic blocks, via the SSAUpdater's interface. For that purpose, we already had the LoopUses set, so this set was redundant. Thanks!
2219	Thanks!

Harbormaster completed remote builds in B157115: Diff 419349.Mar 31 2022, 9:21 AM

@dmilosevic141, your last comment was spot on. I do worry I might have confused you on one point though. When I said "predicated store" I had meant the llvm.masked.store intrinsic. We have another family of vector predicated intrinsics, but they're a lot less mature at the moment. I'm not exactly sure what their status is.

Thanks @reames!
I took some time to dive deeper into the predicated instructions. So far, two things stand out to me:

The ScalarizeMaskedMemIntrin pass, which is a standard part of the CodeGen pipeline. This pass translates the masked memory intrinsics into chains of basic blocks, by loading/storing elements one-by-one. For single-value vectors, handling looks pretty much identical to the initial version of the patch. Scalarization of the masked memory intrinsics is, however, target-dependent (e.g. for x86, the scalarization of calls with single-value vectors passed in as arguments does occur; for RISC-V, it does not).
The Vector Predication Roadmap (Vector Predication Roadmap).

Having said that, I'm a little bit stuck in regards to the next steps towards better handling of single-value vectors, as I'm not sure that the ScalarieMaskedMemIntrin pass is *the* place for that. The Vector Predication Roadmap is a little bit overwhelming - i.e. I'm not sure which part of it is related to the better handling of single-value vectors. If you (or anyone else) have something concrete to point me to, I'd be really thankful. :)
I've updated the patch to use the llvm.masked.store intrinsic, instead of the complicated CFG manipulation.

Harbormaster completed remote builds in B163444: Diff 428010.May 9 2022, 3:02 AM

nikic mentioned this in D130466: [LICM] - Add option to force thread model single.Jul 25 2022, 2:07 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LICM.cpp

114 lines

test/

Transforms/

LICM/

conditional-access-promotion.ll

75 lines

Diff 428010

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/PredIteratorCache.h"		#include "llvm/IR/PredIteratorCache.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
// compile time.		// compile time.
cl::opt<unsigned> llvm::SetLicmMssaNoAccForPromotionCap(		cl::opt<unsigned> llvm::SetLicmMssaNoAccForPromotionCap(
"licm-mssa-max-acc-promotion", cl::init(250), cl::Hidden,		"licm-mssa-max-acc-promotion", cl::init(250), cl::Hidden,
cl::desc("[LICM & MemorySSA] When MSSA in LICM is disabled, this has no "		cl::desc("[LICM & MemorySSA] When MSSA in LICM is disabled, this has no "
"effect. When MSSA in LICM is enabled, then this is the maximum "		"effect. When MSSA in LICM is enabled, then this is the maximum "
"number of accesses allowed to be present in a loop in order to "		"number of accesses allowed to be present in a loop in order to "
"enable memory promotion."));		"enable memory promotion."));

		// Experimental option which allows the LICM pass to promote conditional
		// accesses of loop-invariant locations.
		// The load instructions will be hoisted into the preheader, while the store
		// instructions will be sunk into the exit blocks, where they will be executed
		// conditionally, depending on whether or not the control flow actually got to
		// them.
		cl::opt<bool> SetLicmConditionalAccessPromotion(
		djtodoroUnsubmitted Not Done Reply Inline Actions IIUC, this should be target-independent optimization? And, GCC generates such code for both x86_64 and aarch64, right? The generated code will be bigger, so this will impact the code size, so I guess this needs motivation in terms of (SPEC) benchmarking. djtodoro: IIUC, this should be target-independent optimization? And, GCC generates such code for both…
		dmilosevic141AuthorUnsubmitted Not Done Reply Inline Actions IIUC, this should be target-independent optimization? And, GCC generates such code for both x86_64 and aarch64, right? Correct, GCC (excluding the Ofast optimization level, which allows data races) generates such code for both x86_64 and AArch64, hence why this should be target-independent. The generated code will be bigger, so this will impact the code size, so I guess this needs motivation in terms of (SPEC) benchmarking. Sure, will work on that. dmilosevic141: > IIUC, this should be target-independent optimization? And, GCC generates such code for both…
		"licm-conditional-access-promotion", cl::Hidden, cl::init(false),
		cl::desc("Enable promotion of conditional accesses of loop-invariant "
		"locations"));

static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);
static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,		static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
TargetTransformInfo *TTI, bool &FreeInLoop,		TargetTransformInfo *TTI, bool &FreeInLoop,
bool LoopNestMode);		bool LoopNestMode);
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
MemorySSAUpdater &MSSAU, ScalarEvolution *SE,		MemorySSAUpdater &MSSAU, ScalarEvolution *SE,
▲ Show 20 Lines • Show All 1,626 Lines • ▼ Show 20 Lines	class LoopPromoter : public LoadAndStorePromoter {
MemorySSAUpdater &MSSAU;		MemorySSAUpdater &MSSAU;
LoopInfo &LI;		LoopInfo &LI;
DebugLoc DL;		DebugLoc DL;
Align Alignment;		Align Alignment;
bool UnorderedAtomic;		bool UnorderedAtomic;
AAMDNodes AATags;		AAMDNodes AATags;
ICFLoopSafetyInfo &SafetyInfo;		ICFLoopSafetyInfo &SafetyInfo;
bool CanInsertStoresInExitBlocks;		bool CanInsertStoresInExitBlocks;
		bool PromoteConditionalAccesses;
		// This flag will be used to make sure that every sinken, conditional store
		// instruction is executed conditionally within the exit blocks. In the
		// preheader, it is initialized to 0. In every basic block containing a
		// conditional store it is raised.
		SSAUpdater *FlagSSAUpdater;

// We're about to add a use of V in a loop exit block. Insert an LCSSA phi		// We're about to add a use of V in a loop exit block. Insert an LCSSA phi
// (if legal) if doing so would add an out-of-loop use to an instruction		// (if legal) if doing so would add an out-of-loop use to an instruction
// defined in-loop.		// defined in-loop.
Value maybeInsertLCSSAPHI(Value V, BasicBlock *BB) const {		Value maybeInsertLCSSAPHI(Value V, BasicBlock *BB) const {
if (!LI.wouldBeOutOfLoopUseRequiringLCSSA(V, BB))		if (!LI.wouldBeOutOfLoopUseRequiringLCSSA(V, BB))
return V;		return V;

Show All 10 Lines
public:		public:
LoopPromoter(Value SP, ArrayRef<const Instruction > Insts, SSAUpdater &S,		LoopPromoter(Value SP, ArrayRef<const Instruction > Insts, SSAUpdater &S,
const SmallSetVector<Value *, 8> &PMA,		const SmallSetVector<Value *, 8> &PMA,
SmallVectorImpl<BasicBlock *> &LEB,		SmallVectorImpl<BasicBlock *> &LEB,
SmallVectorImpl<Instruction *> &LIP,		SmallVectorImpl<Instruction *> &LIP,
SmallVectorImpl<MemoryAccess *> &MSSAIP, PredIteratorCache &PIC,		SmallVectorImpl<MemoryAccess *> &MSSAIP, PredIteratorCache &PIC,
MemorySSAUpdater &MSSAU, LoopInfo &li, DebugLoc dl,		MemorySSAUpdater &MSSAU, LoopInfo &li, DebugLoc dl,
Align Alignment, bool UnorderedAtomic, const AAMDNodes &AATags,		Align Alignment, bool UnorderedAtomic, const AAMDNodes &AATags,
ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks)		ICFLoopSafetyInfo &SafetyInfo, bool CanInsertStoresInExitBlocks,
		bool PromoteConditionalAccesses, SSAUpdater *FlagSSAUpdater)
: LoadAndStorePromoter(Insts, S), SomePtr(SP), PointerMustAliases(PMA),		: LoadAndStorePromoter(Insts, S), SomePtr(SP), PointerMustAliases(PMA),
LoopExitBlocks(LEB), LoopInsertPts(LIP), MSSAInsertPts(MSSAIP),		LoopExitBlocks(LEB), LoopInsertPts(LIP), MSSAInsertPts(MSSAIP),
PredCache(PIC), MSSAU(MSSAU), LI(li), DL(std::move(dl)),		PredCache(PIC), MSSAU(MSSAU), LI(li), DL(std::move(dl)),
Alignment(Alignment), UnorderedAtomic(UnorderedAtomic), AATags(AATags),		Alignment(Alignment), UnorderedAtomic(UnorderedAtomic), AATags(AATags),
SafetyInfo(SafetyInfo),		SafetyInfo(SafetyInfo),
CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks) {}		CanInsertStoresInExitBlocks(CanInsertStoresInExitBlocks),
		PromoteConditionalAccesses(PromoteConditionalAccesses),
		FlagSSAUpdater(FlagSSAUpdater) {}

bool isInstInList(Instruction *I,		bool isInstInList(Instruction *I,
const SmallVectorImpl<Instruction *> &) const override {		const SmallVectorImpl<Instruction *> &) const override {
Value *Ptr;		Value *Ptr;
if (LoadInst *LI = dyn_cast<LoadInst>(I))		if (LoadInst *LI = dyn_cast<LoadInst>(I))
Ptr = LI->getOperand(0);		Ptr = LI->getOperand(0);
else		else
Ptr = cast<StoreInst>(I)->getPointerOperand();		Ptr = cast<StoreInst>(I)->getPointerOperand();
return PointerMustAliases.count(Ptr);		return PointerMustAliases.count(Ptr);
}		}

void insertStoresInLoopExitBlocks() {		void insertStoresInLoopExitBlocks() {
// Insert stores after in the loop exit blocks. Each exit block gets a		// Insert stores after in the loop exit blocks. Each exit block gets a
// store of the live-out values that feed them. Since we've already told		// store of the live-out values that feed them. Since we've already told
// the SSA updater about the defs in the loop and the preheader		// the SSA updater about the defs in the loop and the preheader
// definition, it is all set and we can start using it.		// definition, it is all set and we can start using it.
for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) {		for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) {
BasicBlock *ExitBlock = LoopExitBlocks[i];		BasicBlock *ExitBlock = LoopExitBlocks[i];
Value *LiveInValue = SSA.GetValueInMiddleOfBlock(ExitBlock);		Value *LiveInValue = SSA.GetValueInMiddleOfBlock(ExitBlock);
LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock);		LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock);
Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock);		Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock);
Instruction *InsertPos = LoopInsertPts[i];		Instruction *InsertPos = LoopInsertPts[i];
		if (!PromoteConditionalAccesses) {
StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos);		StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos);
if (UnorderedAtomic)		if (UnorderedAtomic)
NewSI->setOrdering(AtomicOrdering::Unordered);		NewSI->setOrdering(AtomicOrdering::Unordered);
NewSI->setAlignment(Alignment);		NewSI->setAlignment(Alignment);
NewSI->setDebugLoc(DL);		NewSI->setDebugLoc(DL);
if (AATags)		if (AATags)
NewSI->setAAMetadata(AATags);		NewSI->setAAMetadata(AATags);

MemoryAccess *MSSAInsertPoint = MSSAInsertPts[i];		MemoryAccess *MSSAInsertPoint = MSSAInsertPts[i];
MemoryAccess *NewMemAcc;		MemoryAccess *NewMemAcc;
if (!MSSAInsertPoint) {		if (!MSSAInsertPoint) {
NewMemAcc = MSSAU.createMemoryAccessInBB(		NewMemAcc = MSSAU.createMemoryAccessInBB(
NewSI, nullptr, NewSI->getParent(), MemorySSA::Beginning);		NewSI, nullptr, NewSI->getParent(), MemorySSA::Beginning);
} else {		} else {
NewMemAcc =		NewMemAcc =
MSSAU.createMemoryAccessAfter(NewSI, nullptr, MSSAInsertPoint);		MSSAU.createMemoryAccessAfter(NewSI, nullptr, MSSAInsertPoint);
}		}
MSSAInsertPts[i] = NewMemAcc;		MSSAInsertPts[i] = NewMemAcc;
MSSAU.insertDef(cast<MemoryDef>(NewMemAcc), true);		MSSAU.insertDef(cast<MemoryDef>(NewMemAcc), true);
// FIXME: true for safety, false may still be correct.		// FIXME: true for safety, false may still be correct.
		} else {
		Value *FlagValue = FlagSSAUpdater->GetValueInMiddleOfBlock(ExitBlock);
		IRBuilder<> Builder(InsertPos);
		Type *LiveInValueType = LiveInValue->getType();
		Type *DataType = VectorType::get(LiveInValueType, 1, false);
		Value *V = UndefValue::get(DataType);
		V = Builder.CreateInsertElement(V, LiveInValue, uint64_t(0));
		Value *P =
		Builder.CreatePointerCast(Ptr, PointerType::getUnqual(DataType));
		Type *MaskType =
		VectorType::get(Type::getInt1Ty(ExitBlock->getContext()), 1, false);
		Value *M = UndefValue::get(MaskType);
		M = Builder.CreateInsertElement(M, FlagValue, uint64_t(0));
		Builder.CreateMaskedStore(V, P, Alignment, M);
		}
}		}
		reamesUnsubmitted Not Done Reply Inline Actions See macro comment. reames: See macro comment.
}		}

void doExtraRewritesBeforeFinalDeletion() override {		void doExtraRewritesBeforeFinalDeletion() override {
if (CanInsertStoresInExitBlocks)		if (CanInsertStoresInExitBlocks)
insertStoresInLoopExitBlocks();		insertStoresInLoopExitBlocks();
}		}

void instructionDeleted(Instruction *I) const override {		void instructionDeleted(Instruction *I) const override {
Show All 18 Lines	return !PointerMayBeCapturedBefore(V, /* ReturnCaptures */ true,
/* StoreCaptures */ true,		/* StoreCaptures */ true,
L->getHeader()->getTerminator(), DT);		L->getHeader()->getTerminator(), DT);
}		}

/// Return true if we can prove that a caller cannot inspect the object if an		/// Return true if we can prove that a caller cannot inspect the object if an
/// unwind occurs inside the loop.		/// unwind occurs inside the loop.
bool isNotVisibleOnUnwindInLoop(const Value Object, const Loop L,		bool isNotVisibleOnUnwindInLoop(const Value Object, const Loop L,
DominatorTree *DT) {		DominatorTree *DT) {
bool RequiresNoCaptureBeforeUnwind;		bool RequiresNoCaptureBeforeUnwind;
		djtodoroUnsubmitted Done Reply Inline Actions This is comment is redundant. djtodoro: This is comment is redundant.
if (!isNotVisibleOnUnwind(Object, RequiresNoCaptureBeforeUnwind))		if (!isNotVisibleOnUnwind(Object, RequiresNoCaptureBeforeUnwind))
return false;		return false;

return !RequiresNoCaptureBeforeUnwind \|\|		return !RequiresNoCaptureBeforeUnwind \|\|
isNotCapturedBeforeOrInLoop(Object, L, DT);		isNotCapturedBeforeOrInLoop(Object, L, DT);
}		}

} // namespace		} // namespace
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bool llvm::promoteLoopAccessesToScalars(
// We start with an alignment of one and try to find instructions that allow		// We start with an alignment of one and try to find instructions that allow
// us to prove better alignment.		// us to prove better alignment.
Align Alignment;		Align Alignment;
// Keep track of which types of access we see		// Keep track of which types of access we see
bool SawUnorderedAtomic = false;		bool SawUnorderedAtomic = false;
bool SawNotAtomic = false;		bool SawNotAtomic = false;
AAMDNodes AATags;		AAMDNodes AATags;

		bool SawConditionalLIStore = false;
		StringRef PointerOperandName;

const DataLayout &MDL = Preheader->getModule()->getDataLayout();		const DataLayout &MDL = Preheader->getModule()->getDataLayout();
		reamesUnsubmitted Not Done Reply Inline Actions The abnormal exit property applies to classic store promotion too. This is either redundant, or you have some bug you should discuss and fix separately. reames: The abnormal exit property applies to classic store promotion too. This is either redundant…
		dmilosevic141AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing this out! I implemented the loopHasNoAbnormalExits function because I was unsure whether the LICM pass makes sure there are no abnormal loop exits before it does 'classic' (non-conditional) promotions. There was no difference before and after calling this function (i.e. no bugs which this call 'fixes'), except for the (now) unnecessary overhead. dmilosevic141: Thanks for pointing this out! I implemented the //loopHasNoAbnormalExits// function because I…

bool IsKnownThreadLocalObject = false;		bool IsKnownThreadLocalObject = false;
if (SafetyInfo->anyBlockMayThrow()) {		if (SafetyInfo->anyBlockMayThrow()) {
// If a loop can throw, we have to insert a store along each unwind edge.		// If a loop can throw, we have to insert a store along each unwind edge.
// That said, we can't actually make the unwind edge explicit. Therefore,		// That said, we can't actually make the unwind edge explicit. Therefore,
// we have to prove that the store is dead along the unwind edge. We do		// we have to prove that the store is dead along the unwind edge. We do
// this by proving that the caller can't have a reference to the object		// this by proving that the caller can't have a reference to the object
// after return and thus can't possibly load from the object.		// after return and thus can't possibly load from the object.
Show All 34 Lines	for (Use &U : ASIV->uses()) {
// alignment as well.		// alignment as well.
if (!DereferenceableInPH \|\| (InstAlignment > Alignment))		if (!DereferenceableInPH \|\| (InstAlignment > Alignment))
if (isSafeToExecuteUnconditionally(		if (isSafeToExecuteUnconditionally(
*Load, DT, TLI, CurLoop, SafetyInfo, ORE,		*Load, DT, TLI, CurLoop, SafetyInfo, ORE,
Preheader->getTerminator(), AllowSpeculation)) {		Preheader->getTerminator(), AllowSpeculation)) {
DereferenceableInPH = true;		DereferenceableInPH = true;
Alignment = std::max(Alignment, InstAlignment);		Alignment = std::max(Alignment, InstAlignment);
}		}
} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {		} else if (const StoreInst *Store = dyn_cast<StoreInst>(UI)) {
		reamesUnsubmitted Done Reply Inline Actions Stray change? reames: Stray change?
		dmilosevic141AuthorUnsubmitted Not Done Reply Inline Actions Just something that I needed for the set that I was using. Reverting this back. dmilosevic141: Just something that I needed for the set that I was using. Reverting this back.
// Stores of the pointer are not interesting, only stores to the		// Stores of the pointer are not interesting, only stores to the
// pointer.		// pointer.
if (U.getOperandNo() != StoreInst::getPointerOperandIndex())		if (U.getOperandNo() != StoreInst::getPointerOperandIndex())
continue;		continue;
if (!Store->isUnordered())		if (!Store->isUnordered())
return false;		return false;

SawUnorderedAtomic \|= Store->isAtomic();		SawUnorderedAtomic \|= Store->isAtomic();
SawNotAtomic \|= !Store->isAtomic();		SawNotAtomic \|= !Store->isAtomic();

// If the store is guaranteed to execute, both properties are satisfied.		// If the store is guaranteed to execute, both properties are satisfied.
// We may want to check if a store is guaranteed to execute even if we		// We may want to check if a store is guaranteed to execute even if we
// already know that promotion is safe, since it may have higher		// already know that promotion is safe, since it may have higher
// alignment than any other guaranteed stores, in which case we can		// alignment than any other guaranteed stores, in which case we can
// raise the alignment on the promoted store.		// raise the alignment on the promoted store.
Align InstAlignment = Store->getAlign();		Align InstAlignment = Store->getAlign();

if (!DereferenceableInPH \|\| !SafeToInsertStore \|\|		if (!DereferenceableInPH \|\| !SafeToInsertStore \|\|
(InstAlignment > Alignment)) {		(InstAlignment > Alignment)) {
if (SafetyInfo->isGuaranteedToExecute(*UI, DT, CurLoop)) {		if (SafetyInfo->isGuaranteedToExecute(*UI, DT, CurLoop)) {
DereferenceableInPH = true;		DereferenceableInPH = true;
SafeToInsertStore = true;		SafeToInsertStore = true;
Alignment = std::max(Alignment, InstAlignment);		Alignment = std::max(Alignment, InstAlignment);
		} else if (SetLicmConditionalAccessPromotion &&
		(!SawConditionalLIStore \|\| (InstAlignment > Alignment))) {
		SawConditionalLIStore = true;
		reamesUnsubmitted Not Done Reply Inline Actions This was already checked above. reames: This was already checked above.
		dmilosevic141AuthorUnsubmitted Done Reply Inline Actions Indeed, thanks! dmilosevic141: Indeed, thanks!
		if (PointerOperandName.empty())
		PointerOperandName = Store->getPointerOperand()->getName();
		reamesUnsubmitted Done Reply Inline Actions This set does not make sense. A store only needs to be condition if we can't find any store to the address which doesn't dominate all exits. You should only need to track some global state here, not anything store specific. reames: This set does not make sense. A store only needs to be condition if we can't find any store to…
		dmilosevic141AuthorUnsubmitted Done Reply Inline Actions This set was used to propagate the value of the flag through the basic blocks, via the SSAUpdater's interface. For that purpose, we already had the LoopUses set, so this set was redundant. Thanks! dmilosevic141: This set was used to propagate the value of the flag through the basic blocks, via the…
		Alignment = std::max(Alignment, InstAlignment);
}		}
}		}

// If a store dominates all exit blocks, it is safe to sink.		// If a store dominates all exit blocks, it is safe to sink.
// As explained above, if an exit block was executed, a dominating		// As explained above, if an exit block was executed, a dominating
// store must have been executed at least once, so we are not		// store must have been executed at least once, so we are not
// introducing stores on paths that did not have them.		// introducing stores on paths that did not have them.
// Note that this only looks at explicit exit blocks. If we ever		// Note that this only looks at explicit exit blocks. If we ever
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	bool llvm::promoteLoopAccessesToScalars(
}		}

// If we've still failed to prove we can sink the store, hoist the load		// If we've still failed to prove we can sink the store, hoist the load
// only, if possible.		// only, if possible.
if (!SafeToInsertStore && !FoundLoadToPromote)		if (!SafeToInsertStore && !FoundLoadToPromote)
// If we cannot hoist the load either, give up.		// If we cannot hoist the load either, give up.
return false;		return false;

		const bool PromoteConditionalAccesses =
		SetLicmConditionalAccessPromotion && SawConditionalLIStore;
		SmallVector<PHINode *, 16> FlagPHIs;
		SSAUpdater FlagSSAUpdater(&FlagPHIs);
		SSAUpdater *FlagSSAUpdaterPtr = nullptr;
		if (!SafeToInsertStore && PromoteConditionalAccesses) {
		// There are only conditional store instructions to the location within the
		// loop.
		SafeToInsertStore = true;
		Type *Int1Ty = Type::getInt1Ty(Preheader->getParent()->getContext());
		FlagSSAUpdater.Initialize(Int1Ty, PointerOperandName.str() + ".flag");
		// Initialize the flag with 0 in the preheader.
		FlagSSAUpdater.AddAvailableValue(Preheader,
		ConstantInt::get(Int1Ty,
		/* Value */ 0));
		for (auto *UI : LoopUses)
		if (StoreInst *ConditionalLIStore = dyn_cast<StoreInst>(UI))
		// Raise the flag if a conditional store happened.
		FlagSSAUpdater.AddAvailableValue(ConditionalLIStore->getParent(),
		ConstantInt::get(Int1Ty,
		/* Value */ 1));
		FlagSSAUpdaterPtr = &FlagSSAUpdater;
		}

// Lets do the promotion!		// Lets do the promotion!
if (SafeToInsertStore)		if (SafeToInsertStore)
LLVM_DEBUG(dbgs() << "LICM: Promoting load/store of the value: " << *SomePtr		LLVM_DEBUG(dbgs() << "LICM: Promoting load/store of the value: " << *SomePtr
<< '\n');		<< '\n');
else		else
LLVM_DEBUG(dbgs() << "LICM: Promoting load of the value: " << *SomePtr		LLVM_DEBUG(dbgs() << "LICM: Promoting load of the value: " << *SomePtr
<< '\n');		<< '\n');

ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "PromoteLoopAccessesToScalar",		return OptimizationRemark(DEBUG_TYPE, "PromoteLoopAccessesToScalar",
LoopUses[0])		LoopUses[0])
<< "Moving accesses to memory location out of the loop";		<< "Moving accesses to memory location out of the loop";
});		});
++NumPromoted;		++NumPromoted;

// Look at all the loop uses, and try to merge their locations.		// Look at all the loop uses, and try to merge their locations.
std::vector<const DILocation *> LoopUsesLocs;		std::vector<const DILocation *> LoopUsesLocs;
for (auto U : LoopUses)		for (auto U : LoopUses)
LoopUsesLocs.push_back(U->getDebugLoc().get());		LoopUsesLocs.push_back(U->getDebugLoc().get());
auto DL = DebugLoc(DILocation::getMergedLocations(LoopUsesLocs));		auto DL = DebugLoc(DILocation::getMergedLocations(LoopUsesLocs));

// We use the SSAUpdater interface to insert phi nodes as required.		// We use the SSAUpdater interface to insert phi nodes as required.
		reamesUnsubmitted Not Done Reply Inline Actions Please use i1 here, not i8. reames: Please use i1 here, not i8.
		dmilosevic141AuthorUnsubmitted Done Reply Inline Actions Thanks! dmilosevic141: Thanks!
SmallVector<PHINode *, 16> NewPHIs;		SmallVector<PHINode *, 16> NewPHIs;
SSAUpdater SSA(&NewPHIs);		SSAUpdater SSA(&NewPHIs);
LoopPromoter Promoter(SomePtr, LoopUses, SSA, PointerMustAliases, ExitBlocks,		LoopPromoter Promoter(SomePtr, LoopUses, SSA, PointerMustAliases, ExitBlocks,
InsertPts, MSSAInsertPts, PIC, MSSAU, *LI, DL,		InsertPts, MSSAInsertPts, PIC, MSSAU, *LI, DL,
Alignment, SawUnorderedAtomic, AATags, *SafetyInfo,		Alignment, SawUnorderedAtomic, AATags, *SafetyInfo,
SafeToInsertStore);		SafeToInsertStore, PromoteConditionalAccesses,
		FlagSSAUpdaterPtr);
// Set up the preheader to have a definition of the value. It is the live-out		// Set up the preheader to have a definition of the value. It is the live-out
// value from the preheader that uses in the loop will use.		// value from the preheader that uses in the loop will use.
LoadInst *PreheaderLoad = new LoadInst(		LoadInst *PreheaderLoad = new LoadInst(
AccessTy, SomePtr, SomePtr->getName() + ".promoted",		AccessTy, SomePtr, SomePtr->getName() + ".promoted",
Preheader->getTerminator());		Preheader->getTerminator());
if (SawUnorderedAtomic)		if (SawUnorderedAtomic)
PreheaderLoad->setOrdering(AtomicOrdering::Unordered);		PreheaderLoad->setOrdering(AtomicOrdering::Unordered);
PreheaderLoad->setAlignment(Alignment);		PreheaderLoad->setAlignment(Alignment);
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/conditional-access-promotion.ll

This file was added.

				;; C reproducer:
				;; int u;
				;;
				;; void f(int a[restrict], int n)
				;; {
				;; for (int i = 0; i < n; ++i)
				;; if (a[i])
				;; ++u;
				;; }

				; RUN: opt -licm -licm-conditional-access-promotion -S < %s \| FileCheck %s

				@u = dso_local local_unnamed_addr global i32 0, align 4

				; CHECK-LABEL: @f
				define dso_local void @f(i32* noalias nocapture readonly %a, i32 %n) {

				djtodoroUnsubmitted Done Reply Inline Actions not needed djtodoro: not needed
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[U_PROMOTED:%.]] = load i32, i32 @u, align 4
				; CHECK-NEXT: br label [[FOR_COND:%.*]]
				entry:
				br label %for.cond

				; CHECK: for.cond:
				; CHECK-NEXT: [[U_FLAG4:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[U_FLAG:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[INC3:%.]] = phi i32 [ [[U_PROMOTED:%.]], [[ENTRY:%.]] ], [ [[INC2:%.]], [[FOR_INC:%.*]] ]
				; CHECK-NEXT: [[I_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC1:%.]], [[FOR_INC:%.]] ]
				; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I_0:%.]], [[N:%.*]]
				; CHECK-NEXT: br i1 [[CMP:%.]], label [[FOR_BODY:%.]], label [[FOR_COND_CLEANUP:%.*]]
				for.cond: ; preds = %for.inc, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc1, %for.inc ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.cond.cleanup

				; CHECK: for.body:
				; CHECK-NEXT: [[IDXPROM:%.]] = zext i32 [[I_0:%.]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.]], i64 [[IDXPROM:%.]]
				; CHECK-NEXT: %0 = load i32, i32* [[ARRAYIDX:%.*]], align 4
				; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 %0, 0
				; CHECK-NEXT: br i1 [[TOBOOL_NOT:%.]], label [[FOR_INC:%.]], label [[IF_THEN:%.*]]
				for.body: ; preds = %for.cond
				%idxprom = zext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %0, 0
				br i1 %tobool.not, label %for.inc, label %if.then

				; CHECK: if.then:
				; CHECK-NEXT: [[INC:%.]] = add nsw i32 [[INC3:%.]], 1
				; CHECK-NEXT: br label [[FOR_INC:%.*]]
				if.then: ; preds = %for.body
				%1 = load i32, i32* @u, align 4
				%inc = add nsw i32 %1, 1
				store i32 %inc, i32* @u, align 4
				br label %for.inc

				; CHECK: for.inc:
				; CHECK-NEXT: [[U_FLAG:%.]] = phi i1 [ true, [[IF_THEN:%.]] ], [ [[U_FLAG4:%.]], [[FOR_COND:%.]] ]
				; CHECK-NEXT: [[INC2:%.]] = phi i32 [ [[INC:%.]], [[IF_THEN:%.]] ], [ [[INC3:%.]], [[FOR_BODY:%.*]] ]
				; CHECK-NEXT: [[INC1:%.]] = add nuw nsw i32 [[I_0:%.]], 1
				; CHECK-NEXT: br label [[FOR_COND:%.*]]
				for.inc: ; preds = %for.body, %if.then
				%inc1 = add nuw nsw i32 %i.0, 1
				br label %for.cond

				; CHECK: for.cond.cleanup:
				; CHECK-NEXT: [[U_FLAG4_LCSSA:%.]] = phi i1 [ [[U_FLAG4:%.]], [[FOR_COND:%.*]] ]
				; CHECK-NEXT: [[INC3_LCSSA:%.]] = phi i32 [ [[INC3:%.]], [[FOR_COND:%.*]] ]
				; CHECK-NEXT: %1 = insertelement <1 x i32> undef, i32 [[INC3_LCSSA:%.*]], i64 0
				; CHECK-NEXT: %2 = insertelement <1 x i1> undef, i1 [[U_FLAG4_LCSSA:%.*]], i64 0
				; CHECK-NEXT: call void @llvm.masked.store.v1i32.p0v1i32(<1 x i32> %1, <1 x i32>* bitcast (i32* @u to <1 x i32>*), i32 4, <1 x i1> %2)
				; CHECK-NEXT: ret void
				for.cond.cleanup: ; preds = %for.cond
				ret void
				}
				djtodoroUnsubmitted Done Reply Inline Actions I guess we don't need these. djtodoro: I guess we don't need these.