This is an archive of the discontinued LLVM Phabricator instance.

[MergedLoadStoreMotion] Sink stores if they have common GEP
Needs ReviewPublic

Authored by dendibakh on Sep 5 2019, 11:03 AM.

Download Raw Diff

Details

Reviewers

bjope
davide
Gerolf
efriedma
lebedev.ri

Summary

If 2 stores in the diamond have common GEP:

bb1:
  %tmp = getelementptr inbounds i32, i32* %arg1, i64 1
  br i1 %arg2, label %bb2, label %bb3
bb2:                                              ; preds = %bb1
  store i32 42, i32* %tmp, align 4
  br label %bb4
bb3:                                              ; preds = %bb1
  store i32 42, i32* %tmp, align 4
  br label %bb4
bb4:                                              ; preds = %bb2, %bb3
  ret void

Stores will be sunk into %bb4.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dendibakh created this revision.Sep 5 2019, 11:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2019, 11:03 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

dendibakh edited the summary of this revision. (Show Details)Sep 5 2019, 11:04 AM

lebedev.ri resigned from this revision.Sep 6 2019, 5:20 AM

Kind reminder. @bjope , please review.

Sorry, I haven't had time to look at this.

Just like with the previous patch (that I did help out reviewing), all I can do is to try to figure out if the code is doing what you aim for. Not if this is good in general and how it fits together with other passes.
You should probably add some more reviewers here. I suppose there should be a code owner for these kinds of transforms? Maybe find someone that knows more about the strategy for these kinds of optimizations?
I happen to know that the load hoisting, earlier present in this pass, was moved, I think it was to GVNHoist. So is the long term plan to add these kinds of rewrites to GVNSink?

Another thing to consider as a reviewer is the cost/value perspective (if the benefits of these changes are worth the added code complexity).
You do not write anything about how this impacts any benchmarks etc., so it is really hard for me to understand if this is good in general or just a micro optimization.

FWIW, personally I'm not that interested in this pass. I just happened to fix a problem codegen not being debug invariant once upon a time.

Maybe you can start out by adding some more info in the description about:

any measured results (what is the benefit)
the "origin" of this patch (is this part of some RFC, any trouble reports about lack of optimizations, just something that you discovered was missing)

And if you do not find anyone who feels more concerned about these kinds of transforms, then I can try to find some time to look at it (since I'm trying to contribute with some patches myself from time to time I actually try to also contribute with reviews... but I actually don't have that much knowledge when it comes to optimization strategies...).

Thanks for getting back to this.
GVNSink and SimplifyCFG only sink common tails of 2 BBs. I.e. they will not cherry-pick any individual stores from the middle of a BB.
That was the motivation for a little improvement. According to my testing it doesn't have measurable performance impact. I saw fluctuations within 1% which I think is caused by code placement.
I will try to find someone to review.

dendibakh added a reviewer: Gerolf.Sep 18 2019, 12:42 PM

Maybe @efriedma can review this.

dendibakh added a reviewer: efriedma.Sep 19 2019, 9:07 AM

Broadly, this seems like a small enough change that it would be worth taking for now, even if we expect the pass to become obsolete eventually. But there isn't much point if you aren't seeing any measurable benefit. Could you give some idea how frequently this triggers?

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
284	Why does it matter here if the address is a GEP, as opposed to some other Value? Instead of checking equality, could we check if the pointers are MustAlias? Would that have any significant impact on how frequently the transform triggers?
296	Is there a check somewhere that ensures InsertPt is actually a legal insertion point? You can't insert instructions into a catchswitch.
299	Do we need to do something with the store's alignment here?

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

MergedLoadStoreMotion.cpp

53 lines

test/

Transforms/

InstMerge/

st_sink_no_geps.ll

35 lines

Diff 218955

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	private:
// Routines for sinking stores		// Routines for sinking stores
StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);		StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);
PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);		PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);
bool isStoreSinkBarrierInRange(const Instruction &Start,		bool isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End, MemoryLocation Loc);		const Instruction &End, MemoryLocation Loc);
bool canSinkStoresAndGEPs(StoreInst S0, StoreInst S1) const;		bool canSinkStoresAndGEPs(StoreInst S0, StoreInst S1) const;
void sinkStoresAndGEPs(BasicBlock BB, StoreInst SinkCand,		void sinkStoresAndGEPs(BasicBlock BB, StoreInst SinkCand,
StoreInst *ElseInst);		StoreInst *ElseInst);
		bool canSinkStoresWithSameGEPs(StoreInst S0, StoreInst S1) const;
		void sinkStores(BasicBlock BB, StoreInst SinkCand, StoreInst *ElseInst);
bool mergeStores(BasicBlock *BB);		bool mergeStores(BasicBlock *BB);
};		};
} // end anonymous namespace		} // end anonymous namespace

///		///
/// Return tail block of a diamond.		/// Return tail block of a diamond.
///		///
BasicBlock MergedLoadStoreMotion::getDiamondTail(BasicBlock BB) {		BasicBlock MergedLoadStoreMotion::getDiamondTail(BasicBlock BB) {
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
///		///
/// Also sinks GEP instruction computing the store address		/// Also sinks GEP instruction computing the store address
///		///
void MergedLoadStoreMotion::sinkStoresAndGEPs(BasicBlock BB, StoreInst S0,		void MergedLoadStoreMotion::sinkStoresAndGEPs(BasicBlock BB, StoreInst S0,
StoreInst *S1) {		StoreInst *S1) {
// Only one definition?		// Only one definition?
auto *A0 = dyn_cast<Instruction>(S0->getPointerOperand());		auto *A0 = dyn_cast<Instruction>(S0->getPointerOperand());
auto *A1 = dyn_cast<Instruction>(S1->getPointerOperand());		auto *A1 = dyn_cast<Instruction>(S1->getPointerOperand());
LLVM_DEBUG(dbgs() << "Sink Instruction into BB \n"; BB->dump();		LLVM_DEBUG(dbgs() << "Sink Stores and GEPs into BB \n"; BB->dump();
dbgs() << "Instruction Left\n"; S0->dump(); dbgs() << "\n";		dbgs() << "Store Left\n"; S0->dump(); dbgs() << "\n";
dbgs() << "Instruction Right\n"; S1->dump(); dbgs() << "\n");		dbgs() << "Store Right\n"; S1->dump(); dbgs() << "\n");
// Hoist the instruction.		// Hoist the instruction.
BasicBlock::iterator InsertPt = BB->getFirstInsertionPt();		BasicBlock::iterator InsertPt = BB->getFirstInsertionPt();
// Intersect optional metadata.		// Intersect optional metadata.
S0->andIRFlags(S1);		S0->andIRFlags(S1);
S0->dropUnknownNonDebugMetadata();		S0->dropUnknownNonDebugMetadata();

// Create the new store to be inserted at the join point.		// Create the new store to be inserted at the join point.
StoreInst *SNew = cast<StoreInst>(S0->clone());		StoreInst *SNew = cast<StoreInst>(S0->clone());
Show All 11 Lines	void MergedLoadStoreMotion::sinkStoresAndGEPs(BasicBlock BB, StoreInst S0,
S1->eraseFromParent();		S1->eraseFromParent();
A0->replaceAllUsesWith(ANew);		A0->replaceAllUsesWith(ANew);
A0->eraseFromParent();		A0->eraseFromParent();
A1->replaceAllUsesWith(ANew);		A1->replaceAllUsesWith(ANew);
A1->eraseFromParent();		A1->eraseFromParent();
}		}

///		///
		/// Check if 2 stores can be sunk if they have the same GEP
		///
		bool MergedLoadStoreMotion::canSinkStoresWithSameGEPs(StoreInst *S0,
		StoreInst *S1) const {
		auto *A0 = dyn_cast<GetElementPtrInst>(S0->getPointerOperand());
		auto *A1 = dyn_cast<GetElementPtrInst>(S1->getPointerOperand());
		efriedmaUnsubmitted Not Done Reply Inline Actions Why does it matter here if the address is a GEP, as opposed to some other Value? Instead of checking equality, could we check if the pointers are MustAlias? Would that have any significant impact on how frequently the transform triggers? efriedma: Why does it matter here if the address is a GEP, as opposed to some other Value? Instead of…
		return A0 && A1 && A0 == A1;
		}

		///
		/// Merge two stores to same address and sink into \p BB
		///
		void MergedLoadStoreMotion::sinkStores(BasicBlock BB, StoreInst S0,
		StoreInst *S1) {
		LLVM_DEBUG(dbgs() << "Sink Stores into BB \n"; BB->dump();
		dbgs() << "Store Left\n"; S0->dump(); dbgs() << "\n";
		dbgs() << "Store Right\n"; S1->dump(); dbgs() << "\n");
		BasicBlock::iterator InsertPt = BB->getFirstInsertionPt();
		efriedmaUnsubmitted Not Done Reply Inline Actions Is there a check somewhere that ensures InsertPt is actually a legal insertion point? You can't insert instructions into a catchswitch. efriedma: Is there a check somewhere that ensures InsertPt is actually a legal insertion point? You…
		// Intersect optional metadata.
		S0->andIRFlags(S1);
		S0->dropUnknownNonDebugMetadata();
		efriedmaUnsubmitted Not Done Reply Inline Actions Do we need to do something with the store's alignment here? efriedma: Do we need to do something with the store's alignment here?

		// Create the new store to be inserted at the join point.
		StoreInst *SNew = cast<StoreInst>(S0->clone());
		SNew->insertBefore(&*InsertPt);

		// New PHI operand? Use it.
		if (PHINode *NewPN = getPHIOperand(BB, S0, S1))
		SNew->setOperand(0, NewPN);
		S0->eraseFromParent();
		S1->eraseFromParent();
		}

		///
/// True when two stores are equivalent and can sink into the footer		/// True when two stores are equivalent and can sink into the footer
///		///
/// Starting from a diamond head block, iterate over the instructions in one		/// Starting from a diamond head block, iterate over the instructions in one
/// successor block and try to match a store in the second successor.		/// successor block and try to match a store in the second successor.
///		///
bool MergedLoadStoreMotion::mergeStores(BasicBlock *HeadBB) {		bool MergedLoadStoreMotion::mergeStores(BasicBlock *HeadBB) {

bool MergedStores = false;		bool MergedStores = false;
Show All 28 Lines	for (BasicBlock::reverse_iterator RBI = Pred0->rbegin(), RBE = Pred0->rend();
auto *S0 = dyn_cast<StoreInst>(I);		auto *S0 = dyn_cast<StoreInst>(I);
if (!S0 \|\| !S0->isSimple())		if (!S0 \|\| !S0->isSimple())
continue;		continue;

++NStores;		++NStores;
if (NStores * Size1 >= MagicCompileTimeControl)		if (NStores * Size1 >= MagicCompileTimeControl)
break;		break;
if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {		if (StoreInst *S1 = canSinkFromBlock(Pred1, S0)) {
if (!canSinkStoresAndGEPs(S0, S1))		bool SinkWithGEPs = canSinkStoresAndGEPs(S0, S1);
		bool SinkOnlyStores = canSinkStoresWithSameGEPs(S0, S1);
		if (!SinkWithGEPs && !SinkOnlyStores)
// Don't attempt to sink below stores that had to stick around		// Don't attempt to sink below stores that had to stick around
// But after removal of a store and some of its feeding		// But after removal of a store and some of its feeding
// instruction search again from the beginning since the iterator		// instruction search again from the beginning since the iterator
// is likely stale at this point.		// is likely stale at this point.
break;		break;

if (SinkBB == TailBB && TailBB->hasNPredecessorsOrMore(3)) {		if (SinkBB == TailBB && TailBB->hasNPredecessorsOrMore(3)) {
// We have more than 2 predecessors. Insert a new block		// We have more than 2 predecessors. Insert a new block
// postdominating 2 predecessors we're going to sink from.		// postdominating 2 predecessors we're going to sink from.
SinkBB = SplitBlockPredecessors(TailBB, {Pred0, Pred1}, ".sink.split");		SinkBB = SplitBlockPredecessors(TailBB, {Pred0, Pred1}, ".sink.split");
if (!SinkBB)		if (!SinkBB)
break;		break;
}		}

MergedStores = true;		MergedStores = true;
		if (SinkWithGEPs)
sinkStoresAndGEPs(SinkBB, S0, S1);		sinkStoresAndGEPs(SinkBB, S0, S1);
		else if (SinkOnlyStores)
		sinkStores(SinkBB, S0, S1);
RBI = Pred0->rbegin();		RBI = Pred0->rbegin();
RBE = Pred0->rend();		RBE = Pred0->rend();
LLVM_DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());		LLVM_DEBUG(dbgs() << "Search again\n"; Instruction I = &RBI; I->dump());
}		}
}		}
return MergedStores;		return MergedStores;
}		}

bool MergedLoadStoreMotion::run(Function &F, AliasAnalysis &AA) {		bool MergedLoadStoreMotion::run(Function &F, AliasAnalysis &AA) {
this->AA = &AA;		this->AA = &AA;

bool Changed = false;		bool Changed = false;
LLVM_DEBUG(dbgs() << "Instruction Merger\n");		LLVM_DEBUG(dbgs() << "Instruction Merger\n");

// Merge unconditional branches, allowing PRE to catch more		// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.		// optimization opportunities.
// This loop doesn't care about newly inserted/split blocks		// This loop doesn't care about newly inserted/split blocks
// since they never will be diamond heads.		// since they never will be diamond heads.
for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {		for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {
BasicBlock BB = &FI++;		BasicBlock BB = &FI++;

// Hoist equivalent loads and sink stores		// Hoist equivalent loads and sink stores
// outside diamonds when possible		// outside diamonds when possible
if (isDiamondHead(BB)) {		if (isDiamondHead(BB)) {
Changed \|= mergeStores(BB);		Changed \|= mergeStores(BB);
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/InstMerge/st_sink_no_geps.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; Test to make sure that we sink stores if they have common GEP.
				; RUN: opt -basicaa -memdep -mldst-motion -S < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -passes='require<memdep>,mldst-motion' -S < %s 2>&1 \| FileCheck %s
				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

				; Function Attrs: nounwind uwtable
				define dso_local void @st_sink_no_geps(i32* nocapture %arg1, i1 zeroext %arg2) local_unnamed_addr {
				; CHECK-LABEL: @st_sink_no_geps(
				; CHECK-NEXT: bb1:
				; CHECK-NEXT: [[TMP:%.]] = getelementptr inbounds i32, i32 [[ARG1:%.*]], i64 1
				; CHECK-NEXT: br i1 [[ARG2:%.]], label [[BB2:%.]], label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB4:%.*]]
				; CHECK: bb3:
				; CHECK-NEXT: br label [[BB4]]
				; CHECK: bb4:
				; CHECK-NEXT: store i32 42, i32* [[TMP]], align 4
				; CHECK-NEXT: ret void
				;
				bb1:
				%tmp = getelementptr inbounds i32, i32* %arg1, i64 1
				br i1 %arg2, label %bb2, label %bb3

				bb2: ; preds = %bb1
				store i32 42, i32* %tmp, align 4
				br label %bb4

				bb3: ; preds = %bb1
				store i32 42, i32* %tmp, align 4
				br label %bb4

				bb4: ; preds = %bb2, %bb3
				ret void
				}