Download Raw Diff

Details

Reviewers

deadalnix
mehdi_amini
efriedma

Commits

rG7cb744621b28: [MemCpyOpt] Don't sink LoadInst below possible clobber.
rL290611: [MemCpyOpt] Don't sink LoadInst below possible clobber.

Summary

Currently, MemCpyOpt::processStore will convert this

val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D

into

store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A

and then into

store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

This is incorrect, since C and A may alias and therefore cannot be re-ordered.

Adding Amaury since this was introduced in http://reviews.llvm.org/D16523 .

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 78398.Nov 17 2016, 12:34 PM

bryant retitled this revision from to [MemCpyOpt] Don't sink LoadInst below possible clobber..

bryant updated this object.

bryant added reviewers: deadalnix, efriedma, mehdi_amini.

bryant set the repository for this revision to rL LLVM.

bryant added a subscriber: llvm-commits.

For the sake of clarity, I've broken down the precise sequence of
transformations.

Original:

val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D

Lift store val, D and its dependents and clobbers above the store to B:

val = load A
store _, C      ; may-alias A + D
store val, D
store _, B      ; may-alias A

Implicitly lower the load to its counterpart store (this is the bug):

store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A

Replace the load and store with a memcpy:

store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

EDIT: Formatting.
EDIT 2: Ordering.

efriedma requested changes to this revision.Nov 21 2016, 4:13 PM

efriedma edited edge metadata.

efriedma added inline comments.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619	I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores.
test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
2	-aa-eval doesn't do anything useful here.

This revision now requires changes to proceed.Nov 21 2016, 4:13 PM

better test:

inject some padding into the agg to thwart instcombine from exploding load/store.
remove -aa-eval.

bryant marked an inline comment as done.Nov 22 2016, 6:05 AM

bryant added inline comments.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619	I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. The store isn't necessarily the only thing to be hoisted to P. Instructions that the store 1) depends on or 2) aliases also need to be hoisted. That's `moveUp`'s job: To figure out which instructions, in addition to the store, need to be hoisted. Once all the hoisting is done, a memcpy/memmove is created to replace the load and store. However, this implies sinking the load past the hoisted stuff to just before the store, which may not be legal. So this patch ensures that it's legal for the load to sink past the hoisted stuff, even though no explicit sinking is needed. Does that make sense? On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores. InstCombine refrains from exploding aggregate loads and stores when there's padding. I've updated the test case accordingly.

Update FileChecks to the new aggregate.

bryant added inline comments.Nov 22 2016, 6:12 AM

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619	Here's an example, in case the explanation wasn't clear: %x = load P ; clobbers the load X ; clobbers the load and store store %x MCO will turn this into X ; clobbers the load and store %x = load store %x ; merged with the load into a memcpy P ; clobbers the load which is bad, since the load is moved past X.

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619	Oh, wait, I see, I was misunderstanding how the algorithm works (I somehow thought one of the loops was iterating the wrong way). Yes, your approach makes sense.
test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
3	evaluate-aa-metadata doesn't do anything here.

This revision is now accepted and ready to land.Nov 22 2016, 1:29 PM

In D26811#603131, @efriedma wrote:

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

Not sure that I understand. Are you suggesting that we leave the bug unfixed?

In D26811#603131, @efriedma wrote:

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

What makes you think this isn't useful and should be deleted ? This is the only thing that can do load/store forwarding in the codebase when aggregates contains padding, so this ends up being important. I think it is very much worth committing.

However, if you have a better alternative to suggest, please do, I can even probably spend some time on it. Improving aggregates support is important to me a several other non clang front end devs.

Oh, we special-case stores of aggregate types which specifically contain padding? Ugh, I missed that... that's *really* subtle. Also, there's probably some better way to set padding bits to undef, but that's an argument for another time.

@efriedma load/store of aggregate cannot be optimized the same way other load/store are done for various reasons.

If the aggregate do not contains padding, then it is possible to deaggregate in a serie of load/store and recompose the integrate. But in the case the aggregate has some padding, this transformation is lossy. However, doing it is memcpy and alike is not lossy so here you go. This is also useful for large aggregates. This transformation is already gated on aggregates types.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
598	This is where the transformation is gated on aggregates.

Removed -evaluate-aa-metadata.

bryant marked an inline comment as done.Nov 26 2016, 11:35 AM

bryant added inline comments.Dec 27 2016, 9:24 AM

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
7	fix typo.

"hould" to "should."
remove unneeded stderr redirect.

bryant marked an inline comment as done.Dec 27 2016, 10:06 AM

Closed by commit rL290611: [MemCpyOpt] Don't sink LoadInst below possible clobber. (authored by bryant). · Explain WhyDec 27 2016, 10:08 AM

This revision was automatically updated to reflect the committed changes.

Diff 78398

lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines	static unsigned findCommonAlignment(const DataLayout &DL, const StoreInst *SI,

return std::min(StoreAlign, LoadAlign);		return std::min(StoreAlign, LoadAlign);
}		}

// This method try to lift a store instruction before position P.		// This method try to lift a store instruction before position P.
// It will lift the store and its argument + that anything that		// It will lift the store and its argument + that anything that
// may alias with these.		// may alias with these.
// The method returns true if it was successful.		// The method returns true if it was successful.
static bool moveUp(AliasAnalysis &AA, StoreInst SI, Instruction P) {		static bool moveUp(AliasAnalysis &AA, StoreInst SI, Instruction P,
		const LoadInst *LI) {
// If the store alias this position, early bail out.		// If the store alias this position, early bail out.
MemoryLocation StoreLoc = MemoryLocation::get(SI);		MemoryLocation StoreLoc = MemoryLocation::get(SI);
if (AA.getModRefInfo(P, StoreLoc) != MRI_NoModRef)		if (AA.getModRefInfo(P, StoreLoc) != MRI_NoModRef)
return false;		return false;

// Keep track of the arguments of all instruction we plan to lift		// Keep track of the arguments of all instruction we plan to lift
// so we can make sure to lift them as well if apropriate.		// so we can make sure to lift them as well if apropriate.
DenseSet<Instruction*> Args;		DenseSet<Instruction*> Args;
if (auto *Ptr = dyn_cast<Instruction>(SI->getPointerOperand()))		if (auto *Ptr = dyn_cast<Instruction>(SI->getPointerOperand()))
if (Ptr->getParent() == SI->getParent())		if (Ptr->getParent() == SI->getParent())
Args.insert(Ptr);		Args.insert(Ptr);

// Instruction to lift before P.		// Instruction to lift before P.
SmallVector<Instruction*, 8> ToLift;		SmallVector<Instruction*, 8> ToLift;

// Memory locations of lifted instructions.		// Memory locations of lifted instructions.
SmallVector<MemoryLocation, 8> MemLocs;		SmallVector<MemoryLocation, 8> MemLocs{StoreLoc};
MemLocs.push_back(StoreLoc);

// Lifted callsites.		// Lifted callsites.
SmallVector<ImmutableCallSite, 8> CallSites;		SmallVector<ImmutableCallSite, 8> CallSites;

		const MemoryLocation LoadLoc = MemoryLocation::get(LI);

for (auto I = --SI->getIterator(), E = P->getIterator(); I != E; --I) {		for (auto I = --SI->getIterator(), E = P->getIterator(); I != E; --I) {
auto C = &I;		auto C = &I;

bool MayAlias = AA.getModRefInfo(C) != MRI_NoModRef;		bool MayAlias = AA.getModRefInfo(C) != MRI_NoModRef;

bool NeedLift = false;		bool NeedLift = false;
if (Args.erase(C))		if (Args.erase(C))
NeedLift = true;		NeedLift = true;
else if (MayAlias) {		else if (MayAlias) {
NeedLift = any_of(MemLocs, [C, &AA](const MemoryLocation &ML) {		NeedLift = any_of(MemLocs, [C, &AA](const MemoryLocation &ML) {
return AA.getModRefInfo(C, ML);		return AA.getModRefInfo(C, ML);
});		});

if (!NeedLift)		if (!NeedLift)
NeedLift = any_of(CallSites, [C, &AA](const ImmutableCallSite &CS) {		NeedLift = any_of(CallSites, [C, &AA](const ImmutableCallSite &CS) {
return AA.getModRefInfo(C, CS);		return AA.getModRefInfo(C, CS);
});		});
}		}

if (!NeedLift)		if (!NeedLift)
continue;		continue;

if (MayAlias) {		if (MayAlias) {
if (auto CS = ImmutableCallSite(C)) {		// Since LI is implicitly moved downwards past the lifted instructions,
		// none of them may modify its source.
		if (AA.getModRefInfo(C, LoadLoc) & MRI_Mod)
		return false;
		else if (auto CS = ImmutableCallSite(C)) {
// If we can't lift this before P, it's game over.		// If we can't lift this before P, it's game over.
if (AA.getModRefInfo(P, CS) != MRI_NoModRef)		if (AA.getModRefInfo(P, CS) != MRI_NoModRef)
return false;		return false;

CallSites.push_back(CS);		CallSites.push_back(CS);
} else if (isa<LoadInst>(C) \|\| isa<StoreInst>(C) \|\| isa<VAArgInst>(C)) {		} else if (isa<LoadInst>(C) \|\| isa<StoreInst>(C) \|\| isa<VAArgInst>(C)) {
// If we can't lift this before P, it's game over.		// If we can't lift this before P, it's game over.
auto ML = MemoryLocation::get(C);		auto ML = MemoryLocation::get(C);
Show All 37 Lines	bool MemCpyOptPass::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
const DataLayout &DL = SI->getModule()->getDataLayout();		const DataLayout &DL = SI->getModule()->getDataLayout();

// Load to store forwarding can be interpreted as memcpy.		// Load to store forwarding can be interpreted as memcpy.
if (LoadInst *LI = dyn_cast<LoadInst>(SI->getOperand(0))) {		if (LoadInst *LI = dyn_cast<LoadInst>(SI->getOperand(0))) {
if (LI->isSimple() && LI->hasOneUse() &&		if (LI->isSimple() && LI->hasOneUse() &&
LI->getParent() == SI->getParent()) {		LI->getParent() == SI->getParent()) {

auto *T = LI->getType();		auto *T = LI->getType();
if (T->isAggregateType()) {		if (T->isAggregateType()) {
		deadalnixUnsubmitted Not Done Reply Inline Actions This is where the transformation is gated on aggregates. deadalnix: This is where the transformation is gated on aggregates.
AliasAnalysis &AA = LookupAliasAnalysis();		AliasAnalysis &AA = LookupAliasAnalysis();
MemoryLocation LoadLoc = MemoryLocation::get(LI);		MemoryLocation LoadLoc = MemoryLocation::get(LI);

// We use alias analysis to check if an instruction may store to		// We use alias analysis to check if an instruction may store to
// the memory we load from in between the load and the store. If		// the memory we load from in between the load and the store. If
// such an instruction is found, we try to promote there instead		// such an instruction is found, we try to promote there instead
// of at the store position.		// of at the store position.
Instruction *P = SI;		Instruction *P = SI;
for (auto &I : make_range(++LI->getIterator(), SI->getIterator())) {		for (auto &I : make_range(++LI->getIterator(), SI->getIterator())) {
if (AA.getModRefInfo(&I, LoadLoc) & MRI_Mod) {		if (AA.getModRefInfo(&I, LoadLoc) & MRI_Mod) {
P = &I;		P = &I;
break;		break;
}		}
}		}

// We found an instruction that may write to the loaded memory.		// We found an instruction that may write to the loaded memory.
// We can try to promote at this position instead of the store		// We can try to promote at this position instead of the store
// position if nothing alias the store memory after this and the store		// position if nothing alias the store memory after this and the store
// destination is not in the range.		// destination is not in the range.
if (P && P != SI) {		if (P && P != SI) {
if (!moveUp(AA, SI, P))		if (!moveUp(AA, SI, P, LI))
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores. efriedma: I'm not really following what this patch is doing. Whether it's legal to sink the load to…
		bryantAuthorUnsubmitted Not Done Reply Inline Actions I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. The store isn't necessarily the only thing to be hoisted to P. Instructions that the store 1) depends on or 2) aliases also need to be hoisted. That's `moveUp`'s job: To figure out which instructions, in addition to the store, need to be hoisted. Once all the hoisting is done, a memcpy/memmove is created to replace the load and store. However, this implies sinking the load past the hoisted stuff to just before the store, which may not be legal. So this patch ensures that it's legal for the load to sink past the hoisted stuff, even though no explicit sinking is needed. Does that make sense? On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores. InstCombine refrains from exploding aggregate loads and stores when there's padding. I've updated the test case accordingly. bryant: > I'm not really following what this patch is doing. Whether it's legal to sink > the load to…
		bryantAuthorUnsubmitted Not Done Reply Inline Actions Here's an example, in case the explanation wasn't clear: %x = load P ; clobbers the load X ; clobbers the load and store store %x MCO will turn this into X ; clobbers the load and store %x = load store %x ; merged with the load into a memcpy P ; clobbers the load which is bad, since the load is moved past X. bryant: Here's an example, in case the explanation wasn't clear: ``` %x = load P ; clobbers the load…
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, wait, I see, I was misunderstanding how the algorithm works (I somehow thought one of the loops was iterating the wrong way). Yes, your approach makes sense. efriedma: Oh, wait, I see, I was misunderstanding how the algorithm works (I somehow thought one of the…
P = nullptr;		P = nullptr;
}		}

// If a valid insertion position is found, then we can promote		// If a valid insertion position is found, then we can promote
// the load/store pair to a memcpy.		// the load/store pair to a memcpy.
if (P) {		if (P) {
// If we load from memory that may alias the memory we store to,		// If we load from memory that may alias the memory we store to,
// memmove must be used to preserve semantic. If not, memcpy can		// memmove must be used to preserve semantic. If not, memcpy can
▲ Show 20 Lines • Show All 823 Lines • Show Last 20 Lines

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -basicaa -scoped-noalias -aa-eval -evaluate-aa-metadata -memcpyopt -S %s 2>/dev/null \| FileCheck %s
				efriedmaUnsubmitted Done Reply Inline Actions -aa-eval doesn't do anything useful here. efriedma: -aa-eval doesn't do anything useful here.

				efriedmaUnsubmitted Done Reply Inline Actions evaluate-aa-metadata doesn't do anything here. efriedma: evaluate-aa-metadata doesn't do anything here.
				%T = type {i32, i32}

				; memcpy(%d, %a) hould not be generated since store2 may-aliases load %a.
				define void @f(%T* %a, %T* %b, %T* %c, %T* %d) {
				bryantAuthorUnsubmitted Done Reply Inline Actions fix typo. bryant: fix typo.
				; CHECK-LABEL: @f(
				; CHECK-NEXT: [[VAL:%.]] = load %T, %T %a, !alias.scope !0
				; CHECK-NEXT: store %T { i32 23, i32 23 }, %T* %b, !alias.scope !3
				; CHECK-NEXT: store %T { i32 44, i32 44 }, %T* %c, !alias.scope !6, !noalias !3
				; CHECK-NEXT: store %T [[VAL]], %T* %d, !alias.scope !9, !noalias !12
				; CHECK-NEXT: ret void
				;
				%val = load %T, %T* %a, !alias.scope !{!10}

				; store1 may-aliases the load
				store %T { i32 23, i32 23 }, %T* %b, !alias.scope !{!11}

				; store2 may-aliases the load and store3
				store %T { i32 44, i32 44 }, %T* %c, !alias.scope !{!12}, !noalias !{!11}

				; store3
				store %T %val, %T* %d, !alias.scope !{!13}, !noalias !{!10, !11}
				ret void
				}

				!0 = !{!0}
				!1 = !{!1}
				!2 = !{!2}
				!3 = !{!3}

				!10 = !{ !10, !0 }
				!11 = !{ !11, !1 }
				!12 = !{ !12, !2 }
				!13 = !{ !13, !3 }

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Don't sink LoadInst below possible clobber.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78398

lib/Transforms/Scalar/MemCpyOptimizer.cpp

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Don't sink LoadInst below possible clobber.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78398

lib/Transforms/Scalar/MemCpyOptimizer.cpp

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll

[MemCpyOpt] Don't sink LoadInst below possible clobber.
ClosedPublic