This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Scalar/
-
lib/
-
Transforms/
-
Scalar/
8/10
DeadStoreElimination.cpp

Differential D112313

[DSE] Optimize defining access of defs while walking upwards.
ClosedPublic

Authored by fhahn on Oct 22 2021, 6:33 AM.

Download Raw Diff

Details

Reviewers

nikic
asbirlea
george.burgess.iv

Commits

rG25dad1064bf1: [DSE] Optimize defining access of defs while walking upwards.

Summary

This patch extends the code that walks memory defs upwards to find
clobbering accesses to also try to optimize the clobbering defining
access.

We should be able to find set the optimized access of our starting def
(KillingDef), if the following holds:

It is the first call of getDomMemoryDef for KillingDef (so Current == KillingDef->getDefiningAccess().
No potentially aliasing defs are skipped.

Then if a (partly) aliasing def is encountered, it can be used as
optimized access for KillingDef. No further optimizations can be
applied to KillingDef.

I'd appreciate a careful look, as the existing documentation is not too
clear on what is expected for optimized accesses.

The motivation for this patch is to use the optimized accesses to cover
more cases of redundant stores as follow-up to D111727.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Oct 22 2021, 6:33 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 22 2021, 6:33 AM

fhahn requested review of this revision.Oct 22 2021, 6:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2021, 6:33 AM

Harbormaster completed remote builds in B130128: Diff 381529.Oct 22 2021, 6:33 AM

remove stray edit of getLocForWriteEx, add some comments.

Harbormaster completed remote builds in B130154: Diff 381565.Oct 22 2021, 9:00 AM

fhahn added a child revision: D112315: [DSE] Use optimized access if available for redundant store elimination..Oct 22 2021, 9:07 AM

fhahn added a parent revision: D112312: [DSE] Add OR_None (not for commit).

nikic added inline comments.Oct 25 2021, 2:55 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410	I think this is incorrect if KillingDef accesses multiple locations. Say you have a memcpy, then KillingLoc will point to the write location only, and that's the only one for which we will check aliasing. This means that the optimized access might look past a clobber of the read location.

Marking as changes requested per above comment.

As a more general comment, I'm somewhat concerned about setting and accessing the optimized access outside the caching MSSA walker, as it may result in unexpected behavior changes. E.g. if you run a pass before DSE that uses the walker and which sets an optimized access, but would not be set by this code (e.g. because it requires looking through a MemoryPhi), then that optimized access would be used by D112315. If you did not run the pass beforehand, it wouldn't be used. Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker? Without looking into it in detail, I'm going to guess "no", they can differ due to e.g. different cutoffs. Of course, behavior can already be influenced by adjacent passes because MSSA updating is imprecise (in the sense that it is not equivalent to rebuilding MSSA from scratch), but I think this may make things worse in that a pass dependency would exist even if IR is not modified -- only caches in MSSA are populated/used in an inconsistent manner.

This revision now requires changes to proceed.Oct 25 2021, 2:28 PM

In D112313#3085784, @nikic wrote:

Marking as changes requested per above comment.

As a more general comment, I'm somewhat concerned about setting and accessing the optimized access outside the caching MSSA walker, as it may result in unexpected behavior changes. E.g. if you run a pass before DSE that uses the walker and which sets an optimized access, but would not be set by this code (e.g. because it requires looking through a MemoryPhi), then that optimized access would be used by D112315. If you did not run the pass beforehand, it wouldn't be used. Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker? Without looking into it in detail, I'm going to guess "no", they can differ due to e.g. different cutoffs. Of course, behavior can already be influenced by adjacent passes because MSSA updating is imprecise (in the sense that it is not equivalent to rebuilding MSSA from scratch), but I think this may make things worse in that a pass dependency would exist even if IR is not modified -- only caches in MSSA are populated/used in an inconsistent manner.

That's a good point, having results depend on memory-ssa caching can lead to unexpected/hard-to-reproduce changes. But I think we already have the same issue with the existing users of walkers. I'd be fine with just keeping a mapping in DSE, but it seems like it might be helpful for other passes to optimize memory-ssa if possible.

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410	But isn't the optimized access only for write clobbers? This is the part that is not really documented well, but going from the use in `getClobberingMemoryAccess` it looks like it should be the nearest dominating write clobber

nikic added inline comments.Oct 26 2021, 2:54 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410	Yes, it's about write clobbers. The problem is that a MemoryDef may access multiple locations, and the optimized access is the nearest write clobber on any of those locations. Your code (unless I'm missing something) will only find clobbers to the write location of KillingDef, but not to any additional read locations it may have. ; assuming p, q noalias write(p) <-- optimized access found by this implementation write(q) <-- correct optimized access memcpy(p, q)

Do not optimize accesses for instructions also reading from memory. With the current guarantee that getLocForWriteEx only supports instructiosn that modify a single location this should guarantee that KillingDef only accesses a single location.

fhahn marked an inline comment as done.Oct 27 2021, 3:27 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1410	You are right, I my thinking about mem-defs was a bit backwards, we also need to consider defs of the read access here. I updated the code to not optimize killing defs that also read from memory. With the current restriction to only support writing instructions with a single memory location that should guarantee that only defs which access a single location (KillingLoc) are optimized. I added a test in 1a2a7cca3e43 which I think may be mis-optimized due to the issue you described, if we support eliminating redundant memcpys based on earlier memcpys.

Harbormaster completed remote builds in B130905: Diff 382589.Oct 27 2021, 4:26 AM

fhahn mentioned this in D105098: [DSE][NFC] Introduce "doesn't overwrite" return code for isOverwrite.Oct 31 2021, 9:46 AM

In D112313#3088714, @fhahn wrote:

In D112313#3085784, @nikic wrote:

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

Right. For example DSE uses the earliest escape analysis, so it may find and cache a better (i.e. more dominating) clobber than the walker usually would.

Maybe it's okay. Would be great for @asbirlea to chime in.

In D112313#3099064, @nikic wrote:

In D112313#3088714, @fhahn wrote:

In D112313#3085784, @nikic wrote:

Conversely, if another pass runs after DSE and uses the caching walker, it will now return the access optimized by DSE, but will it always be the same as the one produced by the walker?

I *think* the patch should only set the optimized access *if* we hit a 'guaranteed' write-clobber, so this should be the 'nearest' dominating write-clobber. It intentionally does not set the optimized access for mem-phis or non-aliasing writes. I think this should be consistent with the walker as we only set the optimized access if it could not be skipped by the walker. That is modulo differences in the AA interpretation used between the walker and isOverwrite.

Right. For example DSE uses the earliest escape analysis, so it may find and cache a better (i.e. more dominating) clobber than the walker usually would.

Good point, I should have said 'at least as good' compared to most uses of the walkers, depending on the AA the walker uses.

Maybe it's okay. Would be great for @asbirlea to chime in.

That would be great, as this seems a fundamental question when interacting with MemSSA, i.e. how passes should or shouldn't optimize MemorySSA as they go along.

ping @asbirlea :)

Do you think DSE should optimize MemorySSA or is that better left to the official walkers only?

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

The flag can be used when attempting to reproduce behavior, to determine if the caching from DSE affects results. It's also a quick way to reverse the decision of allowing any pass to do this.

In D112313#3125182, @asbirlea wrote:

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

That policy sounds good to me, thanks for sharing! Should we add something like that to the MemorySSA docs?

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

I added a flag in the latest version.

fhahn mentioned this in D112315: [DSE] Use optimized access if available for redundant store elimination..Nov 11 2021, 1:37 PM

Harbormaster completed remote builds in B133808: Diff 386647.Nov 11 2021, 2:18 PM

In D112313#3125662, @fhahn wrote:

In D112313#3125182, @asbirlea wrote:

Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.

That policy sounds good to me, thanks for sharing! Should we add something like that to the MemorySSA docs?

Yes. I'll send something for review tomorrow.

Actionable feedback for this patch: can you introduce a cl::opt flag and set CanOptimize = flag && current_conditions, and document what the flag does and why.
Set the flag to false until there is a concrete case where this will be used (does it further affect the results in D112315?).
Have a separate patch that just turn flag to true and evaluate that.

I added a flag in the latest version.

Changes LG. @nikic?

nikic requested changes to this revision.Nov 20 2021, 1:21 PM

nikic added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1337	Looking at how canSkipDef() is defined, I think this is wrong on two fronts: First, canSkipDef() skips any mayThrow() instruction if `!DefVisibleToCaller`. But what if this is a non-nounwind call that clobbers KillingLoc? Second, canSkipDef() skips lifetime intrinsics, but I believe those are considered clobbering by MSSA (they effectively write an undef value to the location).
1386–1387	On an unrelated note, why does this explicitly adjust WalkerStepLimit?
2119	Spurious change

This revision now requires changes to proceed.Nov 20 2021, 1:21 PM

disable MSSA optimizations when stepping over a 'skippable' def

fhahn marked 2 inline comments as done.Nov 22 2021, 7:21 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1337	Good point, it's probably best to disable optimizations for any skipable def to start with.
1386–1387	I'm not sure, but looking at other continues we should be able to drop it.
2119	Removed it here, but I also need to remove the one below as well.

remove unrelated whitespace changes

Harbormaster completed remote builds in B135423: Diff 388915.Nov 22 2021, 2:03 PM

LGTM

This revision is now accepted and ready to land.Nov 26 2021, 10:20 AM

This revision was landed with ongoing or failed builds.Nov 27 2021, 5:05 AM

Closed by commit rG25dad1064bf1: [DSE] Optimize defining access of defs while walking upwards. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG25dad1064bf1: [DSE] Optimize defining access of defs while walking upwards..

reames mentioned this in D120842: [DSE] Cache liveOnEntry as clobbering access.Mar 2 2022, 12:46 PM

reames mentioned this in rG00a877f96af7: [DSE] Cache liveOnEntry as clobbering access.Mar 3 2022, 11:40 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

46 lines

Diff 388915

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	MemorySSAOtherBBStepCost("dse-memoryssa-otherbb-cost", cl::init(5),
"block than the killing MemoryDef"		"block than the killing MemoryDef"
"(default = 5)"));		"(default = 5)"));

static cl::opt<unsigned> MemorySSAPathCheckLimit(		static cl::opt<unsigned> MemorySSAPathCheckLimit(
"dse-memoryssa-path-check-limit", cl::init(50), cl::Hidden,		"dse-memoryssa-path-check-limit", cl::init(50), cl::Hidden,
cl::desc("The maximum number of blocks to check when trying to prove that "		cl::desc("The maximum number of blocks to check when trying to prove that "
"all paths to an exit go through a killing block (default = 50)"));		"all paths to an exit go through a killing block (default = 50)"));

		// This flags allows or disallows DSE to optimize MemorySSA during its
		// traversal. Note that DSE optimizing MemorySSA may impact other passes
		// downstream of the DSE invocation and can lead to issues not being
		// reproducible in isolation (i.e. when MemorySSA is built from scratch). In
		// those cases, the flag can be used to check if DSE's MemorySSA optimizations
		// impact follow-up passes.
		static cl::opt<bool>
		OptimizeMemorySSA("dse-optimize-memoryssa", cl::init(false), cl::Hidden,
		cl::desc("Allow DSE to optimize memory accesses"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
using OverlapIntervalsTy = std::map<int64_t, int64_t>;		using OverlapIntervalsTy = std::map<int64_t, int64_t>;
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;		using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;

/// If the value of this instruction and the memory it writes to is unused, may		/// If the value of this instruction and the memory it writes to is unused, may
/// we delete this instruction?		/// we delete this instruction?
▲ Show 20 Lines • Show All 1,098 Lines • ▼ Show 20 Lines	if (ScanLimit == 0 \|\| WalkerStepLimit == 0) {
LLVM_DEBUG(dbgs() << "\n ... hit scan limit\n");		LLVM_DEBUG(dbgs() << "\n ... hit scan limit\n");
return None;		return None;
}		}

MemoryAccess *Current = StartAccess;		MemoryAccess *Current = StartAccess;
Instruction *KillingI = KillingDef->getMemoryInst();		Instruction *KillingI = KillingDef->getMemoryInst();
LLVM_DEBUG(dbgs() << " trying to get dominating access\n");		LLVM_DEBUG(dbgs() << " trying to get dominating access\n");

		// Only optimize defining access of KillingDef when directly starting at its
		// defining access. The defining access also must only access KillingLoc. At
		// the moment we only support instructions with a single write location, so
		// it should be sufficient to disable optimizations for instructions that
		// also read from memory.
		bool CanOptimize = OptimizeMemorySSA &&
		KillingDef->getDefiningAccess() == StartAccess &&
		!KillingI->mayReadFromMemory();

// Find the next clobbering Mod access for DefLoc, starting at StartAccess.		// Find the next clobbering Mod access for DefLoc, starting at StartAccess.
Optional<MemoryLocation> CurrentLoc;		Optional<MemoryLocation> CurrentLoc;
for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {		for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {
LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << " visiting " << *Current;		dbgs() << " visiting " << *Current;
if (!MSSA.isLiveOnEntryDef(Current) && isa<MemoryUseOrDef>(Current))		if (!MSSA.isLiveOnEntryDef(Current) && isa<MemoryUseOrDef>(Current))
dbgs() << " (" << *cast<MemoryUseOrDef>(Current)->getMemoryInst()		dbgs() << " (" << *cast<MemoryUseOrDef>(Current)->getMemoryInst()
<< ")";		<< ")";
Show All 25 Lines	for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {
}		}

// Below, check if CurrentDef is a valid candidate to be eliminated by		// Below, check if CurrentDef is a valid candidate to be eliminated by
// KillingDef. If it is not, check the next candidate.		// KillingDef. If it is not, check the next candidate.
MemoryDef *CurrentDef = cast<MemoryDef>(Current);		MemoryDef *CurrentDef = cast<MemoryDef>(Current);
Instruction *CurrentI = CurrentDef->getMemoryInst();		Instruction *CurrentI = CurrentDef->getMemoryInst();

if (canSkipDef(CurrentDef, !isInvisibleToCallerBeforeRet(KillingUndObj),		if (canSkipDef(CurrentDef, !isInvisibleToCallerBeforeRet(KillingUndObj),
TLI))		TLI)) {
		CanOptimize = false;
		nikicUnsubmitted Done Reply Inline Actions Looking at how canSkipDef() is defined, I think this is wrong on two fronts: First, canSkipDef() skips any mayThrow() instruction if `!DefVisibleToCaller`. But what if this is a non-nounwind call that clobbers KillingLoc? Second, canSkipDef() skips lifetime intrinsics, but I believe those are considered clobbering by MSSA (they effectively write an undef value to the location). nikic: Looking at how canSkipDef() is defined, I think this is wrong on two fronts: First, canSkipDef…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Good point, it's probably best to disable optimizations for any skipable def to start with. fhahn: Good point, it's probably best to disable optimizations for any skipable def to start with.
continue;		continue;
		}

// Before we try to remove anything, check for any extra throwing		// Before we try to remove anything, check for any extra throwing
// instructions that block us from DSEing		// instructions that block us from DSEing
if (mayThrowBetween(KillingI, CurrentI, KillingUndObj)) {		if (mayThrowBetween(KillingI, CurrentI, KillingUndObj)) {
LLVM_DEBUG(dbgs() << " ... skip, may throw!\n");		LLVM_DEBUG(dbgs() << " ... skip, may throw!\n");
return None;		return None;
}		}

Show All 20 Lines	for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {
})) {		})) {
LLVM_DEBUG(dbgs() << " ... found a read clobber\n");		LLVM_DEBUG(dbgs() << " ... found a read clobber\n");
return None;		return None;
}		}

// If Current does not have an analyzable write location or is not		// If Current does not have an analyzable write location or is not
// removable, skip it.		// removable, skip it.
CurrentLoc = getLocForWriteEx(CurrentI);		CurrentLoc = getLocForWriteEx(CurrentI);
if (!CurrentLoc \|\| !isRemovable(CurrentI))		if (!CurrentLoc \|\| !isRemovable(CurrentI)) {
		CanOptimize = false;
continue;		continue;
		}

// AliasAnalysis does not account for loops. Limit elimination to		// AliasAnalysis does not account for loops. Limit elimination to
// candidates for which we can guarantee they always store to the same		// candidates for which we can guarantee they always store to the same
// memory location and not located in different loops.		// memory location and not located in different loops.
if (!isGuaranteedLoopIndependent(CurrentI, KillingI, *CurrentLoc)) {		if (!isGuaranteedLoopIndependent(CurrentI, KillingI, *CurrentLoc)) {
LLVM_DEBUG(dbgs() << " ... not guaranteed loop independent\n");		LLVM_DEBUG(dbgs() << " ... not guaranteed loop independent\n");
WalkerStepLimit -= 1;		WalkerStepLimit -= 1;
		CanOptimize = false;
		nikicUnsubmitted Not Done Reply Inline Actions On an unrelated note, why does this explicitly adjust WalkerStepLimit? nikic: On an unrelated note, why does this explicitly adjust WalkerStepLimit?
		fhahnAuthorUnsubmitted Done Reply Inline Actions I'm not sure, but looking at other continues we should be able to drop it. fhahn: I'm not sure, but looking at other continues we should be able to drop it.
continue;		continue;
}		}

if (IsMemTerm) {		if (IsMemTerm) {
// If the killing def is a memory terminator (e.g. lifetime.end), check		// If the killing def is a memory terminator (e.g. lifetime.end), check
// the next candidate if the current Current does not write the same		// the next candidate if the current Current does not write the same
// underlying object as the terminator.		// underlying object as the terminator.
if (!isMemTerminator(*CurrentLoc, CurrentI, KillingI))		if (!isMemTerminator(*CurrentLoc, CurrentI, KillingI)) {
		CanOptimize = false;
continue;		continue;
		}
} else {		} else {
int64_t KillingOffset = 0;		int64_t KillingOffset = 0;
int64_t DeadOffset = 0;		int64_t DeadOffset = 0;
auto OR = isOverwrite(KillingI, CurrentI, KillingLoc, *CurrentLoc,		auto OR = isOverwrite(KillingI, CurrentI, KillingLoc, *CurrentLoc,
KillingOffset, DeadOffset);		KillingOffset, DeadOffset);
		if (CanOptimize) {
		// CurrentDef is the earliest write clobber of KillingDef. Use it as
		// optimized access. Do not optimize if CurrentDef is already the
		// defining access of KillingDef.
		if (CurrentDef != KillingDef->getDefiningAccess() &&
		(OR == OW_Complete \|\| OR == OW_MaybePartial))
		KillingDef->setOptimized(CurrentDef);
		nikicUnsubmitted Not Done Reply Inline Actions I think this is incorrect if KillingDef accesses multiple locations. Say you have a memcpy, then KillingLoc will point to the write location only, and that's the only one for which we will check aliasing. This means that the optimized access might look past a clobber of the read location. nikic: I think this is incorrect if KillingDef accesses multiple locations. Say you have a memcpy…
		fhahnAuthorUnsubmitted Done Reply Inline Actions But isn't the optimized access only for write clobbers? This is the part that is not really documented well, but going from the use in `getClobberingMemoryAccess` it looks like it should be the nearest dominating write clobber fhahn: But isn't the optimized access only for write clobbers? This is the part that is not really…
		nikicUnsubmitted Done Reply Inline Actions Yes, it's about write clobbers. The problem is that a MemoryDef may access multiple locations, and the optimized access is the nearest write clobber on any of those locations. Your code (unless I'm missing something) will only find clobbers to the write location of KillingDef, but not to any additional read locations it may have. ; assuming p, q noalias write(p) <-- optimized access found by this implementation write(q) <-- correct optimized access memcpy(p, q) nikic: Yes, it's about write clobbers. The problem is that a MemoryDef may access multiple locations…
		fhahnAuthorUnsubmitted Done Reply Inline Actions You are right, I my thinking about mem-defs was a bit backwards, we also need to consider defs of the read access here. I updated the code to not optimize killing defs that also read from memory. With the current restriction to only support writing instructions with a single memory location that should guarantee that only defs which access a single location (KillingLoc) are optimized. I added a test in 1a2a7cca3e43 which I think may be mis-optimized due to the issue you described, if we support eliminating redundant memcpys based on earlier memcpys. fhahn: You are right, I my thinking about mem-defs was a bit backwards, we also need to consider defs…

		// Once a may-aliasing def is encountered do not set an optimized
		// access.
		if (OR != OW_None)
		CanOptimize = false;
		}

// If Current does not write to the same object as KillingDef, check		// If Current does not write to the same object as KillingDef, check
// the next candidate.		// the next candidate.
if (OR == OW_Unknown \|\| OR == OW_None)		if (OR == OW_Unknown \|\| OR == OW_None)
continue;		continue;
else if (OR == OW_MaybePartial) {		else if (OR == OW_MaybePartial) {
// If KillingDef only partially overwrites Current, check the next		// If KillingDef only partially overwrites Current, check the next
// candidate if the partial step limit is exceeded. This aggressively		// candidate if the partial step limit is exceeded. This aggressively
// limits the number of candidates for partial store elimination,		// limits the number of candidates for partial store elimination,
▲ Show 20 Lines • Show All 725 Lines • ▼ Show 20 Lines	PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {

bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI, LI);		bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI, LI);

#ifdef LLVM_ENABLE_STATS		#ifdef LLVM_ENABLE_STATS
if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
for (auto &I : instructions(F))		for (auto &I : instructions(F))
NumRemainingStores += isa<StoreInst>(&I);		NumRemainingStores += isa<StoreInst>(&I);
#endif		#endif

nikicUnsubmitted Done Reply Inline Actions Spurious change nikic: Spurious change
fhahnAuthorUnsubmitted Done Reply Inline Actions Removed it here, but I also need to remove the one below as well. fhahn: Removed it here, but I also need to remove the one below as well.
if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
PA.preserve<LoopAnalysis>();		PA.preserve<LoopAnalysis>();
return PA;		return PA;
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines