This is an archive of the discontinued LLVM Phabricator instance.

[RFC WIP] Fix DSE for asm outputs (aka PR44913)
Needs RevisionPublic

Authored by glider on Feb 19 2020, 10:40 AM.

Download Raw Diff

Details

Reviewers

fhahn
efriedma

Summary

We need to collect the outputs of inline assembly statements, which in
the same time do not serve as inputs (i.e. those declared as "=m", not
"+m"). Doing so requires changing the DSE code so that every instruction
can potentially have multiple outputs.
Right now it seems sufficient to just change eliminateDeadStores() to
handle that case, but probably other places also need to be changed.
It also makes sense to factor out the "for (auto Loc: Locs)" loop from
that function to make it more readable.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 46829
Build 49477: arc lint + arc unit

Event Timeline

glider created this revision.Feb 19 2020, 10:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2020, 10:40 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

glider added a subscriber: dvyukov.Feb 19 2020, 10:41 AM

Hi Florian, Eli et al.,

I tried attacking https://bugs.llvm.org/show_bug.cgi?id=44913, and the provided patch seems to fix the problem.
Can you please take a look and tell if the general idea makes sense?
If it does, I can polish it by replacing all uses of getLocForWrite() with getLocsForWrite() (we may also need a similar function for reads) and move the code around a bit.

Harbormaster completed remote builds in B46829: Diff 245462.Feb 19 2020, 10:49 AM

fhahn added inline comments.Feb 19 2020, 2:01 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
291	I didn't have time to take a close look yet, but I think the reasoning here should be based on semantics defined in the LangRef. It might also be necessary to improve the LangRef.

I guess reasoning about inline asm directly while we don't have call attributes that allow equivalent reasoning makes sense. Sort of serves as a demonstration for why the attributes are useful. But you'll need to be *very* careful that we're actually specifying and implementing it properly. LangRef needs to clearly state the rules, and we need a bunch of tests to compare various combinations against gcc.

I assume you're aware this is missing a lot of logic that's necessary for correctness.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
320	Iterating over a SmallSet of pointers is non-deterministic.

glider marked 2 inline comments as done.Mar 3 2020, 7:40 AM

glider added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
291	You are right, and my understanding that "=r" must always precede "=m" didn't match LangRef. For example, some inline assembly instructions in the Linux kernel, e.g. { i8, i32 } (i32, i32, i32, i32) asm sideeffect ".pushsection .smp_locks,\22a\22\0A.balign 4\0A.long 671f - .\0A.popsection\0A671:\0A\09lock; cmpxchgl $3, $1\0A\09/* output condition code z/\0A", "={@ccz},=m,={ax},r,m,2,~{memory},~{dirflag},~{fpsr},~{flags}" have the following constraints: `"={@ccz},=m,={ax},r,*m,2,~{memory},~{dirflag},~{fpsr},~{flags}", where a memory output stands between two register outputs. Instead, we need to iterate over the constraints and pick all indirect outputs that don't have a matching input.
320	Correct, will fix.

Not sure if you are still interested in pushing this patch, but I think it would need to be re-based to work on the MemorySSA-backed DSE implementation, which is the default now. Also, as Eli mentioned there are missing correctness checks.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
291	Instead, we need to iterate over the constraints and pick all indirect outputs that don't have a matching input. I am not sure I follow.Could you elaborate? As Eli mentioned, there probably are some additional legality checks missing. It would be good to start by adding some tests. Also, we also need to account for the assembly using already assigned registers to read/write memory?

This revision now requires changes to proceed.Nov 9 2020, 12:48 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

444 lines

Diff 245462

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 37 Lines
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	else
DeadInst->eraseFromParent();		DeadInst->eraseFromParent();
} while (!NowDeadInsts.empty());		} while (!NowDeadInsts.empty());
*BBI = NewIter;		*BBI = NewIter;
// Pop dead entries from back of ThrowableInst till we find an alive entry.		// Pop dead entries from back of ThrowableInst till we find an alive entry.
while (!ThrowableInst.empty() && !ThrowableInst.back().second)		while (!ThrowableInst.empty() && !ThrowableInst.back().second)
ThrowableInst.pop_back();		ThrowableInst.pop_back();
}		}

		/// Get the number of assembly output arguments returned by pointers.
		/// Stolen from MemorySanitizer.cpp, needs to be factored out.
		int getNumOutputArgs(InlineAsm IA, CallBase CB) {
		int NumRetOutputs = 0;
		int NumOutputs = 0;
		Type *RetTy = cast<Value>(CB)->getType();
		if (!RetTy->isVoidTy()) {
		// Register outputs are returned via the CallInst return value.
		auto *ST = dyn_cast<StructType>(RetTy);
		if (ST)
		NumRetOutputs = ST->getNumElements();
		else
		NumRetOutputs = 1;
		}
		InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
		for (size_t i = 0, n = Constraints.size(); i < n; i++) {
		InlineAsm::ConstraintInfo Info = Constraints[i];
		switch (Info.Type) {
		case InlineAsm::isOutput:
		NumOutputs++;
		break;
		default:
		break;
		}
		}
		return NumOutputs - NumRetOutputs;
		}

/// Does this instruction write some memory? This only returns true for things		/// Does this instruction write some memory? This only returns true for things
/// that we can analyze with other helpers below.		/// that we can analyze with other helpers below.
static bool hasAnalyzableMemoryWrite(Instruction *I,		static bool hasAnalyzableMemoryWrite(Instruction *I,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
if (isa<StoreInst>(I))		if (isa<StoreInst>(I))
return true;		return true;
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
Show All 20 Lines	if (Function *F = CS.getCalledFunction()) {
case LibFunc_strcat:		case LibFunc_strcat:
case LibFunc_strncat:		case LibFunc_strncat:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}
}		}
		if (CS.isCallBr() \|\| (CS.isCall() && cast<CallInst>(I)->isInlineAsm())) {
		Instruction &I = *CS.getInstruction();
		CallBase *CB = cast<CallBase>(&I);
		InlineAsm *IA = cast<InlineAsm>(CB->getCalledValue());
		if (getNumOutputArgs(IA, CB)) {
		return true;
		} else
		return false;
		}
}		}
return false;		return false;
}		}

/// Return a Location stored to by the specified instruction. If isRemovable		/// Return a Location stored to by the specified instruction. If isRemovable
/// returns true, this function and getLocForRead completely describe the memory		/// returns true, this function and getLocForRead completely describe the memory
/// operations for this instruction.		/// operations for this instruction.
static MemoryLocation getLocForWrite(Instruction *Inst) {		static MemoryLocation getLocForWrite(Instruction *Inst) {
Show All 21 Lines	static MemoryLocation getLocForWrite(Instruction *Inst) {
}		}
if (auto CS = CallSite(Inst))		if (auto CS = CallSite(Inst))
// All the supported TLI functions so far happen to have dest as their		// All the supported TLI functions so far happen to have dest as their
// first argument.		// first argument.
return MemoryLocation(CS.getArgument(0));		return MemoryLocation(CS.getArgument(0));
return MemoryLocation();		return MemoryLocation();
}		}

		/// Get memory locations written by inline assembly.
		/// This doesn't include locations that are both read and written.
		SmallVector <MemoryLocation, 4> getLocsForAsm(Instruction &I, const DataLayout &DL) {
		// An inline asm() statement in C++ contains lists of input and output
		fhahnUnsubmitted Not Done Reply Inline Actions I didn't have time to take a close look yet, but I think the reasoning here should be based on semantics defined in the LangRef. It might also be necessary to improve the LangRef. fhahn: I didn't have time to take a close look yet, but I think the reasoning here should be based on…
		gliderAuthorUnsubmitted Done Reply Inline Actions You are right, and my understanding that "=r" must always precede "=m" didn't match LangRef. For example, some inline assembly instructions in the Linux kernel, e.g. { i8, i32 } (i32, i32, i32, i32) asm sideeffect ".pushsection .smp_locks,\22a\22\0A.balign 4\0A.long 671f - .\0A.popsection\0A671:\0A\09lock; cmpxchgl $3, $1\0A\09/* output condition code z/\0A", "={@ccz},=m,={ax},r,m,2,~{memory},~{dirflag},~{fpsr},~{flags}" have the following constraints: `"={@ccz},=m,={ax},r,m,2,~{memory},~{dirflag},~{fpsr},~{flags}", where a memory output stands between two register outputs. Instead, we need to iterate over the constraints and pick all indirect outputs that don't have a matching input. glider:* You are right, and my understanding that "=r" must always precede "=*m" didn't match LangRef.
		fhahnUnsubmitted Not Done Reply Inline Actions Instead, we need to iterate over the constraints and pick all indirect outputs that don't have a matching input. I am not sure I follow.Could you elaborate? As Eli mentioned, there probably are some additional legality checks missing. It would be good to start by adding some tests. Also, we also need to account for the assembly using already assigned registers to read/write memory? fhahn: > Instead, we need to iterate over the constraints and pick all indirect outputs that don't…
		// arguments used by the assembly code. These are mapped to operands of the
		// CallInst as follows:
		// - nR register outputs ("=r) are returned by value in a single structure
		// (SSA value of the CallInst);
		// - nO other outputs ("=m" and others) are returned by pointer as first
		// nO operands of the CallInst;
		// - nI inputs ("r", "m" and others) are passed to CallInst as the
		// remaining nI operands.
		// The total number of asm() arguments in the source is nR+nO+nI, and the
		// corresponding CallInst has nO+nI+1 operands (the last operand is the
		// function to be called).
		SmallSet <Value *, 4> Inputs, Outputs;
		SmallVector<MemoryLocation, 4> Ret;
		CallBase *CB = cast<CallBase>(&I);
		InlineAsm *IA = cast<InlineAsm>(CB->getCalledValue());
		int OutputArgs = getNumOutputArgs(IA, CB);
		// The last operand of a CallInst is the function itself.
		int NumOperands = CB->getNumOperands() - 1;

		// Collect input arguments.
		for (int i = OutputArgs; i < NumOperands; i++)
		Inputs.insert(CB->getOperand(i));
		// Collect output arguments that aren't inputs.
		for (int i = 0; i < OutputArgs; i++) {
		Value *Val = CB->getOperand(i);
		if (!Inputs.count(Val))
		Outputs.insert(Val);
		}
		for (auto V : Outputs) {
		efriedmaUnsubmitted Not Done Reply Inline Actions Iterating over a SmallSet of pointers is non-deterministic. efriedma: Iterating over a SmallSet of pointers is non-deterministic.
		gliderAuthorUnsubmitted Done Reply Inline Actions Correct, will fix. glider: Correct, will fix.
		int Size = DL.getTypeStoreSize(V->getType());
		Ret.push_back(MemoryLocation(V, Size));
		}
		return Ret;
		}

		static SmallVector<MemoryLocation, 4> getLocsForWrite(Instruction *Inst, const DataLayout &DL) {
		SmallVector <MemoryLocation, 4> Ret;
		if (auto CS = CallSite(Inst)) {
		if (CS.isCall() && cast<CallInst>(Inst)->isInlineAsm()) {
		Ret = getLocsForAsm(*Inst, DL);
		return Ret;
		}
		}
		MemoryLocation Loc = getLocForWrite(Inst);
		if (Loc.Ptr)
		Ret.push_back(Loc);
		return Ret;
		}

/// Return the location read by the specified "hasAnalyzableMemoryWrite"		/// Return the location read by the specified "hasAnalyzableMemoryWrite"
/// instruction if any.		/// instruction if any.
static MemoryLocation getLocForRead(Instruction *Inst,		static MemoryLocation getLocForRead(Instruction *Inst,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
assert(hasAnalyzableMemoryWrite(Inst, TLI) && "Unknown instruction case");		assert(hasAnalyzableMemoryWrite(Inst, TLI) && "Unknown instruction case");

// The only instructions that both read and write are the mem transfer		// The only instructions that both read and write are the mem transfer
// instructions (memcpy/memmove).		// instructions (memcpy/memmove).
▲ Show 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {
MemDepResult InstDep = MD->getDependency(Inst, &OBB);		MemDepResult InstDep = MD->getDependency(Inst, &OBB);

// Ignore any store where we can't find a local dependence.		// Ignore any store where we can't find a local dependence.
// FIXME: cross-block DSE would be fun. :)		// FIXME: cross-block DSE would be fun. :)
if (!InstDep.isDef() && !InstDep.isClobber())		if (!InstDep.isDef() && !InstDep.isClobber())
continue;		continue;

// Figure out what location is being stored to.		// Figure out what location is being stored to.
MemoryLocation Loc = getLocForWrite(Inst);		SmallVector <MemoryLocation, 4> Locs = getLocsForWrite(Inst, DL);

// If we didn't get a useful location, fail.		// If we didn't get a useful location, fail.
if (!Loc.Ptr)		if (!Locs.size())
continue;		continue;
		for (auto Loc: Locs) {
// Loop until we find a store we can eliminate or a load that		// Loop until we find a store we can eliminate or a load that
// invalidates the analysis. Without an upper bound on the number of		// invalidates the analysis. Without an upper bound on the number of
// instructions examined, this analysis can become very time-consuming.		// instructions examined, this analysis can become very time-consuming.
// However, the potential gain diminishes as we process more instructions		// However, the potential gain diminishes as we process more instructions
// without eliminating any of them. Therefore, we limit the number of		// without eliminating any of them. Therefore, we limit the number of
// instructions we look at.		// instructions we look at.
auto Limit = MD->getDefaultBlockScanLimit();		auto Limit = MD->getDefaultBlockScanLimit();
while (InstDep.isDef() \|\| InstDep.isClobber()) {		while (InstDep.isDef() \|\| InstDep.isClobber()) {
// Get the memory clobbered by the instruction we depend on. MemDep will		// Get the memory clobbered by the instruction we depend on. MemDep will
// skip any instructions that 'Loc' clearly doesn't interact with. If we		// skip any instructions that 'Loc' clearly doesn't interact with. If we
// end up depending on a may- or must-aliased load, then we can't optimize		// end up depending on a may- or must-aliased load, then we can't optimize
// away the store and we bail out. However, if we depend on something		// away the store and we bail out. However, if we depend on something
// that overwrites the memory location we can potentially optimize it.		// that overwrites the memory location we can potentially optimize it.
//		//
// Find out what memory location the dependent instruction stores.		// Find out what memory location the dependent instruction stores.
Instruction *DepWrite = InstDep.getInst();		Instruction *DepWrite = InstDep.getInst();
if (!hasAnalyzableMemoryWrite(DepWrite, *TLI))		if (!hasAnalyzableMemoryWrite(DepWrite, *TLI))
break;		break;
MemoryLocation DepLoc = getLocForWrite(DepWrite);		MemoryLocation DepLoc = getLocForWrite(DepWrite);
// If we didn't get a useful location, or if it isn't a size, bail out.		// If we didn't get a useful location, or if it isn't a size, bail out.
if (!DepLoc.Ptr)		if (!DepLoc.Ptr)
break;		break;

// Find the last throwable instruction not removed by call to		// Find the last throwable instruction not removed by call to
// deleteDeadInstruction.		// deleteDeadInstruction.
Instruction *LastThrowing = nullptr;		Instruction *LastThrowing = nullptr;
if (!ThrowableInst.empty())		if (!ThrowableInst.empty())
LastThrowing = ThrowableInst.back().first;		LastThrowing = ThrowableInst.back().first;

// Make sure we don't look past a call which might throw. This is an		// Make sure we don't look past a call which might throw. This is an
// issue because MemoryDependenceAnalysis works in the wrong direction:		// issue because MemoryDependenceAnalysis works in the wrong direction:
// it finds instructions which dominate the current instruction, rather than		// it finds instructions which dominate the current instruction, rather than
// instructions which are post-dominated by the current instruction.		// instructions which are post-dominated by the current instruction.
//		//
// If the underlying object is a non-escaping memory allocation, any store		// If the underlying object is a non-escaping memory allocation, any store
// to it is dead along the unwind edge. Otherwise, we need to preserve		// to it is dead along the unwind edge. Otherwise, we need to preserve
// the store.		// the store.
if (LastThrowing && OBB.dominates(DepWrite, LastThrowing)) {		if (LastThrowing && OBB.dominates(DepWrite, LastThrowing)) {
const Value* Underlying = GetUnderlyingObject(DepLoc.Ptr, DL);		const Value* Underlying = GetUnderlyingObject(DepLoc.Ptr, DL);
bool IsStoreDeadOnUnwind = isa<AllocaInst>(Underlying);		bool IsStoreDeadOnUnwind = isa<AllocaInst>(Underlying);
if (!IsStoreDeadOnUnwind) {		if (!IsStoreDeadOnUnwind) {
// We're looking for a call to an allocation function		// We're looking for a call to an allocation function
// where the allocation doesn't escape before the last		// where the allocation doesn't escape before the last
// throwing instruction; PointerMayBeCaptured		// throwing instruction; PointerMayBeCaptured
// reasonably fast approximation.		// reasonably fast approximation.
IsStoreDeadOnUnwind = isAllocLikeFn(Underlying, TLI) &&		IsStoreDeadOnUnwind = isAllocLikeFn(Underlying, TLI) &&
!PointerMayBeCaptured(Underlying, false, true);		!PointerMayBeCaptured(Underlying, false, true);
}		}
if (!IsStoreDeadOnUnwind)		if (!IsStoreDeadOnUnwind)
break;		break;
}		}

// If we find a write that is a) removable (i.e., non-volatile), b) is		// If we find a write that is a) removable (i.e., non-volatile), b) is
// completely obliterated by the store to 'Loc', and c) which we know that		// completely obliterated by the store to 'Loc', and c) which we know that
// 'Inst' doesn't load from, then we can remove it.		// 'Inst' doesn't load from, then we can remove it.
// Also try to merge two stores if a later one only touches memory written		// Also try to merge two stores if a later one only touches memory written
// to by the earlier one.		// to by the earlier one.
if (isRemovable(DepWrite) &&		if (isRemovable(DepWrite) &&
!isPossibleSelfRead(Inst, Loc, DepWrite, TLI, AA)) {		!isPossibleSelfRead(Inst, Loc, DepWrite, TLI, AA)) {
int64_t InstWriteOffset, DepWriteOffset;		int64_t InstWriteOffset, DepWriteOffset;
OverwriteResult OR = isOverwrite(Loc, DepLoc, DL, *TLI, DepWriteOffset,		OverwriteResult OR = isOverwrite(Loc, DepLoc, DL, *TLI, DepWriteOffset,
InstWriteOffset, DepWrite, IOL, *AA,		InstWriteOffset, DepWrite, IOL, *AA,
BB.getParent());		BB.getParent());
if (OR == OW_Complete) {		if (OR == OW_Complete) {
LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *DepWrite		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *DepWrite
<< "\n KILLER: " << *Inst << '\n');		<< "\n KILLER: " << *Inst << '\n');

// Delete the store and now-dead instructions that feed it.		// Delete the store and now-dead instructions that feed it.
deleteDeadInstruction(DepWrite, &BBI, MD, TLI, IOL, OBB,		deleteDeadInstruction(DepWrite, &BBI, MD, TLI, IOL, OBB,
ThrowableInst);		ThrowableInst);
++NumFastStores;		++NumFastStores;
MadeChange = true;		MadeChange = true;

// We erased DepWrite; start over.		// We erased DepWrite; start over.
InstDep = MD->getDependency(Inst, &OBB);		InstDep = MD->getDependency(Inst, &OBB);
continue;		continue;
} else if ((OR == OW_End && isShortenableAtTheEnd(DepWrite)) \|\|		} else if ((OR == OW_End && isShortenableAtTheEnd(DepWrite)) \|\|
((OR == OW_Begin &&		((OR == OW_Begin &&
isShortenableAtTheBeginning(DepWrite)))) {		isShortenableAtTheBeginning(DepWrite)))) {
assert(!EnablePartialOverwriteTracking && "Do not expect to perform "		assert(!EnablePartialOverwriteTracking && "Do not expect to perform "
"when partial-overwrite "		"when partial-overwrite "
"tracking is enabled");		"tracking is enabled");
// The overwrite result is known, so these must be known, too.		// The overwrite result is known, so these must be known, too.
int64_t EarlierSize = DepLoc.Size.getValue();		int64_t EarlierSize = DepLoc.Size.getValue();
int64_t LaterSize = Loc.Size.getValue();		int64_t LaterSize = Loc.Size.getValue();
bool IsOverwriteEnd = (OR == OW_End);		bool IsOverwriteEnd = (OR == OW_End);
MadeChange \|= tryToShorten(DepWrite, DepWriteOffset, EarlierSize,		MadeChange \|= tryToShorten(DepWrite, DepWriteOffset, EarlierSize,
InstWriteOffset, LaterSize, IsOverwriteEnd);		InstWriteOffset, LaterSize, IsOverwriteEnd);
} else if (EnablePartialStoreMerging &&		} else if (EnablePartialStoreMerging &&
OR == OW_PartialEarlierWithFullLater) {		OR == OW_PartialEarlierWithFullLater) {
auto *Earlier = dyn_cast<StoreInst>(DepWrite);		auto *Earlier = dyn_cast<StoreInst>(DepWrite);
auto *Later = dyn_cast<StoreInst>(Inst);		auto *Later = dyn_cast<StoreInst>(Inst);
if (Earlier && isa<ConstantInt>(Earlier->getValueOperand()) &&		if (Earlier && isa<ConstantInt>(Earlier->getValueOperand()) &&
DL.typeSizeEqualsStoreSize(		DL.typeSizeEqualsStoreSize(
Earlier->getValueOperand()->getType()) &&		Earlier->getValueOperand()->getType()) &&
Later && isa<ConstantInt>(Later->getValueOperand()) &&		Later && isa<ConstantInt>(Later->getValueOperand()) &&
DL.typeSizeEqualsStoreSize(		DL.typeSizeEqualsStoreSize(
Later->getValueOperand()->getType()) &&		Later->getValueOperand()->getType()) &&
memoryIsNotModifiedBetween(Earlier, Later, AA)) {		memoryIsNotModifiedBetween(Earlier, Later, AA)) {
// If the store we find is:		// If the store we find is:
// a) partially overwritten by the store to 'Loc'		// a) partially overwritten by the store to 'Loc'
// b) the later store is fully contained in the earlier one and		// b) the later store is fully contained in the earlier one and
// c) they both have a constant value		// c) they both have a constant value
// d) none of the two stores need padding		// d) none of the two stores need padding
// Merge the two stores, replacing the earlier store's value with a		// Merge the two stores, replacing the earlier store's value with a
// merge of both values.		// merge of both values.
// TODO: Deal with other constant types (vectors, etc), and probably		// TODO: Deal with other constant types (vectors, etc), and probably
// some mem intrinsics (if needed)		// some mem intrinsics (if needed)

APInt EarlierValue =		APInt EarlierValue =
cast<ConstantInt>(Earlier->getValueOperand())->getValue();		cast<ConstantInt>(Earlier->getValueOperand())->getValue();
APInt LaterValue =		APInt LaterValue =
cast<ConstantInt>(Later->getValueOperand())->getValue();		cast<ConstantInt>(Later->getValueOperand())->getValue();
unsigned LaterBits = LaterValue.getBitWidth();		unsigned LaterBits = LaterValue.getBitWidth();
assert(EarlierValue.getBitWidth() > LaterValue.getBitWidth());		assert(EarlierValue.getBitWidth() > LaterValue.getBitWidth());
LaterValue = LaterValue.zext(EarlierValue.getBitWidth());		LaterValue = LaterValue.zext(EarlierValue.getBitWidth());

// Offset of the smaller store inside the larger store		// Offset of the smaller store inside the larger store
unsigned BitOffsetDiff = (InstWriteOffset - DepWriteOffset) * 8;		unsigned BitOffsetDiff = (InstWriteOffset - DepWriteOffset) * 8;
unsigned LShiftAmount =		unsigned LShiftAmount =
DL.isBigEndian()		DL.isBigEndian()
? EarlierValue.getBitWidth() - BitOffsetDiff - LaterBits		? EarlierValue.getBitWidth() - BitOffsetDiff - LaterBits
: BitOffsetDiff;		: BitOffsetDiff;
APInt Mask =		APInt Mask =
APInt::getBitsSet(EarlierValue.getBitWidth(), LShiftAmount,		APInt::getBitsSet(EarlierValue.getBitWidth(), LShiftAmount,
LShiftAmount + LaterBits);		LShiftAmount + LaterBits);
// Clear the bits we'll be replacing, then OR with the smaller		// Clear the bits we'll be replacing, then OR with the smaller
// store, shifted appropriately.		// store, shifted appropriately.
APInt Merged =		APInt Merged =
(EarlierValue & ~Mask) \| (LaterValue << LShiftAmount);		(EarlierValue & ~Mask) \| (LaterValue << LShiftAmount);
LLVM_DEBUG(dbgs() << "DSE: Merge Stores:\n Earlier: " << *DepWrite		LLVM_DEBUG(dbgs() << "DSE: Merge Stores:\n Earlier: " << *DepWrite
<< "\n Later: " << *Inst		<< "\n Later: " << *Inst
<< "\n Merged Value: " << Merged << '\n');		<< "\n Merged Value: " << Merged << '\n');

auto *SI = new StoreInst(		auto *SI = new StoreInst(
ConstantInt::get(Earlier->getValueOperand()->getType(), Merged),		ConstantInt::get(Earlier->getValueOperand()->getType(), Merged),
Earlier->getPointerOperand(), false,		Earlier->getPointerOperand(), false,
MaybeAlign(Earlier->getAlignment()), Earlier->getOrdering(),		MaybeAlign(Earlier->getAlignment()), Earlier->getOrdering(),
Earlier->getSyncScopeID(), DepWrite);		Earlier->getSyncScopeID(), DepWrite);

unsigned MDToKeep[] = {LLVMContext::MD_dbg, LLVMContext::MD_tbaa,		unsigned MDToKeep[] = {LLVMContext::MD_dbg, LLVMContext::MD_tbaa,
LLVMContext::MD_alias_scope,		LLVMContext::MD_alias_scope,
LLVMContext::MD_noalias,		LLVMContext::MD_noalias,
LLVMContext::MD_nontemporal};		LLVMContext::MD_nontemporal};
SI->copyMetadata(*DepWrite, MDToKeep);		SI->copyMetadata(*DepWrite, MDToKeep);
++NumModifiedStores;		++NumModifiedStores;

// Remove earlier, wider, store		// Remove earlier, wider, store
OBB.replaceInstruction(DepWrite, SI);		OBB.replaceInstruction(DepWrite, SI);

// Delete the old stores and now-dead instructions that feed them.		// Delete the old stores and now-dead instructions that feed them.
deleteDeadInstruction(Inst, &BBI, MD, TLI, IOL, OBB,		deleteDeadInstruction(Inst, &BBI, MD, TLI, IOL, OBB,
ThrowableInst);		ThrowableInst);
deleteDeadInstruction(DepWrite, &BBI, MD, TLI, IOL, OBB,		deleteDeadInstruction(DepWrite, &BBI, MD, TLI, IOL, OBB,
ThrowableInst);		ThrowableInst);
MadeChange = true;		MadeChange = true;

// We erased DepWrite and Inst (Loc); start over.		// We erased DepWrite and Inst (Loc); start over.
break;		break;
}		}
}		}
}		}

// If this is a may-aliased store that is clobbering the store value, we		// If this is a may-aliased store that is clobbering the store value, we
// can keep searching past it for another must-aliased pointer that stores		// can keep searching past it for another must-aliased pointer that stores
// to the same location. For example, in:		// to the same location. For example, in:
// store -> P		// store -> P
// store -> Q		// store -> Q
// store -> P		// store -> P
// we can remove the first store to P even though we don't know if P and Q		// we can remove the first store to P even though we don't know if P and Q
// alias.		// alias.
if (DepWrite == &BB.front()) break;		if (DepWrite == &BB.front()) break;

// Can't look past this instruction if it might read 'Loc'.		// Can't look past this instruction if it might read 'Loc'.
if (isRefSet(AA->getModRefInfo(DepWrite, Loc)))		if (isRefSet(AA->getModRefInfo(DepWrite, Loc)))
break;		break;

InstDep = MD->getPointerDependencyFrom(Loc, /isLoad=/ false,		InstDep = MD->getPointerDependencyFrom(Loc, /isLoad=/ false,
DepWrite->getIterator(), &BB,		DepWrite->getIterator(), &BB,
/QueryInst=/ nullptr, &Limit);		/QueryInst=/ nullptr, &Limit);
}		}
}		}
		}

if (EnablePartialOverwriteTracking)		if (EnablePartialOverwriteTracking)
MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);		MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);

// If this block ends in a return, unwind, or unreachable, all allocas are		// If this block ends in a return, unwind, or unreachable, all allocas are
// dead at its end, which means stores to them are also dead.		// dead at its end, which means stores to them are also dead.
if (BB.getTerminator()->getNumSuccessors() == 0)		if (BB.getTerminator()->getNumSuccessors() == 0)
MadeChange \|= handleEndBlock(BB, AA, MD, TLI, IOL, OBB, ThrowableInst);		MadeChange \|= handleEndBlock(BB, AA, MD, TLI, IOL, OBB, ThrowableInst);
▲ Show 20 Lines • Show All 577 Lines • Show Last 20 Lines