This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ValueTracking.h
-
lib/
-
Analysis/
2/9
ValueTracking.cpp
-
Transforms/Scalar/
-
Scalar/
2
LoopIdiomRecognize.cpp
3/6
MemCpyOptimizer.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
-
fca2memcpy.ll

Differential D51751

Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue
ClosedPublic

Authored by jfb on Sep 6 2018, 2:09 PM.

Download Raw Diff

Details

Reviewers

MatzeB
efriedma

Commits

rG73d8e4e53186: Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue
rL342709: Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue

Summary

This code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types.

clang part of this patch: D51752

Diff Detail

Repository

rL LLVM

Build Status

Buildable 22891
Build 22891: arc lint + arc unit

Event Timeline

jfb created this revision.Sep 6 2018, 2:09 PM

Harbormaster completed remote builds in B22343: Diff 164286.Sep 6 2018, 2:09 PM

Herald added subscribers: llvm-commits, dexonsmith. · View Herald TranscriptSep 6 2018, 2:09 PM

jfb mentioned this in D51752: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.Sep 6 2018, 2:09 PM

jfb edited the summary of this revision. (Show Details)Sep 6 2018, 2:10 PM

jfb added a reviewer: MatzeB.

As noted in D51752 I'll instead improve isBytewiseValue to do this.

Augment isBytewiseValue to be as powerful as clang's version. Update code that used it to handle undef merging. This captures more memcpy patterns and can generate memcpy of undef (which selection DAG later gets rid of).

jfb retitled this revision from NFC: move isRepeatedBytePattern from clang to Constant to Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue.Sep 7 2018, 11:50 AM

jfb edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B22381: Diff 164485.Sep 7 2018, 11:52 AM

efriedma added a subscriber: efriedma.Sep 7 2018, 12:09 PM

efriedma added inline comments.

lib/Analysis/ValueTracking.cpp
3127	Do you really need separate loops for ConstantVector and ConstantArray?
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
657	No test coverage for this change?
lib/Transforms/Scalar/MemCpyOptimizer.cpp
391	What's the point of changing this signature? It doesn't seem to have any effect.

LGTM once Eli approves.

lib/Analysis/ValueTracking.cpp
3046	You could move this down, closer to its first user.
3058–3063	Indeed an interesting discussion whether we should call into something like InstructionSimplify here. I agree with you that we'd rather see this handled in a separate step by whoever calls `isByteWiseValue()`. That way we can keep this function simple and only handle the patterns that are already simplified.
3130	not sure how valuable this comment is :)

Update comments and move Ctx down a bit.

lib/Analysis/ValueTracking.cpp
3058–3063	This comment was originally at the bottom of the function, I moved it to early-exit if it's not a constant because all the subsequent code is about constants (not values). I can drop the comment if you want?
3127	I don't think ConstantArray has getSplatValue, which I use for ConstantVector. Or are you suggesting something else?
lib/Transforms/Scalar/LoopIdiomRecognize.cpp
657	Indeed, right now getMemSetPatternValue doesn't check for UndefValue so there's no way to trigger this line. I'm adding it opportunistically because it's obviously correct, and I added a FIXME in getMemSetPatternValue to do this.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
391	If ByteVal is UndefValue then it can be merged with any other valid pattern. The below change does this. The signature change allows updating ByteVal. I could instead return a pair if you want.

Harbormaster completed remote builds in B22395: Diff 164531.Sep 7 2018, 3:43 PM

@mehdi_amini from our discussion in D49771 I think this patch takes us partway to what you wanted (having LLVM do at least as much work as clang when it comes to generating memset). After this commits I'll look at making clang less smart, and relying on LLVM being smart instead. I want to balance compile-time versus early code quality.

efriedma added inline comments.Sep 7 2018, 4:50 PM

lib/Analysis/ValueTracking.cpp
3127	I meant you could just use the for loop for ConstantVector. Maybe a bit less efficient, but the difference is unlikely to matter.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
391	Whether you use an out parameter or a pair, the value still isn't actually used by either of the callers of tryMergingIntoMemset. Looking again, I guess technically it's used in MemCpyOptPass::processStore, but only in the case where tryMergingIntoMemset fails, so I'm guessing that isn't intentional.

Remove ref param

Harbormaster completed remote builds in B22675: Diff 165602.Sep 14 2018, 3:36 PM

jfb added inline comments.Sep 14 2018, 3:37 PM

lib/Analysis/ValueTracking.cpp
3127	I'd rather not since `getSplatValue` does exactly what we want here.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
391	You're right, changed back.

Missing test coverage for half.

Missing test coverage here for ConstantStruct/ConstantArray/ConstantVector, although I guess the array/struct bits are covered by the clang patch. I'd like to see some coverage for vectors with unusual element sizes, though, like <i1 x 16>.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
758–759	Unnecessary change.
798	Unnecessary change.

This patch will conflict with my patch here: https://reviews.llvm.org/D52092

Background: In our out-of-tree target we have 16-bit-bytes (so memset/memcpy etc operate on 16-bit units). We need to patch isBytewiseValue to handle both 8-bit and 16-bit bytes (we store BitsPerByte in various places such as DataLayout, so it could be 8 or 16 depending on target). The idea with D52092 was to make it possible to check BitsPerByte in isBytewiseValue, and then convert the existing isBytewiseValue into a more general isSplatValue that should work for any given bit length (similar to isSplat in APInt.h).

Do you think it is possible to use such a design here? (I doubt that it costs too much to have a more general isSplatValue in LLVM trunk)

Augment isBytewiseValue to be as powerful as clang's version. Update code that used it to handle undef merging. This captures more memcpy patterns and can generate memcpy of undef (which selection DAG later gets rid of).
Update comments and move Ctx down a bit.
Remove ref param
Remove extra changes, left over from clang-format

Harbormaster completed remote builds in B22891: Diff 166359.Sep 20 2018, 2:09 PM

I'll add more tests soon.

Add a bit more testing

Harbormaster completed remote builds in B22900: Diff 166377.Sep 20 2018, 3:55 PM

In D51751#1237272, @efriedma wrote:

Missing test coverage for half.

Missing test coverage here for ConstantStruct/ConstantArray/ConstantVector, although I guess the array/struct bits are covered by the clang patch. I'd like to see some coverage for vectors with unusual element sizes, though, like <i1 x 16>.

I added some tests. Indeed the clang-side test/CodeGenCXX/auto-var-init.cpp already exercises a bunch of this.

efriedma added inline comments.Sep 20 2018, 4:10 PM

test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗	(On Diff #166377)	You're not really testing any interesting codepaths with a constant zero? We just hit the "isNullValue()" early exit.

jfb added inline comments.Sep 20 2018, 4:57 PM

test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗	(On Diff #166377)	It's the only interesting codepath for `i1` though. Unless you're saying that we should add handling for `i1` in `isBytewiseValue`?

jfb added inline comments.Sep 20 2018, 5:00 PM

test/Transforms/MemCpyOpt/memcpy-to-memset.ll

54 ↗

(On Diff #166377)

To be clear, this is where we return nullptr for all other i1:

// We can handle constant integers that are multiple of 8 bits.
if (ConstantInt *CI = dyn_cast<ConstantInt>(C)) {
  if (CI->getBitWidth() % 8 == 0) {
    assert(CI->getBitWidth() > 8 && "8 bits should be handled above!");
    if (!CI->getValue().isSplat(8))
      return nullptr;
    return ConstantInt::get(Ctx, CI->getValue().trunc(8));
  }
}

efriedma added inline comments.Sep 20 2018, 5:12 PM

test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗	(On Diff #166377)	Well, an improved isBytewiseValue could theoretically handle it... but I guess it's probably not worth bothering, at least for now. But better to have some test coverage so we it's clear what isn't handled.

Test i1, show it isn't handled.

Added the negative test.

Harbormaster completed remote builds in B22904: Diff 166385.Sep 20 2018, 5:18 PM

LGTM

This revision is now accepted and ready to land.Sep 20 2018, 5:47 PM

Closed by commit rL342709: Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue (authored by jfb). · Explain WhySep 20 2018, 10:22 PM

This revision was automatically updated to reflect the committed changes.

jfb mentioned this in rL342734: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.Sep 21 2018, 6:55 AM

jfb mentioned this in rC342734: NFC: deduplicate isRepeatedBytePattern from clang to LLVM's isBytewiseValue.

vitalybuka added a subscriber: vitalybuka.Jul 12 2019, 10:32 AM

vitalybuka added inline comments.

lib/Analysis/ValueTracking.cpp
3127	@jfb This vector processing does not handle undefs correctly. I'd like to fix that with D64031

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 10:32 AM

Herald added a subscriber: jkorous. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

Analysis/

ValueTracking.h

3 lines

lib/

Analysis/

ValueTracking.cpp

94 lines

Transforms/

Scalar/

LoopIdiomRecognize.cpp

7 lines

MemCpyOptimizer.cpp

5 lines

test/

Transforms/

MemCpyOpt/

fca2memcpy.ll

33 lines

Diff 166359

include/llvm/Analysis/ValueTracking.h

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	class Value;
/// x > +0 --> true		/// x > +0 --> true
/// x < -0 --> false		/// x < -0 --> false
bool SignBitMustBeZero(const Value V, const TargetLibraryInfo TLI);		bool SignBitMustBeZero(const Value V, const TargetLibraryInfo TLI);

/// If the specified value can be set by repeating the same byte in memory,		/// If the specified value can be set by repeating the same byte in memory,
/// return the i8 value that it is represented with. This is true for all i8		/// return the i8 value that it is represented with. This is true for all i8
/// values obviously, but is also true for i32 0, i32 -1, i16 0xF0F0, double		/// values obviously, but is also true for i32 0, i32 -1, i16 0xF0F0, double
/// 0.0 etc. If the value can't be handled with a repeated byte store (e.g.		/// 0.0 etc. If the value can't be handled with a repeated byte store (e.g.
/// i16 0x1234), return null.		/// i16 0x1234), return null. If the value is entirely undef and padding,
		/// return undef.
Value isBytewiseValue(Value V);		Value isBytewiseValue(Value V);

/// Given an aggregrate and an sequence of indices, see if the scalar value		/// Given an aggregrate and an sequence of indices, see if the scalar value
/// indexed is already around as a register, for example if it were inserted		/// indexed is already around as a register, for example if it were inserted
/// directly into the aggregrate.		/// directly into the aggregrate.
///		///
/// If InsertBefore is not null, this function will duplicate (modified)		/// If InsertBefore is not null, this function will duplicate (modified)
/// insertvalues when a part of a nested struct is extracted.		/// insertvalues when a part of a nested struct is extracted.
▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 3,036 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumElts; ++i) {
auto *CElt = dyn_cast<ConstantFP>(Elt);		auto *CElt = dyn_cast<ConstantFP>(Elt);
if (!CElt \|\| CElt->isNaN())		if (!CElt \|\| CElt->isNaN())
return false;		return false;
}		}
// All elements were confirmed not-NaN or undefined.		// All elements were confirmed not-NaN or undefined.
return true;		return true;
}		}

/// If the specified value can be set by repeating the same byte in memory,
/// return the i8 value that it is represented with. This is
/// true for all i8 values obviously, but is also true for i32 0, i32 -1,
/// i16 0xF0F0, double 0.0 etc. If the value can't be handled with a repeated
/// byte store (e.g. i16 0x1234), return null.
Value llvm::isBytewiseValue(Value V) {		Value llvm::isBytewiseValue(Value V) {

		MatzeBUnsubmitted Done Reply Inline Actions You could move this down, closer to its first user. MatzeB: You could move this down, closer to its first user.
// All byte-wide stores are splatable, even of arbitrary variables.		// All byte-wide stores are splatable, even of arbitrary variables.
if (V->getType()->isIntegerTy(8)) return V;		if (V->getType()->isIntegerTy(8))
		return V;

		LLVMContext &Ctx = V->getContext();

		// Undef don't care.
		auto *UndefInt8 = UndefValue::get(Type::getInt8Ty(Ctx));
		if (isa<UndefValue>(V))
		return UndefInt8;

		Constant *C = dyn_cast<Constant>(V);
		if (!C) {
		// Conceptually, we could handle things like:
		// %a = zext i8 %X to i16
		// %b = shl i16 %a, 8
		// %c = or i16 %a, %b
		MatzeBUnsubmitted Not Done Reply Inline Actions Indeed an interesting discussion whether we should call into something like InstructionSimplify here. I agree with you that we'd rather see this handled in a separate step by whoever calls `isByteWiseValue()`. That way we can keep this function simple and only handle the patterns that are already simplified. MatzeB: Indeed an interesting discussion whether we should call into something like InstructionSimplify…
		jfbAuthorUnsubmitted Not Done Reply Inline Actions This comment was originally at the bottom of the function, I moved it to early-exit if it's not a constant because all the subsequent code is about constants (not values). I can drop the comment if you want? jfb: This comment was originally at the bottom of the function, I moved it to early-exit if it's not…
		// but until there is an example that actually needs this, it doesn't seem
		// worth worrying about.
		return nullptr;
		}

// Handle 'null' ConstantArrayZero etc.		// Handle 'null' ConstantArrayZero etc.
if (Constant *C = dyn_cast<Constant>(V))
if (C->isNullValue())		if (C->isNullValue())
return Constant::getNullValue(Type::getInt8Ty(V->getContext()));		return Constant::getNullValue(Type::getInt8Ty(Ctx));

// Constant float and double values can be handled as integer values if the		// Constant floating-point values can be handled as integer values if the
// corresponding integer value is "byteable". An important case is 0.0.		// corresponding integer value is "byteable". An important case is 0.0.
if (ConstantFP *CFP = dyn_cast<ConstantFP>(V)) {		if (ConstantFP *CFP = dyn_cast<ConstantFP>(C)) {
if (CFP->getType()->isFloatTy())		Type *Ty = nullptr;
V = ConstantExpr::getBitCast(CFP, Type::getInt32Ty(V->getContext()));		if (CFP->getType()->isHalfTy())
if (CFP->getType()->isDoubleTy())		Ty = Type::getInt16Ty(Ctx);
V = ConstantExpr::getBitCast(CFP, Type::getInt64Ty(V->getContext()));		else if (CFP->getType()->isFloatTy())
		Ty = Type::getInt32Ty(Ctx);
		else if (CFP->getType()->isDoubleTy())
		Ty = Type::getInt64Ty(Ctx);
// Don't handle long double formats, which have strange constraints.		// Don't handle long double formats, which have strange constraints.
		return Ty ? isBytewiseValue(ConstantExpr::getBitCast(CFP, Ty)) : nullptr;
}		}

// We can handle constant integers that are multiple of 8 bits.		// We can handle constant integers that are multiple of 8 bits.
if (ConstantInt *CI = dyn_cast<ConstantInt>(V)) {		if (ConstantInt *CI = dyn_cast<ConstantInt>(C)) {
if (CI->getBitWidth() % 8 == 0) {		if (CI->getBitWidth() % 8 == 0) {
assert(CI->getBitWidth() > 8 && "8 bits should be handled above!");		assert(CI->getBitWidth() > 8 && "8 bits should be handled above!");

if (!CI->getValue().isSplat(8))		if (!CI->getValue().isSplat(8))
return nullptr;		return nullptr;
return ConstantInt::get(V->getContext(), CI->getValue().trunc(8));		return ConstantInt::get(Ctx, CI->getValue().trunc(8));
}		}
}		}

// A ConstantDataArray/Vector is splatable if all its members are equal and		auto Merge = [&](Value LHS, Value RHS) -> Value * {
// also splatable.		if (LHS == RHS)
if (ConstantDataSequential *CA = dyn_cast<ConstantDataSequential>(V)) {		return LHS;
Value *Elt = CA->getElementAsConstant(0);		if (!LHS \|\| !RHS)
Value *Val = isBytewiseValue(Elt);		return nullptr;
if (!Val)		if (LHS == UndefInt8)
		return RHS;
		if (RHS == UndefInt8)
		return LHS;
return nullptr;		return nullptr;
		};

for (unsigned I = 1, E = CA->getNumElements(); I != E; ++I)		if (ConstantDataSequential *CA = dyn_cast<ConstantDataSequential>(C)) {
if (CA->getElementAsConstant(I) != Elt)		Value *Val = UndefInt8;
		for (unsigned I = 0, E = CA->getNumElements(); I != E; ++I)
		if (!(Val = Merge(Val, isBytewiseValue(CA->getElementAsConstant(I)))))
return nullptr;		return nullptr;
		return Val;
		}

		if (isa<ConstantVector>(C)) {
		Constant *Splat = cast<ConstantVector>(C)->getSplatValue();
		return Splat ? isBytewiseValue(Splat) : nullptr;
		}

		if (isa<ConstantArray>(C) \|\| isa<ConstantStruct>(C)) {
		Value *Val = UndefInt8;
		for (unsigned I = 0, E = C->getNumOperands(); I != E; ++I)
		if (!(Val = Merge(Val, isBytewiseValue(C->getOperand(I)))))
		return nullptr;
return Val;		return Val;
		efriedmaUnsubmitted Not Done Reply Inline Actions Do you really need separate loops for ConstantVector and ConstantArray? efriedma: Do you really need separate loops for ConstantVector and ConstantArray?
		jfbAuthorUnsubmitted Not Done Reply Inline Actions I don't think ConstantArray has getSplatValue, which I use for ConstantVector. Or are you suggesting something else? jfb: I don't think ConstantArray has getSplatValue, which I use for ConstantVector. Or are you…
		efriedmaUnsubmitted Not Done Reply Inline Actions I meant you could just use the for loop for ConstantVector. Maybe a bit less efficient, but the difference is unlikely to matter. efriedma: I meant you could just use the for loop for ConstantVector. Maybe a bit less efficient, but…
		jfbAuthorUnsubmitted Not Done Reply Inline Actions I'd rather not since `getSplatValue` does exactly what we want here. jfb: I'd rather not since `getSplatValue` does exactly what we want here.
		vitalybukaUnsubmitted Not Done Reply Inline Actions @jfb This vector processing does not handle undefs correctly. I'd like to fix that with D64031 vitalybuka: @jfb This vector processing does not handle undefs correctly. I'd like to fix that with D64031
}		}

// Conceptually, we could handle things like:		// Don't try to handle the handful of other constants.
		MatzeBUnsubmitted Done Reply Inline Actions not sure how valuable this comment is :) MatzeB: not sure how valuable this comment is :)
// %a = zext i8 %X to i16
// %b = shl i16 %a, 8
// %c = or i16 %a, %b
// but until there is an example that actually needs this, it doesn't seem
// worth worrying about.
return nullptr;		return nullptr;
}		}

// This is the recursive version of BuildSubAggregate. It takes a few different		// This is the recursive version of BuildSubAggregate. It takes a few different
// arguments. Idxs is the index within the nested struct From that we are		// arguments. Idxs is the index within the nested struct From that we are
// looking at now (which is of type IndexedType). IdxSkip is the number of		// looking at now (which is of type IndexedType). IdxSkip is the number of
// indices from Idxs that should be left out when inserting into the resulting		// indices from Idxs that should be left out when inserting into the resulting
// struct. To is the result struct built so far, new insertvalue instructions		// struct. To is the result struct built so far, new insertvalue instructions
▲ Show 20 Lines • Show All 2,150 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines

/// getMemSetPatternValue - If a strided store of the specified value is safe to		/// getMemSetPatternValue - If a strided store of the specified value is safe to
/// turn into a memset_pattern16, return a ConstantArray of 16 bytes that should		/// turn into a memset_pattern16, return a ConstantArray of 16 bytes that should
/// be passed in. Otherwise, return null.		/// be passed in. Otherwise, return null.
///		///
/// Note that we don't ever attempt to use memset_pattern8 or 4, because these		/// Note that we don't ever attempt to use memset_pattern8 or 4, because these
/// just replicate their input array and then pass on to memset_pattern16.		/// just replicate their input array and then pass on to memset_pattern16.
static Constant getMemSetPatternValue(Value V, const DataLayout *DL) {		static Constant getMemSetPatternValue(Value V, const DataLayout *DL) {
		// FIXME: This could check for UndefValue because it can be merged into any
		// other valid pattern.

// If the value isn't a constant, we can't promote it to being in a constant		// If the value isn't a constant, we can't promote it to being in a constant
// array. We could theoretically do a store to an alloca or something, but		// array. We could theoretically do a store to an alloca or something, but
// that doesn't seem worthwhile.		// that doesn't seem worthwhile.
Constant *C = dyn_cast<Constant>(V);		Constant *C = dyn_cast<Constant>(V);
if (!C)		if (!C)
return nullptr;		return nullptr;

// Only handle simple values that are a power of two bytes in size.		// Only handle simple values that are a power of two bytes in size.
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	for (auto &k : IndexQueue) {
else		else
SecondPatternValue = getMemSetPatternValue(SecondStoredVal, DL);		SecondPatternValue = getMemSetPatternValue(SecondStoredVal, DL);

assert((SecondSplatValue \|\| SecondPatternValue) &&		assert((SecondSplatValue \|\| SecondPatternValue) &&
"Expected either splat value or pattern value.");		"Expected either splat value or pattern value.");

if (isConsecutiveAccess(SL[i], SL[k], DL, SE, false)) {		if (isConsecutiveAccess(SL[i], SL[k], DL, SE, false)) {
if (For == ForMemset::Yes) {		if (For == ForMemset::Yes) {
		if (isa<UndefValue>(FirstSplatValue))
		FirstSplatValue = SecondSplatValue;
if (FirstSplatValue != SecondSplatValue)		if (FirstSplatValue != SecondSplatValue)
continue;		continue;
} else {		} else {
		if (isa<UndefValue>(FirstPatternValue))
		FirstPatternValue = SecondPatternValue;
		efriedmaUnsubmitted Not Done Reply Inline Actions No test coverage for this change? efriedma: No test coverage for this change?
		jfbAuthorUnsubmitted Not Done Reply Inline Actions Indeed, right now getMemSetPatternValue doesn't check for UndefValue so there's no way to trigger this line. I'm adding it opportunistically because it's obviously correct, and I added a FIXME in getMemSetPatternValue to do this. jfb: Indeed, right now getMemSetPatternValue doesn't check for UndefValue so there's no way to…
if (FirstPatternValue != SecondPatternValue)		if (FirstPatternValue != SecondPatternValue)
continue;		continue;
}		}
Tails.insert(SL[k]);		Tails.insert(SL[k]);
Heads.insert(SL[i]);		Heads.insert(SL[i]);
ConsecutiveChain[SL[i]] = SL[k];		ConsecutiveChain[SL[i]] = SL[k];
break;		break;
}		}
▲ Show 20 Lines • Show All 1,091 Lines • Show Last 20 Lines

lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	INITIALIZE_PASS_END(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",
false, false)		false, false)

/// When scanning forward over instructions, we look for some other patterns to		/// When scanning forward over instructions, we look for some other patterns to
/// fold away. In particular, this looks for stores to neighboring locations of		/// fold away. In particular, this looks for stores to neighboring locations of
/// memory. If it sees enough consecutive ones, it attempts to merge them		/// memory. If it sees enough consecutive ones, it attempts to merge them
/// together into a memcpy/memset.		/// together into a memcpy/memset.
Instruction MemCpyOptPass::tryMergingIntoMemset(Instruction StartInst,		Instruction MemCpyOptPass::tryMergingIntoMemset(Instruction StartInst,
Value *StartPtr,		Value *StartPtr,
Value *ByteVal) {		Value *ByteVal) {
		efriedmaUnsubmitted Done Reply Inline Actions What's the point of changing this signature? It doesn't seem to have any effect. efriedma: What's the point of changing this signature? It doesn't seem to have any effect.
		jfbAuthorUnsubmitted Not Done Reply Inline Actions If ByteVal is UndefValue then it can be merged with any other valid pattern. The below change does this. The signature change allows updating ByteVal. I could instead return a pair if you want. jfb: If ByteVal is UndefValue then it can be merged with any other valid pattern. The below change…
		efriedmaUnsubmitted Not Done Reply Inline Actions Whether you use an out parameter or a pair, the value still isn't actually used by either of the callers of tryMergingIntoMemset. Looking again, I guess technically it's used in MemCpyOptPass::processStore, but only in the case where tryMergingIntoMemset fails, so I'm guessing that isn't intentional. efriedma: Whether you use an out parameter or a pair, the value still isn't actually used by either of…
		jfbAuthorUnsubmitted Not Done Reply Inline Actions You're right, changed back. jfb: You're right, changed back.
const DataLayout &DL = StartInst->getModule()->getDataLayout();		const DataLayout &DL = StartInst->getModule()->getDataLayout();

// Okay, so we now have a single store that can be splatable. Scan to find		// Okay, so we now have a single store that can be splatable. Scan to find
// all subsequent stores of the same value to offset from the same pointer.		// all subsequent stores of the same value to offset from the same pointer.
// Join these together into ranges, so we can decide whether contiguous blocks		// Join these together into ranges, so we can decide whether contiguous blocks
// are stored.		// are stored.
MemsetRanges Ranges(DL);		MemsetRanges Ranges(DL);

BasicBlock::iterator BI(StartInst);		BasicBlock::iterator BI(StartInst);
for (++BI; !BI->isTerminator(); ++BI) {		for (++BI; !BI->isTerminator(); ++BI) {
if (!isa<StoreInst>(BI) && !isa<MemSetInst>(BI)) {		if (!isa<StoreInst>(BI) && !isa<MemSetInst>(BI)) {
// If the instruction is readnone, ignore it, otherwise bail out. We		// If the instruction is readnone, ignore it, otherwise bail out. We
// don't even allow readonly here because we don't want something like:		// don't even allow readonly here because we don't want something like:
// A[1] = 2; strlen(A); A[2] = 2; -> memcpy(A, ...); strlen(A).		// A[1] = 2; strlen(A); A[2] = 2; -> memcpy(A, ...); strlen(A).
if (BI->mayWriteToMemory() \|\| BI->mayReadFromMemory())		if (BI->mayWriteToMemory() \|\| BI->mayReadFromMemory())
break;		break;
continue;		continue;
}		}

if (StoreInst *NextStore = dyn_cast<StoreInst>(BI)) {		if (StoreInst *NextStore = dyn_cast<StoreInst>(BI)) {
// If this is a store, see if we can merge it in.		// If this is a store, see if we can merge it in.
if (!NextStore->isSimple()) break;		if (!NextStore->isSimple()) break;

// Check to see if this stored value is of the same byte-splattable value.		// Check to see if this stored value is of the same byte-splattable value.
if (ByteVal != isBytewiseValue(NextStore->getOperand(0)))		Value *StoredByte = isBytewiseValue(NextStore->getOperand(0));
		if (isa<UndefValue>(ByteVal) && StoredByte)
		ByteVal = StoredByte;
		if (ByteVal != StoredByte)
break;		break;

// Check to see if this store is to a constant offset from the start ptr.		// Check to see if this store is to a constant offset from the start ptr.
int64_t Offset;		int64_t Offset;
if (!IsPointerOffset(StartPtr, NextStore->getPointerOperand(), Offset,		if (!IsPointerOffset(StartPtr, NextStore->getPointerOperand(), Offset,
DL))		DL))
break;		break;

▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
// and memset. Right now we only handle memset.		// and memset. Right now we only handle memset.

// Ensure that the value being stored is something that can be memset'able a		// Ensure that the value being stored is something that can be memset'able a
// byte at a time like "0" or "-1" or any width, as well as things like		// byte at a time like "0" or "-1" or any width, as well as things like
// 0xA0A0A0A0 and 0.0.		// 0xA0A0A0A0 and 0.0.
auto *V = SI->getOperand(0);		auto *V = SI->getOperand(0);
if (Value *ByteVal = isBytewiseValue(V)) {		if (Value *ByteVal = isBytewiseValue(V)) {
if (Instruction *I = tryMergingIntoMemset(SI, SI->getPointerOperand(),		if (Instruction *I = tryMergingIntoMemset(SI, SI->getPointerOperand(),
ByteVal)) {		ByteVal)) {
BBI = I->getIterator(); // Don't invalidate iterator.		BBI = I->getIterator(); // Don't invalidate iterator.
		efriedmaUnsubmitted Done Reply Inline Actions Unnecessary change. efriedma: Unnecessary change.
return true;		return true;
}		}

// If we have an aggregate, we try to promote it to memset regardless		// If we have an aggregate, we try to promote it to memset regardless
// of opportunity for merging as it can expose optimization opportunities		// of opportunity for merging as it can expose optimization opportunities
// in subsequent passes.		// in subsequent passes.
auto *T = V->getType();		auto *T = V->getType();
if (T->isAggregateType()) {		if (T->isAggregateType()) {
Show All 22 Lines

bool MemCpyOptPass::processMemSet(MemSetInst *MSI, BasicBlock::iterator &BBI) {		bool MemCpyOptPass::processMemSet(MemSetInst *MSI, BasicBlock::iterator &BBI) {
// See if there is another memset or store neighboring this memset which		// See if there is another memset or store neighboring this memset which
// allows us to widen out the memset to do a single larger store.		// allows us to widen out the memset to do a single larger store.
if (isa<ConstantInt>(MSI->getLength()) && !MSI->isVolatile())		if (isa<ConstantInt>(MSI->getLength()) && !MSI->isVolatile())
if (Instruction *I = tryMergingIntoMemset(MSI, MSI->getDest(),		if (Instruction *I = tryMergingIntoMemset(MSI, MSI->getDest(),
MSI->getValue())) {		MSI->getValue())) {
BBI = I->getIterator(); // Don't invalidate iterator.		BBI = I->getIterator(); // Don't invalidate iterator.
return true;		return true;
		efriedmaUnsubmitted Done Reply Inline Actions Unnecessary change. efriedma: Unnecessary change.
}		}
return false;		return false;
}		}

/// Takes a memcpy and a call that it depends on,		/// Takes a memcpy and a call that it depends on,
/// and checks for the possibility of a call slot optimization by having		/// and checks for the possibility of a call slot optimization by having
/// the call write its result directly into the destination of the memcpy.		/// the call write its result directly into the destination of the memcpy.
bool MemCpyOptPass::performCallSlotOptzn(Instruction cpy, Value cpyDest,		bool MemCpyOptPass::performCallSlotOptzn(Instruction cpy, Value cpyDest,
▲ Show 20 Lines • Show All 708 Lines • Show Last 20 Lines

test/Transforms/MemCpyOpt/fca2memcpy.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	%1 = load %S, %S* %src			%1 = load %S, %S* %src
	%2 = load %S, %S* %src			%2 = load %S, %S* %src
	store %S %1, %S* %dst			store %S %1, %S* %dst
	store %S %2, %S* %dst			store %S %2, %S* %dst
	ret void			ret void
	}			}

	; If the store address is computed ina complex manner, make			; If the store address is computed in a complex manner, make
	; sure we lift the computation as well if needed and possible.			; sure we lift the computation as well if needed and possible.
	define void @addrproducer(%S* %src, %S* %dst) {			define void @addrproducer(%S* %src, %S* %dst) {
	; CHECK-LABEL: addrproducer			; CHECK-LABEL: addrproducer(
	; CHECK: %dst2 = getelementptr %S, %S* %dst, i64 1			; CHECK-NEXT: %[[DSTCAST:[0-9]+]] = bitcast %S* %dst to i8*
	; CHECK: call void @llvm.memmove.p0i8.p0i8.i64			; CHECK-NEXT: %dst2 = getelementptr %S, %S* %dst, i64 1
	; CHECK-NEXT: store %S undef, %S* %dst			; CHECK-NEXT: %[[DST2CAST:[0-9]+]] = bitcast %S* %dst2 to i8*
				; CHECK-NEXT: %[[SRCCAST:[0-9]+]] = bitcast %S* %src to i8*
				; CHECK-NEXT: call void @llvm.memmove.p0i8.p0i8.i64(i8* align 8 %[[DST2CAST]], i8* align 8 %[[SRCCAST]], i64 16, i1 false)
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 8 %[[DSTCAST]], i8 undef, i64 16, i1 false)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	%1 = load %S, %S* %src			%1 = load %S, %S* %src
	store %S undef, %S* %dst			store %S undef, %S* %dst
	%dst2 = getelementptr %S , %S* %dst, i64 1			%dst2 = getelementptr %S , %S* %dst, i64 1
	store %S %1, %S* %dst2			store %S %1, %S* %dst2
	ret void			ret void
	}			}

	define void @aliasaddrproducer(%S* %src, %S* %dst, i32* %dstidptr) {			define void @aliasaddrproducer(%S* %src, %S* %dst, i32* %dstidptr) {
	; CHECK-LABEL: aliasaddrproducer			; CHECK-LABEL: aliasaddrproducer(
				; CHECK-NEXT: %[[SRC:[0-9]+]] = load %S, %S* %src
				; CHECK-NEXT: %[[DSTCAST:[0-9]+]] = bitcast %S* %dst to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 8 %[[DSTCAST]], i8 undef, i64 16, i1 false)
				; CHECK-NEXT: %dstindex = load i32, i32* %dstidptr
				; CHECK-NEXT: %dst2 = getelementptr %S, %S* %dst, i32 %dstindex
				; CHECK-NEXT: store %S %[[SRC]], %S* %dst2
				; CHECK-NEXT: ret void
	%1 = load %S, %S* %src			%1 = load %S, %S* %src
	store %S undef, %S* %dst			store %S undef, %S* %dst
	%dstindex = load i32, i32* %dstidptr			%dstindex = load i32, i32* %dstidptr
	%dst2 = getelementptr %S , %S* %dst, i32 %dstindex			%dst2 = getelementptr %S , %S* %dst, i32 %dstindex
	store %S %1, %S* %dst2			store %S %1, %S* %dst2
	ret void			ret void
	}			}

	define void @noaliasaddrproducer(%S* %src, %S* noalias %dst, i32* noalias %dstidptr) {			define void @noaliasaddrproducer(%S* %src, %S* noalias %dst, i32* noalias %dstidptr) {
	; CHECK-LABEL: noaliasaddrproducer			; CHECK-LABEL: noaliasaddrproducer(
				; CHECK-NEXT: %[[SRCCAST:[0-9]+]] = bitcast %S* %src to i8*
				; CHECK-NEXT: %[[LOADED:[0-9]+]] = load i32, i32* %dstidptr
				; CHECK-NEXT: %dstindex = or i32 %[[LOADED]], 1
				; CHECK-NEXT: %dst2 = getelementptr %S, %S* %dst, i32 %dstindex
				; CHECK-NEXT: %[[DST2CAST:[0-9]+]] = bitcast %S* %dst2 to i8*
				; CHECK-NEXT: %[[SRCCAST2:[0-9]+]] = bitcast %S* %src to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %[[DST2CAST]], i8* align 8 %[[SRCCAST2]], i64 16, i1 false)
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 8 %[[SRCCAST]], i8 undef, i64 16, i1 false)
				; CHECK-NEXT: ret void
	%1 = load %S, %S* %src			%1 = load %S, %S* %src
	store %S undef, %S* %src			store %S undef, %S* %src
	%2 = load i32, i32* %dstidptr			%2 = load i32, i32* %dstidptr
	%dstindex = or i32 %2, 1			%dstindex = or i32 %2, 1
	%dst2 = getelementptr %S , %S* %dst, i32 %dstindex			%dst2 = getelementptr %S , %S* %dst, i32 %dstindex
	store %S %1, %S* %dst2			store %S %1, %S* %dst2
	ret void			ret void
	}			}