This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
2/3
CGDecl.cpp
-
test/CodeGen/
-
CodeGen/
1/1
init.c

Differential D49771

CodeGen: use non-zero memset when possible for automatic variables
ClosedPublic

Authored by jfb on Jul 24 2018, 5:16 PM.

Download Raw Diff

Details

Reviewers

dexonsmith
bogner

Commits

rG6508929da9b2: CodeGen: use non-zero memset when possible for automatic variables
rC337887: CodeGen: use non-zero memset when possible for automatic variables
rL337887: CodeGen: use non-zero memset when possible for automatic variables

Summary

Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.

This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.

rdar://problem/42563091

Diff Detail

Repository

rC Clang

Build Status

Buildable 20679
Build 20679: arc lint + arc unit

Event Timeline

jfb created this revision.Jul 24 2018, 5:16 PM

Harbormaster completed remote builds in B20679: Diff 157176.Jul 24 2018, 5:16 PM

Herald added a subscriber: cfe-commits. · View Herald TranscriptJul 24 2018, 5:16 PM

• Quuxplusone added a subscriber: • Quuxplusone.Jul 24 2018, 5:40 PM

• Quuxplusone added inline comments.

test/CodeGen/init.c
202	Drive-by suggestion: If you make this `struct S { union U u; short s; };` then you'll also be testing the case of "padding between struct fields", which is otherwise untested here.

Seems straightforward and correct to me.

lib/CodeGen/CGDecl.cpp
956–957	Probably makes sense to swap the order of these or give the enum class a smaller underlying type than int.
996–998	Very much a nitpick, but this would be slightly easier to follow written in the order without a negation.

This revision is now accepted and ready to land.Jul 24 2018, 5:52 PM

Use short to test padding between array elements.
Define enum class storage type; swap order of if / else to make it more readable.

Addressed all comments.

lib/CodeGen/CGDecl.cpp
956–957	I defined the enum class' storage type as `uint8_t`.

Closed by commit rL337887: CodeGen: use non-zero memset when possible for automatic variables (authored by jfb). · Explain WhyJul 24 2018, 9:30 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 24 2018, 9:30 PM

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

In D49771#1181008, @mehdi_amini wrote:

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

In D49771#1183380, @jfb wrote:

In D49771#1181008, @mehdi_amini wrote:

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

I'm not asking where is it easier to do, but where does it make the most sense :)
Doing such in LLVM in general means catching more patterns (i.e. after inlining, etc.), and also catching it from multiple frontend. So in general I'm worried when I see optimizations implemented in the frontend instead of the middle end.

In D49771#1183562, @mehdi_amini wrote:

In D49771#1183380, @jfb wrote:

In D49771#1181008, @mehdi_amini wrote:

I'm curious: isn't the kind of optimization we should expect LLVM to provide?

Maybe? It seems obvious to do here since we know we'll probably want to be doing it, and I have another patch I'm working on which will make it that much more obviously useful to have here. The middle-end can definitely figure it out but it just seems like more work, later, so in the meantime we'd be looking at more stuff.

I'm not asking where is it easier to do, but where does it make the most sense :)

What I mean by "easy" is: we know we're likely to want this type of code, there's not much pattern recognition needed on our part here. Were we to wait we'd need to do more work. I believe this statement will become truer over time.

Doing such in LLVM in general means catching more patterns (i.e. after inlining, etc.), and also catching it from multiple frontend. So in general I'm worried when I see optimizations implemented in the frontend instead of the middle end.

Agreed, LLVM could also do it, and it would likely be useful to do so. I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

Isn't a strong design component of clang/LLVM? Clang does not try to generate "smart" code and leave it up to LLVM to clean it up.

In D49771#1183641, @mehdi_amini wrote:

I'm worried, however, about generating a bunch more code than needed from clang in the hopes that the compiler will clean it up later.

Isn't a strong design component of clang/LLVM? Clang does not try to generate "smart" code and leave it up to LLVM to clean it up.

The code around this one, and lack of code in LLVM, seem to disagree. :-)

There are two different considerations here:
(1) Create less target code
(2) Create less IR

If this code can significantly reduce the amount of IR, it can be useful in general. That's why the existing memset logic is helpful.

jfb mentioned this in D51751: Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue.Sep 7 2018, 4:08 PM

Revision Contents

Path

Size

lib/

CodeGen/

CGDecl.cpp

167 lines

test/

CodeGen/

init.c

66 lines

Diff 157176

lib/CodeGen/CGDecl.cpp

Show First 20 Lines • Show All 942 Lines • ▼ Show 20 Lines	static bool shouldUseBZeroPlusStoresToInitialize(llvm::Constant *Init,
// plopping in more stores.		// plopping in more stores.
unsigned StoreBudget = 6;		unsigned StoreBudget = 6;
uint64_t SizeLimit = 32;		uint64_t SizeLimit = 32;

return GlobalSize > SizeLimit &&		return GlobalSize > SizeLimit &&
canEmitInitWithFewStoresAfterBZero(Init, StoreBudget);		canEmitInitWithFewStoresAfterBZero(Init, StoreBudget);
}		}

		/// A byte pattern.
		///
		/// Can be "any" pattern if the value was padding or known to be undef.
		/// Can be "none" pattern if a sequence doesn't exist.
		class BytePattern {
		uint8_t Val;
		enum class ValueType { Specific, Any, None } Type;
		bognerUnsubmitted Done Reply Inline Actions Probably makes sense to swap the order of these or give the enum class a smaller underlying type than int. bogner: Probably makes sense to swap the order of these or give the enum class a smaller underlying…
		jfbAuthorUnsubmitted Not Done Reply Inline Actions I defined the enum class' storage type as `uint8_t`. jfb: I defined the enum class' storage type as `uint8_t`.
		BytePattern(ValueType Type) : Type(Type) {}

		public:
		BytePattern(uint8_t Value) : Val(Value), Type(ValueType::Specific) {}
		static BytePattern Any() { return BytePattern(ValueType::Any); }
		static BytePattern None() { return BytePattern(ValueType::None); }
		bool isAny() const { return Type == ValueType::Any; }
		bool isNone() const { return Type == ValueType::None; }
		bool isValued() const { return Type == ValueType::Specific; }
		uint8_t getValue() const {
		assert(isValued());
		return Val;
		}
		BytePattern merge(const BytePattern Other) const {
		if (isNone() \|\| Other.isNone())
		return None();
		if (isAny())
		return Other;
		if (Other.isAny())
		return *this;
		if (getValue() == Other.getValue())
		return *this;
		return None();
		}
		};

		/// Figures out whether the constant can be initialized with memset.
		static BytePattern constantIsRepeatedBytePattern(llvm::Constant *C) {
		if (isa<llvm::ConstantAggregateZero>(C) \|\| isa<llvm::ConstantPointerNull>(C))
		return BytePattern(0x00);
		if (isa<llvm::UndefValue>(C))
		return BytePattern::Any();

		if (isa<llvm::ConstantInt>(C)) {
		auto *Int = cast<llvm::ConstantInt>(C);
		if (Int->getBitWidth() % 8 != 0)
		return BytePattern::None();
		const llvm::APInt &Value = Int->getValue();
		if (!Value.isSplat(8))
		return BytePattern::None();
		return BytePattern(Value.getLoBits(8).getLimitedValue());
		bognerUnsubmitted Done Reply Inline Actions Very much a nitpick, but this would be slightly easier to follow written in the order without a negation. bogner: Very much a nitpick, but this would be slightly easier to follow written in the order without a…
		}

		if (isa<llvm::ConstantFP>(C)) {
		auto *FP = cast<llvm::ConstantFP>(C);
		llvm::APInt Bits = FP->getValueAPF().bitcastToAPInt();
		if (Bits.getBitWidth() % 8 != 0)
		return BytePattern::None();
		if (!Bits.isSplat(8))
		return BytePattern::None();
		return BytePattern(Bits.getLimitedValue() & 0xFF);
		}

		if (isa<llvm::ConstantVector>(C)) {
		llvm::Constant *Splat = cast<llvm::ConstantVector>(C)->getSplatValue();
		if (Splat)
		return constantIsRepeatedBytePattern(Splat);
		return BytePattern::None();
		}

		if (isa<llvm::ConstantArray>(C) \|\| isa<llvm::ConstantStruct>(C)) {
		BytePattern Pattern(BytePattern::Any());
		for (unsigned I = 0, E = C->getNumOperands(); I != E; ++I) {
		llvm::Constant *Elt = cast<llvm::Constant>(C->getOperand(I));
		Pattern = Pattern.merge(constantIsRepeatedBytePattern(Elt));
		if (Pattern.isNone())
		return Pattern;
		}
		return Pattern;
		}

		if (llvm::ConstantDataSequential *CDS =
		dyn_cast<llvm::ConstantDataSequential>(C)) {
		BytePattern Pattern(BytePattern::Any());
		for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
		llvm::Constant *Elt = CDS->getElementAsConstant(I);
		Pattern = Pattern.merge(constantIsRepeatedBytePattern(Elt));
		if (Pattern.isNone())
		return Pattern;
		}
		return Pattern;
		}

		// BlockAddress, ConstantExpr, and everything else is scary.
		return BytePattern::None();
		}

		/// Decide whether we should use memset to initialize a local variable instead
		/// of using a memcpy from a constant global. Assumes we've already decided to
		/// not user bzero.
		/// FIXME We could be more clever, as we are for bzero above, and generate
		/// memset followed by stores. It's unclear that's worth the effort.
		static BytePattern shouldUseMemSetToInitialize(llvm::Constant *Init,
		uint64_t GlobalSize) {
		uint64_t SizeLimit = 32;
		if (GlobalSize <= SizeLimit)
		return BytePattern::None();
		return constantIsRepeatedBytePattern(Init);
		}

/// EmitAutoVarDecl - Emit code and set up an entry in LocalDeclMap for a		/// EmitAutoVarDecl - Emit code and set up an entry in LocalDeclMap for a
/// variable declaration with auto, register, or no storage class specifier.		/// variable declaration with auto, register, or no storage class specifier.
/// These turn into simple stack objects, or GlobalValues depending on target.		/// These turn into simple stack objects, or GlobalValues depending on target.
void CodeGenFunction::EmitAutoVarDecl(const VarDecl &D) {		void CodeGenFunction::EmitAutoVarDecl(const VarDecl &D) {
AutoVarEmission emission = EmitAutoVarAlloca(D);		AutoVarEmission emission = EmitAutoVarAlloca(D);
EmitAutoVarInit(emission);		EmitAutoVarInit(emission);
EmitAutoVarCleanups(emission);		EmitAutoVarCleanups(emission);
}		}
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
llvm::Value *SizeVal =		llvm::Value *SizeVal =
llvm::ConstantInt::get(IntPtrTy,		llvm::ConstantInt::get(IntPtrTy,
getContext().getTypeSizeInChars(type).getQuantity());		getContext().getTypeSizeInChars(type).getQuantity());

llvm::Type *BP = CGM.Int8Ty->getPointerTo(Loc.getAddressSpace());		llvm::Type *BP = CGM.Int8Ty->getPointerTo(Loc.getAddressSpace());
if (Loc.getType() != BP)		if (Loc.getType() != BP)
Loc = Builder.CreateBitCast(Loc, BP);		Loc = Builder.CreateBitCast(Loc, BP);

// If the initializer is all or mostly zeros, codegen with bzero then do a		// If the initializer is all or mostly the same, codegen with bzero / memset
// few stores afterward.		// then do a few stores afterward.
if (shouldUseBZeroPlusStoresToInitialize(		uint64_t ConstantSize =
constant,		CGM.getDataLayout().getTypeAllocSize(constant->getType());
CGM.getDataLayout().getTypeAllocSize(constant->getType()))) {		if (shouldUseBZeroPlusStoresToInitialize(constant, ConstantSize)) {
Builder.CreateMemSet(Loc, llvm::ConstantInt::get(Int8Ty, 0), SizeVal,		Builder.CreateMemSet(Loc, llvm::ConstantInt::get(Int8Ty, 0), SizeVal,
isVolatile);		isVolatile);
// Zero and undef don't require a stores.		// Zero and undef don't require a stores.
if (!constant->isNullValue() && !isa<llvm::UndefValue>(constant)) {		if (!constant->isNullValue() && !isa<llvm::UndefValue>(constant)) {
Loc = Builder.CreateBitCast(Loc,		Loc = Builder.CreateBitCast(Loc,
constant->getType()->getPointerTo(Loc.getAddressSpace()));		constant->getType()->getPointerTo(Loc.getAddressSpace()));
emitStoresForInitAfterBZero(CGM, constant, Loc, isVolatile, Builder);		emitStoresForInitAfterBZero(CGM, constant, Loc, isVolatile, Builder);
}		}
} else {		return;
		}

		BytePattern Pattern = shouldUseMemSetToInitialize(constant, ConstantSize);
		if (!Pattern.isNone()) {
		uint8_t Value = Pattern.isAny() ? 0x00 : Pattern.getValue();
		Builder.CreateMemSet(Loc, llvm::ConstantInt::get(Int8Ty, Value), SizeVal,
		isVolatile);
		return;
		}

// Otherwise, create a temporary global with the initializer then		// Otherwise, create a temporary global with the initializer then
// memcpy from the global to the alloca.		// memcpy from the global to the alloca.
std::string Name = getStaticDeclName(CGM, D);		std::string Name = getStaticDeclName(CGM, D);
unsigned AS = CGM.getContext().getTargetAddressSpace(		unsigned AS = CGM.getContext().getTargetAddressSpace(
CGM.getStringLiteralAddressSpace());		CGM.getStringLiteralAddressSpace());
BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS);		BP = llvm::PointerType::getInt8PtrTy(getLLVMContext(), AS);

llvm::GlobalVariable *GV =		llvm::GlobalVariable *GV = new llvm::GlobalVariable(
new llvm::GlobalVariable(CGM.getModule(), constant->getType(), true,		CGM.getModule(), constant->getType(), true,
llvm::GlobalValue::PrivateLinkage,		llvm::GlobalValue::PrivateLinkage, constant, Name, nullptr,
constant, Name, nullptr,
llvm::GlobalValue::NotThreadLocal, AS);		llvm::GlobalValue::NotThreadLocal, AS);
GV->setAlignment(Loc.getAlignment().getQuantity());		GV->setAlignment(Loc.getAlignment().getQuantity());
GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);		GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);

Address SrcPtr = Address(GV, Loc.getAlignment());		Address SrcPtr = Address(GV, Loc.getAlignment());
if (SrcPtr.getType() != BP)		if (SrcPtr.getType() != BP)
SrcPtr = Builder.CreateBitCast(SrcPtr, BP);		SrcPtr = Builder.CreateBitCast(SrcPtr, BP);

Builder.CreateMemCpy(Loc, SrcPtr, SizeVal, isVolatile);		Builder.CreateMemCpy(Loc, SrcPtr, SizeVal, isVolatile);
}		}
}

/// Emit an expression as an initializer for an object (variable, field, etc.)		/// Emit an expression as an initializer for an object (variable, field, etc.)
/// at the given location. The expression is not necessarily the normal		/// at the given location. The expression is not necessarily the normal
/// initializer for the object, and the address is not necessarily		/// initializer for the object, and the address is not necessarily
/// its normal location.		/// its normal location.
///		///
/// \param init the initializing expression		/// \param init the initializing expression
/// \param D the object to act as if we're initializing		/// \param D the object to act as if we're initializing
▲ Show 20 Lines • Show All 658 Lines • Show Last 20 Lines

test/CodeGen/init.c

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	void test10(int X) {
bar(&S);		bar(&S);

// CHECK-LABEL: @test10(		// CHECK-LABEL: @test10(
// CHECK: call void @llvm.memset		// CHECK: call void @llvm.memset
// CHECK-NOT: store i32 0		// CHECK-NOT: store i32 0
// CHECK: call void @bar		// CHECK: call void @bar
}		}

		void nonzeroMemseti8() {
		char arr[33] = { 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, };
		// CHECK-LABEL: @nonzeroMemseti8(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 42, i32 33, i1 false)
		}

		void nonzeroMemseti16() {
		unsigned short arr[17] = { 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, 0x4242, };
		// CHECK-LABEL: @nonzeroMemseti16(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 66, i32 34, i1 false)
		}

		void nonzeroMemseti32() {
		unsigned arr[9] = { 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, 0xF0F0F0F0, };
		// CHECK-LABEL: @nonzeroMemseti32(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
		}

		void nonzeroMemseti64() {
		unsigned long long arr[7] = { 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, 0xAAAAAAAAAAAAAAAA, };
		// CHECK-LABEL: @nonzeroMemseti64(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -86, i32 56, i1 false)
		}

		void nonzeroMemsetf32() {
		float arr[9] = { 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, 0x1.cacacap+75, };
		// CHECK-LABEL: @nonzeroMemsetf32(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 101, i32 36, i1 false)
		}

		void nonzeroMemsetf64() {
		double arr[7] = { 0x1.4444444444444p+69, 0x1.4444444444444p+69, 0x1.4444444444444p+69, 0x1.4444444444444p+69, 0x1.4444444444444p+69, 0x1.4444444444444p+69, 0x1.4444444444444p+69, };
		// CHECK-LABEL: @nonzeroMemsetf64(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 68, i32 56, i1 false)
		}

		void nonzeroPaddedUnionMemset() {
		union U { char c; int i; };
		union U arr[9] = { 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, 0xF0, };
		// CHECK-LABEL: @nonzeroPaddedUnionMemset(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 36, i1 false)
		}

		void nonzeroNestedMemset() {
		union U { char c; int i; };
		struct S { union U u; int i; };
		QuuxplusoneUnsubmitted Done Reply Inline Actions Drive-by suggestion: If you make this `struct S { union U u; short s; };` then you'll also be testing the case of "padding between struct fields", which is otherwise untested here. Quuxplusone: Drive-by suggestion: If you make this `struct S { union U u; short s; };` then you'll also be…
		struct S arr[5] = { { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, { {0xF0}, 0xF0F0F0F0 }, };
		// CHECK-LABEL: @nonzeroNestedMemset(
		// CHECK-NOT: store
		// CHECK-NOT: memcpy
		// CHECK: call void @llvm.memset.p0i8.i32(i8* {{.*}}, i8 -16, i32 40, i1 false)
		}

// PR9257		// PR9257
struct test11S {		struct test11S {
int A[10];		int A[10];
};		};
void test11(struct test11S *P) {		void test11(struct test11S *P) {
*P = (struct test11S) { .A = { [0 ... 3] = 4 } };		*P = (struct test11S) { .A = { [0 ... 3] = 4 } };
// CHECK-LABEL: @test11(		// CHECK-LABEL: @test11(
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines