This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] Generalize malloc-to-global for any allocation function
ClosedPublic

Authored by reames on Jan 17 2022, 9:58 AM.

Download Raw Diff

Details

Reviewers

nikic
lifted
Bryce-MW
durin42

Commits

rG26049b8ce376: [GlobalOpt] Generalize malloc-to-global for any allocation function

Summary

We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value.

One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jan 17 2022, 9:58 AM

Herald added subscribers: ormris, bollu, hiraditya, mcrosier. · View Herald TranscriptJan 17 2022, 9:58 AM

reames requested review of this revision.Jan 17 2022, 9:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 9:58 AM

A couple of side notes:

With both the new and old code, we might be loosing some ability to prune loads from uninitialized state of the allocation. We could consider using lifetime markers at the point of the original allocation to preserve this. I'm a little leery of introducing them though as their semantics on globals are a bit vague. Thankfully, I think this is a minor opt quality issue at worst, and I don't have any motivating examples to justify exploring this further.

It would be interesting to explore whether we could prove the memory was allocated once. This might actually be easier for a nullable global than a non-null one. (e.g. for the idiomatic pattern if (!g) g = malloc()) I don't have a strong motivating example for this, but it seems like we're leaving something on the floor here. If we do the init once thing, we also need to be careful about data section size and profitability. What's profitable for undef and zero (e.g. effecting bss size), and what's profitably for other init constants (e.g. effecting .data sizes) might not be the same - i.e. we have to account for load time initialization costs.

Harbormaster completed remote builds in B143836: Diff 400598.Jan 17 2022, 10:44 AM

LGTM

In D117503#3249054, @reames wrote:

A couple of side notes:

With both the new and old code, we might be loosing some ability to prune loads from uninitialized state of the allocation. We could consider using lifetime markers at the point of the original allocation to preserve this. I'm a little leery of introducing them though as their semantics on globals are a bit vague. Thankfully, I think this is a minor opt quality issue at worst, and I don't have any motivating examples to justify exploring this further.

Yeah, definitely don't want lifetime intrinsics on globals!

It would be interesting to explore whether we could prove the memory was allocated once. This might actually be easier for a nullable global than a non-null one. (e.g. for the idiomatic pattern if (!g) g = malloc()) I don't have a strong motivating example for this, but it seems like we're leaving something on the floor here. If we do the init once thing, we also need to be careful about data section size and profitability. What's profitable for undef and zero (e.g. effecting bss size), and what's profitably for other init constants (e.g. effecting .data sizes) might not be the same - i.e. we have to account for load time initialization costs.

This seems like a more general optimization -- what we currently do is optimize stored-once globals (one store, multiple reads) and arbitrary initialization in global ctors. We could try to determine that some initialization code can only be reached once (e.g. by detecting a "global initialization flag" pattern) and then treat that like global ctor initialization. Don't think this is worth bothering with at this point though.

llvm/test/Transforms/GlobalOpt/calloc-promote.ll
3	Duplicate run line

This revision is now accepted and ready to land.Jan 17 2022, 12:58 PM

reames mentioned this in rG30715365d45c: [test] precommit new test for D117503.Jan 17 2022, 3:01 PM

This revision was landed with ongoing or failed builds.Jan 17 2022, 3:07 PM

Closed by commit rG26049b8ce376: [GlobalOpt] Generalize malloc-to-global for any allocation function (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG26049b8ce376: [GlobalOpt] Generalize malloc-to-global for any allocation function.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

GlobalOpt.cpp

65 lines

test/

Transforms/

GlobalOpt/

calloc-promote.ll

16 lines

Diff 400653

llvm/lib/Transforms/IPO/GlobalOpt.cpp

Show First 20 Lines • Show All 611 Lines • ▼ Show 20 Lines	if (isa<LoadInst>(U)) {
isa<LoadInst>(U->getOperand(0)) &&		isa<LoadInst>(U->getOperand(0)) &&
isa<ConstantPointerNull>(U->getOperand(1))) {		isa<ConstantPointerNull>(U->getOperand(1))) {
assert(isa<GlobalValue>(cast<LoadInst>(U->getOperand(0))		assert(isa<GlobalValue>(cast<LoadInst>(U->getOperand(0))
->getPointerOperand()		->getPointerOperand()
->stripPointerCasts()) &&		->stripPointerCasts()) &&
"Should be GlobalVariable");		"Should be GlobalVariable");
// This and only this kind of non-signed ICmpInst is to be replaced with		// This and only this kind of non-signed ICmpInst is to be replaced with
// the comparing of the value of the created global init bool later in		// the comparing of the value of the created global init bool later in
// optimizeGlobalAddressOfMalloc for the global variable.		// optimizeGlobalAddressOfAllocation for the global variable.
} else {		} else {
//cerr << "NONTRAPPING USE: " << *U;		//cerr << "NONTRAPPING USE: " << *U;
return false;		return false;
}		}
}		}
return true;		return true;
}		}

▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
}		}

/// This function takes the specified global variable, and transforms the		/// This function takes the specified global variable, and transforms the
/// program as if it always contained the result of the specified malloc.		/// program as if it always contained the result of the specified malloc.
/// Because it is always the result of the specified malloc, there is no reason		/// Because it is always the result of the specified malloc, there is no reason
/// to actually DO the malloc. Instead, turn the malloc into a global, and any		/// to actually DO the malloc. Instead, turn the malloc into a global, and any
/// loads of GV as uses of the new global.		/// loads of GV as uses of the new global.
static GlobalVariable *		static GlobalVariable *
OptimizeGlobalAddressOfMalloc(GlobalVariable GV, CallInst CI,		OptimizeGlobalAddressOfAllocation(GlobalVariable GV, CallInst CI,
uint64_t AllocSize, const DataLayout &DL,		uint64_t AllocSize, Constant *InitVal,
		const DataLayout &DL,
TargetLibraryInfo *TLI) {		TargetLibraryInfo *TLI) {
LLVM_DEBUG(errs() << "PROMOTING GLOBAL: " << GV << " CALL = " << CI		LLVM_DEBUG(errs() << "PROMOTING GLOBAL: " << GV << " CALL = " << CI
<< '\n');		<< '\n');

// Create global of type [AllocSize x i8].		// Create global of type [AllocSize x i8].
Type *GlobalType = ArrayType::get(Type::getInt8Ty(GV->getContext()),		Type *GlobalType = ArrayType::get(Type::getInt8Ty(GV->getContext()),
AllocSize);		AllocSize);

// Create the new global variable. The contents of the malloc'd memory is		// Create the new global variable. The contents of the allocated memory is
// undefined, so initialize with an undef value.		// undefined initially, so initialize with an undef value.
GlobalVariable *NewGV = new GlobalVariable(		GlobalVariable *NewGV = new GlobalVariable(
*GV->getParent(), GlobalType, false, GlobalValue::InternalLinkage,		*GV->getParent(), GlobalType, false, GlobalValue::InternalLinkage,
UndefValue::get(GlobalType), GV->getName() + ".body", nullptr,		UndefValue::get(GlobalType), GV->getName() + ".body", nullptr,
GV->getThreadLocalMode());		GV->getThreadLocalMode());

// If there are bitcast users of the malloc (which is typical, usually we have		// Initialize the global at the point of the original call. Note that this
// a malloc + bitcast) then replace them with uses of the new global. Update		// is a different point from the initialization referred to below for the
// other users to use the global as well.		// nullability handling. Sublety: We have not proven the original global was
		// only initialized once. As such, we can not fold this into the initializer
		// of the new global as may need to re-init the storage multiple times.
		if (!isa<UndefValue>(InitVal)) {
		IRBuilder<> Builder(CI->getNextNode());
		// TODO: Use alignment above if align!=1
		Builder.CreateMemSet(NewGV, InitVal, AllocSize, None);
		}

		// Update users of the allocation to use the new global instead.
BitCastInst *TheBC = nullptr;		BitCastInst *TheBC = nullptr;
while (!CI->use_empty()) {		while (!CI->use_empty()) {
Instruction *User = cast<Instruction>(CI->user_back());		Instruction *User = cast<Instruction>(CI->user_back());
if (BitCastInst *BCI = dyn_cast<BitCastInst>(User)) {		if (BitCastInst *BCI = dyn_cast<BitCastInst>(User)) {
if (BCI->getType() == NewGV->getType()) {		if (BCI->getType() == NewGV->getType()) {
BCI->replaceAllUsesWith(NewGV);		BCI->replaceAllUsesWith(NewGV);
BCI->eraseFromParent();		BCI->eraseFromParent();
} else {		} else {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	OptimizeGlobalAddressOfAllocation(GlobalVariable GV, CallInst CI,
// If the initialization boolean was used, insert it, otherwise delete it.		// If the initialization boolean was used, insert it, otherwise delete it.
if (!InitBoolUsed) {		if (!InitBoolUsed) {
while (!InitBool->use_empty()) // Delete initializations		while (!InitBool->use_empty()) // Delete initializations
cast<StoreInst>(InitBool->user_back())->eraseFromParent();		cast<StoreInst>(InitBool->user_back())->eraseFromParent();
delete InitBool;		delete InitBool;
} else		} else
GV->getParent()->getGlobalList().insert(GV->getIterator(), InitBool);		GV->getParent()->getGlobalList().insert(GV->getIterator(), InitBool);

// Now the GV is dead, nuke it and the malloc..		// Now the GV is dead, nuke it and the allocation..
GV->eraseFromParent();		GV->eraseFromParent();
CI->eraseFromParent();		CI->eraseFromParent();

// To further other optimizations, loop over all users of NewGV and try to		// To further other optimizations, loop over all users of NewGV and try to
// constant prop them. This will promote GEP instructions with constant		// constant prop them. This will promote GEP instructions with constant
// indices into GEP constant-exprs, which will allow global-opt to hack on it.		// indices into GEP constant-exprs, which will allow global-opt to hack on it.
for (auto *CE : RepValues)		for (auto *CE : RepValues)
ConstantPropUsersOf(CE, DL, TLI);		ConstantPropUsersOf(CE, DL, TLI);
Show All 40 Lines	for (const Use &VUse : V->uses()) {

return false;		return false;
}		}
}		}

return true;		return true;
}		}

/// If we have a global that is only initialized with a fixed size malloc,		/// If we have a global that is only initialized with a fixed size allocation
/// transform the program to use global memory instead of malloc'd memory.		/// try to transform the program to use global memory instead of heap
/// This eliminates dynamic allocation, avoids an indirection accessing the		/// allocated memory. This eliminates dynamic allocation, avoids an indirection
/// data, and exposes the resultant global to further GlobalOpt.		/// accessing the data, and exposes the resultant global to further GlobalOpt.
static bool tryToOptimizeStoreOfMallocToGlobal(GlobalVariable GV, CallInst CI,		static bool tryToOptimizeStoreOfAllocationToGlobal(GlobalVariable *GV,
		CallInst *CI,
AtomicOrdering Ordering,		AtomicOrdering Ordering,
const DataLayout &DL,		const DataLayout &DL,
TargetLibraryInfo *TLI) {		TargetLibraryInfo *TLI) {
// TODO: This can be generalized to calloc-like functions by using		if (!isAllocRemovable(CI, TLI))
// getInitialValueOfAllocation() for the global initialization.		// Must be able to remove the call when we get done..
assert(isMallocLikeFn(CI, TLI) && "Must be malloc-like call");		return false;

		Type *Int8Ty = Type::getInt8Ty(CI->getFunction()->getContext());
		Constant *InitVal = getInitialValueOfAllocation(CI, TLI, Int8Ty);
		if (!InitVal)
		// Must be able to emit a memset for initialization
		return false;

uint64_t AllocSize;		uint64_t AllocSize;
if (!getObjectSize(CI, AllocSize, DL, TLI, ObjectSizeOpts()))		if (!getObjectSize(CI, AllocSize, DL, TLI, ObjectSizeOpts()))
return false;		return false;

// Restrict this transformation to only working on small allocations		// Restrict this transformation to only working on small allocations
// (2048 bytes currently), as we don't want to introduce a 16M global or		// (2048 bytes currently), as we don't want to introduce a 16M global or
// something.		// something.
Show All 11 Lines	static bool tryToOptimizeStoreOfAllocationToGlobal(GlobalVariable *GV,

// We can't optimize this if the malloc itself is used in a complex way,		// We can't optimize this if the malloc itself is used in a complex way,
// for example, being stored into multiple globals. This allows the		// for example, being stored into multiple globals. This allows the
// malloc to be stored into the specified global, loaded, gep, icmp'd.		// malloc to be stored into the specified global, loaded, gep, icmp'd.
// These are all things we could transform to using the global for.		// These are all things we could transform to using the global for.
if (!valueIsOnlyUsedLocallyOrStoredToOneGlobal(CI, GV))		if (!valueIsOnlyUsedLocallyOrStoredToOneGlobal(CI, GV))
return false;		return false;

OptimizeGlobalAddressOfMalloc(GV, CI, AllocSize, DL, TLI);		OptimizeGlobalAddressOfAllocation(GV, CI, AllocSize, InitVal, DL, TLI);
return true;		return true;
}		}

// Try to optimize globals based on the knowledge that only one value (besides		// Try to optimize globals based on the knowledge that only one value (besides
// its initializer) is ever stored to the global.		// its initializer) is ever stored to the global.
static bool		static bool
optimizeOnceStoredGlobal(GlobalVariable GV, Value StoredOnceVal,		optimizeOnceStoredGlobal(GlobalVariable GV, Value StoredOnceVal,
AtomicOrdering Ordering, const DataLayout &DL,		AtomicOrdering Ordering, const DataLayout &DL,
Show All 13 Lines	if (GV->getInitializer()->getType()->isPointerTy() &&
GV->getInitializer()->getType()->getPointerAddressSpace())) {		GV->getInitializer()->getType()->getPointerAddressSpace())) {
if (Constant *SOVC = dyn_cast<Constant>(StoredOnceVal)) {		if (Constant *SOVC = dyn_cast<Constant>(StoredOnceVal)) {
if (GV->getInitializer()->getType() != SOVC->getType())		if (GV->getInitializer()->getType() != SOVC->getType())
SOVC = ConstantExpr::getBitCast(SOVC, GV->getInitializer()->getType());		SOVC = ConstantExpr::getBitCast(SOVC, GV->getInitializer()->getType());

// Optimize away any trapping uses of the loaded value.		// Optimize away any trapping uses of the loaded value.
if (OptimizeAwayTrappingUsesOfLoads(GV, SOVC, DL, GetTLI))		if (OptimizeAwayTrappingUsesOfLoads(GV, SOVC, DL, GetTLI))
return true;		return true;
} else if (isMallocLikeFn(StoredOnceVal, GetTLI)) {		} else if (isAllocationFn(StoredOnceVal, GetTLI)) {
if (auto *CI = dyn_cast<CallInst>(StoredOnceVal)) {		if (auto *CI = dyn_cast<CallInst>(StoredOnceVal)) {
auto TLI = &GetTLI(CI->getFunction());		auto TLI = &GetTLI(CI->getFunction());
if (tryToOptimizeStoreOfMallocToGlobal(GV, CI, Ordering, DL, TLI))		if (tryToOptimizeStoreOfAllocationToGlobal(GV, CI, Ordering, DL, TLI))
return true;		return true;
}		}
}		}
}		}

return false;		return false;
}		}

▲ Show 20 Lines • Show All 1,414 Lines • Show Last 20 Lines

llvm/test/Transforms/GlobalOpt/calloc-promote.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=globalopt -S < %s \| FileCheck %s			; RUN: opt -passes=globalopt -S < %s \| FileCheck %s

				nikicUnsubmitted Not Done Reply Inline Actions Duplicate run line nikic: Duplicate run line
	@g = internal global i32* null, align 8			@g = internal global i32* null, align 8

	define signext i32 @f() local_unnamed_addr {			define signext i32 @f() local_unnamed_addr {
	; CHECK-LABEL: @f(			; CHECK-LABEL: @f(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = call i8 @calloc(i64 1, i64 4)			; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @g.body, i32 0, i32 0), i8 0, i64 4, i1 false)
	; CHECK-NEXT: [[B:%.]] = bitcast i8 [[CALL]] to i32*			; CHECK-NEXT: store i16 -1, i16* bitcast ([4 x i8]* @g.body to i16*), align 2
	; CHECK-NEXT: store i32* [[B]], i32** @g, align 8
	; CHECK-NEXT: [[B2:%.]] = bitcast i8 [[CALL]] to i16*
	; CHECK-NEXT: store i16 -1, i16* [[B2]], align 2
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%call = call i8* @calloc(i64 1, i64 4)			%call = call i8* @calloc(i64 1, i64 4)
	%b = bitcast i8* %call to i32*			%b = bitcast i8* %call to i32*
	store i32* %b, i32** @g, align 8			store i32* %b, i32** @g, align 8
	%b2 = bitcast i8* %call to i16*			%b2 = bitcast i8* %call to i16*
	store i16 -1, i16* %b2			store i16 -1, i16* %b2
	ret i32 0			ret i32 0
	}			}

	define signext i32 @main() {			define signext i32 @main() {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.*]] = call signext i32 @f()			; CHECK-NEXT: [[CALL:%.*]] = call signext i32 @f()
	; CHECK-NEXT: call void @f1()			; CHECK-NEXT: call void @f1()
	; CHECK-NEXT: [[V0:%.]] = load i32, i32** @g, align 8			; CHECK-NEXT: store i32 1, i32* bitcast ([4 x i8]* @g.body to i32*), align 4
	; CHECK-NEXT: store i32 1, i32* [[V0]], align 4
	; CHECK-NEXT: call void @f1()			; CHECK-NEXT: call void @f1()
	; CHECK-NEXT: [[V1:%.]] = load i8, i8 bitcast (i32 @g to i8**), align 8			; CHECK-NEXT: store i8 2, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @g.body, i32 0, i32 0), align 4
	; CHECK-NEXT: store i8 2, i8* [[V1]], align 4
	; CHECK-NEXT: call void @f1()			; CHECK-NEXT: call void @f1()
	; CHECK-NEXT: [[V2:%.]] = load i32, i32** @g, align 8			; CHECK-NEXT: [[RES:%.]] = load i32, i32 bitcast ([4 x i8]* @g.body to i32*), align 4
	; CHECK-NEXT: [[RES:%.]] = load i32, i32 [[V2]], align 4
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	entry:			entry:
	%call = call signext i32 @f()			%call = call signext i32 @f()
	call void @f1()			call void @f1()
	%v0 = load i32, i32* @g, align 8			%v0 = load i32, i32* @g, align 8
	store i32 1, i32* %v0, align 4			store i32 1, i32* %v0, align 4
	call void @f1()			call void @f1()
	Show All 10 Lines