This is an archive of the discontinued LLVM Phabricator instance.

A new HeapToStack allocation promotion pass
Needs ReviewPublic

Authored by hfinkel on Sep 23 2013, 2:00 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This adds a new optimization pass that can 'promote' malloc'd memory to stack-allocated memory when the lifetime of the allocation can be determined to be bounded by the execution of the function.

To be specific, consider the following three cases:

void bar(int *x);

void foo1() {

int *x = malloc(16);
bar(x);
free(x);

}

In this case the malloc can be replaced by an alloca, and the free removed. Note that this is true even though the pointer 'x' is definitely captured (and may be recorded in global storage, etc.).

The pairing between the malloc and the free need not be 1:1, for example:

void foo2(int a) {

int *x;

if (a == 0)
  x = malloc(16);
else
  x = malloc(32);

bar(x);
free(x);

}

where both mallocs and the free can be removed, and, finally, the free might not exclusively be used to free stack-promotable allocations, as in:

void foo3(int *y) {

if (y == 0)
  y = malloc(48);

bar(y);
free(y);

}

to handle this last case, which is referred to as an 'ambiguous' free in the code, a new boolean value is introduced to track the source of the allocation. The malloc is removed, but the free is not. Instead, the pointer passed to the free is conditionally set to null to prevent freeing a stack address.

I talked briefly to Chandler about this offline, and we might want to turn the core functionality here into a utility function so that it can be used directly by SROA. Nevertheless, I think that the basic functionality can be reviewed in this form, and I'd like to discuss any other desired refactoring as part of the review process.

Note: Using this pass causes a miscompile in SingleSource/Benchmarks/Shootout/objinst -- the transformation performed by HeapToStack looks valid, and bugpoint attributes the failure to some combination of later transformations. I'll open a separate bug report to track this issue.

Please review and thanks again!

Diff Detail

Event Timeline

Does there need to be special frontend logic to keep this out of e.g. kernel builds? Kernel code typically is running on a very small stack and deliberately calls malloc/kalloc/whatever even for a value that is live for a single block. Replacing those calls with stack allocation might result in some nasty kernel panic debugging for some poor soul.

Yes, I think that we'd like to thread some enabling flag (as well as the thresholds) through as PassManagerBuilder options. Regarding kernel builds, don't then use -fno-builtins anyway? That would disable this transformation as well.

-Hal

  • Original Message -----
Does there need to be special frontend logic to keep this out of
e.g. kernel builds? Kernel code typically is running on a very
small stack and deliberately calls malloc/kalloc/whatever even for
a value that is live for a single block. Replacing those calls
with stack allocation might result in some nasty kernel panic
debugging for some poor soul.

http://llvm-reviews.chandlerc.com/D1745

joey added inline comments.Sep 24 2013, 2:08 AM
lib/Transforms/Scalar/HeapToStack.cpp
276

Seems odd to use c-style casts?

  • Original Message -----

Comment at: lib/Transforms/Scalar/HeapToStack.cpp:275
@@ +274,3 @@
+
+ if (VerifiedMallocs.count((Instruction *) FAddr)) {

+ DEBUG(dbgs() << "H2S: malloc " << *FAddr <<

Seems odd to use c-style casts?

FAddr might not be an instruction, but it probably is. I could write:

if (isa<Instruction>(FAddr) && VerifiedMallocs.count(cast<Instruction>(FAddr)))

or even

if (VerifiedMallocs.count(dyn_cast<Instruction>(FAddr)))

but both of those pessimize the common case. Pointers to non-instruction will never be in the map.

It is just a micro-optimization (which happens to lead to fewer source-code characters). I'm fine with removing it for clarity.

Thanks,
Hal

http://llvm-reviews.chandlerc.com/D1745

joey added a comment.Sep 24 2013, 8:39 AM

FAddr might not be an instruction, but it probably is. I could write:

if (isa<Instruction>(FAddr) && VerifiedMallocs.count(cast<Instruction>(FAddr))) or even

if (VerifiedMallocs.count(dyn_cast<Instruction>(FAddr)))
but both of those pessimize the common case. Pointers to non-instruction will never be in the map.

It is just a micro-optimization (which happens to lead to fewer source-code characters). I'm fine with removing it for clarity.

Does it actually make any noticeable difference?
I personally prefer the use of dyn_cast, but I won't push too hard for it.

void added a comment.Sep 24 2013, 11:05 AM

I haven't reviewed the patch itself just yet. But what you're doing here seems very similar to how the ARC Optimizer works (in lib/Transformations/ObjCARC). It may be beneficial to look at how it does things. In particular, it removes "retains" and "releases" from the code. There is a lot of analysis that needs to go into this, of course.

Anyway, just a suggestion.

Bill,

Thanks for the pointer.

Just noticed this, in ProvenanceAnalysis.h it says:

/ This file declares a special form of Alias Analysis called ``Provenance
/ Analysis''. The word ``provenance'' refers to the history of the ownership
/ of an object. Thus ``Provenance Analysis'' is an analysis which attempts to
/ use various techniques to determine if locally

if locally what? I think something was truncated here.

-Hal

  • Original Message -----
I haven't reviewed the patch itself just yet. But what you're doing
here seems very similar to how the ARC Optimizer works (in
lib/Transformations/ObjCARC). It may be beneficial to look at how
it does things. In particular, it removes "retains" and "releases"
from the code. There is a lot of analysis that needs to go into
this, of course.

Anyway, just a suggestion.

http://llvm-reviews.chandlerc.com/D1745

  • Original Message -----

Hi chandlerc,

This adds a new optimization pass that can 'promote' malloc'd memory
to stack-allocated memory when the lifetime of the allocation can be
determined to be bounded by the execution of the function.

To be specific, consider the following three cases:

void bar(int *x);

void foo1() {

int *x = malloc(16);
bar(x);
free(x);

}

In this case the malloc can be replaced by an alloca, and the free
removed. Note that this is true even though the pointer 'x' is
definitely captured (and may be recorded in global storage, etc.).

The pairing between the malloc and the free need not be 1:1, for
example:

void foo2(int a) {

int *x;

if (a == 0)
  x = malloc(16);
else
  x = malloc(32);

bar(x);
free(x);

}

where both mallocs and the free can be removed, and, finally, the
free might not exclusively be used to free stack-promotable
allocations, as in:

void foo3(int *y) {

if (y == 0)
  y = malloc(48);

bar(y);
free(y);

}

to handle this last case, which is referred to as an 'ambiguous' free
in the code, a new boolean value is introduced to track the source
of the allocation. The malloc is removed, but the free is not.
Instead, the pointer passed to the free is conditionally set to null
to prevent freeing a stack address.

I talked briefly to Chandler about this offline, and we might want to
turn the core functionality here into a utility function so that it
can be used directly by SROA. Nevertheless, I think that the basic
functionality can be reviewed in this form, and I'd like to discuss
any other desired refactoring as part of the review process.

Note: Using this pass causes a miscompile in
SingleSource/Benchmarks/Shootout/objinst -- the transformation
performed by HeapToStack looks valid, and bugpoint attributes the
failure to some combination of later transformations. I'll open a
separate bug report to track this issue.

This seems to be a bug in DSE. I've opened PR17405 on this.

-Hal

Please review and thanks again!

http://llvm-reviews.chandlerc.com/D1745

Files:

include/llvm-c/Transforms/Scalar.h
include/llvm/IR/DataLayout.h
include/llvm/InitializePasses.h
include/llvm/LinkAllPasses.h
include/llvm/Transforms/Scalar.h
lib/Transforms/IPO/PassManagerBuilder.cpp
lib/Transforms/Scalar/CMakeLists.txt
lib/Transforms/Scalar/HeapToStack.cpp
lib/Transforms/Scalar/Scalar.cpp
test/Transforms/HeapToStack/basic.ll
hfinkel updated this revision to Unknown Object (????).Oct 2 2013, 7:31 AM

This is an updated implementation, with a few major changes:

  • A major bug has been fixed (which has been causing a miscompile in SingleSource/Benchmarks/Shootout/objinst): the transformation had been failing to remove the 'tail' attribute on call sites that might modify the now-stack-allocated memory. It seems important not to remove 'tail' except when absolutely necessary because doing so pessimizes AA, but likely does not affect any actual tail-call formation (because there must have been a free after the call originally, and so the original call was not in tail-call position).
  • New instructions are created using IRBuilder, and lifetime intrinsics are added (for constant-sized allocations, the new allocas are moved to the beginning of the entry block).
  • By default, we now branch around conditionally-elided free calls (instead of nulling out the pointer argument). I'm assuming that the branch is generally more efficient than the call in the common case.
  • The new alloca call now should properly take the name of the original malloc.

Thanks again everyone!

chandlerc resigned from this revision.Mar 29 2015, 11:24 AM
chandlerc removed a reviewer: chandlerc.