Page MenuHomePhabricator

[Attributor] Heap-To-Stack Conversion
ClosedPublic

Authored by sstefan1 on Jul 29 2019, 10:40 AM.

Details

Summary

D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features.

I'm starting with some test cases (more will be added along the way). For now they are in FunctionAttrs, as most other attributor tests, but that can be changed.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Providing an implementation for the conversion. This is few commits behind fromt the latest Attributor, will be rebased.

sstefan1 updated this revision to Diff 215803.Aug 18 2019, 2:51 PM
  • minor fix
jdoerfert added inline comments.Mon, Aug 19, 11:18 AM
llvm/lib/Transforms/IPO/Attributor.cpp
2073 ↗(On Diff #215803)

Not needed, see my comment further below.

2078 ↗(On Diff #215803)

Not needed, we can use the InfoCache to look only at calls.

2084 ↗(On Diff #215803)

Let's collect all valid h2s candidate malloc calls and then only "materialize" the ones we deem profitable. Knowing a malloc could be converted to the stack is already a nice piece information.

2301 ↗(On Diff #215803)

If you want to start with a small patch, just check the function for willreturn, nounwind, and nofree.

2328 ↗(On Diff #215803)

I dislike this exploration. We should not need to traverse all paths all the time. (This does not track visited blocks btw. so it is probably an endless loop if there are loops in the function).

We should (later) follow the uses of the malloc only. The logic I have in mind is as follows (IMHO much easier than this (=the old logic)):

If all (transitive) users of a malloc are either nocapture uses or free uses, we can do heap2stack. In fact, we should probably generalize the capture AA to collect capturing uses instead so we can analyze them later, e.g., here.

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
17 ↗(On Diff #215803)

I think this is a bug, see below.

I'll create a patch to *not* annotate free with nocapture.

25 ↗(On Diff #215803)

I'd argue, "freeing" a pointer is "capturing" by nature so this combination "nocapture" + "free" cannot exist. If it doesn't, "nocapture" is marked and implies "nofree" on the pointer. Thus, we can actually allocate %1 on the stack here.

sstefan1 updated this revision to Diff 216967.Fri, Aug 23, 3:20 PM

Taking a different approach using nocapture and nofree.

sstefan1 updated this revision to Diff 216968.Fri, Aug 23, 3:22 PM
remove llvm_debug
Harbormaster completed remote builds in B37224: Diff 216968.

We need tests where stuff is read/written as well.

llvm/lib/Transforms/IPO/Attributor.cpp
2719 ↗(On Diff #216968)

Mallocs can become live later so we shouldn't remove them but only ignore them for the moment. In fact, the checkForAllCallLikeInstructions call will already filter "currently assumed dead" ones out so you have to invoke it in every update or collect all calls through the InfoCache explicitly in the initialize method.

2724 ↗(On Diff #216968)

This is quadratic in the number of malloc calls.

If you want to exclude malloc calls from the tests below do:
if (MallocCalls.count(&I)) return true;

2729 ↗(On Diff #216968)

No need to check, just put it in FreesForMalloc (though I'm unclear why we need it at the moment)

2773 ↗(On Diff #216968)

I was hoping we follow and check uses:

UsesCheck = (Instruction &I) {
  SmallPtrSet<const Use *, 8> Visisted;
  SmallVector<const Use *, 8> Worklist;
  Worklist.append(I.use_begin(), I.use_end());
  while (!Worklist.empty()) {
    const Use *U = Worklist.pop_back_val();
    if (!Visited.insert(U).second) continue;
    const auto *UserI = U->getUser();
    if (isa<LoadInst>(UserI) || isa<StoreInst>(UserI))
     continue;
    if (auto *CB = dyn_cast<CallBase>(UserI)) {
      // TODO: Check use once we have nofree on call site args.
      const auto &NoFreeAA = A.getAAFor<AANoFree>(..., IRPosition::call_site(CB));
      if (!NoFreeAA.isAssumedNoFree()) return false;
      // same for nocapture
      // if both are good, continue.
    }
    if (isa<GetElementOperator>(UserI) || isa<BitCastInst>(UserI)) {
      // Add users to worklist and continue
    }
    // unknown user
    return false;
  }
  return true;
}

MalloCheck = (Instruction &I) {
  if (!isaMalloc(I)) return true;
  if (BadMallocs.count(&I)) return true;
  // assumed h2s malloc I
  if (UsesCheck(I))
    Mallocs.insert(&I);
  else
    BadMallocs.insert(&I);
  return true;
}

checkForAllCallLikeInstructions(MallocCheck)
2787 ↗(On Diff #216968)

Is this "Number of mallocs converted to allocs"?

2792 ↗(On Diff #216968)

I doubt it makes sense to have a H2S call site AA. We should probably only allow the function one.

3213 ↗(On Diff #216968)

Still use the Whitelist and group it with the other function AAs please

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
40 ↗(On Diff #216968)

We need a test, sync_will_return before free which should (later) be OK.

120 ↗(On Diff #216968)

This should be h2s as the free is dead and there are no users that are nofree.

sstefan1 updated this revision to Diff 217286.Mon, Aug 26, 6:02 PM

addressing comments

sstefan1 updated this revision to Diff 217287.Mon, Aug 26, 6:07 PM
  • remove mistake
sstefan1 marked an inline comment as done.Mon, Aug 26, 6:20 PM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2729 ↗(On Diff #216968)

The reason for this is if for example a malloc becomes 'bad' then all the frees that use it can be easily disregarded.

sstefan1 updated this revision to Diff 217518.Tue, Aug 27, 3:52 PM
  • Small corrections.

There are some leftover comments but this is almost ready.

llvm/lib/Transforms/IPO/Attributor.cpp
129 ↗(On Diff #217518)

I'd enable this by default but for a smaller size, let's say 128?

2652 ↗(On Diff #217518)

AI = new BitCastInst(AI, MallocCall->getType(), "malloc_bc", AI->getNextNode());

might do the trick.

2673 ↗(On Diff #217518)

remove the const here then you can avoid the const casts

2719 ↗(On Diff #217518)

If no-free is set nocaputre is not checked anymore. We need to check for nofree and nocapture.

2724 ↗(On Diff #217518)

If this is not the arg operand we should give up. (we may allow it at the callee position but not in the operand bundle)

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
58 ↗(On Diff #217518)

outdated comment.

68 ↗(On Diff #217518)

maybe elaborate why h2s is still OK.

103 ↗(On Diff #217518)

You want branch %6 or you would have a double free (not necessarily a problem for h2s but still).

121 ↗(On Diff #217518)

leftover comment.

127 ↗(On Diff #217518)

We need this test again with an nounwind @foo which should allow h2s.

Negative tests:
Malloc in loop body
Irreducible cfg

?

Still missing.

+ support for calloc shouldn’t be very hard to add.

There are some leftover comments but this is almost ready.

I'd like to see tests with https://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic

@xbolva00 I'm sorry this has dragged on a bit.

I will address all the comments, probably tomorrow.

sstefan1 updated this revision to Diff 219196.Fri, Sep 6, 4:16 PM

addressing comments

sstefan1 updated this revision to Diff 219202.Fri, Sep 6, 4:43 PM

small fix

Is this in a state to be reviewed or do you plan to go over it again?

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
251 ↗(On Diff #219204)

check lines missing.

jdoerfert accepted this revision.Thu, Sep 12, 5:47 PM

Can you rebase and remove the artifacts in the diff? I also provided final comments. I think once they are fixed it is fine. Feel free to commit a fixed version.

llvm/lib/Transforms/IPO/Attributor.cpp
2664 ↗(On Diff #219204)

Leftover comment and braces around single stmt.

2745 ↗(On Diff #219204)

this has to lead to invalidation. use in a operand bundle could free even if the function is nofree. Maybe move the check to the beginning of the CallBase conditional.

2765 ↗(On Diff #219204)

Allow lifetime.start/end as users here as well. That should unbreak one of the test cases.unbreak

2780 ↗(On Diff #219204)

You have to put them into the "Bad" instruction set here. Also, move the "is bad" check to the beginning.
There is a test missing for a malloc and a calloc that have to large sizes. (calloc only the multiplication should be too large).
We need to check that the multiplication does not overflow. APInt has a method to do overflow checked multiplication.

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
1 ↗(On Diff #219204)

When we run this with the old PM, what will happen? No h2s? We should do it and make sure it doesn't crash etc.

202 ↗(On Diff #219204)

This should be fine. Also a lifetime.end user is fine.

This revision is now accepted and ready to land.Thu, Sep 12, 5:47 PM
sstefan1 marked 3 inline comments as done.Sat, Sep 14, 2:21 PM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2745 ↗(On Diff #219204)

It makes sense to me to move the check to the beginning.

use in a operand bundle could free even if the function is nofree

I'm not I understand this, though.

2780 ↗(On Diff #219204)

Also, move the "is bad" check to the beginning.

If you mean in initialize, then I'd say this is a nicer approach.

llvm/test/Transforms/FunctionAttrs/heap_to_stack.ll
1 ↗(On Diff #219204)

Yes, no h2s and doesn't crash.

Closed by commit rL371942: [Attributor] Heap-To-Stack Conversion (authored by sstefan, committed by ). · Explain WhySun, Sep 15, 2:46 PM
This revision was automatically updated to reflect the committed changes.

I would prefer to use update_test_checks.py instead of hand written checks.

Quite questionable is “stackification” of mallocs in loops, we should avoid it I think. @efriedma, what do you think?

xbolva00 added a comment.EditedSun, Sep 15, 3:10 PM

IsMallocLikeFn returns true for C++ operator new which does throw on an allocation failure.

So probably you stackify:

int *p = new int;
free(p);

It is a stupid code and buggy, but please add it as test case and the following fix:

For now, just check if (isMallocLike && !isOpNewLike).

I think it is worth to add support for c++’s new - delete pairs in the future..

Quite questionable is “stackification” of mallocs in loops, we should avoid it I think. @efriedma, what do you think?

Please can you be more specific?

IsMallocLikeFn returns true for C++ operator new which does throw on an allocation failure.

Likewise, please can you be more specific in your messages? This doesn't actually explain what the problem is.

xbolva00 added inline comments.Sun, Sep 15, 3:17 PM
llvm/trunk/lib/Transforms/IPO/Attributor.cpp
3229

Check for overflow is missing

3263

Builder.CreateMemset

I think the current implementation needs some important fixes to avoid miscompilation/overflow, can you make it off by default, fix issues, and enable it again?

Thanks.

xbolva00 added a comment.EditedSun, Sep 15, 3:41 PM

Quite questionable is “stackification” of mallocs in loops, we should avoid it I think. @efriedma, what do you think?

Please can you be more specific?

Yes :)
Heap to stack idea started some time ago and @efriedma raised some concerns: https://reviews.llvm.org/D47345#1112573.
Imagine

for (int i = 0; i < BIG_NUM; ++i) {

void *p = malloc(128);
....
free(p);

}

->

for (int i = 0; i < BIG_NUM; ++i) {

void *p = alloca(128);
....

}

Nice stack overflow :)

And yes, we shouldn't do this on irreducible cfgs (also noted by Eli).

IsMallocLikeFn returns true for C++ operator new which does throw on an allocation failure.

Likewise, please can you be more specific in your messages? This doesn't actually explain what the problem is.

%1 = tail call dereferenceable(4) i8* @_Znwm(i64 4)
...some code...
tail call void @free(i8* nonnull %1)

is probably converted to alloca. I have no strong opinion what should be done here, ideally warning! :) Anyway, I would like to see some tests with C++'s new and various combinations (there are many variants of 'new', 'delete').

sstefan1 marked an inline comment as done.Sun, Sep 15, 11:07 PM

I think the current implementation needs some important fixes to avoid miscompilation/overflow, can you make it off by default, fix issues, and enable it again?

Thanks.

Attributor is turned if by dedault right now, so h2s is also turned off. I will revisit this again this week and try to address most of the concerns you raised in a new patch.

llvm/trunk/lib/Transforms/IPO/Attributor.cpp
3229

If there is an overflow, this will not be executed.

xbolva00 added inline comments.Sun, Sep 15, 11:12 PM
llvm/trunk/lib/Transforms/IPO/Attributor.cpp
3229

Ok, great

sstefan1 marked an inline comment as done.Sun, Sep 15, 11:14 PM
sstefan1 added inline comments.
llvm/trunk/lib/Transforms/IPO/Attributor.cpp
3363

This was supposed to be if(Overflow). I'll fix this.

I think the current implementation needs some important fixes to avoid miscompilation/overflow, can you make it off by default, fix issues, and enable it again?

The Attributor is off by default (even though we should now pass LLVM-TS + SPEC2006 tests!!).

IsMallocLikeFn returns true for C++ operator new which does throw on an allocation failure.

[...] is probably converted to alloca. I have no strong opinion what should be done here, ideally warning! :) Anyway, I would like to see some tests with C++'s new and various combinations (there are many variants of 'new', 'delete').

I fail to see the problem in converting a possible throwing new to a non-throwing alloca.
If you think there is one could you please try to explain why that should not be valid?

I think it is worth to add support for c++’s new - delete pairs in the future..

We can add this (the delete part) when we actually use free/delete to reason about h2s validity.

Quite questionable is “stackification” of mallocs in loops, we should avoid it I think. @efriedma, what do you think?

Heap to stack idea started some time ago and @efriedma raised some concerns: https://reviews.llvm.org/D47345#1112573.

From a pure standard point of view, I would argue the current implementation is not wrong in doing h2s for mallocs in loops.
I do however not intent to turn it on and let people deal with the aftermath. As mentioned by @sstefan1, this is not on by default.
My plan was to get the logic in, add the "must-be-freed"-based logic that @hfinkel used in the original version, and then look into heuristics.
The reason is that the "must-be-freed" actually allows different placement of the alloca and we want to see what is out there wrt. sizes etc.
Does that make sense and does my plan sound OK to you?

llvm/trunk/lib/Transforms/IPO/Attributor.cpp
3363

Add a test for this please.

If you think there is one could you please try to explain why that should not be valid?

Ok, now I think it should be okay.

Does that make sense and does my plan sound OK to you?

Ok.