This is an archive of the discontinued LLVM Phabricator instance.

[CFLAA] Teach cfl-aa to understand heap memory allocation
ClosedPublic

Authored by grievejia on May 28 2016, 2:32 PM.

Details

Summary

Cfl-aa used to bail out and make the most conservative assumption on any external library functions (any input or output parameter may alias just about anything). This patch tries to make it less conservative when seeing an external function that allocate/deallocate heap memory, e.g. malloc(), calloc(), and free().

Diff Detail

Repository
rL LLVM

Event Timeline

grievejia updated this revision to Diff 58906.May 28 2016, 2:32 PM
grievejia retitled this revision from to [CFLAA] Teach cfl-aa to understand heap memory allocation.
grievejia updated this object.
grievejia added a subscriber: llvm-commits.

Tests would be appreciated here, too, please. :)

lib/Analysis/CFLAliasAnalysis.cpp
387 ↗(On Diff #58906)

This looks a bit sketchy, because isAllocLikeFn returns true for strdup/strndup, both of which copy arbitrary bits from a source to a destination. It also looks like BasicAA (line 766) opts to be conservative with strdup/strndup.

That said, given how we currently handle external function calls + conversions of pointers to ints (and back), I'm not sure if allowing strdup here can break anything that has a reasonable chance of success in the first place.

...So, I'm really tempted to say "please do the same thing as BasicAA here and add a TODO that we'll address when external values/other CFLAA features are in, so this doesn't come back to subtly bite us in the future."

Unless i'm missing something, this is not the correct handling. Happy to be
convinced.

The normal behavior (and that specified in the papers) is to treat malloc
as if it output the address of a unique variable.

This is because while it introduces no aliases itself, it does introduce a
variable that can be aliased.
If that variable is reached through other mechanisms, it may be an alias :)

If you say they do *nothing*, as you've done here, nothing will ever alias
the result of malloc, even something that *should*.

Am i missing something?

This is because while it introduces no aliases itself, it does introduce a
variable that can be aliased.
If that variable is reached through other mechanisms, it may be an alias :)

I agree.

If you say they do *nothing*, as you've done here, nothing will ever alias
the result of malloc, even something that *should*.

Am i missing something?

"Nothing" is currently what happens to AllocaInsts. I just tried to be consistent with how AllocaInst is treated.

Also, "nothing" only applies to the malloc() inst/AllocaInst itself. If any other instructions use it, the allocated value should get linked to other part of the function.

grievejia updated this revision to Diff 58973.May 30 2016, 10:08 AM
grievejia edited edge metadata.

Replaced isAllocLikeFn() with isMallocLikeFn() || isCallocLikeFn(), just to be consistent with BasicAA.

Also, added a test case.

Danny - ...Good point. I forgot that we were super conservative when something isn't in the sets.

Jia - Can we please change this to just add an edge from the malloc/calloc result (and free argument) to itself with zero stratified attributes, so that we have the relevant pointer in our sets? :)

We need to do this because, if the result of the malloc is otherwise unused, we'll not see it in the stratified sets. If something's not in the stratified sets, we'll assume it was added after we ran on the function, so we're conservative and always just say MayAlias. If we're not doing the same thing with otherwise-unused allocas, we probably should be.

grievejia updated this revision to Diff 58986.EditedMay 30 2016, 12:29 PM
grievejia marked an inline comment as done.

Added additional edges in case the return value of malloc() is not used.

I'm not sure about free(), though. How could an actual argument becomes "unused"? Isn't the argument itself a use?

We need to do this because, if the result of the malloc is otherwise unused, we'll not see it in the stratified sets. If something's not in the stratified sets, we'll assume it was added after we ran on the function, so we're conservative and always just say MayAlias. If we're not doing the same thing with otherwise-unused allocas, we probably should be.

You are right. I think we are doing this for unused allocas, outside GetEdgeVisitor.

Any reason for (1) not adding unused value to StratifiedSets, and (2) treating values not in StratifiedSets conservatively?

We treat values that don't exist in StratifiedSets conservatively because the IR can be modified after we run CFLAA on it. Imagine that we started with:

define void @foo() {
  %1 = alloca [2 x i32], align 8
  ; some code here
  ret void
}

CFLAA ran, etc. and a later pass introduced a GEP of %1, like so:

define void @foo() {
  %1 = alloca [2 x i32], align 8
  %2 = GEP %1, i32 0, i32 0
  ; some code here
  ret void
}

...And a later pass queries CFLAA for whether %1 and %2 alias for whatever reason. Because %2 wasn't there at build-time, we can't offer an accurate answer without doing extra work, so we always answer conservatively.

As for why we don't add unused pointers, probably bugs. If you can find a case where we're not adding pointers, please say so. :)

lib/Analysis/CFLAliasAnalysis.cpp
392 ↗(On Diff #58986)

I think we need to do the same edge hack with free calls, as well :)

Consider:

@g = ; some global ptr
define void @foo() {
  %a = alloca i32
  ; code that only uses %a
  call void @free(i8* @g)
}

If we don't, @g won't be in the sets, so we'll consider it to alias anything, which is too conservative.

grievejia updated this revision to Diff 58989.May 30 2016, 12:53 PM
grievejia marked an inline comment as done.

Addressed free() argument.

grievejia added a comment.EditedMay 30 2016, 1:04 PM

We treat values that don't exist in StratifiedSets conservatively because the IR can be modified after we run CFLAA on it.

Hmm, this is strange. To me the one and only right thing to do after IR changes is to invalidate the cfl-aa pass and re-run the analysis. In general it is not safe to persist alias analysis result across IR modifications. In your example, what if, instead of someone adding a new value to the IR, a new store instruction is added and the points-to set of some existing value gets altered? The analysis result can't be used anyway.

grievejia updated this revision to Diff 58999.May 30 2016, 2:08 PM

Updated test case.

The previous test case makes alias queries on poisoned pointers and alloca-ed pointers. We'll always get back NoAlias result for those queries even before applying the patch. The updated version tests for the right thing.

george.burgess.iv edited edge metadata.

LGTM'ed this over email. Doing so here, too, to make life easier for code historians.

This revision is now accepted and ready to land.Jun 1 2016, 11:46 AM
This revision was automatically updated to reflect the committed changes.