CFLAA's support for interprocedural analysis was rather limited. For each function argument/return value, it only records whether if the given value is somehow related (i.e. can be obtained through assignments/reference/derefence) to other argument/return values. If the relation is found, an assignment edge is added (i.e. merge the two values into the same StratifiedSet). For example, if it sees a call instruction like this:
call void @f(i32** %x, i32* %y)
and if function @f is defined like this:
define void @f(%i32** %a, i32* %b) { store i32* %b, i32** %a ret void }
Then the analysis will try to say, at the callsite, that %x and %y are aliases. However, doing interprocedural analysis in this way is neither sound nor precise. It is imprecise because it is obvious that callee @f does not do anything to make its arguments alias each other, yet the analysis conclude that they may. It is unsound because if we add another instruction after the callsite:
%z = load i32*, i32** %x
, the analysis will not be able to detect %z and %y may-alias each other.
It seems to me that the core issue here is that our function summaries are not detailed enough to provide adequate information to the caller. Given a function @f, we can't just say "the first argument is somehow related to the second argument, but yeah we don't know exactly what the relationship is". What we really want instead is something more concrete, such as "the first argument is related to the second argument, and here's the relationship: the dereference of the first argument may-alias with the second argument". Summaries in the latter form give precise instructions on how to deal with actual arguments/actual return value at the callsite where the summary is instantiated.
This patch implements the idea illustrated in the last paragraph. Previously, the "RetParamRelation" struct inside FunctionInfo only includes related-pairs of parameters/return values. Now we refine the pairs by extending each parameters/return values with an additional DerefLevel N, which allows us to represent things like "derefrence the first argument N times". For instance, for the function @f mentioned before we will put [(param0, DerefLevel = 1), (param1, DerefLevel=0)] into the summary. At callsite where this summary is instantiated, we will create a new StratifiedSet below the set of %x, and merge it with the StratifiedSet of %y. To support this kind of operation where the DerefLevels of both elements of the pair are non-zero, a new interface to StratifiedSetBuilder needs to be added.
I've also changed the algorithm we used to generate function summaries. Here is how we checked whether two parameters %a and %b are related in the past:
- Find the StratifiedSet that %a belongs to
- Find all StratifiedSets that come above %a, and see if %b is in there
- Find all StratifiedSets that come below %a, and see if %b is in there
The algorithm is correct but if we were to do this for all pairs of parameters, it becomes O(n^2). Here is the approach I used in this patch:
- Declare a hashmap that maps from StratifiedSets to lists of parameters
- Scan the parameter list. For each parameter p, get the corresponding StratifiedSets s, and add (s->p) to the hashmap
- Scan the hashmap. For each StratifiedSet, if the mapped list has more than one element, then we know that every parameters in the list alias every other one
I think my approach should do less work and has a linear complexity (I could be wrong about this, though). One drawback of this approach (or at least in the way I'm doing it) is that we may end up with redundant items in the summary. Take @f as an example again: the summary not only contains the aforementioned entry [(param0, DerefLevel = 1), (param1, DerefLevel=0)] , we may end up with another entry [(param0, DerefLevel = 2), (param1, DerefLevel=1)] as well. The second entry is redundant here because it can be implied from the first entry. That being said, the issue does not affect soundness, and it may be fixable, so I am not very concerned about it.
With this patch ready, I am finally able to remove some xfailed test cases I introduced in an earlier patch. There are still two xfailed left untouched, since fixing them requires me putting even more information (e.g. which InterfaceValue(s) may be tagged with AttrUnknown) into the summary. They will be handled by subsequent patches.