This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Improve pointer arithmetic checker.
ClosedPublic

Authored by xazax.hun on Oct 30 2015, 8:38 AM.

Details

Summary

This patch is intended to improve pointer arithmetic checker.

From now on it tries to only warn, when the pointer arithmetic is likely to cause an error. For example when the pointer points to a single object, or an array of derived types.

Note that this check does not free the stored information right now, because it caused some trouble when I was checking the following code.

struct trie {
  struct trie* next;
};

struct kwset {
  struct trie *trie;
  unsigned char y[10];
  struct trie* next[10];
  int d;
};

typedef struct trie trie_t;
typedef struct kwset kwset_t;

void f(kwset_t *kws, char const *p, char const *q) {
  struct trie const *trie;
  struct trie * const *next = kws->next;
  register unsigned char c;
  register char const *end = p;
  register char const *lim = q;
  register int d = 1;
  register unsigned char const *y = kws->y;

  d = y[c = (end+=d)[-1]];
  trie = next[c]; // Here the analyzer tought that kws->next is a dead region, so the stored information was unavailable for the array. adding a kws = 0 or similar line to the end of the function fixed the problem. Is this a bug in liveness analysis fo regions?
}

Diff Detail

Repository
rL LLVM

Event Timeline

xazax.hun updated this revision to Diff 38813.Oct 30 2015, 8:38 AM
xazax.hun retitled this revision from to [analyzer] Improve pointer arithmetic checker..
xazax.hun updated this object.
xazax.hun added reviewers: zaks.anna, dcoughlin.
xazax.hun updated this object.
xazax.hun added subscribers: cfe-commits, dkrupp.
dcoughlin edited edge metadata.Nov 11 2015, 10:45 AM

Gabor,

This is an alpha checker. Do you anticipate turning it on by default?

Comments inline.

lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
28 ↗(On Diff #38813)

Is it necessary to distinguish so many cases here? For example, why do we need to distinguish between PlacementNew and OverloadedNew?

Another thought: given the expense of tracking stuff in the GDM could we instead track whether pointer arithmetic is explicitly disallowed for a given region? Then we wouldn't have to track any data for the "good" pointers.

Also, how does AllocKind relate to the AllocationFamily from MallocChecker? Could that checker's state be used so we don't have to track any additional information here?

If you do keep AllocKind, I think it would be good to add a comment describing how this enum is used and the intended meaning of each element.

29 ↗(On Diff #38813)

I'm not sure "singleton" is the right term here. I associate "singleton" with the design pattern of only having a single instance of a class allocated at a given time (a form of global shared state). Also, SingletonMalloc doesn't appear to be used anywhere.

88 ↗(On Diff #38813)

I think it would be better to use llvm::SmallSet here.

137 ↗(On Diff #38813)

I think it would be good to add a doc comment for this function describing what the function does and its parameters as well as whether they are input or output parameters.

I also wonder if this logic is better expressed as a loop rather than recursion. What do you think?

150 ↗(On Diff #38813)

In general, I think it is better to avoid default in cases like these so that when an enum case is added the compiler issues a warning and thus forces the person adding the change to think about what the behavior of the new case should be.

198 ↗(On Diff #38813)

I think "polymorphic" is a bit jargony. Can this diagnostic be explained in terms of base and derived classes?

test/Analysis/ptr-arith.cpp
47 ↗(On Diff #38813)

I think it would be good to add tests showing p = i*0 + p' and p = p + i*0' don't alarm. Also `p += i*0'.

Gabor,

This is an alpha checker. Do you anticipate turning it on by default?

Comments inline.

I could see two kinds of false positives with this checker when running this on the LLVM codebase.

When objects are allocated in a way that one object is allocated together with an array (the array is after the original object in), and the object contains getter code like:

reinterpret_cast<Elements*>(this + 1);

When a function is written like this:

Obj* f() {
if (opaqueCond)
    return singleObject;
return array;
}

And used like this:

f()[5];

Other then these two cases I think the results are good.

xazax.hun added inline comments.Nov 12 2015, 1:37 AM
lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
28 ↗(On Diff #38813)

I think this way it is more explicit that we are not going to reason about PlacementNew and OverLoadedNew. Turning this information into comments and reduce the number of elements of the enum is fine. I am going to do that.

Storing information only for singletons or only for arrays seems like a good idea for me. However it is possible that there are more pointers for single objects than arrays? In this case it would be better to store data only for "good" pointers.

AllocationFamily does not distinguish placement and overloaded new for instance. The current approach to this checker is to be conservative and do not try to reason about them. Other than that in the future I planned to add heuristics when a malloc call is likely to allocate and array and when it is likely to allocate a single object (heuristics on the AST). The AllocationFamily do not distinguish these cases.

29 ↗(On Diff #38813)

What about SingleObjectNew? Currently this checker does not reason about malloced pointers, because it does not distinguish the case where the malloc is used to allocate an array and where it is used to allocate a single object. I was planning to do this classification based on some heuristics on AST matching.

88 ↗(On Diff #38813)

I will change that, thanks.

137 ↗(On Diff #38813)

I will rewrite it without recursion and check the results but I suspect that you are right.

150 ↗(On Diff #38813)

I will enumerate the rest of the kinds here.

198 ↗(On Diff #38813)

Sure, I will try to come up with something.

xazax.hun updated this revision to Diff 41022.Nov 24 2015, 4:20 AM
xazax.hun edited edge metadata.
xazax.hun marked 11 inline comments as done.
  • Fixed some of the review comments.
  • Updated to latest trunk.
xazax.hun added inline comments.Nov 24 2015, 4:23 AM
lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
28 ↗(On Diff #41022)

In case, there is a pointer to the stack allocated variable, nothing will be stored in the GDM.
The result of a new is a symbolic region, but there are other ways to get a symbolic region, e.g. a pointer as an argument to a top level function. In order to distinguish these cases I think I need to store something to the GDM. But I will think a bit more, whether there is a way to reduce the GDM usage.

150 ↗(On Diff #38813)

I checked and there are more kinds than what I think is worth enumerating.

dcoughlin accepted this revision.Feb 22 2016, 9:30 AM
dcoughlin edited edge metadata.

Other than a suggested diagnostic rewording to consider, looks good to me. Thanks Gábor!

lib/StaticAnalyzer/Checkers/PointerArithChecker.cpp
198 ↗(On Diff #41022)

Here is a suggestion to reword: "Pointer arithmetic on non-array variables relies on memory layout, which is dangerous."

This revision is now accepted and ready to land.Feb 22 2016, 9:30 AM
This revision was automatically updated to reflect the committed changes.