This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations
AbandonedPublic

Authored by steakhal on Aug 29 2023, 8:21 AM.

Details

Summary

After this patch, we no longer need the check::Location callback, as we
caught bugs earlier when forming bad references - aka. before loading or
binding anything to it.

(CWE-122 Heap Based Buffer Overflow: CWE-805-class-loop)

Depends on D159106

Diff Detail

Event Timeline

steakhal created this revision.Aug 29 2023, 8:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 8:21 AM
steakhal requested review of this revision.Aug 29 2023, 8:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 8:21 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Good direction of development, this will be useful for providing better bug reports (in addition to ensuring correct behavior some situations).

Note that it's also possible to dereference pointers with the operator ->, which is represented by MemberExprs in the AST; we should probably handle that as if it was a UO_Deref.

There is also a small corner case that for an array some_type arr[N] it's well-defined to form the past-the-end pointer as &arr[N] (instead of arr + N) -- while any other use of arr[N] is undefined behavior. If this occurs in practice, then we'll probably need some additional logic to handle it. (Note that the check::Location implementation dodged this question, because it didn't report anything when the program formed &arr[N], but later created a bug report when this pointer value was dereferenced.)

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
34

Which testcase would break without the check::Bind callback? (Not action needed, I'm just curious.)

steakhal abandoned this revision.Aug 31 2023, 1:09 AM

Good direction of development, this will be useful for providing better bug reports (in addition to ensuring correct behavior some situations).
Note that it's also possible to dereference pointers with the operator ->, which is represented by MemberExprs in the AST; we should probably handle that as if it was a UO_Deref.

+1, +1

There is also a small corner case that for an array some_type arr[N] it's well-defined to form the past-the-end pointer as &arr[N] (instead of arr + N) -- while any other use of arr[N] is undefined behavior. If this occurs in practice, then we'll probably need some additional logic to handle it. (Note that the check::Location implementation dodged this question, because it didn't report anything when the program formed &arr[N], but later created a bug report when this pointer value was dereferenced.)

Ah, this disappoints me. You are right. This delayed mechanism definitely needs a more careful approach.

Given these concerns and the quality of the diagnostics, I don't think it's something easy to fix. I'll abandon this for now, to focus on the low-hanging patches to upstream.

Thanks for the thoughtful review! @donat.nagy

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
34

Good question. TO my surprise, no tests would be broken if we removed this callback.
Nice catch.
I thought it would break the new test added in D159106, but it would still pass.

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

I'd be happy to see (a slightly extended variant of) this commit merged, because I could provide much better warning messages if I can access the concrete subscript/dereference operations. Of course if you don't have time to work on this I can put this up for review myself (probably after your other commits are handled).

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.
For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

I'd be happy to see (a slightly extended variant of) this commit merged, because I could provide much better warning messages if I can access the concrete subscript/dereference operations. Of course if you don't have time to work on this I can put this up for review myself (probably after your other commits are handled).

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.

As a rough solution we can simply say that this checker ignores &arr[idx] (because it's just a different way of writing arr + idx) and only checks expressions where "real" dereference happens. This way the checker would won't emit false positives on past-the-end pointers and still report every out of bound memory access (including off-by-one errors).

This can be refined by adding a check that which validates that in an expression like &arr[idx] the index satisfies 0 <= idx && idx <= array_size. This is conceptually independent (but can use the same implementation as the real memory access check) and would add some useful reports and constraints without breaking anything.

For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

Pointer arithmetic is very restricted in the standard, e.g. http://eel.is/c++draft/basic.compound says that

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or
(3.3) the null pointer value for that type, or
(3.4) an invalid pointer value.

As this explicitly rules out before-first-element pointers and they are not too common (I don't recall any code that used them), I don't think that they deserve a special case (just report them if we see them).

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

Of course, it's completely reasonable to focus on a limited set of changes before the vacation.

In the meantime, I'll probably work on one or more commits that would (1) switch to using the concrete check::PostStmt callbacks instead of check::Location (and Bind) like this commit (2) improve the diagnostics, add more details to the messages. When will you return from the vacation? Should I wait for you with the review of these changes (if I can write and test them before your return)?

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.

As a rough solution we can simply say that this checker ignores &arr[idx] (because it's just a different way of writing arr + idx) and only checks expressions where "real" dereference happens. This way the checker would won't emit false positives on past-the-end pointers and still report every out of bound memory access (including off-by-one errors).

This can be refined by adding a check that which validates that in an expression like &arr[idx] the index satisfies 0 <= idx && idx <= array_size. This is conceptually independent (but can use the same implementation as the real memory access check) and would add some useful reports and constraints without breaking anything.

It makes sense, but it would need to special case on the parent or child AST node, which isn't really clean. But I think we think the same on this subject.
(Note that the & is a parent of the arr[N], where the checker would trigger, thus we would need to check the parent node to ignore such cases; and that is not really supported in the AST, and using the ParentMap for it is expensive. Remember, that this would happen for any SubscriptExpr as many times they are visited.)

For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

Pointer arithmetic is very restricted in the standard, e.g. http://eel.is/c++draft/basic.compound says that

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or
(3.3) the null pointer value for that type, or
(3.4) an invalid pointer value.

As this explicitly rules out before-first-element pointers and they are not too common (I don't recall any code that used them), I don't think that they deserve a special case (just report them if we see them).

Yes. They are really restricted, and AFAIK only past-the-end pointers are allowed, unlike before-the-first element. Such code might be written for reverse iterations.
But we know that hardware would work like that, and people might rely on it.

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

Of course, it's completely reasonable to focus on a limited set of changes before the vacation.

In the meantime, I'll probably work on one or more commits that would (1) switch to using the concrete check::PostStmt callbacks instead of check::Location (and Bind) like this commit (2) improve the diagnostics, add more details to the messages. When will you return from the vacation? Should I wait for you with the review of these changes (if I can write and test them before your return)?

I don't think you should wait for me. I'll be off for a week though, but I might not have the bandwidth to do reviews even after that.
Post it once you are fine with it. Prove it "behaves well", and just wait for someone to review it. It might be noq, xazax or me. We will see.