Page MenuHomePhabricator

[Sema] Fix -Warray-bounds false negative when casting an out-of-bounds array item
Needs ReviewPublic

Authored by ilya on Dec 19 2019, 9:17 AM.

Details

Reviewers
rsmith
Summary

Event Timeline

ilya created this revision.Dec 19 2019, 9:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2019, 9:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ilya retitled this revision from [Sema] NFC: Remove trailing spaces and fix a typo in a test file to [Sema] Fix -Warray-bounds false negative when casting an out-of-bounds array item.Dec 19 2019, 9:28 AM
ilya edited the summary of this revision. (Show Details)
ilya added a reviewer: rsmith.
ilya added a subscriber: ebevhan.

These are not the only AST nodes representing cast expressions (there is also CXXFunctionalCastExpr, ...). What about replacing the IgnoreParenImpCasts() above by IgnoreParenCasts() ? Incidentally doing this uncovers another test (Parser/cxx-ambig-decl-expr.cpp ) where this diagnostic is triggered.

In addition to @riccibruno's comment, I found a couple of other suspicious things nearby (unrelated to the fix in this patch). No need to address them in this patch unless you feel motivated :)

clang/lib/Sema/SemaChecking.cpp
13370–13371

Hmm, don't we need to do different things for dot and arrow in this case?

13370–13371

Doesn't this need to preserve the AllowOnePastEnd value to avoid a false positive for &(cond ? arr1[N] : arr2[N])?

To be rigorous, we should perform "pointer" checking for every operation that performs pointer arithmetic. Then we should perform "lvalue" checking (which doesn't allow pointers one past the end) in the following places:

  1. When we take the address of an lvalue.
  2. When we form a reference to an lvalue.
  3. When we perform an lvalue-to-rvalue conversion.
  4. When we perform an assignment to an lvalue.

This sort of piecemeal approach of recursively looking through arbitrary expressions seems likely to miss cases. For example, we currently don't perform checks inside compound literals.

To be rigorous, we should perform "pointer" checking for every operation that performs pointer arithmetic. Then we should perform "lvalue" checking (which doesn't allow pointers one past the end) in the following places:

  1. When we take the address of an lvalue.
  2. When we form a reference to an lvalue.
  3. When we perform an lvalue-to-rvalue conversion.
  4. When we perform an assignment to an lvalue.

    This sort of piecemeal approach of recursively looking through arbitrary expressions seems likely to miss cases. For example, we currently don't perform checks inside compound literals.

I agree that this approach is not good. I'm concerned that your direction might still miss things, though (and it seems to involve a lot of AST traversal, which would be nice to avoid). For example, __builtin_bitcast performs a load without an lvalue-to-rvalue conversion, and to be thorough we'd need to special-case it. Perhaps we could instead:

  • warn immediately when indexing outside [0, N] inclusive
  • produce a deferred warning when indexing with index N, and diagnose at the end of the expression evaluation context
  • remove elements from the list of deferred warnings when handling an & operator

In C at least, that should be correct in all cases. I think it's correct in C++ as well; there are lots more forms of lvalue to walk into in the third step (eg, &(cond ? x[n] : y[n]) shouldn't warn), but it seems feasible to enumerate. (This would lose the warnings on *&x[n], but I don't think that's a disaster.)

and it seems to involve a lot of AST traversal

I was thinking we'd just call into SemaChecking in appropriate places. I guess there's a little AST traversal to figure whether an expression forms an array address. Your idea seems simpler.

remove elements from the list of deferred warnings when handling an & operator

For C++, I think you might also need to handle discarded-value expressions? Maybe it's okay if we warn anyway in that case. :)

This would lose the warnings on *&x[n], but I don't think that's a disaster

And more generally *(x+n), although I guess that isn't implemented now anyway.

ilya marked an inline comment as done.Dec 23 2019, 1:19 PM

These are not the only AST nodes representing cast expressions (there is also CXXFunctionalCastExpr, ...). What about replacing the IgnoreParenImpCasts() above by IgnoreParenCasts() ? Incidentally doing this uncovers another test (Parser/cxx-ambig-decl-expr.cpp ) where this diagnostic is triggered.

Thanks, using IgnoreParenCasts is even better.

clang/lib/Sema/SemaChecking.cpp
13370–13371

There are several test cases for an out of bounds access on an array member using dot and arrow operators in array-bounds.cpp. Do you have a specific test case for which you think the code is broken?

ilya updated this revision to Diff 235184.Dec 23 2019, 4:27 PM
  1. Refactor CheckArrayAccess() to take AllowPastTheEnd parameter to avoid a false positive array out-of-bounds warning for &(cond ? arr1[N] : arr2[N]).
  2. Use IgnoreParenCasts() instead of IgnoreParenImpCasts() in CheckArrayAccess() to avoid false negatives when explicit cast is involved.
ilya marked an inline comment as done.Dec 23 2019, 4:33 PM

Implemented @riccibruno's and @rsmith's comments.
While I agree with the general notion about the flaws with the current implementation, I feel that the more major refactoring proposed in this review is out of the scope of this minor fix.

ilya added a comment.Jan 6 2020, 9:59 AM

Kind ping

rsmith added inline comments.Jan 6 2020, 1:34 PM
clang/lib/Sema/SemaChecking.cpp
13370–13371

There are several test cases for an out of bounds access on an array member using dot and arrow operators in array-bounds.cpp. Do you have a specific test case for which you think the code is broken?

Sure. There's a false negative for this:

struct A { int n; };
A *a[4];
int *n = &a[4]->n;

... because we incorrectly visit the left-hand side of the -> with AllowOnePastEnd == 1. The left-hand side of -> is subject to lvalue-to-rvalue conversion, so can't be one-past-the-end regardless of the context in which the -> appears.

clang/test/SemaCXX/array-bounds.cpp
331

This case should warn; dynamic_cast will access the object's vptr. Please at least add a FIXME.

rsmith added a comment.Jan 6 2020, 1:36 PM

While I agree with the general notion about the flaws with the current implementation, I feel that the more major refactoring proposed in this review is out of the scope of this minor fix.

I'm OK with making the minor fix here and deferring the larger rework.

ilya updated this revision to Diff 238106.Jan 14 2020, 2:38 PM

Address rsmith's comments.

ilya marked 4 inline comments as done.Jan 14 2020, 2:48 PM
ilya added inline comments.
clang/lib/Sema/SemaChecking.cpp
13370–13371

... because we incorrectly visit the left-hand side of the -> with AllowOnePastEnd == 1. The left-hand side of -> is subject to lvalue-to-rvalue conversion, so can't be one-past-the-end regardless of the context in which the -> appears.

Thanks for clarifying. So basically we don't want to allow one-past-end for MemberExpr?

ebevhan added inline comments.Jan 16 2020, 7:18 AM
clang/lib/Sema/SemaChecking.cpp
13370–13371

Thanks for clarifying. So basically we don't want to allow one-past-end for MemberExpr?

I think the point is rather that when the MemberExpr isArrow, we want to think of it as performing an implicit dereference first, and thus we should do AllowOnePastEnd-- before recursing if that is the case.

ilya marked 2 inline comments as done.Jan 21 2020, 10:10 AM
ilya added inline comments.
clang/lib/Sema/SemaChecking.cpp
13370–13371

I think the point is rather that when the MemberExpr isArrow, we want to think of it as performing an implicit dereference first, and thus we should do AllowOnePastEnd-- before recursing if that is the case.

@ebevhan and I have discussed this offline, and we believe that there's no reason to allow an address-of of a member of a one past-the-end value, using either '.' or '->' operators, regardless of the fact that referencing a member of a one past-the-end value using '.' is merely a pointer arithmetic and doesn't imply dereferencing, as with operator '->'.

We couldn't find any reference in the C++ 2017 N4659 regarding member access of a one past-the-end value. Only mentions of one past-the-end in the context of iterators (27.2.1 p. 7) and pointer arithmetic (8.3.1 p. 3).

ISO C99 describes in 6.5.3.2 p. 3 that if the operand to the '&' operator is the result of an unary '*' operator (or array subscript, implictly), the two operators "cancel" each other, but it says nothing about the case when there's a member expression "in-between", (the member expression section, 6.5.2.3, says nothing about that either).

@rsmith, what do you think?