This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Fix -Warray-bounds false negative when casting an out-of-bounds array item
ClosedPublic

Authored by ilya on Dec 19 2019, 9:17 AM.

Event Timeline

ilya created this revision.Dec 19 2019, 9:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2019, 9:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ilya retitled this revision from [Sema] NFC: Remove trailing spaces and fix a typo in a test file to [Sema] Fix -Warray-bounds false negative when casting an out-of-bounds array item.Dec 19 2019, 9:28 AM
ilya edited the summary of this revision. (Show Details)
ilya added a reviewer: rsmith.
ilya added a subscriber: ebevhan.

These are not the only AST nodes representing cast expressions (there is also CXXFunctionalCastExpr, ...). What about replacing the IgnoreParenImpCasts() above by IgnoreParenCasts() ? Incidentally doing this uncovers another test (Parser/cxx-ambig-decl-expr.cpp ) where this diagnostic is triggered.

In addition to @riccibruno's comment, I found a couple of other suspicious things nearby (unrelated to the fix in this patch). No need to address them in this patch unless you feel motivated :)

clang/lib/Sema/SemaChecking.cpp
13370–13371

Hmm, don't we need to do different things for dot and arrow in this case?

13370–13371

Doesn't this need to preserve the AllowOnePastEnd value to avoid a false positive for &(cond ? arr1[N] : arr2[N])?

To be rigorous, we should perform "pointer" checking for every operation that performs pointer arithmetic. Then we should perform "lvalue" checking (which doesn't allow pointers one past the end) in the following places:

  1. When we take the address of an lvalue.
  2. When we form a reference to an lvalue.
  3. When we perform an lvalue-to-rvalue conversion.
  4. When we perform an assignment to an lvalue.

This sort of piecemeal approach of recursively looking through arbitrary expressions seems likely to miss cases. For example, we currently don't perform checks inside compound literals.

To be rigorous, we should perform "pointer" checking for every operation that performs pointer arithmetic. Then we should perform "lvalue" checking (which doesn't allow pointers one past the end) in the following places:

  1. When we take the address of an lvalue.
  2. When we form a reference to an lvalue.
  3. When we perform an lvalue-to-rvalue conversion.
  4. When we perform an assignment to an lvalue.

This sort of piecemeal approach of recursively looking through arbitrary expressions seems likely to miss cases. For example, we currently don't perform checks inside compound literals.

I agree that this approach is not good. I'm concerned that your direction might still miss things, though (and it seems to involve a lot of AST traversal, which would be nice to avoid). For example, __builtin_bitcast performs a load without an lvalue-to-rvalue conversion, and to be thorough we'd need to special-case it. Perhaps we could instead:

  • warn immediately when indexing outside [0, N] inclusive
  • produce a deferred warning when indexing with index N, and diagnose at the end of the expression evaluation context
  • remove elements from the list of deferred warnings when handling an & operator

In C at least, that should be correct in all cases. I think it's correct in C++ as well; there are lots more forms of lvalue to walk into in the third step (eg, &(cond ? x[n] : y[n]) shouldn't warn), but it seems feasible to enumerate. (This would lose the warnings on *&x[n], but I don't think that's a disaster.)

and it seems to involve a lot of AST traversal

I was thinking we'd just call into SemaChecking in appropriate places. I guess there's a little AST traversal to figure whether an expression forms an array address. Your idea seems simpler.

remove elements from the list of deferred warnings when handling an & operator

For C++, I think you might also need to handle discarded-value expressions? Maybe it's okay if we warn anyway in that case. :)

This would lose the warnings on *&x[n], but I don't think that's a disaster

And more generally *(x+n), although I guess that isn't implemented now anyway.

ilya marked an inline comment as done.Dec 23 2019, 1:19 PM

These are not the only AST nodes representing cast expressions (there is also CXXFunctionalCastExpr, ...). What about replacing the IgnoreParenImpCasts() above by IgnoreParenCasts() ? Incidentally doing this uncovers another test (Parser/cxx-ambig-decl-expr.cpp ) where this diagnostic is triggered.

Thanks, using IgnoreParenCasts is even better.

clang/lib/Sema/SemaChecking.cpp
13370–13371

There are several test cases for an out of bounds access on an array member using dot and arrow operators in array-bounds.cpp. Do you have a specific test case for which you think the code is broken?

ilya updated this revision to Diff 235184.Dec 23 2019, 4:27 PM
  1. Refactor CheckArrayAccess() to take AllowPastTheEnd parameter to avoid a false positive array out-of-bounds warning for &(cond ? arr1[N] : arr2[N]).
  2. Use IgnoreParenCasts() instead of IgnoreParenImpCasts() in CheckArrayAccess() to avoid false negatives when explicit cast is involved.
ilya marked an inline comment as done.Dec 23 2019, 4:33 PM

Implemented @riccibruno's and @rsmith's comments.
While I agree with the general notion about the flaws with the current implementation, I feel that the more major refactoring proposed in this review is out of the scope of this minor fix.

ilya added a comment.Jan 6 2020, 9:59 AM

Kind ping

rsmith added inline comments.Jan 6 2020, 1:34 PM
clang/lib/Sema/SemaChecking.cpp
13370–13371

There are several test cases for an out of bounds access on an array member using dot and arrow operators in array-bounds.cpp. Do you have a specific test case for which you think the code is broken?

Sure. There's a false negative for this:

struct A { int n; };
A *a[4];
int *n = &a[4]->n;

... because we incorrectly visit the left-hand side of the -> with AllowOnePastEnd == 1. The left-hand side of -> is subject to lvalue-to-rvalue conversion, so can't be one-past-the-end regardless of the context in which the -> appears.

clang/test/SemaCXX/array-bounds.cpp
331

This case should warn; dynamic_cast will access the object's vptr. Please at least add a FIXME.

rsmith added a comment.Jan 6 2020, 1:36 PM

While I agree with the general notion about the flaws with the current implementation, I feel that the more major refactoring proposed in this review is out of the scope of this minor fix.

I'm OK with making the minor fix here and deferring the larger rework.

ilya updated this revision to Diff 238106.Jan 14 2020, 2:38 PM

Address rsmith's comments.

ilya marked 4 inline comments as done.Jan 14 2020, 2:48 PM
ilya added inline comments.
clang/lib/Sema/SemaChecking.cpp
13370–13371

... because we incorrectly visit the left-hand side of the -> with AllowOnePastEnd == 1. The left-hand side of -> is subject to lvalue-to-rvalue conversion, so can't be one-past-the-end regardless of the context in which the -> appears.

Thanks for clarifying. So basically we don't want to allow one-past-end for MemberExpr?

ebevhan added inline comments.Jan 16 2020, 7:18 AM
clang/lib/Sema/SemaChecking.cpp
13370–13371

Thanks for clarifying. So basically we don't want to allow one-past-end for MemberExpr?

I think the point is rather that when the MemberExpr isArrow, we want to think of it as performing an implicit dereference first, and thus we should do AllowOnePastEnd-- before recursing if that is the case.

ilya marked 2 inline comments as done.Jan 21 2020, 10:10 AM
ilya added inline comments.
clang/lib/Sema/SemaChecking.cpp
13370–13371

I think the point is rather that when the MemberExpr isArrow, we want to think of it as performing an implicit dereference first, and thus we should do AllowOnePastEnd-- before recursing if that is the case.

@ebevhan and I have discussed this offline, and we believe that there's no reason to allow an address-of of a member of a one past-the-end value, using either '.' or '->' operators, regardless of the fact that referencing a member of a one past-the-end value using '.' is merely a pointer arithmetic and doesn't imply dereferencing, as with operator '->'.

We couldn't find any reference in the C++ 2017 N4659 regarding member access of a one past-the-end value. Only mentions of one past-the-end in the context of iterators (27.2.1 p. 7) and pointer arithmetic (8.3.1 p. 3).

ISO C99 describes in 6.5.3.2 p. 3 that if the operand to the '&' operator is the result of an unary '*' operator (or array subscript, implictly), the two operators "cancel" each other, but it says nothing about the case when there's a member expression "in-between", (the member expression section, 6.5.2.3, says nothing about that either).

@rsmith, what do you think?

ilya added a comment.Mar 13 2020, 7:30 PM

One last kind ping

Ping @rsmith . Are you ok if I rebase and push this change? Or do you have specific items that require attention? I've spoken to @ilya , he's ok if I clean this up and push with credit to him. Thanks

Rebase, commandeer this patch from Ilya.

vabridgers edited the summary of this revision. (Show Details)Jan 17 2021, 5:22 PM

Ping. Could someone pick up review of this patch again, please? Ilya left off with a question to @rsmith , and at that point all activity fell off. Thanks.

Gentle and polite ping :) Could someone have a look at this? @rsmith , or @lattner ? Thanks.

rsmith accepted this revision.Jan 28 2021, 4:00 PM
rsmith added inline comments.
clang/lib/Sema/SemaChecking.cpp
13370–13371

Let's give this a go. I'm a little concerned that people might write things like:

struct X { int n; };
X x[32];
int *end = &x[32].n;

I don't think that has defined behavior in C++, because there isn't an X object so there isn't an n member to refer to, but I think it's probably OK in C (hard to say in both cases, though, due to underspecification in both standards). Nonetheless, such code is at least questionable. I don't think we have a good way to determine how well this warning will work other than trying it, so let's do that.

Please be prepared to revert and do something more conservative if we find this fires a lot on somewhat-reasonable code :)

This revision is now accepted and ready to land.Jan 28 2021, 4:00 PM

Thanks @rsmith, will do. Best!

thakis added a subscriber: thakis.Feb 3 2021, 4:48 PM

Hi,

this causes a warning in harfbuzz on code that looks like so:

UChar decomposed[4];
...
int len = unorm2_getRawDecomposition(...decomposed, ARRAY_LENGTH (decomposed)...)
if (len == 1) {
  U16_GET_UNSAFE (decomposed, 0, *a);

where U16_GET_UNSAFE looks like so:

#define U16_GET_UNSAFE(s, i, c) UPRV_BLOCK_MACRO_BEGIN { \
    (c)=(s)[i]; \
    if(U16_IS_SURROGATE(c)) { \
        if(U16_IS_SURROGATE_LEAD(c)) { \
            (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)+1]); \
        } else { \
            (c)=U16_GET_SUPPLEMENTARY((s)[(i)-1], (c)); \
        } \
    } \
} UPRV_BLOCK_MACRO_END

(The BLOCK_MACRO macros are just a way to spell do...while(false) as far as I understand.)

It's true that the else block in the macro evaluates to decomposed[0-1], which is out of bounds -- but it's also in a conditional, and in a macro. That seems like this can cause quite a few false positives. Maybe this warning should be suppressed in conditionals, or macros, or some such?

(50k/70k files of chromium built with this patch, this so far is the only instance it flagged, and that's a false positive.)

Thanks @thakis, per @rsmith 's suggestion in the review, I'll revert this. Apologies for the inconvenience.