This is an archive of the discontinued LLVM Phabricator instance.

[clang][Sema] Use correct array size for diagnostic
ClosedPublic

Authored by void on Oct 13 2022, 2:30 PM.

Details

Summary

The diagnostic was confusing and reporting that an array contains far
more elements than it is defined to have, due to casting.

For example, this code:

double foo[4096];
((char*)foo)[sizeof(foo)];

warns that the "index 32768 is past the end of the array (which contains
32768 elements)."

Diff Detail

Event Timeline

void created this revision.Oct 13 2022, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 2:30 PM
void requested review of this revision.Oct 13 2022, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 2:30 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
shafik added a subscriber: shafik.Oct 13 2022, 3:49 PM

Thank you for this fix, I have some comments.

clang/lib/Sema/SemaChecking.cpp
16074

index is also wrong as you can see from the diagnostic below array index 32768 is past.

I think what you want is index.getNumWords() and size.getNumWords().

16076

It is not clear what this is for.

If we look at warn_array_index_exceeds_bounds in DiagnosticSemaKinds.td we can see that element%s2 which I believe is a typo and means that this argument is lost.

void added inline comments.Oct 13 2022, 4:04 PM
clang/lib/Sema/SemaChecking.cpp
16074

I'm not so sure about the index part. The sizeof(foo) bit is 32768, so that part of the diagnostic is technically correct. (The best kind of correct! :-)

As for getNumWords(), won't that just default to the size of the array in int's?

16076

I think it's to (somehow) specify whether or not to add an s to the end of element...but I'm not 100% sure.

kees added inline comments.Oct 13 2022, 7:54 PM
clang/test/Sema/array-bounds-ptr-arith.c
13–14

Can this be changed to 8 from 80 to make sure it is also doing the math right on the check and not just the warning text?

void updated this revision to Diff 467941.Oct 14 2022, 3:12 PM

Update test to use a smaller index.

void marked an inline comment as done.Oct 14 2022, 3:12 PM
clang/test/SemaCXX/array-bounds.cpp
240

I find this new message quite confusing, because the array index is given in terms of char elements, while the array size is given in terms of double elements. I actually find the original message more consistent to that respect.

kees added inline comments.Oct 17 2022, 12:35 PM
clang/test/SemaCXX/array-bounds.cpp
240

Perhaps show both element count and byte offset?

array index $count is $(bytes - end_of_array_in_bytes) past the end of the array (which contains $end_of_array_in_bytes bytes of $array_max_index elements)

clang/test/SemaCXX/array-bounds.cpp
240

if we have no cast, I find the version without the patch just fine.

If we have a cast, I suggest we just express the result in bytes, something along those lines (inspired by your suggestion):

array index $count is $(bytes - end_of_array_in_bytes) bytes past the end of the allocated memory for that array (which contains $end_of_array_in_bytes bytes)
void added inline comments.Oct 17 2022, 3:23 PM
clang/test/SemaCXX/array-bounds.cpp
240

I so far have this. It looks not too bad even for the non-cast. Thoughts?

$ cat ab.c
void test_pr10771() {
    double foo[4096];  // expected-note {{array 'foo' declared here}}

    ((char*)foo)[sizeof(foo)] = '\0';  // expected-warning {{array index 32768 is past the end of the array (which contains 4096 elements of type 'double[4096])}}
    foo[4098] = '\0';  // expected-warning {{array index 32768 is past the end of the array (which contains 4096 elements of type 'double[4096])}}
}
$ clang++ -x c++ -o /dev/null -S -std=c++14 ~/llvm/ab.c
/home/morbo/llvm/ab.c:4:13: warning: array index 32768 is 0 bytes past the end of the array (that has type 'double[4096]') [-Warray-bounds]
    ((char*)foo)[sizeof(foo)] = '\0';  // expected-warning {{array index 32768 is past the end of the array (which contains 4096 elements of type 'double[4096])}}
            ^    ~~~~~~~~~~~
/home/morbo/llvm/ab.c:2:5: note: array 'foo' declared here
    double foo[4096];  // expected-note {{array 'foo' declared here}}
    ^
/home/morbo/llvm/ab.c:5:5: warning: array index 4098 is 16 bytes past the end of the array (that has type 'double[4096]') [-Warray-bounds]
    foo[4098] = '\0';  // expected-warning {{array index 32768 is past the end of the array (which contains 4096 elements of type 'double[4096])}}
    ^   ~~~~
/home/morbo/llvm/ab.c:2:5: note: array 'foo' declared here
    double foo[4096];  // expected-note {{array 'foo' declared here}}
    ^
2 warnings generated.
void updated this revision to Diff 468353.Oct 17 2022, 3:55 PM

Improve the error message to include how many bytes the index goes over

warning: the pointer incremented by 14 refers 1 byte past the end of the array (that has type 'const char[13]')
void updated this revision to Diff 468355.Oct 17 2022, 4:04 PM

Update more testcases.

void added a comment.Oct 17 2022, 4:09 PM

The messages that's a bit annoying are the ones reporting "0 bytes past the end of the array":

warning: array index 2 is 0 bytes past the end of the array (that has type 'int[2]')

I'm not sure what to do with these (except to leave them alone).

void updated this revision to Diff 468356.Oct 17 2022, 4:13 PM

Fix rogue test.

shafik requested changes to this revision.Oct 17 2022, 5:01 PM

The current approach of mixing bytes and indices in the same diagnostic is too confusing. I think we see some acceptable messages for out of bounds access by looking at three different implementations diagnostic for OOB access in a constant expression context:

int main() {
    constexpr int arr[1]{};
    constexpr int x = arr[3];
}

We have clang which produces:

<source>:3:23: note: cannot refer to element 3 of array of 1 element in a constant expression
    constexpr int x = arr[3];
                      ^

gcc produces:

<source>:3:28: error: array subscript value '3' is outside the bounds of array 'arr' of type 'const int [1]'
    3 |     constexpr int x = arr[3];
      |                       ~~~~~^

and MSVC produces:

<source>(3): note: failure was caused by out of range index 3; allowed range is 0 <= index < 1

I think any approach similar to this would be acceptable.

This revision now requires changes to proceed.Oct 17 2022, 5:01 PM
void added a comment.Oct 18 2022, 12:07 AM

The current approach of mixing bytes and indices in the same diagnostic is too confusing.

"Current" approach?

I think we see some acceptable messages for out of bounds access by looking at three different implementations diagnostic for OOB access in a constant expression context:

int main() {
    constexpr int arr[1]{};
    constexpr int x = arr[3];
}

We have clang which produces:

<source>:3:23: note: cannot refer to element 3 of array of 1 element in a constant expression
    constexpr int x = arr[3];
                      ^

gcc produces:

<source>:3:28: error: array subscript value '3' is outside the bounds of array 'arr' of type 'const int [1]'
    3 |     constexpr int x = arr[3];
      |                       ~~~~~^

and MSVC produces:

<source>(3): note: failure was caused by out of range index 3; allowed range is 0 <= index < 1

I think any approach similar to this would be acceptable.

The issue is when we get strange mixtures of byte indexes with non-byte arrays. The situations we're dealing with is this one:

void test() {
  double foo[4096];

  ((char*)foo)[sizeof(foo)] = '\0';
}

What's generated by ToT is a diagnostic saying that the index (in bytes) is past the array size (also in bytes):

array index 32768 is past the end of the array (which contains 32768 elements)

So there's a disconnect between the original size of the array (which is in doubles) and the array size being reported (which is in bytes). It seems that no matter what it's not going to be easy to give anything but "mixed" information (bytes and indices) to the user to ensure that we've fully described the issue at hand.

aaron.ballman added inline comments.
clang/test/SemaCXX/array-bounds.cpp
240

The part I struggle with is the "is 0 bytes past the end of the array" -- that reads as if it's not past the end of the array at all because it's saying "You've accessed right up to the end of the array; congratulations on your very careful use!".

void added inline comments.Oct 18 2022, 1:55 PM
clang/test/SemaCXX/array-bounds.cpp
240

Okay. How about this then?

Typical diagnostic:

  warning: array index %{index} is past the end of the array (that has type %{type})

Diagnostic if the array is cast to something else:

  warning: array index %{index} is past the end of the array (that has type %{type}, cast to ${cast_type})
void updated this revision to Diff 468728.Oct 18 2022, 3:59 PM

Update the error message to remove the reference to bytes, which can be confusing.

I really like the new message, thanks for taking account the diverging reviews!

aaron.ballman accepted this revision.Oct 19 2022, 5:52 AM

LGTM, I think the new diagnostic wording is an improvement. Can you please add a release note to mention the diagnostic wording was patched up?

void added a comment.Oct 19 2022, 12:01 PM

Can you please add a release note to mention the diagnostic wording was patched up?

Done.

void updated this revision to Diff 468997.Oct 19 2022, 12:02 PM

Add release note.

void updated this revision to Diff 469044.Oct 19 2022, 2:14 PM

Reformatting.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 20 2022, 11:57 AM
This revision was automatically updated to reflect the committed changes.

Apologies for the late reply but after the last changes to the diagnostic messages are much better.

void added a comment.Oct 20 2022, 4:29 PM

Apologies for the late reply but after the last changes to the diagnostic messages are much better.

Thanks! Sorry I jumped the gun and submitted before your reply...