This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix for incorrect handling of 0 length non-POD array construction
ClosedPublic

Authored by isuckatcs on Aug 9 2022, 8:39 AM.

Details

Summary

We current handle 0 size non-POD arrays as if they had 1 element, which leads to false positives.

See https://godbolt.org/z/6E98xY66W for one example.

This patch fixes this bug.

Diff Detail

Event Timeline

isuckatcs created this revision.Aug 9 2022, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2022, 8:39 AM
isuckatcs requested review of this revision.Aug 9 2022, 8:39 AM

Thanks for the patch.
Please check if all the touched lines are covered by tests.
Other than a few nits it looks great!

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
593

Consider moving this comment to the condition where you check Size == 0.

596
1162

Why was the UndefinedVal inappropriate? In fact, that seems to be the right value.

1176–1184

You could be specific about that case and set the UndefinedVal initial value instead of relying on that's the previous value of InitVal.

clang/test/Analysis/array-init-loop.cpp
237

I didn't know this is a thing. I thought it was ill-formed c++.
Aha, they are diagnosed by -Wzero-length-array as warning: zero size arrays are an extension. Interesting.
And it seems like sizeof(arr) is zero in the backend; and we should in deed support this construct.

isuckatcs marked 5 inline comments as done.
  • Addressed comments
  • Added more testcases
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
1162

If InitVal is undefined, State = State->bindLoc(FieldLoc, InitVal, LocCtxt); will crash.

That method eventually calls RegionStoreManager::bindArray() where we are not handling if the value is Undefined, so when we reach const nonloc::CompoundVal& CV = Init.castAs<nonloc::CompoundVal>(); the cast fails and we have a crash.

I changed the default value and if the value is Undefined I don't bind it to the store at all.

It seems that in this situation it doesn't matter what the value is, as the captured 0 size array will only function as a pointer to the beginning of the lambda. See it on godbolt. That means if we read some value from the array it is not necessarily Undefined, so I assume there's no need for any special handling here.

Rebased and added more test cases

NoQ added a comment.Aug 14 2022, 12:05 AM

Brilliant catch as always!

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
1175–1176

This code is communicating a fairly specific, unusual, peculiar situation of a zero-size array through a very generic, uninformative, unexpressive means of object-under-construction not being set. I.e., if I see object-under-construction not being set, zero-size array is the last thing I'd suspect. It's very likely that someone else will try to communicate some other information by not setting object-under-construction, and then we'll hit a hard-to-debug cornercase interaction when you think it means one thing but it actually means the other thing.

Is there a more direct way to communicate that the array is zero size? Like, make a special marker specifically for that purpose, that definitely can't be confused with anything else?

xazax.hun added inline comments.Aug 14 2022, 9:42 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
599

Should this be an else if?

606–611

Note that some compilers treat zero size arrays as flexible array members, see https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

I am not sure whether we actually would ever need special handling for those as the Ctors probably need to be invoked manually for those cases and we would see that invocation in the analyzer. But a simple TODO to test that out at some point might be handy.

isuckatcs marked 2 inline comments as done.

Rebase and addressed some comments.

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
1175–1176

In this situation I think it's the other way around. It's not like the array being zero length is communicated by not marking it as under construction, it's rather like when the array is zero length, the constructors are not invoked, so object under construction isn't set either.

But this snippet has changed over time, and the size of the array is explicitly checked instead.

isuckatcs added inline comments.Aug 22 2022, 11:21 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
606–611

I think that case requires no special handling. The declaration of a zero length array is like a no-op regardless of where it is, no assembly code is generated for it.

If you have a struct like struct S { int size; Aggregate arr[0]; }, and instantiate it on the stack, the constructor for Aggregate is not invoked, so I also ignore it in this patch.

If memory is allocated using malloc() or ::operator new()
(e.g.: S *s = (S*)::operator new(offsetof(S, arr) + sizeof(Aggregate) * 10);),
the constructors need to be called manually.

If the developer calls new that will either not compile (e.g.: S s; s.arr = new Aggregate[10];), or will lead to memory corruption (e.g.: S *s = (S*)new Aggregate[10];). In the later case the analyzer should simply take the symbolic region of s and construct everything there, though I haven't tested this yet.

I'm going to add a test case for this feature soon, I just wanted to explain how I think about it, in case I miss something.

isuckatcs updated this revision to Diff 454778.Aug 23 2022, 3:55 AM

Fixed an error prone line

xazax.hun accepted this revision.Aug 23 2022, 11:40 AM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
606–611

Sounds good, thanks!

This revision is now accepted and ready to land.Aug 23 2022, 11:40 AM
isuckatcs updated this revision to Diff 455146.Aug 24 2022, 3:53 AM
isuckatcs marked 2 inline comments as done.

@xazax.hun I added a test case for how I imagine flexible non-POD array members in C++

xazax.hun accepted this revision.Aug 24 2022, 7:47 AM

@xazax.hun I added a test case for how I imagine flexible non-POD array members in C++

Thanks! I think the test case could be simplified a bit, I am not sure whether we need offsetof, but otherwise looks good to me.

clang/test/Analysis/flexible-array-member.cpp
34 ↗(On Diff #455146)

I am not sure how idiomatic calling operator new is, I mostly see code examples with malloc around the internet. But feel free to leave this as is.

Most of the examples I see look like: (Flex*)malloc(sizeof(Flex) + sizeof(S) * size) which is admittedly a bit C-like, but I think you might not need the offsetof.

isuckatcs marked an inline comment as done.Aug 24 2022, 8:40 AM
isuckatcs added inline comments.
clang/test/Analysis/flexible-array-member.cpp
34 ↗(On Diff #455146)

I am not sure how idiomatic calling operator new is, I mostly see code examples with malloc around the internet. But feel free to leave this as is.

I wanted the test to be more "C++-like", and both of those functions do the same, so I chose ::operator new(). I'll leave it like this for now. We can change it anytime later if required.

I think you might not need the offsetof.

True, the size of a zero length array is 0, so the size of the struct is equivalent to the size of length. I was assuming some padding between length and contents, which might happen with a 1 element array (I've seen this construct used with a 1 element array, since that's accepted by every compiler). If a 1 element array is used, I think it's more expressive to do offsetof(Flex, contents) + sizeof(S) * size instead of sizeof(Flex) + sizeof(S) * (size - 1).

xazax.hun added inline comments.Aug 24 2022, 8:44 AM
clang/test/Analysis/flexible-array-member.cpp
34 ↗(On Diff #455146)

I was assuming some padding between length and contents

I am not very well versed in this area, but I'd expect the compiler to include that padding in sizeof(Flex). At least, that would be the least error prone solution.

isuckatcs marked 2 inline comments as done.Aug 24 2022, 9:33 AM
isuckatcs added inline comments.
clang/test/Analysis/flexible-array-member.cpp
34 ↗(On Diff #455146)

I'd expect the compiler to include that padding in sizeof(Flex).

It includes the padding. What I meant is that imagine you have a similar struct, with a one element array
(e.g.: struct S {int length; void *elements[1];};). Then you will probably have 4 padding bytes between length and elements.

If you want to allocate memory for an array with size elements, you would either do sizeof(S) + sizeof(void*) * (size - 1) or
offsetof(S, elements) + sizeof(void*) * size. The second one is more readable I think.

This was on my mind when I was writing the test case, hence the offsetof solution. It's true that you probably don't want to create a field like S arr[1], because that would also invoke a default constructor (if exists).

isuckatcs marked an inline comment as done.Aug 24 2022, 9:33 AM
isuckatcs retitled this revision from [analyzer] Fix for incorrect handling of 0 size arrays to [analyzer] Fix for incorrect handling of 0 length non-POD array construction.Aug 25 2022, 3:41 AM
isuckatcs edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 3:42 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript