This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add C++ array delete checker
ClosedPublic

Authored by Discookie on Aug 17 2023, 1:14 AM.

Details

Summary

This checker reports cases where an array of polymorphic objects are deleted as their base class.
Deleting an array where the array's static type is different from its dynamic type is undefined.

Since the checker is similar to DeleteWithNonVirtualDtorChecker, I refactored that checker to support more detection types.

This checker corresponds to the SEI Cert rule EXP51-CPP: Do not delete an array through a pointer of the incorrect type.

Diff Detail

Event Timeline

Discookie created this revision.Aug 17 2023, 1:14 AM
Discookie requested review of this revision.Aug 17 2023, 1:14 AM

This checker is a straightforward, clean implementation of the SEI-CERT rule EXP51-CPP, it'll be a good addition to Static Analyzer. I'm satisfied with the implementation, although there might be others who have a more refined taste in C++ coding style. The only issue that I found is the misleading name of the "common ancestor" hidden checker that I mentioned in the inline comment.

The test suite nicely covers all the basic situations, but I'd be happy to see some corner cases like virtual and non-virtual diamond inheritance or even a degenerate example like

class Base { /*...*/ };
class Derived: public Base {
  Base basesAsMember[10];
  /*...*/
};
void f() {
  Derived *d = new Derived(...);
  delete[] (d->basesAsMember);
}
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
757–760

The name and help text of this hidden dependency is very misleading: it's not particularly connected to arrays (the connection between the two child checkers is that they both handle the "delete" operator) and it's not a modeling checker (which would provide function evaluation, constraints and transitions) but an "empty shell" checker that does nothing by itself, but provides a singleton checker object where the registration of the child checkers can enable the two analysis modes.

Please rename this to CXXDeleteHelper or something similar.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
88

This logic (which is inherited from the earlier checker) is a bit surprising because it strips all Element, Field, BaseObject and DerivedObject layers from the MemRegion object.

I think it works in practice because
(1) there is an `isDerivedFrom() check after it and
(2) it's rare to write (buggy) code like

struct Inner { /*...*/ };
struct Outer { Inner inner; /*...*/ };
void f() {
  Outer *o = new Outer(...);
  delete(o->inner);
}

Perhaps this could be explained by a comment if you feel that it's warranted; but I mostly wrote this as a note for myself and other reviewers.

By the way, what would you think about putting the implementation of the two checkers (DeleteWithNonVirtualDtor, ArrayDelete) into two separate checker classes and placing the shared code into a common base class of these classes? I think this could be a significant simplification: we could avoid the manual management of the two checker names and bug types and register two regular checker objects instead of the one dummy prerequisite checker + two "fake" checker registrations that just toggle boolean flags.

Discookie planned changes to this revision.Aug 21 2023, 2:58 AM

Indeed it would make more sense to separate checkers into their own classes, but using a shared base. This checker doesn't have quite as advanced logic as MallocChecker, so there's not that much of a need for a single class managing everything. I'll change the checker to that system.

I found that this concept of "fake" checkers (that change only options of a common checker instance) has more problems (checker dependencies, callback order, separation of "modeling" and bug report generation independently), it is better to avoid this. It is really the same as having options of a single checker (separate checkers are not the same as options of a checker), otherwise a common base class or common code module is a better option.

I share the raised concerns. And I think we are aligned.

PS: sorry if any of my comments are dups, or outdated. I only had a little time last week, and things have progressed since then I noticed. I still decided to submit my possibly outdated and partial review comments. I hope you understand.
Its quite difficult to allocate time to do reviews from day to to day work. I unfortunately usually do this in my spare time, if I have.

clang/docs/analyzer/checkers.rst
1787–1804

I think you should probably mention EXP51-CPP CERT rule somehow here.

1789–1791
1793–1803

Make sure in the example the create is related (e.g. called/used).
Also, refrain from using sink in the docs. It's usually used in the context of taint analysis.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
757–760
clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
50–51
61
65–68

One more thing to mention. Its usually illadvised to rename files or do changes that would seriously impact git blame, unless we have a really good reason doing so.
Aesthetics usually don't count one, especially if the name is not user-facing. However, maintainability is another axis, thus as it's always, not black and white.

One more thing to mention. Its usually illadvised to rename files or do changes that would seriously impact git blame, unless we have a really good reason doing so.
Aesthetics usually don't count one, especially if the name is not user-facing. However, maintainability is another axis, thus as it's always, not black and white.

Git (e.g. git blame) can be smart enough to recognize renamed files if there aren't too much changes, but this change may be too large for that (at least Phabricator doesn't recognize it as a move, which confused me at first when I looked at this review). I'd suggest keeping the old filename in this commit; if you wish (and the community accepts it) you could rename the file in a separate followup NFC commit that doesn't do anything else.

PS: sorry if any of my comments are dups, or outdated. I only had a little time last week, and things have progressed since then I noticed. I still decided to submit my possibly outdated and partial review comments. I hope you understand.
Its quite difficult to allocate time to do reviews from day to to day work. I unfortunately usually do this in my spare time, if I have.

Thanks for reviewing our commits, it's very helpful for us.

Discookie updated this revision to Diff 555821.Sep 5 2023, 1:27 AM

Refactored the checkers to use a shared base class, otherwise I kept the same functionality.

I'll push this diff in a way where the blame transfers between the files, but I don't think keeping the file name the same makes sense, since this is definitely not a DeleteWithNonVirtualDtorChecker.

Discookie marked 8 inline comments as done.Sep 5 2023, 1:50 AM

Fixed the formatting issues as well.

clang/docs/analyzer/checkers.rst
1787–1804

Added a link to the CERT rules, similar to how others do it, thanks!

1793–1803

Changed the example - should I change the DeleteWithNonVirtualDtor example as well? That has the same issues as you said here.

Add a test where we have multiple imp derived to base casts but only one leads to a bug. Do it for both ways, once that the first is the offending, and once the second.

clang/docs/analyzer/checkers.rst
1793–1803

I have no opinion. You could.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
126

Is this transitive?

BTW inheritance can only be expressed if the class is a definition, right?
Thus passing this should imply has definition.

143

I think addVisitor already takes a template param, and it will call make unique for you.

192

Aren't you actually interested in N->getLocation().getAs<StmtPoint>().getStmt()?

The diag stmt can be fuzzy, but the PP is exact.

196

If you dyncast to ImplicitCastExpr, couldn't you have done it here?

202

How do you know that that this castexpr corresponds to the region for which you report the bug? To mez this might be some unrelated castexpr.
I was expecting the offending memregion being passed to the visitor, that it can check against.

217

There are so many early returns going on. I feel, we could miss the program point where it should have been marked satisfied. After this point, the visitor will never or should never emit a note.

224

Use named argument passing.

donat.nagy added inline comments.Sep 5 2023, 4:44 AM
clang/docs/analyzer/checkers.rst
1793–1803

I think you should definitely update it, for the sake of consistency and just improving things whenever we can. This commit already touches the code of that checker, there is no point in leaving bad documentation behind us...

Discookie updated this revision to Diff 556402.Sep 11 2023, 1:28 AM
Discookie marked 8 inline comments as done.

Added a test for taking last upcast only for the note tag. For now the visitor matches all explicit casts, because there are too many edge cases to count for now wrt. explicit upcasting.

Discookie added inline comments.Sep 11 2023, 1:29 AM
clang/docs/analyzer/checkers.rst
1793–1803

Updated.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
126

It's transitive indeed.

I thought there were some presumptions about hasDefinition() in isDerivedFrom(), but apparently not, and it didn't cause any crashes either. Removed the redundant checks.

192

As far as I can tell, getStmtForDiagnostics() does exactly that, but with a bit more edge case handling and a couple fallbacks.

196

I'm interested in all cast expressions, not just implicit ones. There can also be explicit upcasts as well.

202

We know it's the one we are looking for, because the checker marked it interesting earlier. Not sure how memregion equality check would differ from taking isInteresting(), but it can be changed to that if needed.

I have concerns mostly about the cast visitor.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
192

If they do the same, and it does not depend on the mentioned fallbacks, I think we should use the Stmt of the programpoint to be explicit.

194–201

Have you considered dyn_cast<CastExpr>() to make it work for both implicit and explicit casts?
BTW simular -> similar

220

We should assert that this visitor always adds a Note. In other words, that it must find the Stmt where the derived->base conversion happened. If ever that's not true, we have a bug.

clang/test/Analysis/ArrayDelete.cpp
85–88

Hmm, the static type of d3 doesn't tell us much about how it refers to an object of type DoubleDerived.
To me, it would make sense to have multiple Conversion from derived to base happened here, even telling us what static type it converted to what other static type in the message.
And it should add a new visitor of the same kind tracking the castee.

Derived *d3 = new DoubleDerived[10]; // note: `DoubleDerived` -> `Derived` here
Base *b3 = d3; // note: `Derived` -> `Base` here
delete[] b3; // warn: Deleting `Derived` objects as `Base` objects.

WDYT @donat.nagy ?

90–93

I think in such cases a static_cast should suffice; unless you intentionally wanted to test reinterpret_cast of course.

donat.nagy added inline comments.Sep 12 2023, 2:12 AM
clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
47

I like that yo switched to a more descriptive class name :)

clang/test/Analysis/ArrayDelete.cpp
85–88

I agree that it would be good to be good to mention the class names in the message.

90–93

Note that the effects of reinterpret_cast and static_cast are different [1]: static_cast<Derived*>(BasePtr) adjusts the pointer value to ensure that it points to the derived object whose Base ancestor object would be located at BasePtr, while a reinterpret_cast keeps the raw pointer value (which is complete nonsense from an object-oriented point-of-view, but could be relevant in some low-level hacks).

I'd guess that this part is directly testing the strange behavior of reinterpret_cast; but it'd be good to add a comment that clarifies this.

[1]: https://stackoverflow.com/questions/9138730/why-is-it-important-to-use-static-cast-instead-of-reinterpret-cast-here

steakhal added inline comments.Sep 12 2023, 3:23 AM
clang/test/Analysis/ArrayDelete.cpp
85–88

Do you also agree that we should have all steps where such a conversion happened?
Notice the 2 note: markers in my example. @donat.nagy

donat.nagy added inline comments.Sep 12 2023, 3:46 AM
clang/test/Analysis/ArrayDelete.cpp
85–88

It would be a nice addition if it wouldn't seriously complicate the implementation.

If we want to report multiple/all conversions, then we would need to create messages for back-and-forth conversions (e.g. allocating Derived, converting it to Base, back to Derived, back to Base, then deleting it illegally).

Discookie updated this revision to Diff 556933.Sep 18 2023, 1:32 AM

Updated the visitor to track all conversions, and have type names for clarity.

Discookie marked 12 inline comments as done.Sep 18 2023, 1:40 AM
Discookie added inline comments.
clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
192

I'd rather not, since this pattern isn't used anywhere else, while getStmtForDiagnostics is widely used across the codebase.

194–201

I did exactly that a little earlier :D but I reworked the logic to filter based on the availability of types instead of the CastType.

217

Since we're emitting notes for all casts, this isn't needed anymore.

clang/test/Analysis/ArrayDelete.cpp
85–88

Added, as requested! While casts of reference types are not supported, it already makes the flow of classes much clearer.

90–93

Sadly there isn't any deeper meaning to reinterpret_cast here, just a mistake on my part. Fixed and changed to static_cast.

Discookie marked 6 inline comments as done.Sep 18 2023, 1:42 AM
Discookie added inline comments.
clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
220

Can't do that just yet as casts in the form of Derived &d = ...; Base &b = d; is not supported.

Looks good, thanks for the improvements!

One minor remark: your commit uses backticks (`) around the class names in the bug reports, which is commonly used to mark code fragments e.g. in commit messages; however, in the bug reports IIRC most other checkers use apostrophes (').

@steakhal gentle ping for one last round of reviews

The code looks good now.
Let's have one last round.

Do you have real-world reports?

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
758–759

I thin these strings are concatenated like in C, thus it's gonna have "aredestructed" joined.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
174–177

Prefer single quotes over backticks.

199–201

What is the problem with this?
I thought getPointeeType() works for ReferenceTypes.

218–219

We use single apostrophes for quoting names in CSA.

Discookie updated this revision to Diff 557438.Sep 28 2023, 3:55 AM

Replaced backticks with single quotes.

Discookie updated this revision to Diff 557439.Sep 28 2023, 3:58 AM
Discookie marked 5 inline comments as done.Sep 28 2023, 4:06 AM

Ran the checker on a couple larger projects, but no real-world reports found so far (same as DeleteWithNonVirtualDtor). Can't tell if it's an engine limitation or just the bug being uncommon in general.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
758–759

Whoops, fixed.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
199–201

Apparently not, because references aren't ReferenceTypes but qualified Types. I could add support for it in a future commit, but I'd think casting and deleting array-references wrongly is even less common than deleting array-pointers.

218–219

Got it, thanks!

steakhal accepted this revision.Sep 28 2023, 4:24 AM

It's good enough already, given you apply my last suggestion.
References would be nice to support for notes, but not a blocker.

clang/lib/StaticAnalyzer/Checkers/CXXDeleteChecker.cpp
199–201

Nop, this is for tracking conversions. That might be common.
Such as in our codebase, we usually prefer references. However, in clang we have pointers all over the place, thus internally one might cast references, but delete or pass them back to a legacy API by taking the address of the reference.

clang/test/Analysis/ArrayDelete.cpp
103–110
This revision is now accepted and ready to land.Sep 28 2023, 4:24 AM