Page MenuHomePhabricator

[analyzer] MallocChecker: Add a visitor to leave a note on functions that could have, but did not change ownership on leaked memory
ClosedPublic

Authored by Szelethus on Jul 12 2021, 8:28 AM.

Details

Summary

This is a rather common feedback we get from out leak checkers: bug reports are really short, and are contain barely any usable information on what the analyzer did to conclude that a leak actually happened.

This happens because of our bug report minimizing effort. We construct bug reports by inspecting the ExplodedNodes that lead to the error from the bottom up (from the error node all the way to the root of the exploded graph), and mark entities that were the cause of a bug, or have interacted with it as interesting. In order to make the bug report a bit less verbose, whenever we find an entire function call (from CallEnter to CallExitEnd) that didn't talk about any interesting entity, we prune it (click here for more info on bug report generation). Even if the event to highlight is exactly this lack of interaction with interesting entities.

D105553 generalized the visitor that creates notes for these cases. This patch adds a new kind of NoStateChangeVisitor that leaves notes in functions that took a piece of dynamically allocated memory that later leaked as parameter, and didn't change its ownership status.

While there is some code to talk over in MallocChecker.cpp, the main thing to discuss in my mind are the test cases, where I display where I want to see this visitor end up. I hope to be able to reach a point sometime soon when I can run on this on some real projects and post screenshots about it!

Diff Detail

Event Timeline

Szelethus created this revision.Jul 12 2021, 8:28 AM
Szelethus requested review of this revision.Jul 12 2021, 8:28 AM
NoQ added a comment.Jul 12 2021, 8:08 PM

This is amazing, great progress!

I added the parent revision because it didn't compile on pre-merge checks otherwise.

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
803

Can you also comment on what's, generally, the default scenario / motivating example where this is necessary? What makes you hunt down store bindings that didn't cause an escape to happen (given that an escape would have been a state change)? IIUC this is for the situation when the callee stores the pointer in a caller-local variable and in this case you don't want to claim that ownership didn't change?

811–812

I suggest: "Returning without deallocating memory or storing the pointer for later deallocation". Still a bit flimsy but less jargony.

834–835

How difficult would it be to re-use this part as well?

clang/test/Analysis/NewDeleteLeaks.cpp
105–107

How do you think this is different from the very first test, memory_allocated_in_fn_call()? Is this just because of how the pointer is put into a variable first? This function is kinda still the only place that could free memory.

Szelethus updated this revision to Diff 358883.Jul 15 2021, 2:55 AM
  • Fixes according to reviewer comments!
  • Rebase on D105553.
  • Primitively check whether the allocated memory was actually passed into the function.
Szelethus marked 2 inline comments as done.Jul 15 2021, 2:55 AM
Szelethus added inline comments.
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
803

The comment above is meant to explain it.

// [...] Any pointers
// inside the call that pointed to the allocated memory are of little
// consequence if their lifetime ends before within the function
834–835

Ugh, I gave this a shot, and its worse then expected. I think the original patch that George made was intended to be used by NoStoreFuncVisitor, and nothing else. As a consequence, its practically impossible to factor out without rewriting the whole thing.

At this point, I'm more in favor of just biting the bullet, and reuse how pointees and casts are printed from D50506, and maybe some of the dereferencing technology as well. It might be sliiiiiightly slower, but the majority of the price is only paid when FieldChainInfo is printed, not when its constructed.

For the time being, I made a rather primitive implementation here, which should lean on the conservative side (we under approximate the set of function calls where the allocated memory was actually passed into).

clang/test/Analysis/NewDeleteLeaks.cpp
105–107

That was my idea, yes, but you have a point there...

Szelethus updated this revision to Diff 358927.Jul 15 2021, 5:07 AM
Szelethus marked an inline comment as done.
  • Fix testfiles.
  • Fix a typo in the comments.
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
803

Oh, speaking of which, that is a gramarr eror.

An ever so gentle ping :^)

NoQ accepted this revision.Jul 26 2021, 8:52 AM

This is amazing and I think we should land this and see how it goes.

clang/test/Analysis/self-assign.cpp
1 ↗(On Diff #358927)

Changes in this file seem to be formatting-only, maybe they could be committed separately?

This revision is now accepted and ready to land.Jul 26 2021, 8:52 AM
NoQ added inline comments.Jul 26 2021, 4:04 PM
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
841

Actually you know what, I'm curious about some IRL data on this.

With uninitialized variable warnings this whole thing worked out really well because there are very few reasons to pass a pointer to an uninitialized local variable into a function, other than to initialize it. There's a lot more reasons to pass allocated memory into a function than to free it though. So I'm curious whether this is going to emit unnecessary notes in such cases and we'll have to tighten this heuristic.

Thanks! Here are some results:

All runs can be found here: https://codechecker-demo.eastus.cloudapp.azure.com/Default/runs

Protobuf, Bitcoin, Xerces, TinyXML, PostgreSQL, FFMPEG, OpenSSL, Vim, Redis, Twin, curl:

Nothing changed.

libWebM:

Ignoring the fact that at both places where memory was allocated there is a // NOLINT comment, I think these bug reports have improved greatly. A function is highlighted where each of those memory region could have been added to a container, but were not. Not that I wrote any code to look for this, this is purely luck ;)

No1., No2.

SQLite:

This one isn't as great. I wouldn't expect a function called buildshifts to do any sort of dynamic memory management, and its contents support this claim. stp is only referencing to retrieve one of its members. More worrying is the fact that State_insert seems like the function (or macro, as they are not always capitalized in this project) to deal with this, yet it isn't noted at all in the bug report.

No1.

Maybe if I added this under an alpha checker option?

Szelethus updated this revision to Diff 366269.Aug 13 2021, 6:54 AM
Szelethus set the repository for this revision to rG LLVM Github Monorepo.
  • Add the new feature behind a new, off-by-default alpha checker option.
  • Restore test/Analysis/self-assign.cpp (to be formatted in a standalone commit).

If you don't mind, I'll commit this as is. We seem to agree on the general direction, and doesn't bother any users from now on.

NoQ closed this revision.Aug 17 2021, 12:30 PM

Committed as rG2d3668c997faac1f64cd3b8eb336af989069d135 (broken phabricator link).

Yes, great, thanks!, sounds like the right thing to do. I guess the next potential step is to set up syntactic analysis that would tell us whether the function *could have* deallocated memory or stored it for later deallocation. One trivial step would be to scan for free()/delete invocations with the parameter variable as an argument. That would probably turn the feature into a strict improvement that we can turn on by default. The next step would be to figure out what could constitute an escape; a binding to non-local storage might be a good first approximation but i'm not sure how to cover potentially-escaping functions which is probably the most beneficial part.