Page MenuHomePhabricator

[analyzer] Fix for the strdup family in unix.malloc checker
Needs ReviewPublic

Authored by baloghadamsoftware on Jun 1 2016, 6:54 AM.

Details

Reviewers
dcoughlin
Summary

The strdup family was only partially handled in the original checker. As a consequence it did not recognize leaks where a variable which received a result of an strdup was later overwritten. (The unix.cstring checkers did not recognize this leak either, but their task is also different.) This patch fixes this issue by simulating the memory allocation of strdup functions. Some new functions where taken from the unix.cstring checkers but they were also simplified and adapted for the purpose of the checker. Some other tests had to be fixed as well since they contained strdup caused memory leaks. (The leak was not fixed but the warning is now expected.)

Diff Detail

Event Timeline

baloghadamsoftware retitled this revision from to [analyzer] Fix for the strdup family in unix.malloc checker.
baloghadamsoftware updated this object.
baloghadamsoftware added a reviewer: dcoughlin.
NoQ added a subscriber: NoQ.Jun 6 2016, 5:10 AM
NoQ added a comment.Jun 6 2016, 6:01 AM

This patch is doing a very good thing - modeling contents of the copied string as a lazy compound value of the original string. For that, i guess it's worth adding some tests (ExprInspection-based(?)) that show that, say, first few characters of the copied string are same as the original. It might also be a good idea to move this modeling to CStringChecker completely, because right now it's easier to assume a range on SymbolExtent of the strdupped region, than transfer correct string length into MallocChecker. But then you'd need to enable CStringChecker in your builds so that they could work together to model strdup().

The effect you observe, however - finding leaks on overwriting variables that hold leaked symbols - is in fact not because you modeled the copy better, but because you managed to accidentally work around the "zombie symbols" bug, described in http://lists.llvm.org/pipermail/cfe-dev/2016-March/047922.html and D18860 (there's still no certainty as of how exactly to fix it). Because you model contents of the copied string by binding a value to it, the Store is eager to reap the symbol as soon as possible, and hence detect a leak. Before, only the checker knew about the symbol of the copied string, and it wasn't eager to reap that symbol after it disappeared from the rest of the program state through overwriting. The problem could as well be fixed with code as simple and unobvious as:

void MallocChecker::checkLiveSymbols(ProgramStateRef State,
                                     SymbolReaper &SR) const {
  for (const auto &Item: State->get<RegionState>())
    SR.maybeDead(Item.first);
}
lib/StaticAnalyzer/Checkers/MallocChecker.cpp
2134

I'm not sure if this is truly worth it. In CStringChecker, a lot of work is done in order to model the length of the string; however, currently MallocChecker does not have any access to that info, and a lot more code copy-paste'ing would be necessary to re-model all the stuff. And even if you do so, the symbol you create in that statement wouldn't be the same as the symbol used in CStringChecker, so you may never compare them to each other and see that they're equal - say, strlen() of a duplicated string, strlen() for the original string, and the value you produce on that statement, would still be three different symbols.

Ideally, we can leave this out for later, and wait until we allow checkers interact with each other and share such information. So that we could ask CStringChecker directly here, and obtain string length as CStringChecker sees it.

MTC added a subscriber: MTC.Sep 26 2017, 2:46 AM