This is an archive of the discontinued LLVM Phabricator instance.

[Patch] [Analyzer] false positive: Potential leak connected with memcpy (PR 22954)
ClosedPublic

Authored by pgousseau on Aug 7 2015, 7:09 AM.

Details

Summary

Dear All,

I would like to propose a patch to avoid the false positive memory leak warning kindly reported by krzysztof in https://llvm.org/bugs/show_bug.cgi?id=22954

The issue seems originates from the CString checker's handling of 'memcpy' (and string copy functions in general).
Given the below code snippet:


struct aa { char *s; char data[32];};
...
a.s = malloc(nbytes);
memcpy(a.data, source, len);
...


As the CString checker handles the memcpy call, it requests the invalidation of the 'a.data' region. But the invalidation worker marks the whole memory region of 'a' as to be invalidated. The Malloc checker is not made aware of this causing the false positive.

Following advices from Anton Yartsev and Gabor Horvath on cfe-dev (http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-July/043786.html), this patch introduces a new trait 'TK_DoNotInvalidateSuperRegion', for the invalidation worker to take into account, when invalidating a destination buffer of type 'FieldRegion'.

Please let me know if this is an acceptable change and if yes eventually commit it for me (as I do not have svn access) ?

Regards,

Pierre Gousseau
SN Systems - Sony Computer Entertainment

Diff Detail

Repository
rL LLVM

Event Timeline

pgousseau updated this revision to Diff 31510.Aug 7 2015, 7:09 AM
pgousseau retitled this revision from to [Patch] [Analyzer] false positive: Potential leak connected with memcpy (PR 22954).
pgousseau updated this object.
pgousseau updated this object.Aug 10 2015, 1:44 AM
pgousseau removed a reviewer: cfe-commits.
pgousseau added a subscriber: cfe-commits.
zaks.anna added inline comments.Aug 10 2015, 4:41 PM
lib/StaticAnalyzer/Core/RegionStore.cpp
1098 ↗(On Diff #31510)

What happens on early returns? Here and the one below. Are there tests for these cases?

ayartsev added inline comments.Aug 11 2015, 6:08 AM
lib/StaticAnalyzer/Core/RegionStore.cpp
2359 ↗(On Diff #31510)

Too much unnecessary passing around of RegionAndSymbolInvalidationTraits parameter. What about moving "RegionAndSymbolInvalidationTraits" member from "invalidateRegionsWorker" class to the base class "ClusterAnalysis"?

pgousseau added inline comments.Aug 11 2015, 9:58 AM
lib/StaticAnalyzer/Core/RegionStore.cpp
1098 ↗(On Diff #31510)

Oops yes returning here is wrong, if we return here we skip the code conjuring the default value at line 1122.
I will change it to a goto conjure_default for !NumElements == true.
Also a value of 0 for ElemSize is ok so no need to return actually.
I will add a test for 0 sized elements' array and 0 elements array and update the patch.
What do you think ?
Thanks !

2359 ↗(On Diff #31510)

Makes sense, I will update the patch.
Thanks !

pgousseau updated this revision to Diff 31940.Aug 12 2015, 8:12 AM

Removed 'return' statements and added tests for empty arrays following Anna review.
Refactored parameter passing of 'TK_DoNotInvalidateSuperRegion' following Anton review, by adding a new member function 'hasTrait' to 'ClusterAnalysis'.

Please let me know if this is acceptable and if yes eventually commit it for me ?
Regards,
Pierre

dcoughlin edited edge metadata.Aug 13 2015, 6:13 PM

I'm still looking at this. Higher-level comments coming soon.

lib/StaticAnalyzer/Core/RegionStore.cpp
1110 ↗(On Diff #31940)

R0.getOffset() will assert if R0 is a symbolic region offset. This can happen if the invalidated array is itself in an array (e.g., someOtherArray[i].array) or is in a union.

1118 ↗(On Diff #31940)

getOffset() here will assert also if there is any key with a symbolic offset in SuperR.

1119 ↗(On Diff #31940)

Should this be ROffset < UpperOffset?

ayartsev added inline comments.Aug 14 2015, 6:31 AM
lib/StaticAnalyzer/Core/RegionStore.cpp
746 ↗(On Diff #31940)

Hmm.. Either we completely encapsulate 'RegionAndSymbolInvalidationTraits' in the 'invalidateRegionsWorker' class or we move 'RegionAndSymbolInvalidationTraits' (maybe renamed to the more general name) to the base 'ClusterAnalysis' class.
In the suggested solution you on the one hand make processing of 'RegionAndSymbolInvalidationTraits' specific to 'invalidateRegionsWorker' class but on the other hand explicitly refer to 'RegionAndSymbolInvalidationTraits::InvalidationKinds' and 'RegionAndSymbolInvalidationTraits::TK_DoNotInvalidateSuperRegion' in the 'ClusterAnalysis' class.
It seems to me the proper way to encapsulate 'RegionAndSymbolInvalidationTraits' in the 'invalidateRegionsWorker' is to just override 'AddToWorkList()' in subclasses (as you done with 'hasTrait()').

You should consider what should happen when the memcpy may write past the end of the fixed-size array and add tests that specify correct behavior for these cases. An important example is:

struct Foo {
  char data[4];
  int i;
};

Foo f;
f.i = 10;

memcpy(f.data, someBuf, 100);

clang_analyzer_eval(f.i == 10); // What should this yield?

I think it is also important to add tests for regions at symbolic offsets, for bindings in the super region having keys with symbolic offsets, and for cases where there is potential aliasing and casting between regions with symbolic offsets.

Also, Jordan wrote up a description of the region store in docs/analyzer/RegionStore.txt that you might find helpful if you haven't already seen it.

You should consider what should happen when the memcpy may write past the end of the fixed-size array and add tests that specify correct behavior for these cases. An important example is:

struct Foo {
  char data[4];
  int i;
};

Foo f;
f.i = 10;

memcpy(f.data, someBuf, 100);

clang_analyzer_eval(f.i == 10); // What should this yield?

I think it is also important to add tests for regions at symbolic offsets, for bindings in the super region having keys with symbolic offsets, and for cases where there is potential aliasing and casting between regions with symbolic offsets.

Yes it seems I overlooked symbolic offsets in the test cases, will handle those.

Also, Jordan wrote up a description of the region store in docs/analyzer/RegionStore.txt that you might find helpful if you haven't already seen it.

Very usefull thanks !

lib/StaticAnalyzer/Core/RegionStore.cpp
746 ↗(On Diff #31940)

I agree yes overriding AddToWorkList would fit better, will upload new patch. Thanks !

1110 ↗(On Diff #31940)

Yes that definitely needs fixing. Thanks !

1118 ↗(On Diff #31940)

Will fix. Thanks !

1119 ↗(On Diff #31940)

Yes, will change to (ROffset < UpperOffset || (LowerOffset == UpperOffset && ROffset == LowerOffset)).
To handle arrays of 0 elements and arrays of 0 sized elements.
Thanks !

pgousseau updated this revision to Diff 32955.Aug 24 2015, 7:22 AM
pgousseau edited edge metadata.

Encapsulate 'RegionAndSymbolInvalidationTraits' by overriding 'AddToWorkList' following Anton's review.
Also as pointed out by Devin's review:
Add tests cases and handling for regions, bindings in super regions, and aliased/casted regions, with symbolic offsets.
Add test case and handling for writing past the array.
Replace 'ROffset <= UpperOffset' by 'ROffset < UpperOffset'.

Please let me know if this an acceptable change ?
Regards,
Pierre

Thanks for adding handling of the symbolic cases! Some more comments inline.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
825 ↗(On Diff #32955)

It seems odd that you return a ProgramStateRef here but don't use it in the only caller of this function. Could you just return a boolean instead? Alternatively, if you really want to add the assumption that the buffer is valid to the post-state, then you should thread the state returned from the call to this function through the rest of CStringChecker::InvalidateBuffer.

840 ↗(On Diff #32955)

You have a lot of early returns of the form:

if (!Foo)
  return state;

that don't seem to be exercised in your tests. Can these actually trigger? If so, you should add tests for these cases. If not, maybe asserts/cast<T>()/castAs<T>() would be more appropriate.

Also: should these early returns return state? Or should they return null?

851 ↗(On Diff #32955)

Can you rewrite this if/else to avoid the indentation? (See http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return).

866 ↗(On Diff #32955)

"CheckLocation" is copy pasta here.

lib/StaticAnalyzer/Core/RegionStore.cpp
1115 ↗(On Diff #32955)

This nesting is getting pretty deep. Can you invert some guards and change to goto/continue where appropriate.

For example:

if (!SuperR)
  goto conjure_default;

and

const ClusterBindings *C = B.lookup(SuperR)
if (!C)
  continue;
1125 ↗(On Diff #32955)

Please add a comment here to say that this is to handle arrays of 0 elements and arrays of 0-sized elements.

1132 ↗(On Diff #32955)

If you are not going to use the result, you can use isa<SymbolicRegion>(R) here instead of dyn_cast.

test/Analysis/pr22954.c
476 ↗(On Diff #32955)

Should this test m->s3[j] as well?

498 ↗(On Diff #32955)

Can you also add a test where the size argument to memcpy is the result of sizeof()?

pgousseau added inline comments.Aug 26 2015, 9:17 AM
lib/StaticAnalyzer/Checkers/CStringChecker.cpp
825 ↗(On Diff #32955)

Yes returning a bool would be better thanks, I did not meant for this function to add the assumption that the buffer is valid.

840 ↗(On Diff #32955)

Yes most of those checks seems redundant and/or might need to return nullptr/false, will double check thanks.

851 ↗(On Diff #32955)

Yes thanks for the link !

866 ↗(On Diff #32955)

Will fix !

lib/StaticAnalyzer/Core/RegionStore.cpp
1115 ↗(On Diff #32955)

Makes sense thanks.

1125 ↗(On Diff #32955)

Will do !

1132 ↗(On Diff #32955)

Will do !

test/Analysis/pr22954.c
476 ↗(On Diff #32955)

Yes that is what I meant thanks !

498 ↗(On Diff #32955)

Will do !

pgousseau updated this revision to Diff 33326.Aug 27 2015, 7:51 AM

Following Devin's review:

Modified 'IsFirstBufInBound()' to return a bool instead of a state.
Removed redundant checks in 'IsFirstBufInBound'.
Added tests to exercise 'IsFirstBufInBound' more.
Removed if/else to avoid indentation.
Removed copy paste error.
Inverted guards in 'VisitCluster' to decrease indentation.
Added comments regarding 0 elements/0-sized elements arrays.
Fixed unused var warning with isa instead of dyn_cast.
Fixed test typo and added sizeof test.

Please let me know if this is an acceptable change ?

Regards,

Pierre

Other than one teeny nit, looks good to me.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
863 ↗(On Diff #33326)

You can use cast<ElementRegion>(R) here, which will assert that R is an ElementRegion, and remove the assert below.

pgousseau added inline comments.Aug 28 2015, 1:59 AM
lib/StaticAnalyzer/Checkers/CStringChecker.cpp
863 ↗(On Diff #33326)

Will do ! Thanks.

pgousseau updated this revision to Diff 33402.Aug 28 2015, 2:08 AM

Replace 'dyn_cast' by 'cast' following Devin's review.

If all looks good could someone commit this patch for me ?

Regards,

Pierre

dcoughlin accepted this revision.Aug 28 2015, 11:00 AM
dcoughlin edited edge metadata.

I will commit. Thanks for your work on this, Pierre!

This revision is now accepted and ready to land.Aug 28 2015, 11:00 AM
This revision was automatically updated to reflect the committed changes.

I will commit. Thanks for your work on this, Pierre!

Much appreciated ! Thanks for your help.

xazax.hun edited edge metadata.EditedAug 31 2015, 12:04 PM

Hi!

With this patch committed I noticed a regression in the static analyzer.

I analyzed openssl-1.0.0d (using the test suite in utils/analyzer/SATestBuild.py).
I got the following assertion error:

(lldb) bt
* thread #1: tid = 0xa1fcb, 0x00007fff943e50ae libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff943e50ae libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff943f25fd libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x0000000100960106 clang`::abort() [inlined] raise(sig=6) + 18 at Signals.inc:504
    frame #3: 0x00000001009600f4 clang`::abort() + 4 at Signals.inc:521
    frame #4: 0x00000001009600e1 clang`::__assert_rtn(func=<unavailable>, file=<unavailable>, line=<unavailable>, expr=<unavailable>) + 81 at Signals.inc:517
    frame #5: 0x00000001018fc418 clang`(anonymous namespace)::CStringChecker::InvalidateBuffer(clang::ento::CheckerContext&, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::Expr const*, clang::ento::SVal, bool, clang::Expr const*) [inlined] clang::ento::NonLoc clang::ento::SVal::castAs<clang::ento::NonLoc>() const + 1448 at SVals.h:76
    frame #6: 0x00000001018fc3f9 clang`(anonymous namespace)::CStringChecker::InvalidateBuffer(clang::ento::CheckerContext&, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>, clang::Expr const*, clang::ento::SVal, bool, clang::Expr const*) [inlined] (anonymous namespace)::CStringChecker::IsFirstBufInBound(state=clang::ento::ProgramStateRef @ 0x0000000103bf2080, FirstBuf=0x0000000103a86768) at CStringChecker.cpp:842
    frame #7: 0x00000001018fc3f9 clang`(anonymous namespace)::CStringChecker::InvalidateBuffer(C=<unavailable>, state=<unavailable>, E=0x0000000103a86768, V=<unavailable>, IsSourceBuffer=<unavailable>, Size=<unavailable>) + 1417 at CStringChecker.cpp:920
    frame #8: 0x00000001018fadf7 clang`(anonymous namespace)::CStringChecker::evalCopyCommon(this=0x0000000103212fb0, C=0x00007fff5fbfc1a0, CE=<unavailable>, state=clang::ento::ProgramStateRef @ 0x00007fff5fbfc0c0, Size=0x0000000103a867b0, Dest=0x0000000103a86768, Source=<unavailable>, Restricted=<unavailable>, IsMempcpy=<unavailable>) const + 3991 at CStringChecker.cpp:1079
    frame #9: 0x00000001018f8ad8 clang`(anonymous namespace)::CStringChecker::evalMemcpy(this=0x0000000103212fb0, C=0x00007fff5fbfc1a0, CE=0x0000000103a86720) const + 248 at CStringChecker.cpp:1101
    frame #10: 0x00000001018f89b6 clang`bool clang::ento::eval::Call::_evalCall<(anonymous namespace)::CStringChecker>(void*, clang::CallExpr const*, clang::ento::CheckerContext&) [inlined] (anonymous namespace)::CStringChecker::evalCall(CE=0x0000000103a86720, C=0x00007fff5fbfc1a0) const + 655 at CStringChecker.cpp:2002
    frame #11: 0x00000001018f8727 clang`bool clang::ento::eval::Call::_evalCall<(anonymous namespace)::CStringChecker>(checker=0x0000000103212fb0, CE=0x0000000103a86720, C=0x00007fff5fbfc1a0) + 23 at Checker.h:438
    frame #12: 0x0000000101a0417d clang`clang::ento::CheckerManager::runCheckersForEvalCall(clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&, clang::ento::CallEvent const&, clang::ento::ExprEngine&) [inlined] clang::ento::CheckerFn<bool (clang::CallExpr const*, clang::ento::CheckerContext&)>::operator(this=<unavailable>, ps=<unavailable>)(clang::CallExpr const*, clang::ento::CheckerContext&) const + 653 at CheckerManager.h:58
    frame #13: 0x0000000101a0416b clang`clang::ento::CheckerManager::runCheckersForEvalCall(this=0x0000000103211950, Dst=0x00007fff5fbfc2d8, Src=<unavailable>, Call=0x0000000103ac2070, Eng=0x00007fff5fbfcd90) + 635 at CheckerManager.cpp:549
    frame #14: 0x0000000101a361af clang`clang::ento::ExprEngine::evalCall(this=0x00007fff5fbfcd90, Dst=0x00007fff5fbfc448, Pred=<unavailable>, Call=0x0000000103ac2070) + 383 at ExprEngineCallAndReturn.cpp:527
    frame #15: 0x0000000101a35ee0 clang`clang::ento::ExprEngine::VisitCallExpr(this=0x00007fff5fbfcd90, CE=0x0000000103a86720, Pred=<unavailable>, dst=0x00007fff5fbfc9b8) + 528 at ExprEngineCallAndReturn.cpp:499
    frame #16: 0x0000000101a1b4a0 clang`clang::ento::ExprEngine::Visit(this=0x00007fff5fbfcd90, S=0x0000000103a86720, Pred=<unavailable>, DstTop=<unavailable>) + 12224 at ExprEngine.cpp:1075
    frame #17: 0x0000000101a16c30 clang`clang::ento::ExprEngine::ProcessStmt(this=0x00007fff5fbfcd90, S=<unavailable>, Pred=<unavailable>) + 880 at ExprEngine.cpp:446
    frame #18: 0x0000000101a1681e clang`clang::ento::ExprEngine::processCFGElement(this=<unavailable>, E=<unavailable>, Pred=0x0000000103bf1be0, StmtIdx=<unavailable>, Ctx=0x00007fff5fbfcc98) + 190 at ExprEngine.cpp:295
    frame #19: 0x0000000101a0c128 clang`clang::ento::CoreEngine::HandlePostStmt(this=<unavailable>, B=<unavailable>, StmtIdx=<unavailable>, Pred=<unavailable>) + 136 at CoreEngine.cpp:503
    frame #20: 0x0000000101a0b71b clang`clang::ento::CoreEngine::ExecuteWorkList(this=0x00007fff5fbfcda8, L=<unavailable>, Steps=150000, InitState=clang::ento::ProgramStateRef @ 0x00007fff5fbfd120) + 491 at CoreEngine.cpp:223
    frame #21: 0x00000001012698a0 clang`(anonymous namespace)::AnalysisConsumer::ActionExprEngine(clang::Decl*, bool, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*> >*) [inlined] clang::ento::ExprEngine::ExecuteWorkList(L=0x00000001032c84a0, Steps=<unavailable>) + 35 at ExprEngine.h:109
    frame #22: 0x000000010126987d clang`(anonymous namespace)::AnalysisConsumer::ActionExprEngine(this=0x0000000103211090, D=0x00000001039b8418, ObjCGCEnabled=<unavailable>, IMode=<unavailable>, VisitedCallees=<unavailable>) + 973 at AnalysisConsumer.cpp:659
    frame #23: 0x000000010126931d clang`(anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*> >*) [inlined] (anonymous namespace)::AnalysisConsumer::RunPathSensitiveChecks(this=<unavailable>, D=<unavailable>, IMode=<unavailable>, Visited=<unavailable>) + 1501 at AnalysisConsumer.cpp:689
    frame #24: 0x00000001012692c9 clang`(anonymous namespace)::AnalysisConsumer::HandleCode(this=<unavailable>, D=<unavailable>, Mode=<unavailable>, IMode=Inline_Regular, VisitedCallees=<unavailable>) + 1417 at AnalysisConsumer.cpp:627
    frame #25: 0x000000010125bd31 clang`(anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) + 743 at AnalysisConsumer.cpp:491
    frame #26: 0x000000010125ba4a clang`(anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(this=0x0000000103211090, C=<unavailable>) + 650 at AnalysisConsumer.cpp:542
    frame #27: 0x0000000101274065 clang`clang::ParseAST(S=0x0000000103858a00, PrintStats=false, SkipFunctionBodies=<unavailable>) + 581 at ParseAST.cpp:168
    frame #28: 0x0000000100d96adb clang`clang::FrontendAction::Execute(this=<unavailable>) + 75 at FrontendAction.cpp:439
    frame #29: 0x0000000100d621eb clang`clang::CompilerInstance::ExecuteAction(this=0x0000000103208240, Act=0x0000000103209ae0) + 843 at CompilerInstance.cpp:830
    frame #30: 0x0000000100dd48bf clang`clang::ExecuteCompilerInvocation(Clang=0x0000000103208240) + 4047 at ExecuteCompilerInvocation.cpp:222
    frame #31: 0x000000010000608c clang`cc1_main(Argv=<unavailable>, Argv0="/Users/ghorvath/Documents/LLVM/build/bin/clang", MainAddr=0x0000000100001df0) + 1180 at cc1_main.cpp:116
    frame #32: 0x0000000100004cc9 clang`main [inlined] ExecuteCC1Tool(Tool=<unavailable>) + 83 at driver.cpp:380
    frame #33: 0x0000000100004c76 clang`main(argc_=<unavailable>, argv_=<unavailable>) + 11830 at driver.cpp:443
    frame #34: 0x00007fff881eb5ad libdyld.dylib`start + 1
    frame #35: 0x00007fff881eb5ad libdyld.dylib`start + 1

Could you look into this?

I reverted the commit until this assertion is fixed.

Steps to reproduce:
Download the following preprocessed file:


Execute the analyzer on that one: clang -cc1 -analyze -analyzer-checker=core -analyzer-checker=unix -fblocks clang_crash_7QnDaH.i

I reverted the commit until this assertion is fixed.

Steps to reproduce:
Download the following preprocessed file:


Execute the analyzer on that one: clang -cc1 -analyze -analyzer-checker=core -analyzer-checker=unix -fblocks clang_crash_7QnDaH.i

Thanks for the reproducible and revert, I will have a look !

I reverted the commit until this assertion is fixed.

Steps to reproduce:
Download the following preprocessed file:


Execute the analyzer on that one: clang -cc1 -analyze -analyzer-checker=core -analyzer-checker=unix -fblocks clang_crash_7QnDaH.i

Thanks for the reproducible and revert, I will have a look !

I have now uploaded a patch for this at http://reviews.llvm.org/D12571