This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][MallocChecker] PR46253: Correctly recognize standard realloc
ClosedPublic

Authored by Szelethus on Jun 12 2020, 9:11 AM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=46253

This is an obvious hack because realloc isn't any more affected than other functions modeled by MallocChecker (or any user of CallDescription really), but the nice solution will take some time to implement.

Diff Detail

Event Timeline

Szelethus created this revision.Jun 12 2020, 9:11 AM
NoQ accepted this revision.Jun 13 2020, 3:29 PM

Thanks! Yeah, that's a lot of annoying code to write that doesn't need to be imperative at all.

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
1074–1075

https://bugs.llvm.org/show_bug.cgi?id=46253#c1
Uh-oh, did we lose some sanity checking during CallDescription conversion?

While modeling of all functions is probably incorrect, the crash can be bisected down to D68165. I think other functions don't crash because they already have similar type checks in them, just more spread out around the code rather than concentrated in one place. It might still be worth it to try to figure out why exactly did D68165 cause it in order to double-check for more regressions.

This revision is now accepted and ready to land.Jun 13 2020, 3:29 PM
martong added inline comments.Jun 16 2020, 8:02 AM
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
1071–1072

Indeed. CallDescrtiption could be improved to do precise type checking. Also it could be matching FunctionDecls instead of names of functions (strings), we see how error prone this can be.

I think, the mechanisms in StdCLibraryFunctionChecker could be integrated into CallDescription, so all other checkers could benefit.

See also the discussion we had on this with @xazax.hun :

A high level comment.

Trying to match function signatures is not a unique problem! In fact, almost all of the checks the analyzer have is trying to solve the very some problem.
One of the methods we have at this point is called CallDescription, see here: https://github.com/llvm/llvm-project/blob/master/clang/lib/StaticAnalyzer/Core/CallEvent.cpp#L358

Moreover, I would assume something similar needs to be done for APINotes.

Do you think it would be possible to extend the CallDescription interface to match your needs? In that case all of the checks could profit from this work.
What do you think?

Yes, viewing from this angle, I am trying to solve a problem here that perhaps could be handled by CallDescription and CallDescriptionMap with some extensions.

Here are the differences between the two approaches so far:

  • CallDescription is evaluated for every call event (e.g. checkPreCall), the names - strings - are compared; containing declaration contexts are compared by names (see CallEvent::isCalled). Contrary to this, summaries are associated directly to FunctionDecls, so during a call event we compare two FunctionDecl pointers.
  • A CallDescription does not support overloading on different types if the number of parameters are the same. With summaries this works.
  • A CallDescriptionMap is a static map with fixed number of entries. Contrary to this, the FunctionSummaryMap contains an entry (a summary) for a function only if we can lookup a matching FunctionDecl in the TU. The initialization of the FunctionSummaryMap happens lazily during the first call event.

Except the lack of proper overloading support, these differences are in the implementation. So, yes, giving it more thought, maybe we could refactor CallDescriptionMap to behave this way, but that would be some very heavy refactoring :) Still, would be possible, I guess.

Moreover, I would assume something similar needs to be done for APINotes.

Yes, I agree, I could not find out (yet) how the do it, but somehow they must match a given FunctionDecl to the Name in the YAML.


I was thinking about an alternative method for a long time now.
Making one step backwards: what we need is to be able to associate some data to a given function. And we can perfectly identify the function with it's prototype written in C/C++.
So, what if we'd be able to write a CallDescriptionMap (or a FunctionSummaryMap) like this:

CallDescriptionMap<FnCheck> Callbacks = {
    {{"void *memcpy(void *dest, const void *src, size_t n)"}, &CStringChecker::evalMemcpy},
    ...

Actually, we could reuse the Parser and the Sema itself to solve this issue. In fact, we could achieve this by reusing the ExternalASTSource together with the ASTImporter to find precisely the existing redecl chain in the TU for memcpy.

This revision was automatically updated to reflect the committed changes.