Page MenuHomePhabricator

[analyzer][NFC] StdLibraryFunctionsChecker: Add empty Signatures
AbandonedPublic

Authored by martong on May 5 2020, 8:17 AM.

Details

Summary

In the C language, functions must not be overloaded. In this patch I am adding
empty Signatures. During adding a summary for a function with a given name,
if we provide an empty Signature then we assume that all visible function
declarations in the given TU share the same signature (i.e. their
FunctionProtoType is the same). This makes the administration of adding
summaries for the standard C library (and other C libraries) way easier.

Diff Detail

Unit TestsFailed

TimeTest
100 msLLVM.tools/llvm-isel-fuzzer::Unknown Unit Message ("")
Script: -- : 'RUN: at line 3'; c:\ws\workspace\beta_amd64_wi---3576c8c8\llvm-project\build\bin\opt.exe -o C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\build\test\tools\llvm-isel-fuzzer\Output\aarch64-empty-bc.ll.tmp C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\llvm\test\tools\llvm-isel-fuzzer\aarch64-empty-bc.ll
60 msLLVM.tools/llvm-isel-fuzzer::Unknown Unit Message ("")
Script: -- : 'RUN: at line 3'; echo > C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\build\test\tools\llvm-isel-fuzzer\Output\aarch64-empty.ll.tmp
750 msLLVM.tools/llvm-isel-fuzzer::Unknown Unit Message ("")
Script: -- : 'RUN: at line 9'; echo > C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\build\test\tools\llvm-isel-fuzzer\Output\aarch64-execname-options.ll.tmp.input
750 msLLVM.tools/llvm-isel-fuzzer::Unknown Unit Message ("")
Script: -- : 'RUN: at line 8'; echo > C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\build\test\tools\llvm-isel-fuzzer\Output\execname-options.ll.tmp.input
100 msLLVM.tools/llvm-isel-fuzzer::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; c:\ws\workspace\beta_amd64_wi---3576c8c8\llvm-project\build\bin\opt.exe -o C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\build\test\tools\llvm-isel-fuzzer\Output\missing-triple.ll.tmp C:\ws\workspace\BETA_amd64_wi---3576c8c8\llvm-project\llvm\test\tools\llvm-isel-fuzzer\missing-triple.ll
View Full Test Results (11 Failed)

Event Timeline

martong created this revision.May 5 2020, 8:17 AM

The empty signature is used to make add of C API functions easy because only the function name must be specified, not all types of parameters?

An empty signature means really that the whole signature is "irrelevant"? It is like all types and number of arguments is irrelevant. Probably that would be a better name, instead of empty use isIrrelevant.

I am a bit unsure about this change. C libraries might be consumed in C++ projects and C++ projects might be free to overload those functions. Does the effort needed to specify the signatures justify not supporting that (probably unintentional) name collisions in C++?

An empty signature means really that the whole signature is "irrelevant"? It is like all types and number of arguments is irrelevant. Probably that would be a better name, instead of empty use isIrrelevant.

Yes, that's right. I like what you propose for the naming, thanks!

I am a bit unsure about this change. C libraries might be consumed in C++ projects and C++ projects might be free to overload those functions.

I don't think that could be a concern.
Actually, redefinition of a reserved name either in the C or in the C++ standard library is undefined behavior:

C++11:
[reserved.names]

  1. The C ++ standard library reserves the following kinds of names:

— macros
— global names
— names with external linkage
...
Each name from the Standard C library declared with external linkage is reserved to the implementation
for use as a name with extern "C" linkage, both in namespace std and in the global namespace.

  1. If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.

C99 (7.1) has something similar. And from glibc manual:
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them.

On the other hand, we may be able to discover the mentioned UB, and it may be worth to emit a bug-report in such cases.

Does the effort needed to specify the signatures justify not supporting that (probably unintentional) name collisions in C++?

No, we still support adding summaries to overloaded C++ functions. But then the user must specify the signature (possibly with Irrelevant types) that matches only one FunctionDecl. Hope this answers your question.

martong added inline comments.May 6 2020, 8:31 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
263

TODO rename to isIrrelevant.

I don't think that could be a concern.
Actually, redefinition of a reserved name either in the C or in the C++ standard library is undefined behavior:

I disagree. As you mentioned in another revision, we plan to model functions beyond the C and C++ standard library. We cannot prevent name collisions for those other libraries (and sometimes we cannot even prevent unintended name collisions with the standard libraries).
I think to reduce the risk of applying the wrong summary to a function worth the effort of spelling the signature out (since it only needs to be done once).

Probably something like IsCStdLibraryFunction can be included in the signature, and empty signature is allowed only for functions that are in a C standard library.

martong updated this revision to Diff 262620.May 7 2020, 5:11 AM
  • Rename 'empty' -> 'irrelevant'

I don't think that could be a concern.
Actually, redefinition of a reserved name either in the C or in the C++ standard library is undefined behavior:

I disagree. As you mentioned in another revision, we plan to model functions beyond the C and C++ standard library. We cannot prevent name collisions for those other libraries (and sometimes we cannot even prevent unintended name collisions with the standard libraries).
I think to reduce the risk of applying the wrong summary to a function worth the effort of spelling the signature out (since it only needs to be done once).

I see your point Gabor. I agree that it would be the best if we could give a precise signatures to every summary. But we can't.
There are two difficulties to provide proper signatures:

  1. Some function types contain non-builtin types. E.g. FILE*. We cannot get this type as easily as we do with builtin types (we can get builtins simply from the ASTContext). In case of such a compound type, we should be digging up the type from the AST, and that can be done by doing a lookup. But instead of looking up the type of one parameter, it is better to do the lookup for the function itself. So, in these cases we rather use the Irrelevant type in the Signature. Signatures with irrelevant types are not precise, they may match accidentally other functions.
  2. Cppcheck config files does not provide any signature, so it would require overly too much manual work to synthesize the signatures (that may not be precise anyway, see the above point).

However, not having a signature is not a problem in the case of LibC and POSIX. This is because LibC implementations do provide POSIX compatiblity, so they define the functions from POSIX (e.g. e.g. glibc). Besides they claim

All other library names are reserved if your program explicitly includes the header file that defines or declares them.

In my understanding this includes the names defined by POSIX.
Consequently, user code should not redefine - or add any overload in case of C++- to any functions defined in the standard C, C++ or POSIX library. This is also related to C++.

So, here is my suggestion.

  • LibC, STL and POSIX: We should allow empty/irrelevant signatures for standardized functions. If the user code provides any colliding function then we emit an error.
  • Other libraries: We should be more rigorous with these functions and we could require a signature for each function. If the user code provides any colliding function then we just emit a warning and the corresponding summary is not added and will not be used. One example is when the user enables summaries from OpenSSL where we have a summary for let's say void encrypt(FILE*), and the Signature is void(Irrelevant). If the user code defines encrypt(double) that seems to be okay as it overloads with encrypt(FILE*) but our signature matching mechanism cannot cope with this, so we rather back out and keep the conservative evaluation for the function.

TLDR; having empty signatures makes it possible to easily add summaries for standard functions that we know they must have only one definition and they cannot be overloaded.

Probably something like IsCStdLibraryFunction can be included in the signature, and empty signature is allowed only for functions that are in a C standard library.

Yeah, we could do something like that, but also with POSIX. Furthermore, I think we shall extend to that direction once we want to support libs other than libC, STL and POSIX.

  1. Some function types contain non-builtin types. E.g. FILE*. We cannot get this type as easily as we do with builtin types (we can get builtins simply from the ASTContext). In case of such a compound type, we should be digging up the type from the AST, and that can be done by doing a lookup.

I see the problem, but as far as I understand, this is not unique to libC or Posix. Any libraries that use non-primitive types (which is most of them) will suffer from the very same problem. So I anticipated that this type lookup from the ASTContext needs to be implemented regardless of the irrelevant signature functionality.

martong planned changes to this revision.May 14 2020, 7:02 AM

We had a separate discussion with @xazax.hun. Let me summarize the outcome. We agreed that we should try our best to provide signatures for all functions (this affects child patches, particularly the POSIX related D79433). Probably, we are not going to need this patch anymore, once we will have done that work.

martong abandoned this revision.Jun 22 2020, 2:34 AM

Since D80016 we are able to look up types in the TU, so this patch is no longer needed. We'd like to be as precise with the function signatures as possible.