This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker: Use platform dependent EOF and UCharMax
ClosedPublic

Authored by martong on Feb 12 2020, 2:47 AM.

Details

Summary

Both EOF and the max value of unsigned char is platform dependent. In this
patch we try our best to deduce the value of EOF from the Preprocessor,
if we can't we fall back to -1.

Diff Detail

Unit TestsFailed

Event Timeline

martong created this revision.Feb 12 2020, 2:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 2:48 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
534

Debug message (to be removed)?

536

It would be good to have this function available generally to other checkers, the same functionality is needed in https://reviews.llvm.org/D72705 too.
It could work with any (specified) macro name, there are other special values in API calls. But there can be more difficult cases if the EOF (or other) is not a simple value but another macro or constructed from values some way. (The ULONG_MAX and similar can be get in the same way.)

martong planned changes to this revision.Feb 12 2020, 4:41 AM
martong marked 3 inline comments as done.
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
534

Thanks, good catch!

536

Ok, I am gonna put a generic version of this under CheckerHelpers.h, so all checkers can use it.

martong marked an inline comment as done.Feb 12 2020, 4:42 AM
martong updated this revision to Diff 244140.Feb 12 2020, 5:10 AM
  • Remove debug dump
  • Add TryExpandAsInteger to CheckerHelpers.h
martong marked an inline comment as done.Feb 12 2020, 5:10 AM
martong updated this revision to Diff 244141.Feb 12 2020, 5:13 AM
  • Remove PP declaration
martong updated this revision to Diff 244143.Feb 12 2020, 5:23 AM
  • EOFMacroIt -> MacroIt

Harbormaster failed remote builds in B46321: Diff 244141!

This is actually true, I have a test that crashes! Finally, a case where remote builds are proved to be useful!

martong updated this revision to Diff 244149.Feb 12 2020, 5:48 AM
  • Fix crash in TryExpandAsInteger
balazske added inline comments.Feb 12 2020, 5:54 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h
17

We do not need to include Preprocessor.h here (fwd declaration is enough).

70

Name should begin with lowercase letter?

Szelethus requested changes to this revision.Feb 12 2020, 6:00 AM
Szelethus added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerHelpers.h
67–69

While I agree that we shouldn't have to reinvent the Preprocessor every time we need something macro related, I doubt that this will catch anything. I checker my system's standard library, and this is what I found:

#ifndef EOF
# define EOF (-1)
#endif

Lets go just one step further and cover the probably majority of the cases where the definition is in parentheses.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
512–517

Hah, would've never thought of doing this with a lambda. Me likey.

571

Huh, what's happening here exactly? Doesn't Range take 2 RangeInts as ctor arguments? What does {EOFv, EOFv}, {0, UCharMax} mean in this context?

clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

This seems a bit clunky even for the Preprocessor -- how about

const auto *MacroII = PP.getIdentifierInfo(Macro);
if (!MacroII)
  return;
const MacroInfo *MI = PP.getMacroInfo(MacroII);
assert(MI);
clang/test/Analysis/std-c-library-functions-eof.c
24

So the test is about checking whether the analyzer correctly assigned -2 to y, right? Then let's check that too.

clang_analyzer_eval(y == 2);
This revision now requires changes to proceed.Feb 12 2020, 6:00 AM
martong updated this revision to Diff 244173.Feb 12 2020, 7:19 AM
martong marked 6 inline comments as done.
  • Move include of Preprocessor.h to CheckerHelpers.cpp
  • Try -> try
  • Use PP.getIdentifierInfo
  • Handle parens in tryExpandAsInteger
martong marked 2 inline comments as done.Feb 12 2020, 7:20 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
571

Range is a convenience function that creates a IntRangeVector from the two params. This way we can easily create a list of ranges from two ints. Range(-1, 255) results a list of ranges with one list: {{-1, 255}}.
Now, because EOF's value can be less than the minimum value of the next range we have to explicitly enlist the ranges.
E.g. if EOF is -2 then we will have a list of {-2,-2} and {0, 255}.

clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

Ok, but we cannot assert on MI, because there may be cases when the macro is not defined in a TU. In that case we should just return with None.

Szelethus accepted this revision.Feb 13 2020, 1:23 AM

LGTM! Very nice! I think you can commit as you please, granted you add core checkers to the test RUN: lines.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
313–321

I appreciated you sitting down with me with a piece of paper and a pen to explain what happens here, but for the less fortunate this code snippet should be decorated with some ASCII art.

I don't insist on you doing that within the scope of this patch, but if you did anyways, that would be great :)

clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
137

TIL Despite the analyzer having a half-a-decade old checker option that was defaulted to a hexadecimal value, nobody wrote tests for it, and it took 4-5 years to unearth that due to the incorrect parsing of integers (it was set to decimal instead of auto detect) it never ever worked.

clang/test/Analysis/std-c-library-functions-eof.c
2–6

Always enable core checkers for pathsensitive analyses.

This revision is now accepted and ready to land.Feb 13 2020, 1:23 AM
Szelethus added inline comments.Feb 13 2020, 1:24 AM
clang/test/Analysis/std-c-library-functions-eof.c
12

Maybe you could throw in a test where the definition isn't surrounded by parentheses?

martong updated this revision to Diff 244376.Feb 13 2020, 3:09 AM
martong marked 6 inline comments as done.
  • Enable core checkers explicitly in test
  • Add ASCII art to depict a range list example
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
313–321

Ok, I have added that below the existing comment of the enclosing if block.

clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
137

Well, I hope AutoSenseRadix will just detect the correct base.

This revision was automatically updated to reflect the committed changes.
NoQ added inline comments.Feb 13 2020, 9:36 AM
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

What exactly happens when the macro is #undef-ined and redefined? We get the last redefinition that's valid at the end of the translation unit, right? Can we check whether there are multiple definitions and guard against that?

Szelethus added inline comments.Feb 13 2020, 9:49 AM
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

Ugh, now that you say it that is a valid concern. I had to deal with that back in the day: https://reviews.llvm.org/D52794?id=171962#inline-476352

martong marked an inline comment as done.Feb 17 2020, 8:02 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

Solving this does not seem easy in my opinion. To handle #undefs we should build an infrastructure where summaries can reference callable objects. This is necessary, because in evalCall the value of EOF would depend on the souce location of the call expression of the function with the summary. Not impossible to solve, but certainly introduces complexity. Do you think that the redefinition of EOF is so common? I mean maybe it is too much hassle for a very rare edge case (?).

martong marked an inline comment as done.Feb 17 2020, 8:26 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

The standard library (libc or libc++) should define EOF consitently in stdio.h.
Now, if the application redefines the value of EOF then the code could be broken, or at least it would not be compatible with libc. Consider the following code that is perfectly legit if we don't redefine EOF, but if we do:

#include <stdio.h>
#include <stdlib.h>
#define EOF -2 // Here be dragons !!!
int main(void)
{
    FILE* fp = fopen("test.txt", "r");
    int c;
    while ((c = fgetc(fp)) != EOF) { // BOOM: Infinite loop !!!
       putchar(c);
    }
    fclose(fp);
}

So, redefinition of EOF (or any standard macro) results in broken code.
And this is also a warning:

) clang eof-inf.c
eof-inf.c:3:9: warning: 'EOF' macro redefined [-Wmacro-redefined]
#define EOF -2 // Here be dragons !!!
        ^
/usr/include/x86_64-linux-gnu/bits/libio.h:66:10: note: previous definition is here
# define EOF (-1)
         ^
1 warning generated.
balazske added inline comments.Feb 17 2020, 11:17 PM
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

It should be possible to get the list of all macro definitions (of a macro). We can select the first item, or one that is inside a system header (by the SourceLocation).

martong marked an inline comment as done.Feb 18 2020, 8:52 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/CheckerHelpers.cpp
114–117

Actually, redefinition of a macro in a standard library is undefined behavior:

C++11:

17.6.4.3.1
[macro.names]
1) A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.

and

[reserved.names]
1) The C ++ standard library reserves the following kinds of names:
— macros
— global names
— names with external linkage
2) If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.

Also, this is stated in C99 (7.1).
Plus[[ https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html | from libc manual ]]:

The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them.

Thus, I don't think we should come up with solutions to handle already broken code (i.e. programs with undefined behavior).