This is an archive of the discontinued LLVM Phabricator instance.

[clang][analyzer] Add modeling of 'errno'.
ClosedPublic

Authored by balazske on Feb 22 2022, 1:08 AM.

Details

Summary

Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.

Diff Detail

Event Timeline

balazske created this revision.Feb 22 2022, 1:08 AM
balazske requested review of this revision.Feb 22 2022, 1:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 1:08 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Documentation not added yet.

Good start!

However, I'm not a big fan of coupling the testing checker with the actual modeling checker.
IMO we should have a distinct checker, similarly to TaintTester. That way you could do even fancier things like:
define mylib_may_fail(), bifurcate and return true for the success case, on which it wouldn't touch errno; and on the failure case return false and set the errno to some concrete value that we could check against.

But I'm also fine with the current approach, providing the set_errno(SVal) introspection function.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
767–769

At this point, I'm not even sure if we should assert any of these.

1378
clang/lib/StaticAnalyzer/Checkers/Errno.h
22 ↗(On Diff #410471)

I think we can settle on something better. What about calling it simply errno?

24–26 ↗(On Diff #410471)

I don't think we need this.
In getErrnoValue() we should simply return UnknownVal if we don't have 'errno.
And in setErrnoValue() we should return the incoming State unmodified.

THat being said, only a top-level Check::BeginFunction callback could see an 'errno' uninitialized, which I don't think is a real issue.
All the rest of the callbacks would run after it's initialized, thus would behave as expected.
And in case the translation unit doesn't have nor 'errno' variable nor 'errno_location_ish' functions, ignoring these set/get functions is actually the expected behavior.

Having to check isErrnoAvailable() would be really an anti-pattern.

clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
32 ↗(On Diff #410471)

Why don't you use StringRef here as well?

35 ↗(On Diff #410471)

compiler-rt/lib/sanitizer_common/sanitizer_errno.h has a list of the possible names for this API.:
__error, __errno, ___errno, _errno or __errno_location

That being said, you should eval::Call all of these, and bind the return value to the memorized errno region.
At this point, this should be a CallDescriptionSet within the checker.

60 ↗(On Diff #410471)

For function definitions, we should use static instead of anonymous namespaces.
That being said, I don't think inline will do anything in this context.

66 ↗(On Diff #410471)

I would expect a comment stating that:

  • We inspect if we have a VarDecl naming "errno", it returns that Decl.
  • Otherwise, it will look for some common errno_location function names and return that Decl.
88 ↗(On Diff #410471)

I think you can hoist these lambdas into static functions.
This handler is already big enough.

99 ↗(On Diff #410471)

I don't think any of these actually return the canonic versions.

118 ↗(On Diff #410471)

Ah, I've seen and done these reinterpret casts.
Could you please check why don't we have a partial specialization allowing any const pointers besides void pointers?
That way it would look much better.

134 ↗(On Diff #410471)
146–149 ↗(On Diff #410471)

You should sink these lines to the corresponding branches.

180–181 ↗(On Diff #410471)

Fuse these two lines.

clang/test/Analysis/Inputs/errno_var.h
4
clang/test/Analysis/errno.c
23

Please call a different function, which is definitely not from glibc.
We might model this in the future using eval::Call in which case no invalidation will happen breaking the test.

clang/test/Analysis/global-region-invalidation.c
7

If you were eval::Calling the errno_location functions, and used the corresponding header, would the tests pass?

ormris removed a subscriber: ormris.Feb 22 2022, 9:59 AM
NoQ added inline comments.Feb 23 2022, 1:17 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
767–769

I mean, ideally this should be just UnknownSpaceRegion. For everything else we should have made a fresh MemRegion class. It is absurd that the same numeric address value (represented the symbol s) can correspond to two (now three) different locations in memory.

clang/lib/StaticAnalyzer/Checkers/Errno.h
24–26 ↗(On Diff #410471)

UnknownVal implies that it's an actual value but we don't know which one. If the value doesn't exist we shouldn't use it. And if the user doesn't include the appropriate header then the value really doesn't exist in the translation unit. So flavor-wise I think we shouldn't use UnknownVal here; I'd rather have an Optional<SVal>.

Other than that, yeah, I think this is a good suggestion.

clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
32 ↗(On Diff #410471)

There's that rule against static constructors/destructors. We should use a lot of constexpr in these situations and/or try to turn these structures into plain C arrays as much as possible.

66 ↗(On Diff #410471)

Ok so the first part of the function (identifying how the system headers represent errno) depends entirely on the AST right? In this case you can probably perform the AST scan only once during checkASTDecl<TranslationUnitDecl> and stash the result in a global variable. It shouldn't be re-run on every analysis. So whatever you do with isErrnoAvailable(), it shouldn't accept ProgramStateRef at all, it can simply query that global variable. You'll still need the state trait because SVal objects only make sense within a single analysis but the AST scan can be made only once.

134 ↗(On Diff #410471)

This is literally the first step of the analysis. The block count is known.

What I really want to see here is a non-trivial custom tag (i.e., the const void *symbolTag parameter on some of the overloads of conjureSymbolVal()) to make sure this symbol isn't treated as a duplicate of any other symbol some other checker conjures initially. Like all tags, you can initialize it with some address of some checker-static variable.

steakhal added inline comments.Feb 23 2022, 3:48 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
767–769

From my perspective, a symbolic region is just a region that we don't know what its parent region is - let it be part of an object or an entire memory space.
The memory space is what we use for communicating some constraints on where that symbolic region.
E.g. a symbolic region of a stack memory space should never alias with any memory region of the heap memory space.

It is absurd that the same numeric address value (represented the symbol s) can correspond to two (now three) different locations in memory.

IMO a given s should indeed refer to a single location in memory, but that's not a property we can enforce here in the constructor.
And my guess is that by removing this assertion, we would still have this invariant. What we should do instead, put an assertion into the SymbolManager, to enforce that with the same s symbol, one cannot construct two SymbolicRegions with different memory spaces.
Alternatively, we could remove the memory space from the Profile to map to the same entity in the Map; which would enforce this invariant on its own. WDYT?

clang/lib/StaticAnalyzer/Checkers/Errno.h
24–26 ↗(On Diff #410471)

Yeah, probably the Optional is a better alternative.

However, I'd like to explain my reasoning behind my previous suggestion:
Unknown can/should model any values that the analyzer cannot currently reason about: e.g. floating-point numbers.
In this case, we cannot reason about the errno, thus I recommended it.

My point is, that we shouldn't regress our API for a marginal problem. And according to my reasoning, this seems to be a marginal issue.
Using Optional here is better than the current implementation, but not by much.
It would still result in a suboptimal coding pattern, where we guard each access to get/set errno by an if statement. In the cases where we would use the getOr() getter, I'm expecting to see the users pass the UnknownVal() as the fallback value in most cases anyway.

clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
32 ↗(On Diff #410471)

Ah, I see, thanks!

134 ↗(On Diff #410471)

I would still use the C.blockCount() to make it more in-line with the rest of the uses of conjureSymbol().

balazske updated this revision to Diff 410824.Feb 23 2022, 7:39 AM
balazske marked 7 inline comments as done.

Addressed a part of the review comments.
The test checker is now a separate checker.
Initialization is simplified, checkASTDecl is used.

balazske added inline comments.Feb 23 2022, 7:44 AM
clang/lib/StaticAnalyzer/Checkers/Errno.h
22 ↗(On Diff #410471)

Just errno may not work because it collides with the "real" errno (that is a macro).

24–26 ↗(On Diff #410471)

These work now if no ErrnoRegion is available. Returning Optional seems not better than a required check before the call. I think the current version is not the best: It can be possible to make assumptions using assume on the returned SVal value, but this must not be done on a non-existing errno value. For this case probably isErrnoAvailable is required to be used.

The isErrnoAvailable can be useful for a checker that does special things only if there is a working errno modeling.

clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
35 ↗(On Diff #410471)

The list is needed to build CallDescriptions and to lookup the errno location function. A CallDescriptionSet can not be used to get the function names from it. Is there a way to build a initializer_list of CallDescription objects in compile time (from a string array)?

99 ↗(On Diff #410471)

ACtx.IntTy and value from getPointerType should be canonical (it is a CanQualType).

balazske marked 6 inline comments as done.Feb 23 2022, 7:56 AM
steakhal added inline comments.Feb 23 2022, 8:13 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
351–353

I think ErrnoModeling would be more appropriate.

1564–1567

If this is the case, it should have a Dependency<Errno>

clang/lib/StaticAnalyzer/Checkers/Errno.h
22 ↗(On Diff #410471)

Ah, I see. Ugly macros!
Then why not simply leave it in the clang::ento namespace?

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
1155–1156

I would recommend removing this comment alltogether.
The header is already annotated, by the time this will get out-of-sync anyway.

clang/test/Analysis/errno.c
16–22

Fun fact, I think you can #include ERRNO_HEADER, if you pass the -DERRNO_HEADER=\"Inputs/errno_var.h\" :D

NoQ added inline comments.Feb 23 2022, 7:40 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
767–769

The memory space is what we use for communicating some constraints on where that symbolic region.

I agree that it's a constraint. And as such, I believe it shouldn't be made part of the region's identity. Just like "$x < 5" is not a field inside symbol $x but a separate state trait. Similarly we can turn SymbolicRegion's memory space constraint into a state trait. Then we can express more sophisticated constraints ("this symbolic region is either on the heap or a global variable but definitely not a local variable"), or we can have constraints become more precise over time ("this symbolic region was compared as "less than" another heap region, therefore it itself must be on the heap").

What we should do instead, put an assertion into the SymbolManager, to enforce that with the same s symbol, one cannot construct two SymbolicRegions with different memory spaces.

This could be a nice temporary solution.

balazske updated this revision to Diff 411131.Feb 24 2022, 7:40 AM

Improved the tests.

balazske marked 4 inline comments as done.Feb 24 2022, 7:46 AM
balazske added inline comments.
clang/test/Analysis/global-region-invalidation.c
7

It is now executed wit both versions.

balazske retitled this revision from [clang][analyzer] Add modeling of 'errno' (work-in-progress). to [clang][analyzer] Add modeling of 'errno'..Feb 24 2022, 7:46 AM
balazske edited the summary of this revision. (Show Details)
balazske updated this revision to Diff 411149.Feb 24 2022, 8:50 AM
balazske marked 3 inline comments as done.
balazske edited the summary of this revision. (Show Details)

Changed name of the checker to ErrnoModeling, other small cleanup.

NoQ accepted this revision.Feb 24 2022, 10:40 AM

The patch looks great to me now. As soon as you address comments about isErrnoAvailable() (at least, the parameter can now be removed), I think you can commit.

This revision is now accepted and ready to land.Feb 24 2022, 10:40 AM
balazske updated this revision to Diff 411340.Feb 25 2022, 12:27 AM

Removed isErrnoAvailable, added test for getErrnoValue.

balazske marked an inline comment as done.Feb 25 2022, 12:28 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/Errno.h
22 ↗(On Diff #410471)

It is better to have separate namespaces for the different inter-checker API's. Probably more functions will be added later to the errno_modeling.

24–26 ↗(On Diff #410471)

I have removed now isErrnoAvailable and the get function returns Optional. I am not sure if this is the best option, if more functions will be added (to get the errno location, and get and set an "errno state").

steakhal accepted this revision.Feb 25 2022, 12:36 AM

Looks great, thanks.

balazske updated this revision to Diff 411363.Feb 25 2022, 2:51 AM

Fixed remaining problems.

This revision was landed with ongoing or failed builds.Feb 25 2022, 3:44 AM
This revision was automatically updated to reflect the committed changes.
balazske reopened this revision.Feb 25 2022, 6:13 AM
This revision is now accepted and ready to land.Feb 25 2022, 6:13 AM
balazske updated this revision to Diff 411398.Feb 25 2022, 6:14 AM

Rename of "Errno.h", maybe fixes Windows build problem.

according to the pre-merge checks it still fails on windows and Debian.
https://buildkite.com/llvm-project/premerge-checks/builds/81130#ce8f3062-699e-4043-aed9-b891ec8ebee6
https://buildkite.com/llvm-project/premerge-checks/builds/81130#53beb4a2-2c53-4ff0-b60f-6130ed5d25cd

********************
Failed Tests (3):
  Clang :: Analysis/errno.c
  Clang :: Analysis/global-region-invalidation.c
  Clang :: Driver/hip-link-bundle-archive.hip
 
Testing Time: 836.49s
  Skipped          :     3
  Unsupported      :   201
  Passed           : 29682
  Expectedly Failed:    31
  Failed           :     3

The "hip-link-bundle-archive" test looks really unrelated, the others are fixed if we go back to -DERRNO_VAR (no " characters in command line, and probably / does not work too). There are Debian build errors but these look unrelated(?).

balazske updated this revision to Diff 411421.Feb 25 2022, 8:04 AM

Another try to fix the test failures, rebased to current main.

The "hip-link-bundle-archive" test looks really unrelated, the others are fixed if we go back to -DERRNO_VAR (no " characters in command line, and probably / does not work too). There are Debian build errors but these look unrelated(?).

I think its the forward slashes.

We could workaround the issue by adding this line to the test files: // REQUIRES: system-linux

balazske updated this revision to Diff 411750.Feb 28 2022, 12:49 AM

Removed the static variable.

The current build errors are probably not related, the Windows test failure shows up in other differential objects.

This revision was landed with ongoing or failed builds.Feb 28 2022, 11:22 PM
This revision was automatically updated to reflect the committed changes.