This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] add StoreToImmutable and ModelConstQualifiedReturn checkers
Needs ReviewPublic

Authored by zukatsinadze on Apr 22 2022, 3:58 AM.

Details

Summary

The patch adds two new checkers.
core.StoreToImmutable that warns binds on immutable memory and
alpha.security.cert.env.ModelConstQualifiedReturn that models return values
of some functions that should be considered const qualified. Based on SEI CERT
ENV30-C: https://wiki.sei.cmu.edu/confluence/x/79UxBQ

Diff Detail

Event Timeline

zukatsinadze created this revision.Apr 22 2022, 3:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 3:58 AM
zukatsinadze requested review of this revision.Apr 22 2022, 3:58 AM

fix clang format

Looks great. I only have nits mostly for the docs.

clang/docs/analyzer/checkers.rst
2384–2386

IMO the list of functions should be mentioned separately. What if the list gets longer and longer.
A separate paragraph could list these instead. Also, wrap them with double backticks.

``getenv()``
2390

Use crossref links.

2393–2401

I don't like the fact that this example works only if the alpha.cert.pos.StoreToImmutableChecker is enabled, which is an alpha checker.

There are other places where such immutable memory regions arise, such as writable strings.
Anyway, at least let's mention that the given alpha checker has anything to do with this checker (+ crossref!).

2402

You should also embed an example of how to fix this issue: E.g. advertises making a copy of the immutable region to a mutable location and doing the mutation there instead. And use const as a preventive measure for the future for the original pointer.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
766–769

Please merge these into a single isa<T1, T2,...>().

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
68–70

Put this in the right alphabetical place.

clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
28–30

You could expose a pointer to this by creating a const BugType *getBugType() const; getter method, which could be used by other checkers.

52

We capitalize the static analyzer warnings and notes.

clang/lib/StaticAnalyzer/Checkers/cert/ModelConstQualifiedReturnChecker.cpp
2–3

Fix the comment.

11–12

Comments should be capitalized and punctuated as usual.

31–44

I can see a small issue by evaCall-ing these functions.
The problem is that we might not model all aspects of these functions, thus the modeling can cause harm to the analysis. E.g. by not doing invalidation where we actually would need invalidation to kick in, or anything else really.
Consequently, another problem is that no other checker can evaluate these, since we evaluate them here.
Thus, fixing such improper modeling could end up in further changes down the line.
Unfortunately, I don't have any better ATM, so let's go with this evalCall approach.

67

One should achieve this by comparing the BugType pointers.

clang/test/Analysis/cert/env30-c.cpp
3

Also enable the 'core' package.

NoQ added inline comments.Apr 28 2022, 2:14 PM
clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
44

This check catches a lot more regions than the ones produced by the other checker. In particular, it'll warn on all global constants. Did you intend this to happen? You don't seem to have tests for that.

53

I also recommend trackExpressionValue to make sure we have all reassignments highlighted as the value gets bounced between pointer variables. The user will need proof that the pointer actually points to const memory.

clang/lib/StaticAnalyzer/Checkers/cert/ModelConstQualifiedReturnChecker.cpp
31–44

Right, this definitely shouldn't be evalCall. Post-call seems to be sufficient for this checker.

The patch adds two new checkers.

I don't see any technical dependencies between the two, so, please split it into two independent patches.

Also, could you please update the Static Analyzer section of clang/docs/ReleaseNotes.rst with the new checkers and their description?

zukatsinadze marked 14 inline comments as done.
zukatsinadze added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
766–769

error: macro "assert" passed 4 arguments, but takes just 1
assert(isa<UnknownSpaceRegion, HeapSpaceRegion, GlobalSystemSpaceRegion, GlobalImmutableSpaceRegion>(sreg));

So I had to add extra ()-s around isa, I guess macro expands weirdly?

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
68–70

I think it is in the right alphabetical place not considering cert/ (other certs are in similar order), but maybe I should remove the cert directory, what do you think?
It was created by me a few years ago, but I don't see a point now anymore.

clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
28–30

I did something similar based on SmartPtrChecker.

44

Could you give an example? I tried with the following snippet and it didn't give a warning

int a = 1, b = 1;
void foo(int *p) {
  a = 2;
  p[b++] = 3;
  int *c = &a;
  *c = 5;
}
53

What expression should I use for tracking? I tried to simply cast S to Expr, but it didn't really work.
Then I came up with something ugly: cast S to BinaryOperator -> getLHS -> getSubExpr -> ignoreImpCasts.

zukatsinadze marked 4 inline comments as not done.May 17 2022, 4:04 PM

The patch adds two new checkers.

I don't see any technical dependencies between the two, so, please split it into two independent patches.

StoreToImmutable does not depend on the modeling checker, but I am using the BugType pointer from StoreToImmutable in the modeling checker for Note diagnostics.
Should it still be split and just not merge the second patch until the first one is committed?

Also, could you please update the Static Analyzer section of clang/docs/ReleaseNotes.rst with the new checkers and their description?

Sorry, I missed this comment. I will add release notes in the next round.

I found this report at vim/src/term.c (https://github.com/vim/vim.git at v8.2.1920):

I'm not sure what the exact type of BC global variable is but I think it's declared as either extern char *BC; or just char *BC;.
empty_option is declared as either extern unsigned char *empty_option; or unsigned char *empty_option = (unsigned char *)"";

Could you please have a look at this FP @zukatsinadze?

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
68–70

Leave it here.

clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
53

Transitive interestingness might be related to this: D125362

Any update on this. I'm really excited about this one. Let me know if something blocks you.

I found this report at vim/src/term.c (https://github.com/vim/vim.git at v8.2.1920):

I'm not sure what the exact type of BC global variable is but I think it's declared as either extern char *BC; or just char *BC;.
empty_option is declared as either extern unsigned char *empty_option; or unsigned char *empty_option = (unsigned char *)"";

Could you please have a look at this FP @zukatsinadze?

I've checked why we report this. It turns out mutable global variables were considered immutable (except the errno), which caused FPs in this proposed checker.
I made D127306 to keep mutable (system) globals as mutable memory.

NoQ added inline comments.Jun 9 2022, 3:26 PM
clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
44

I was thinking about constants, i.e. const int a etc.

clang/lib/StaticAnalyzer/Checkers/cert/ModelConstQualifiedReturnChecker.cpp
31–44

Yeah so the reason why this was made an evalCall is because we needed to replace the memory space for the return value. I think the right solution is to re-implement SymRegion memory spaces as traits, so that they could be updated during analysis at any time (under the natural condition that they only ever get narrower). So like dynamic type, just dynamic memory space.

Sorry for my late reply.

It feels like we have some serious obstacles.
The check::PostCall handler wants to mark some memory region immutable. Currently, the checker creates a new symbolic memregion, spawned into the immutable memory space. After this it simply re-binds the return value.
However, only eval::Call handler is supposed (must) to bind the return value, so the current implementation cannot land.

I played with the trait idea, which was this:

  1. Introduce a new program state trait with the REGISTER_SET_WITH_PROGRAMSTATE(ImmutableRegions, const ento::MemRegion *). One should not bind values to these regions. (*)
  2. The check::PostCall would simply insert into this set.
  3. The StoreToImmutableChecker, at the check::Bind handler, would verify that the region is not contained by the set - otherwise emit a report...
  4. Surface this new trait to be reachable by the Core infrastructure.
  5. Refactor all the Core functions introducing or expecting MemRegionManager::getGlobalsRegions to insert the region in question into this ImmutableRegions set associated with the current State, producing some new State.

The last point is the most critical. Now, by design, MemRegionManager does not refer to the ProgramState, hence we don't have one that we could mutate by inserting into the ImmutableRegions set. Passing a ProgramStateRef along all the functions is a nightmare. Trust me, I've tried it.
That being said, eradicating GlobalImmutableSpaceRegion seems to be challenging.

I've settled on using the custom trait, and the GlobalImmutableSpaceRegion memspace kind to detect if the store (bind) should be allowed or not.

WDYT @NoQ, how could we get rid of the GlobalImmutableSpaceRegion?


(*): Originally I wanted a set of const MemSpaceRegion *, but it turns out a default eval called function which returns a plain old mutable pointer is of SymRegion{conj{}} at the UnknownSpaceRegion. And we probably don't want to mark immutable the whole UnknownSpaceRegion xD.

Sorry for my late reply.

It feels like we have some serious obstacles.
The check::PostCall handler wants to mark some memory region immutable. Currently, the checker creates a new symbolic memregion, spawned into the immutable memory space. After this it simply re-binds the return value.
However, only eval::Call handler is supposed (must) to bind the return value, so the current implementation cannot land.

I played with the trait idea, which was this:

  1. Introduce a new program state trait with the REGISTER_SET_WITH_PROGRAMSTATE(ImmutableRegions, const ento::MemRegion *). One should not bind values to these regions. (*)
  2. The check::PostCall would simply insert into this set.
  3. The StoreToImmutableChecker, at the check::Bind handler, would verify that the region is not contained by the set - otherwise emit a report...
  4. Surface this new trait to be reachable by the Core infrastructure.
  5. Refactor all the Core functions introducing or expecting MemRegionManager::getGlobalsRegions to insert the region in question into this ImmutableRegions set associated with the current State, producing some new State.

The last point is the most critical. Now, by design, MemRegionManager does not refer to the ProgramState, hence we don't have one that we could mutate by inserting into the ImmutableRegions set. Passing a ProgramStateRef along all the functions is a nightmare. Trust me, I've tried it.
That being said, eradicating GlobalImmutableSpaceRegion seems to be challenging.

I've settled on using the custom trait, and the GlobalImmutableSpaceRegion memspace kind to detect if the store (bind) should be allowed or not.

WDYT @NoQ, how could we get rid of the GlobalImmutableSpaceRegion?


(*): Originally I wanted a set of const MemSpaceRegion *, but it turns out a default eval called function which returns a plain old mutable pointer is of SymRegion{conj{}} at the UnknownSpaceRegion. And we probably don't want to mark immutable the whole UnknownSpaceRegion xD.

@NoQ what do you think about steakhal's proposal?

Overall this seems to be a promising checker; I added a few minor remarks as inline comments. Unfortunately @steakhal's comment that

The check::PostCall handler wants to mark some memory region immutable. Currently, the checker creates a new symbolic memregion, spawned into the immutable memory space. After this it simply re-binds the return value.
However, only eval::Call handler is supposed (must) to bind the return value, so the current implementation cannot land.

highlights question that is as far as I see a blocking obstacle. I hope that you can find a solution for that technical difficulty.

Moreover, I agree with @martong's comment that you should probably create two commits for these two checkers. (One-way dependency between the checkers is not an obstacle: simply pay attention to merging the commits in the right order.)

clang/lib/StaticAnalyzer/Checkers/StoreToImmutableChecker.cpp
68–72

This code is a bit hard to understand, perhaps add a comment like
// If the binding statement looks like "*ptr = ...", then track the pointer "ptr"

clang/lib/StaticAnalyzer/Checkers/cert/ModelConstQualifiedReturnChecker.cpp
39

If you don't have concrete plans for extending this checker to cases where you need a different HandlerFn, then I strongly suggest replacing this map with a set (or a CallDescriptionMap<bool> where the bool is just a dummy placeholder). There is no reason to juggle member function pointers when they are always pointing to the same function.