This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Treat system globals as mutable if they are not const
ClosedPublic

Authored by steakhal on Jun 8 2022, 7:10 AM.

Details

Summary

Previously, system globals were treated as immutable regions, unless it
was the errno which is known to be frequently modified.

D124244 wants to add a check for stores to immutable regions.
It would basically turn all stores to system globals into an error even
though we have no reason to believe that those mutable sys globals
should be treated as if they were immutable. And this leads to
false-positives if we apply D124244.

In this patch, I'm proposing to treat mutable sys globals actually
mutable, hence allocate them into the GlobalSystemSpaceRegion, UNLESS
they were declared as const (and a primitive arithmetic type), in
which case, we should use GlobalImmutableSpaceRegion.

In any other cases, I'm using the GlobalInternalSpaceRegion, which is
no different than the previous behavior.


In the tests I added, only the last expected-warning was different, compared to the baseline.
Which is this:

void test_my_mutable_system_global_constraint() {
  assert(my_mutable_system_global > 2);
  clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}}
  invalidate_globals();
  clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
void test_my_mutable_system_global_assign(int x) {
  my_mutable_system_global = x;
  clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}}
  invalidate_globals();
  clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE.
}

Unfortunately, the taint checker and the taint propagation will be affected by this change.
Immutable regions were not invalidated, thus taint was preserved as an artifact of this.
Now, such memory regions do get invalidated, thus a new symbolic value will be bound to them; which won't be tainted anymore.

This caused a case in the test suite where the global stdin pointer got 'sanitized' by the default eval called fscanf() call.
In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

The measurements are not yet complete, but the current results show minor report differences across the measured projects.
Almost all relate to this taint propagation regression, and the handful of others are probably an artifact of the missing sink nodes caused by those reports.
I'll come back with a more in-depth analysis if requested.

Diff Detail

Event Timeline

steakhal created this revision.Jun 8 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 7:10 AM
steakhal requested review of this revision.Jun 8 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 7:10 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal edited the summary of this revision. (Show Details)Jun 8 2022, 7:29 AM

In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

How complex would it be to add the taint propagation to the engine/evalCall? Do you see that feasible as a parent patch of this?

In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

How complex would it be to add the taint propagation to the engine/evalCall? Do you see that feasible as a parent patch of this?

I see your point, but I don't think we should delay D124244 because of that. By landing this, we could also land that. And the change you propose might take significantly longer to make and verify.
I could make some prototypes, but I'm not making promises.

In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

How complex would it be to add the taint propagation to the engine/evalCall? Do you see that feasible as a parent patch of this?

I see your point, but I don't think we should delay D124244 because of that. By landing this, we could also land that. And the change you propose might take significantly longer to make and verify.
I could make some prototypes, but I'm not making promises.

Ok.

One concern. Can we decide for a global const system variable if that it is a system variable? Consider my_const_system_global, it will be in GlobalImmutableSpaceRegion, however, it is also a system variable. Would it make sense to have a GlobalImmutableSystemSpaceRegion?

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
979–981

Is this comment still meaningful? I don't think so because if the type is const, then it does not matter what sub-types it has.

clang/test/Analysis/globals-are-not-always-immutable.c
15

I am wondering if we should have another test for assert(errno == 2);. Because here we add a binding to the store, however, with the assert we add a constraint to the range based constraint manager and these are very much different cases.

The same applies to the below tests as well.

56

Please highlight that this is the essence of the change.

In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

How complex would it be to add the taint propagation to the engine/evalCall? Do you see that feasible as a parent patch of this?

I see your point, but I don't think we should delay D124244 because of that. By landing this, we could also land that. And the change you propose might take significantly longer to make and verify.
I could make some prototypes, but I'm not making promises.

Ok.

One concern. Can we decide for a global const system variable if that it is a system variable? Consider my_const_system_global, it will be in GlobalImmutableSpaceRegion, however, it is also a system variable. Would it make sense to have a GlobalImmutableSystemSpaceRegion?

Well, there is more to this story. Constness and storage addressspace should be two different propery. Memspaces supposed to represent the latter. The mutability should have been designed as a GDM trait instead. It just happens to be that on some platforms there is a segment which is mapped as read only. As far as memspaces goes, it shouldn't be important.

Consider some addspace attributed pointers. We should be able to conclude aliasing properties from just the type. In general (modulo weird cases), different addspace pointers should not refer to the same lvalue. Hence, the corresponding memspaces should also differ.

In theory, the engine should propagate taint in default eval call. If that would be the case, we would still have this report.

How complex would it be to add the taint propagation to the engine/evalCall? Do you see that feasible as a parent patch of this?

I see your point, but I don't think we should delay D124244 because of that. By landing this, we could also land that. And the change you propose might take significantly longer to make and verify.
I could make some prototypes, but I'm not making promises.

Ok.

One concern. Can we decide for a global const system variable if that it is a system variable? Consider my_const_system_global, it will be in GlobalImmutableSpaceRegion, however, it is also a system variable. Would it make sense to have a GlobalImmutableSystemSpaceRegion?

Well, there is more to this story. Constness and storage addressspace should be two different propery. Memspaces supposed to represent the latter. The mutability should have been designed as a GDM trait instead. It just happens to be that on some platforms there is a segment which is mapped as read only. As far as memspaces goes, it shouldn't be important.

Okay, makes sense. Maybe we could go into that direction in a later patch stack.
So, I think this patch is indeed a step forward, nice work.

steakhal edited the summary of this revision. (Show Details)Jun 9 2022, 10:31 AM
steakhal added reviewers: xazax.hun, zukatsinadze.
steakhal updated this revision to Diff 435611.Jun 9 2022, 10:44 AM
steakhal edited the summary of this revision. (Show Details)
steakhal marked 2 inline comments as done.
  • Use assert macro for introducing constraints.
  • Add tests for store binding invalidations, besides constraint invalidations.
steakhal added inline comments.Jun 9 2022, 10:49 AM
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
979–981

Makes sense. However, I'm not sure. Is it the case @xazax.hun?

martong accepted this revision.Jun 13 2022, 8:59 AM

Ok, LGTM

clang/test/Analysis/globals-are-not-always-immutable.c
71

Just remove these when you commit.

This revision is now accepted and ready to land.Jun 13 2022, 8:59 AM
xazax.hun accepted this revision.Jun 13 2022, 11:53 AM

Overall looks good to me, it is a step towards having a more faithful representation of the reality.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
978

Why do we need this extra condition? I see that this was the original behavior, just wondering if we actually know. I think there are more types than arithmetic that can be stored in the read only memory.

steakhal added inline comments.Jun 13 2022, 12:00 PM
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
978

Well, seems reasonable. Do you think I should remove it here or in a separate patch?

clang/test/Analysis/globals-are-not-always-immutable.c
71

Okay!

xazax.hun added inline comments.Jun 13 2022, 12:14 PM
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
978

Let's keep it separate, so it can be reverted individually. Just in case we find out the reason the hard way :)

steakhal updated this revision to Diff 436707.Jun 14 2022, 2:05 AM
steakhal marked an inline comment as done.
  • Modify the GenericTaintChecker::isStdin() to look through derived symbols, to mitigate the effect of invalidations.
steakhal requested review of this revision.Jun 14 2022, 2:06 AM

Please check this again @martong @xazax.hun.
I'll also conduct a measurement, investigating what report changes we experience with this change.

  • Modify the GenericTaintChecker::isStdin() to look through derived symbols, to mitigate the effect of invalidations.

So, the taint property is still not propagated by the engine after the invalidation. BUT, since we have the

static bool isTaintedOrPointsToTainted(const Expr *E, .... {
  if (isTainted(State, E, C.getLocationContext()) || isStdin(E, C))
    return true;

condition and the modified isStdin, now we consider the Expr* associated to stdin as tainted. Please confirm my understanding is correct.

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
98–101

This code yields to a virtual function call. And we fortunately have that virtual function in the base class.
Use SymExpr::getOriginRegion() and dyn_cast_or_null to DeclRegion.

steakhal updated this revision to Diff 436821.Jun 14 2022, 9:31 AM
steakhal marked an inline comment as done.

Use getOriginRegion().

  • Modify the GenericTaintChecker::isStdin() to look through derived symbols, to mitigate the effect of invalidations.

So, the taint property is still not propagated by the engine after the invalidation. BUT, since we have the

static bool isTaintedOrPointsToTainted(const Expr *E, .... {
  if (isTainted(State, E, C.getLocationContext()) || isStdin(E, C))
    return true;

condition and the modified isStdin, now we consider the Expr* associated to stdin as tainted. Please confirm my understanding is correct.

Exactly.

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
98–101

Nice catch. I did not even know about this.

steakhal marked 3 inline comments as done.Jun 14 2022, 10:32 AM
steakhal added inline comments.
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
978
979–981
This revision is now accepted and ready to land.Jun 15 2022, 12:50 AM
steakhal marked an inline comment as done.Jun 15 2022, 7:37 AM

According to my measurements, this commit will change some reports, but nothing significant (<20) on our testset (except llvm which I did not include).
But more importantly, no taint reports disappeared nor were introduced.

This revision was landed with ongoing or failed builds.Jun 15 2022, 8:08 AM
This revision was automatically updated to reflect the committed changes.