This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] DynamicSize: Store the dynamic size
ClosedPublic

Authored by Charusso on Nov 1 2019, 11:57 AM.

Details

Summary

This patch introduces a way to store the size.
Also adds two debug checkers to dump out the size:
dumpExtent, dumpElementCount.

Diff Detail

Event Timeline

Charusso created this revision.Nov 1 2019, 11:57 AM
Charusso marked an inline comment as done.Nov 1 2019, 12:06 PM

I do not want to reverse engineer the MallocChecker to obtain the size on call basis. The current model without D68725 makes it enough difficult to obtain the size even with this generic map, but it is working fine.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
1073 ↗(On Diff #227500)

That dual assumption made changes in the test files, and there is no other dual assumption.

NoQ added inline comments.Nov 1 2019, 1:12 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
29–33

Why do we need this?

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
1073 ↗(On Diff #227500)

Wait, this code should not have been changed. It's not an allocation site, we don't receive any new information about the size of the region here.

clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
325–329

This gets rid of the assertion failure in https://bugs.llvm.org/show_bug.cgi?id=28450 by implementing my suggestion (2). Yay.

Charusso updated this revision to Diff 227534.Nov 1 2019, 2:06 PM
Charusso marked 6 inline comments as done.
  • Fix.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
29–33

I think as the checkers are growing and we push more-and-more allocation modeling so that at some point the Git's 8-parameter allocator's size expression could be retrieved so easily. This is the full arsenal of my buffer-overflow checker's needs, so I have pushed it here. Also it made a meaning to have a helper-class with two fields (one would be lame).

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
1439

My problem was only that. It partially repeats the ExprEngine::bindReturnValue(), which is a wonky design. I will look into that later.

clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp
325–329

Cool!

Changes to MallocChecker really highlight the positive effects of this patch. Nice!

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
493

Is it possible to merge these parameters into a single DynamicSizeInfo object?

1147–1148

and here?

Charusso marked 3 inline comments as done.Nov 4 2019, 8:20 AM

Changes to MallocChecker really highlight the positive effects of this patch. Nice!

Thanks!

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
493

I want to do the opposite so I want to hide such constructors with the global getters and setters to make it easier to use. It turns out you only want to obtain one of its fields at a time, so that it is a bad idea to give you the entire object with multiple metadata. In my mind they will be in a dynamic namespace, and we could write that: dynamic::getSize(State, MR) or dynamic::getType(State, MR).

I like the idea of creating Contexts to store metadata, but this is not the use case of the dynamic information, I think. The Dynamic*Info is good for being stored in a map, and other than that it is just boilerplate to obtain the object and call one of its fields which I do not like anymore.

NoQ added inline comments.Nov 4 2019, 11:54 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
29–33
In D68725#1722136, @NoQ wrote:

any path-sensitive checker for which such region is "interesting" would have to implement a bug visitor to display the allocation site. Such visitor automatically knows the location of the alloca() expression and can emit the necessary fixits.

Again, you will have to highlight the allocation site with a note. Therefore you will have to write a bug visitor that traverses the size expression at some point (or, equivalently, a note tag when the size expression is evaluated). Therefore you don't need to store the expression in the program state.

Charusso abandoned this revision.Jan 30 2020, 7:57 AM
Charusso marked an inline comment as done.

Let us say, we do not need this patch. In case we would need it, I will reopen.

Charusso updated this revision to Diff 241749.EditedJan 31 2020, 9:10 AM
Charusso edited the summary of this revision. (Show Details)
  • Let us reuse this patch.
  • Remove the expression storing feature.
  • Only store known sizes.
NoQ added a comment.Feb 20 2020, 2:02 PM

Hmm, has this patch not landed yet? I was so excited to finally have https://bugs.llvm.org/show_bug.cgi?id=28450 fixed :)

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
767 ↗(On Diff #241749)

This looks like an object count, we'll need to convert it to size in bytes because that's what extent is.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
710

Same. I guess we should add a test for both cases, given that nothing failed when we screwed up the extent.

Charusso updated this revision to Diff 248097.Mar 3 2020, 8:46 PM
Charusso marked 4 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Set the size properly.
  • Add new debug.ExprInspection patterns: region, size, element count.
  • clang-format -i ExprInspectionChecker.cpp.
  • Having no idea what is the single regression in tests.
Charusso marked 2 inline comments as done.Mar 3 2020, 8:59 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp
68

That one is interesting. Some of the checkers / SValBuilder(?) generate unsigned integers which should not happen, I believe. May we need a FIXME and an assertion about signedness. What do you think?

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
710

Well, it was equally wrong everywhere, so that it works... I have noticed it like 5 months ago, but I was lazy to fix.

clang/test/Analysis/explain-svals.cpp
57

Yea, that is the fact: The size is the size of the parameter, which is unknown.

clang/test/Analysis/memory-model.cpp
108 ↗(On Diff #248097)

Here I wanted to put more, but I am not that cool with other MemRegions. Some wise words about the followups of this test file:

Some day

clang/test/Analysis/misc-ps-region-store.m
1190 ↗(On Diff #248097)

That is the single regression which I do not get.

NoQ added inline comments.Mar 15 2020, 7:03 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
25

Naming: I believe we should keep using the word "Extent". There's no need to introduce a new term for the existing concept; it makes it harder to explain on the mailing list :) Let's make a follow-up patch to change the naming back (so that not to screw the review).

41–44

This function is probably going to be used exactly once in the whole code. There's no need to turn it into a public API.

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
99 ↗(On Diff #248097)

clang_analyzer_dump_extent()? Or just clang_analyzer_dump(clang_analyzer_getExtent()).

100 ↗(On Diff #248097)

clang_analyzer_dumpElementCount(). We need subjects in our sentences because they tend to vary quite a bit here and therefore cannot be implied (dump, return, express, ...).

183–185 ↗(On Diff #248097)
const Expr *Arg = getArgExpr(CE, C);
if (!Arg)
  return nullptr;

No need for gymnastics >.>

283 ↗(On Diff #248097)

So, how is it different from the existing clang_analyzer_dump()?

289–291 ↗(On Diff #248097)
const MemRegion *MR = getArgRegion(CE, C);
if (!MR)
  return;

... and so on.

310–311 ↗(On Diff #248097)

How is this better than getValueType()? Are you sure you're not getting a pointer type instead in the !TVR case? I.e., can you test this on a non-heap SymRegion?

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp
68

SymbolExtent has type size_t and both malloc and operator new accept size_t as a parameter. Therefore everything needs to be unsigned and we need to assert this.

That said, array indexes are signed (as per implementation of ElementRegion).

clang/test/Analysis/misc-ps-region-store.m
1190 ↗(On Diff #248097)

Well, please debug. Like, look at the full report, dump egraph, see what changed. Try to creduce the example further under the condition "behavior changed with the patch", maybe that'll clear something up (though if it's a true positive after creduce it doesn't guarantee that it's a true positive before creduce).

Charusso updated this revision to Diff 253394.Mar 28 2020, 8:31 PM
Charusso marked 19 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Fix VLASizeChecker's multi-dimensional array early return.
  • So that fix the regression in test misc-ps-region-store.m.
  • Fix tests that need regex.
  • Add documentation about dumpExtent, dumpElementCount.
Charusso added inline comments.Mar 28 2020, 8:31 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
25

Since then I have changed my mind when I have read about Environment and Store in a book. Sure, next up.

41–44

It is being used 3 times already, so I believe it is a cool API.

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
99 ↗(On Diff #248097)

I like the shorter version, but of course I have seen the longer version.

283 ↗(On Diff #248097)

I wanted to make it for the derived regions, but then I have realized I am lazy to dig into the buggier parts of the Analyzer. Removed.

310–311 ↗(On Diff #248097)

How is this better than getValueType()?

Consistency. We get the static size extent by getting the desugared type which most likely just an extra overhead.

Are you sure you're not getting a pointer type instead in the !TVR case? I.e., can you test this on a non-heap SymRegion?

The issue was the var_region_simple_ptr test: we need to use regex for checking after the }} token where / 4 is the correct result. I did not really get why the test pass.

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp
68

It was a premature optimization for consistency. I will leave the signedness as is, FIXME added.

clang/test/Analysis/misc-ps-region-store.m
1190 ↗(On Diff #248097)

Given that I have no access for rdar I did my hand-reduce and the solution is the following:

// RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin9 \
// RUN:  -analyzer-checker=core,alpha.security.ArrayBound \
// RUN:  -verify %s

static void RDar8424269_B(int unknown) {
  unsigned char tmp2t[1][unknown];
  tmp2t[1][13] = 0; // expected-warning \
    {{Access out-of-bound array element (buffer overflow)}}
}

Looking at the master branch core.VLASize created the constraint extent_$1{tmp2} {[0, 0]} without any state splits. The new behavior does not create random constraints so that an alpha checker could detect we have no information about the unknown size extent.

Given that the alpha checker cannot catch the following: char foo[bar]; foo[13] = 0; is assumed that the problem is the VLASizeChecker's modeling which has a FIXME: Handle multi-dimensional VLAs. but does not handle this case.

An easy and appropriate fix to detect the multi-dimensional array. Here is the way to obtain the inner dimensions: http://clang-developers.42468.n3.nabble.com/Multi-dimensional-arrays-td4028727.html

So that the final solution is:

   // FIXME: Handle multi-dimensional VLAs.
+  if (VLA->getElementType()->getAsArrayTypeUnsafe())
+    return;

I hope with that I have fixed even more Bugzilla issues. May it would be interesting to see why the size extent of the array got a constraint of being 0, but I leave it as an exercise for the reader.

Charusso updated this revision to Diff 254051.Mar 31 2020, 5:30 PM
  • Remove the last gymnastic.
  • Rebase.

While this patch spans across a lot of files, it is actually rather straightforward. I'm stunned to see that we never really had a proper dynamic size support, especially that this patch has been out there for a good long time :) Oh well, better learn that now than never.

There sure is some rebasing nightmare waiting for you in VLASizeChecker, and I honestly lack the confidence to accept a patch with so much SVal-smithing going on. With that said, I do like the overall direction.

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp
28–36

Well hold on for a second. Does this mean that we legit resolved getDynamicSize to getStaticSize??? That's crazy! I've been looking around in the code for how does your patch interact with our current dynamic size modeling, but I guess I never really realized that there is no such a thing. Odd to not even see a FIXME around.

29

getSuperRegionIfElementRegion?

63–65

Make this work as is "make sure that checkers actually obey this precondition" or does the assert just not work as-is? Could you specify this comment?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
779–780 ↗(On Diff #254051)

I wonder what the rationale was here. Why do we explicitly avoid alignment new?

799–800 ↗(On Diff #254051)

Regardless of whether this is heap allocated or not, we can set a dynamic size here, correct? Like, all the align new expressions are standard, but they aren't replaceable (as documented in isReplaceableGlobalAllocationFunction()).

NoQ added a comment.Sep 30 2020, 9:30 AM

Thank you for reminding me of this patch. I still think it's a pretty important patch and i'm interested in getting it landed.

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
1461

What happened to this StripCasts()? I don't see it in the after-code and I vaguely remember having to be very careful with it.

I was referred to this patch from https://reviews.llvm.org/D86743. I pulled this patch under review, brought it up to date and pushed to github at https://github.com/vabridgers/llvm-project-dev.git, branch: vla-fam-fixes. Everything seems ok on this branch (LITs pass, reproducers from https://bugs.llvm.org/show_bug.cgi?id=47272 and https://bugs.llvm.org/show_bug.cgi?id=28450 no longer crash). I can continue and push a change to Phabricator for review, or @Charusso and/or @balazske could finish this? I didn't want to just push an update without asking first :/ Cheers!

I was referred to this patch from https://reviews.llvm.org/D86743. I pulled this patch under review, brought it up to date and pushed to github at https://github.com/vabridgers/llvm-project-dev.git, branch: vla-fam-fixes. Everything seems ok on this branch (LITs pass, reproducers from https://bugs.llvm.org/show_bug.cgi?id=47272 and https://bugs.llvm.org/show_bug.cgi?id=28450 no longer crash). I can continue and push a change to Phabricator for review, or @Charusso and/or @balazske could finish this? I didn't want to just push an update without asking first :/ Cheers!

Hi Vince and Artem,

I agree that this is an important patch. Unfortunately it got stuck. I see two possible solutions:

  1. Vince commandeers this patch. I am not sure about the policy of commandeering a patch and that might be too intrusive.
  2. Vince opens a new patch (a copy of this) and gives credit to @Charusso when he commits. I prefer this.
Charusso updated this revision to Diff 319940.Jan 28 2021, 12:43 PM
Charusso marked 6 inline comments as done.
  • Fix everything.

Hey, I am back.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h
41–44

Let us remove the setDynamicSize(CXXNewExpr) API from now on (was not that cool). This function will reside inside ExprEngine::bindReturnValue() which is the second phase of evaluating operator new: https://reviews.llvm.org/D40560#961694

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
310–311 ↗(On Diff #248097)

Added a conjured SymbolicRegion test called user_defined_new and an assertion.

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
1461

I have reimplemented it by a mistake with the getSuperRegion(MR) helper routine. From now I will use StripCasts() instead. Nice catch!

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp
28–36

I wanted to upstream both of the patches, but this one became dead:

[analyzer] DynamicSize: Remove 'getExtent()' from regions
Oct 29 2019, 01:43

[analyzer] DynamicSize: Store the dynamic size
Nov 1 2019, 19:57

Sorry for the inconvenience.

29

getSuperRegion(MR) was just a plain helper routine not restricted to return iff MR is an ElementRegion. It obtains the immediate region which we can measure in size, but it turns out the same idea as MemRegion::StripCasts(), so I have removed it.

63–65

This assertion was about the signedness, because I have seen something like unsigned(42) == signed(42) did not evaluate to true, and I really wanted to make the size equality to be true in such value-equality cases. I have added a test (signedness_equality) instead, which is working fine now (I hope, because I am not sure what was the original behavior).

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
779–780 ↗(On Diff #254051)

We avoid to argue about non-standard memory allocation/alignment and extent measurement, because we are unsure about the resulting memory block. (May this is not the best way to measure standardity, but I leave it as is.)

799–800 ↗(On Diff #254051)

May the replacability is not the same as being standard operator new, which is a problem, but I leave this area as is, because I have realized the CXXNewExpr evaluation is the last phase of evaluating operator new [1], so we do not need to create the dynamic size again. (Also as I have tested out the symVal is unknown iff we could not model the operator new so far (being last phase [1]), which means it is non-standard).

[1] https://reviews.llvm.org/D40560#961694

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
710

More tests are added.

@NoQ, could you upstream it and move this idea forward please?

NoQ accepted this revision.Apr 1 2021, 8:35 PM

I don't see any further bugs so I think this is good to go!

I think you should separate out the NFC / debugging facility changes before committing.

clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
294 ↗(On Diff #319940)

128 is too much imho :)

This revision is now accepted and ready to land.Apr 1 2021, 8:35 PM
This revision was automatically updated to reflect the committed changes.
Charusso marked an inline comment as done.
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
294 ↗(On Diff #319940)

@Charusso
It looks like this patch introduced a some weird false positive on PostgreSQL


I'll try to look at it myself and minimize it, but maybe you can get an idea from a full report.

@Charusso
It looks like this patch introduced a some weird false positive on PostgreSQL


I'll try to look at it myself and minimize it, but maybe you can get an idea from a full report.

Could you supply us with all the divergence please? If there is only one case, may we can ignore it by appending it to @NoQ's extremely-weird-bugs.txt so people can focus on more important stuff.

What I see in the bug report is that: line 3290 - guc_malloc() returns null and it is a true positive.

@Charusso
It looks like this patch introduced a some weird false positive on PostgreSQL


I'll try to look at it myself and minimize it, but maybe you can get an idea from a full report.

Could you supply us with all the divergence please? If there is only one case, may we can ignore it by appending it to @NoQ's extremely-weird-bugs.txt so people can focus on more important stuff.

What I see in the bug report is that: line 3290 - guc_malloc() returns null and it is a true positive.

  1. It's the only bug in divergence.
  2. The analyzer doesn't explain why it thinks that guc_malloc returns null pointer. I find it alarming that it might assume it for all the wrong reasons.
  3. It is a false positive because guc_malloc is called with elevel == FATAL, which causes ereport to fail the whole execution.

I believe that this problem is important because it is definitely a regression.

NoQ added a comment.EditedApr 6 2021, 10:16 PM

Here's a reduced repro - a file that has different behavior before and after the patch (sorry, not perfectly reduced, my creduce is broken again):

// RUN: clang --analyze %s
typedef Oid;
typedef Pointer;
typedef text;
errstart(int elevel, filename, lineno, funcname, domain);
typedef Datum;
typedef struct {
  Oid elemtype;
} ArrayType;

*guc_malloc(elevel, size) {
  void *data;
  data = malloc(size);
  if (data == (0))
    (errstart(elevel, "c", 3298, __func__, (0))
         ? ((errcode((((('0') & 0x3F) < 24)))))
         : 0);
  return (errstart(elevel, "c", 3324, __func__, (0)) ? ((errcode(((24))))) : 0);
}

ParseLongOption(*string, *name, value) {
  *name = guc_malloc(21, 1);
  __builtin___strlcpy_chk(*name, string, 1, string[1]);
}

ProcessGUCArray(ArrayType *array, action) {
  int i = 0;
  { (((((array)->elemtype) = 25)), "c", 7599); }
  for (i = i + 1; ((((char *)(array)) + sizeof(ArrayType)))[0];) {
    Datum d;
    char s;
    char name;
    char value;
    d = array_ref();
    s = text_to_cstring((text)((Pointer)(d)));
    ParseLongOption(s, &name, &value);
      (errstart(19, "c", 7629, __func__, (0)) ? ((errcode(), errmsg(name)))
                                              : 0);
  }
}

It's most likely some budget heuristic acting up. By slightly changing stuff that has shouldn't have any effect you can easily eliminate the regression or even reverse it (so that the warning appeared before the patch but not after the patch). The patch definitely changes modeling so it's possible that it causes budgets to be spent differently.

One particular thing i noticed about the behavior after the patch (by observing exploded graph topologies) is that it causes states to be merged more often. Which is expected because where previously we've created a new extent symbol and added a new constraint, currently we simply re-use the existing symbol. This causes 10% improvement in the number of generated nodes (on the above toy example). I also didn't immediately notice any unintended state splits.

I think we should indeed move on. I'm curious which specific budget do we hit but this patch definitely didn't introduce the root cause of the problem, only accidentally surfaced it. We should still investigate the tracking bug though, i.e. why path in guc_malloc() isn't explained.

Thank you guys for investigating it!

  1. The analyzer doesn't explain why it thinks that guc_malloc returns null pointer. I find it alarming that it might assume it for all the wrong reasons.
In D69726#2673178, @NoQ wrote:

We should still investigate the tracking bug though, i.e. why path in guc_malloc() isn't explained.

What I have seen back in the days is that: uninitialized variables and variables storing NULL are not attached to regions so we cannot map notes to their origin region and we cannot track them. That is where trackExpressionValue() tries to attach notes based on changes in the Store and with full of heuristics. The nature of heuristics and the fight of note-creation and note-suppression Reporters what you see.

If we would prioritize to massage the trackExpressionValue() framework, count me in: I have half year of pai- programming in it, but I am out of the office.