Page MenuHomePhabricator

[analyzer] DynamicSize: Store the dynamic size
Needs ReviewPublic

Authored by Charusso on Nov 1 2019, 11:57 AM.



This patch introduces a way to store the size.
Also adds two debug checkers to dump out the size:
dumpExtent, dumpElementCount.

Diff Detail

Event Timeline

Charusso created this revision.Nov 1 2019, 11:57 AM
Charusso marked an inline comment as done.Nov 1 2019, 12:06 PM

I do not want to reverse engineer the MallocChecker to obtain the size on call basis. The current model without D68725 makes it enough difficult to obtain the size even with this generic map, but it is working fine.

1073 ↗(On Diff #227500)

That dual assumption made changes in the test files, and there is no other dual assumption.

NoQ added inline comments.Nov 1 2019, 1:12 PM

Why do we need this?

1073 ↗(On Diff #227500)

Wait, this code should not have been changed. It's not an allocation site, we don't receive any new information about the size of the region here.


This gets rid of the assertion failure in by implementing my suggestion (2). Yay.

Charusso updated this revision to Diff 227534.Nov 1 2019, 2:06 PM
Charusso marked 6 inline comments as done.
  • Fix.

I think as the checkers are growing and we push more-and-more allocation modeling so that at some point the Git's 8-parameter allocator's size expression could be retrieved so easily. This is the full arsenal of my buffer-overflow checker's needs, so I have pushed it here. Also it made a meaning to have a helper-class with two fields (one would be lame).


My problem was only that. It partially repeats the ExprEngine::bindReturnValue(), which is a wonky design. I will look into that later.



Changes to MallocChecker really highlight the positive effects of this patch. Nice!


Is it possible to merge these parameters into a single DynamicSizeInfo object?


and here?

Charusso marked 3 inline comments as done.Nov 4 2019, 8:20 AM

Changes to MallocChecker really highlight the positive effects of this patch. Nice!



I want to do the opposite so I want to hide such constructors with the global getters and setters to make it easier to use. It turns out you only want to obtain one of its fields at a time, so that it is a bad idea to give you the entire object with multiple metadata. In my mind they will be in a dynamic namespace, and we could write that: dynamic::getSize(State, MR) or dynamic::getType(State, MR).

I like the idea of creating Contexts to store metadata, but this is not the use case of the dynamic information, I think. The Dynamic*Info is good for being stored in a map, and other than that it is just boilerplate to obtain the object and call one of its fields which I do not like anymore.

NoQ added inline comments.Nov 4 2019, 11:54 AM
In D68725#1722136, @NoQ wrote:

any path-sensitive checker for which such region is "interesting" would have to implement a bug visitor to display the allocation site. Such visitor automatically knows the location of the alloca() expression and can emit the necessary fixits.

Again, you will have to highlight the allocation site with a note. Therefore you will have to write a bug visitor that traverses the size expression at some point (or, equivalently, a note tag when the size expression is evaluated). Therefore you don't need to store the expression in the program state.

Charusso abandoned this revision.Jan 30 2020, 7:57 AM
Charusso marked an inline comment as done.

Let us say, we do not need this patch. In case we would need it, I will reopen.

Charusso updated this revision to Diff 241749.EditedJan 31 2020, 9:10 AM
Charusso edited the summary of this revision. (Show Details)
  • Let us reuse this patch.
  • Remove the expression storing feature.
  • Only store known sizes.
NoQ added a comment.Feb 20 2020, 2:02 PM

Hmm, has this patch not landed yet? I was so excited to finally have fixed :)


This looks like an object count, we'll need to convert it to size in bytes because that's what extent is.


Same. I guess we should add a test for both cases, given that nothing failed when we screwed up the extent.

Charusso updated this revision to Diff 248097.Mar 3 2020, 8:46 PM
Charusso marked 4 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Set the size properly.
  • Add new debug.ExprInspection patterns: region, size, element count.
  • clang-format -i ExprInspectionChecker.cpp.
  • Having no idea what is the single regression in tests.
Charusso marked 2 inline comments as done.Mar 3 2020, 8:59 PM
Charusso added inline comments.

That one is interesting. Some of the checkers / SValBuilder(?) generate unsigned integers which should not happen, I believe. May we need a FIXME and an assertion about signedness. What do you think?


Well, it was equally wrong everywhere, so that it works... I have noticed it like 5 months ago, but I was lazy to fix.


Yea, that is the fact: The size is the size of the parameter, which is unknown.


Here I wanted to put more, but I am not that cool with other MemRegions. Some wise words about the followups of this test file:

Some day

1190 ↗(On Diff #248097)

That is the single regression which I do not get.

NoQ added inline comments.Mar 15 2020, 7:03 PM

Naming: I believe we should keep using the word "Extent". There's no need to introduce a new term for the existing concept; it makes it harder to explain on the mailing list :) Let's make a follow-up patch to change the naming back (so that not to screw the review).


This function is probably going to be used exactly once in the whole code. There's no need to turn it into a public API.


clang_analyzer_dump_extent()? Or just clang_analyzer_dump(clang_analyzer_getExtent()).


clang_analyzer_dumpElementCount(). We need subjects in our sentences because they tend to vary quite a bit here and therefore cannot be implied (dump, return, express, ...).

const Expr *Arg = getArgExpr(CE, C);
if (!Arg)
  return nullptr;

No need for gymnastics >.>


So, how is it different from the existing clang_analyzer_dump()?

const MemRegion *MR = getArgRegion(CE, C);
if (!MR)

... and so on.


How is this better than getValueType()? Are you sure you're not getting a pointer type instead in the !TVR case? I.e., can you test this on a non-heap SymRegion?


SymbolExtent has type size_t and both malloc and operator new accept size_t as a parameter. Therefore everything needs to be unsigned and we need to assert this.

That said, array indexes are signed (as per implementation of ElementRegion).

1190 ↗(On Diff #248097)

Well, please debug. Like, look at the full report, dump egraph, see what changed. Try to creduce the example further under the condition "behavior changed with the patch", maybe that'll clear something up (though if it's a true positive after creduce it doesn't guarantee that it's a true positive before creduce).

Charusso updated this revision to Diff 253394.Mar 28 2020, 8:31 PM
Charusso marked 19 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Fix VLASizeChecker's multi-dimensional array early return.
  • So that fix the regression in test misc-ps-region-store.m.
  • Fix tests that need regex.
  • Add documentation about dumpExtent, dumpElementCount.
Charusso added inline comments.Mar 28 2020, 8:31 PM

Since then I have changed my mind when I have read about Environment and Store in a book. Sure, next up.


It is being used 3 times already, so I believe it is a cool API.


I like the shorter version, but of course I have seen the longer version.


I wanted to make it for the derived regions, but then I have realized I am lazy to dig into the buggier parts of the Analyzer. Removed.


How is this better than getValueType()?

Consistency. We get the static size extent by getting the desugared type which most likely just an extra overhead.

Are you sure you're not getting a pointer type instead in the !TVR case? I.e., can you test this on a non-heap SymRegion?

The issue was the var_region_simple_ptr test: we need to use regex for checking after the }} token where / 4 is the correct result. I did not really get why the test pass.


It was a premature optimization for consistency. I will leave the signedness as is, FIXME added.

1190 ↗(On Diff #248097)

Given that I have no access for rdar I did my hand-reduce and the solution is the following:

// RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin9 \
// RUN:  -analyzer-checker=core, \
// RUN:  -verify %s

static void RDar8424269_B(int unknown) {
  unsigned char tmp2t[1][unknown];
  tmp2t[1][13] = 0; // expected-warning \
    {{Access out-of-bound array element (buffer overflow)}}

Looking at the master branch core.VLASize created the constraint extent_$1{tmp2} {[0, 0]} without any state splits. The new behavior does not create random constraints so that an alpha checker could detect we have no information about the unknown size extent.

Given that the alpha checker cannot catch the following: char foo[bar]; foo[13] = 0; is assumed that the problem is the VLASizeChecker's modeling which has a FIXME: Handle multi-dimensional VLAs. but does not handle this case.

An easy and appropriate fix to detect the multi-dimensional array. Here is the way to obtain the inner dimensions:

So that the final solution is:

   // FIXME: Handle multi-dimensional VLAs.
+  if (VLA->getElementType()->getAsArrayTypeUnsafe())
+    return;

I hope with that I have fixed even more Bugzilla issues. May it would be interesting to see why the size extent of the array got a constraint of being 0, but I leave it as an exercise for the reader.

Charusso updated this revision to Diff 254051.Mar 31 2020, 5:30 PM
  • Remove the last gymnastic.
  • Rebase.

While this patch spans across a lot of files, it is actually rather straightforward. I'm stunned to see that we never really had a proper dynamic size support, especially that this patch has been out there for a good long time :) Oh well, better learn that now than never.

There sure is some rebasing nightmare waiting for you in VLASizeChecker, and I honestly lack the confidence to accept a patch with so much SVal-smithing going on. With that said, I do like the overall direction.




Well hold on for a second. Does this mean that we legit resolved getDynamicSize to getStaticSize??? That's crazy! I've been looking around in the code for how does your patch interact with our current dynamic size modeling, but I guess I never really realized that there is no such a thing. Odd to not even see a FIXME around.


Make this work as is "make sure that checkers actually obey this precondition" or does the assert just not work as-is? Could you specify this comment?


I wonder what the rationale was here. Why do we explicitly avoid alignment new?


Regardless of whether this is heap allocated or not, we can set a dynamic size here, correct? Like, all the align new expressions are standard, but they aren't replaceable (as documented in isReplaceableGlobalAllocationFunction()).

NoQ added a comment.Wed, Sep 30, 9:30 AM

Thank you for reminding me of this patch. I still think it's a pretty important patch and i'm interested in getting it landed.


What happened to this StripCasts()? I don't see it in the after-code and I vaguely remember having to be very careful with it.