This patch introduces a way to store the size.
Also adds two debug checkers to dump out the size:
dumpExtent, dumpElementCount.
Details
- Reviewers
NoQ vsavchenko - Commits
- rGdf64f471d1e2: [analyzer] DynamicSize: Store the dynamic size
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I do not want to reverse engineer the MallocChecker to obtain the size on call basis. The current model without D68725 makes it enough difficult to obtain the size even with this generic map, but it is working fine.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
1073 ↗ | (On Diff #227500) | That dual assumption made changes in the test files, and there is no other dual assumption. |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
29–33 | Why do we need this? | |
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
1073 ↗ | (On Diff #227500) | Wait, this code should not have been changed. It's not an allocation site, we don't receive any new information about the size of the region here. |
clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp | ||
325–329 | This gets rid of the assertion failure in https://bugs.llvm.org/show_bug.cgi?id=28450 by implementing my suggestion (2). Yay. |
- Fix.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
29–33 | I think as the checkers are growing and we push more-and-more allocation modeling so that at some point the Git's 8-parameter allocator's size expression could be retrieved so easily. This is the full arsenal of my buffer-overflow checker's needs, so I have pushed it here. Also it made a meaning to have a helper-class with two fields (one would be lame). | |
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp | ||
1439 | My problem was only that. It partially repeats the ExprEngine::bindReturnValue(), which is a wonky design. I will look into that later. | |
clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp | ||
325–329 | Cool! |
Thanks!
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp | ||
---|---|---|
493 | I want to do the opposite so I want to hide such constructors with the global getters and setters to make it easier to use. It turns out you only want to obtain one of its fields at a time, so that it is a bad idea to give you the entire object with multiple metadata. In my mind they will be in a dynamic namespace, and we could write that: dynamic::getSize(State, MR) or dynamic::getType(State, MR). I like the idea of creating Contexts to store metadata, but this is not the use case of the dynamic information, I think. The Dynamic*Info is good for being stored in a map, and other than that it is just boilerplate to obtain the object and call one of its fields which I do not like anymore. |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
29–33 |
- Let us reuse this patch.
- Remove the expression storing feature.
- Only store known sizes.
Hmm, has this patch not landed yet? I was so excited to finally have https://bugs.llvm.org/show_bug.cgi?id=28450 fixed :)
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
---|---|---|
767 ↗ | (On Diff #241749) | This looks like an object count, we'll need to convert it to size in bytes because that's what extent is. |
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp | ||
710 | Same. I guess we should add a test for both cases, given that nothing failed when we screwed up the extent. |
- Set the size properly.
- Add new debug.ExprInspection patterns: region, size, element count.
- clang-format -i ExprInspectionChecker.cpp.
- Having no idea what is the single regression in tests.
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp | ||
---|---|---|
68 | That one is interesting. Some of the checkers / SValBuilder(?) generate unsigned integers which should not happen, I believe. May we need a FIXME and an assertion about signedness. What do you think? | |
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp | ||
710 | Well, it was equally wrong everywhere, so that it works... I have noticed it like 5 months ago, but I was lazy to fix. | |
clang/test/Analysis/explain-svals.cpp | ||
57 | Yea, that is the fact: The size is the size of the parameter, which is unknown. | |
clang/test/Analysis/memory-model.cpp | ||
108 ↗ | (On Diff #248097) | Here I wanted to put more, but I am not that cool with other MemRegions. Some wise words about the followups of this test file:
|
clang/test/Analysis/misc-ps-region-store.m | ||
1190 ↗ | (On Diff #248097) | That is the single regression which I do not get. |
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
25 | Naming: I believe we should keep using the word "Extent". There's no need to introduce a new term for the existing concept; it makes it harder to explain on the mailing list :) Let's make a follow-up patch to change the naming back (so that not to screw the review). | |
41–44 | This function is probably going to be used exactly once in the whole code. There's no need to turn it into a public API. | |
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp | ||
99 ↗ | (On Diff #248097) | clang_analyzer_dump_extent()? Or just clang_analyzer_dump(clang_analyzer_getExtent()). |
100 ↗ | (On Diff #248097) | clang_analyzer_dumpElementCount(). We need subjects in our sentences because they tend to vary quite a bit here and therefore cannot be implied (dump, return, express, ...). |
183–185 ↗ | (On Diff #248097) | const Expr *Arg = getArgExpr(CE, C); if (!Arg) return nullptr; No need for gymnastics >.> |
283 ↗ | (On Diff #248097) | So, how is it different from the existing clang_analyzer_dump()? |
289–291 ↗ | (On Diff #248097) | const MemRegion *MR = getArgRegion(CE, C); if (!MR) return; ... and so on. |
310–311 ↗ | (On Diff #248097) | How is this better than getValueType()? Are you sure you're not getting a pointer type instead in the !TVR case? I.e., can you test this on a non-heap SymRegion? |
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp | ||
68 | SymbolExtent has type size_t and both malloc and operator new accept size_t as a parameter. Therefore everything needs to be unsigned and we need to assert this. That said, array indexes are signed (as per implementation of ElementRegion). | |
clang/test/Analysis/misc-ps-region-store.m | ||
1190 ↗ | (On Diff #248097) | Well, please debug. Like, look at the full report, dump egraph, see what changed. Try to creduce the example further under the condition "behavior changed with the patch", maybe that'll clear something up (though if it's a true positive after creduce it doesn't guarantee that it's a true positive before creduce). |
- Fix VLASizeChecker's multi-dimensional array early return.
- So that fix the regression in test misc-ps-region-store.m.
- Fix tests that need regex.
- Add documentation about dumpExtent, dumpElementCount.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
25 | Since then I have changed my mind when I have read about Environment and Store in a book. Sure, next up. | |
41–44 | It is being used 3 times already, so I believe it is a cool API. | |
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp | ||
99 ↗ | (On Diff #248097) | I like the shorter version, but of course I have seen the longer version. |
283 ↗ | (On Diff #248097) | I wanted to make it for the derived regions, but then I have realized I am lazy to dig into the buggier parts of the Analyzer. Removed. |
310–311 ↗ | (On Diff #248097) |
Consistency. We get the static
The issue was the var_region_simple_ptr test: we need to use regex for checking after the }} token where / 4 is the correct result. I did not really get why the test pass. |
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp | ||
68 | It was a premature optimization for consistency. I will leave the signedness as is, FIXME added. | |
clang/test/Analysis/misc-ps-region-store.m | ||
1190 ↗ | (On Diff #248097) | Given that I have no access for rdar I did my hand-reduce and the solution is the following: // RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin9 \ // RUN: -analyzer-checker=core,alpha.security.ArrayBound \ // RUN: -verify %s static void RDar8424269_B(int unknown) { unsigned char tmp2t[1][unknown]; tmp2t[1][13] = 0; // expected-warning \ {{Access out-of-bound array element (buffer overflow)}} } Looking at the master branch core.VLASize created the constraint extent_$1{tmp2} {[0, 0]} without any state splits. The new behavior does not create random constraints so that an alpha checker could detect we have no information about the unknown Given that the alpha checker cannot catch the following: char foo[bar]; foo[13] = 0; is assumed that the problem is the VLASizeChecker's modeling which has a FIXME: Handle multi-dimensional VLAs. but does not handle this case. An easy and appropriate fix to detect the multi-dimensional array. Here is the way to obtain the inner dimensions: http://clang-developers.42468.n3.nabble.com/Multi-dimensional-arrays-td4028727.html So that the final solution is: // FIXME: Handle multi-dimensional VLAs. + if (VLA->getElementType()->getAsArrayTypeUnsafe()) + return; I hope with that I have fixed even more Bugzilla issues. May it would be interesting to see why the |
While this patch spans across a lot of files, it is actually rather straightforward. I'm stunned to see that we never really had a proper dynamic size support, especially that this patch has been out there for a good long time :) Oh well, better learn that now than never.
There sure is some rebasing nightmare waiting for you in VLASizeChecker, and I honestly lack the confidence to accept a patch with so much SVal-smithing going on. With that said, I do like the overall direction.
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp | ||
---|---|---|
28–36 | Well hold on for a second. Does this mean that we legit resolved getDynamicSize to getStaticSize??? That's crazy! I've been looking around in the code for how does your patch interact with our current dynamic size modeling, but I guess I never really realized that there is no such a thing. Odd to not even see a FIXME around. | |
29 | getSuperRegionIfElementRegion? | |
63–65 | Make this work as is "make sure that checkers actually obey this precondition" or does the assert just not work as-is? Could you specify this comment? | |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
779–780 ↗ | (On Diff #254051) | I wonder what the rationale was here. Why do we explicitly avoid alignment new? |
799–800 ↗ | (On Diff #254051) | Regardless of whether this is heap allocated or not, we can set a dynamic size here, correct? Like, all the align new expressions are standard, but they aren't replaceable (as documented in isReplaceableGlobalAllocationFunction()). |
Thank you for reminding me of this patch. I still think it's a pretty important patch and i'm interested in getting it landed.
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp | ||
---|---|---|
1461 | What happened to this StripCasts()? I don't see it in the after-code and I vaguely remember having to be very careful with it. |
I was referred to this patch from https://reviews.llvm.org/D86743. I pulled this patch under review, brought it up to date and pushed to github at https://github.com/vabridgers/llvm-project-dev.git, branch: vla-fam-fixes. Everything seems ok on this branch (LITs pass, reproducers from https://bugs.llvm.org/show_bug.cgi?id=47272 and https://bugs.llvm.org/show_bug.cgi?id=28450 no longer crash). I can continue and push a change to Phabricator for review, or @Charusso and/or @balazske could finish this? I didn't want to just push an update without asking first :/ Cheers!
Hi Vince and Artem,
I agree that this is an important patch. Unfortunately it got stuck. I see two possible solutions:
- Vince commandeers this patch. I am not sure about the policy of commandeering a patch and that might be too intrusive.
- Vince opens a new patch (a copy of this) and gives credit to @Charusso when he commits. I prefer this.
Hey, I am back.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h | ||
---|---|---|
41–44 | Let us remove the setDynamicSize(CXXNewExpr) API from now on (was not that cool). This function will reside inside ExprEngine::bindReturnValue() which is the second phase of evaluating operator new: https://reviews.llvm.org/D40560#961694 | |
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp | ||
310–311 ↗ | (On Diff #248097) | Added a conjured SymbolicRegion test called user_defined_new and an assertion. |
clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp | ||
1461 | I have reimplemented it by a mistake with the getSuperRegion(MR) helper routine. From now I will use StripCasts() instead. Nice catch! | |
clang/lib/StaticAnalyzer/Core/DynamicSize.cpp | ||
28–36 | I wanted to upstream both of the patches, but this one became dead: [analyzer] DynamicSize: Remove 'getExtent()' from regions [analyzer] DynamicSize: Store the dynamic size Sorry for the inconvenience. | |
29 | getSuperRegion(MR) was just a plain helper routine not restricted to return iff MR is an ElementRegion. It obtains the immediate region which we can measure in size, but it turns out the same idea as MemRegion::StripCasts(), so I have removed it. | |
63–65 | This assertion was about the signedness, because I have seen something like unsigned(42) == signed(42) did not evaluate to true, and I really wanted to make the size equality to be true in such value-equality cases. I have added a test (signedness_equality) instead, which is working fine now (I hope, because I am not sure what was the original behavior). | |
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp | ||
779–780 ↗ | (On Diff #254051) | We avoid to argue about non-standard memory allocation/alignment and extent measurement, because we are unsure about the resulting memory block. (May this is not the best way to measure standardity, but I leave it as is.) |
799–800 ↗ | (On Diff #254051) | May the replacability is not the same as being standard operator new, which is a problem, but I leave this area as is, because I have realized the CXXNewExpr evaluation is the last phase of evaluating operator new [1], so we do not need to create the dynamic size again. (Also as I have tested out the symVal is unknown iff we could not model the operator new so far (being last phase [1]), which means it is non-standard). |
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp | ||
710 | More tests are added. |
I don't see any further bugs so I think this is good to go!
I think you should separate out the NFC / debugging facility changes before committing.
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp | ||
---|---|---|
294 ↗ | (On Diff #319940) | 128 is too much imho :) |
Cool, thanks!
Debug facility NFC: https://reviews.llvm.org/rG89d210fe1a7a1c6cbf926df0595b6f107bc491d5
size -> extent conversion: https://reviews.llvm.org/rG9b3df78b4c2ab7a7063e532165492e1ffa38d401
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp | ||
---|---|---|
294 ↗ | (On Diff #319940) | I am totally fine with 128, but let us use 64 then: |
@Charusso
It looks like this patch introduced a some weird false positive on PostgreSQL
I'll try to look at it myself and minimize it, but maybe you can get an idea from a full report.
Could you supply us with all the divergence please? If there is only one case, may we can ignore it by appending it to @NoQ's extremely-weird-bugs.txt so people can focus on more important stuff.
What I see in the bug report is that: line 3290 - guc_malloc() returns null and it is a true positive.
- It's the only bug in divergence.
- The analyzer doesn't explain why it thinks that guc_malloc returns null pointer. I find it alarming that it might assume it for all the wrong reasons.
- It is a false positive because guc_malloc is called with elevel == FATAL, which causes ereport to fail the whole execution.
I believe that this problem is important because it is definitely a regression.
Here's a reduced repro - a file that has different behavior before and after the patch (sorry, not perfectly reduced, my creduce is broken again):
// RUN: clang --analyze %s typedef Oid; typedef Pointer; typedef text; errstart(int elevel, filename, lineno, funcname, domain); typedef Datum; typedef struct { Oid elemtype; } ArrayType; *guc_malloc(elevel, size) { void *data; data = malloc(size); if (data == (0)) (errstart(elevel, "c", 3298, __func__, (0)) ? ((errcode((((('0') & 0x3F) < 24))))) : 0); return (errstart(elevel, "c", 3324, __func__, (0)) ? ((errcode(((24))))) : 0); } ParseLongOption(*string, *name, value) { *name = guc_malloc(21, 1); __builtin___strlcpy_chk(*name, string, 1, string[1]); } ProcessGUCArray(ArrayType *array, action) { int i = 0; { (((((array)->elemtype) = 25)), "c", 7599); } for (i = i + 1; ((((char *)(array)) + sizeof(ArrayType)))[0];) { Datum d; char s; char name; char value; d = array_ref(); s = text_to_cstring((text)((Pointer)(d))); ParseLongOption(s, &name, &value); (errstart(19, "c", 7629, __func__, (0)) ? ((errcode(), errmsg(name))) : 0); } }
It's most likely some budget heuristic acting up. By slightly changing stuff that has shouldn't have any effect you can easily eliminate the regression or even reverse it (so that the warning appeared before the patch but not after the patch). The patch definitely changes modeling so it's possible that it causes budgets to be spent differently.
One particular thing i noticed about the behavior after the patch (by observing exploded graph topologies) is that it causes states to be merged more often. Which is expected because where previously we've created a new extent symbol and added a new constraint, currently we simply re-use the existing symbol. This causes 10% improvement in the number of generated nodes (on the above toy example). I also didn't immediately notice any unintended state splits.
I think we should indeed move on. I'm curious which specific budget do we hit but this patch definitely didn't introduce the root cause of the problem, only accidentally surfaced it. We should still investigate the tracking bug though, i.e. why path in guc_malloc() isn't explained.
Thank you guys for investigating it!
What I have seen back in the days is that: uninitialized variables and variables storing NULL are not attached to regions so we cannot map notes to their origin region and we cannot track them. That is where trackExpressionValue() tries to attach notes based on changes in the Store and with full of heuristics. The nature of heuristics and the fight of note-creation and note-suppression Reporters what you see.
If we would prioritize to massage the trackExpressionValue() framework, count me in: I have half year of pai- programming in it, but I am out of the office.
Naming: I believe we should keep using the word "Extent". There's no need to introduce a new term for the existing concept; it makes it harder to explain on the mailing list :) Let's make a follow-up patch to change the naming back (so that not to screw the review).