What had been done: - Passing a proper QualType for the initlist conversion - Compatibility fixes for unit tests introduced. - Various contingencies added to make sure the new code will be invoked only where it is needed. - New tests added to cover some cases in which new code will work properly. The cases where it will fail were not added intentionally as proper solution requires less localized changes. - Account for the case when source type info unavailable. NB: Negative test cases were not added as problem was not fully fixed and interlaced combinations of compound structures will still cause crash, like union in array or array in struct containing array with struct.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Hi, looks very interesting! We definitely have troubles with some initializer lists.
It looks like you're primarily interested in the alpha/unsupported checker, but your patch probably affects behavior of the default setup as well. If you've found any difference in results for the default setup, please include a test case! These are much more valuable. You can often use tools like creduce/cvise to reduce entire real-world translation units to tiny test cases, under the test predicate that "two compiler versions produce different output".
clang/lib/StaticAnalyzer/Core/RegionStore.cpp | ||
---|---|---|
564–565 | There's a bit of a confusing tradition here, mostly documented in my unsorted long rants in earlier patches, let me try to explain. RegionStore was built to recognize that type punning is a thing. You can write a value while treating the target location/region R as if it's supposed to hold a value of type T₁ and then load the value by treating the same memory location as if it holds a value of type T₂≠T₁. You can achieve it by having unions or by reinterpret-casting pointers. RegionStore needs to handle this reasonably well, no matter what T₁ and T₂ are. Even though strict aliasing rules put some restrictions on what T₁ and T₂ could be, the analyzer is still required to work correctly when the programmer actively relies on -fno-strict-aliasing. On top of that, every MemRegion R can have a type T(R) associated with it (accessed as R->getValueType()). This was probably a bad idea from the start, for at least three reasons:
This makes the type of MemRegion largely immaterial and hard to rely upon. Which is why State->getSVal(R) accepts an optional parameter T₂ which may be different from both T₁ and T(R). However, after some back-and-forth bugfixing, we've settled on agreeing that when T(R) exists, it doesn't make sense to pass any T₂ into State->getSVal(R, T₂) other than T₂=T(R). Basically, if R already carries a type, then you don't have a good reason to pass a different type as T₂, given that you can always pass a different region R' instead, that represents the same memory location as R but has the desired value type T₂. So T₂ is only required when it's fundamentally impossible to discover the type from the region. So in order to eliminate the confusion, getSVal() tries to get rid of the extra parameter T₂ as early as possible, and have a single source of truth going further down the call stack. I honestly no longer think this is a healthy strategy long-term. Ideally I think we should get rid of region types entirely; they cause more problems than they solve. But that's a much bigger change, and I'm worried that a partial step in this direction, not supported by any strategic vision, will only multiply confusion. So, long story short, given that ElementRegions are always typed, I'm wondering if it's easier to simply recast the region in your case as well, instead of reintroducing such duplication of types. | |
1920 | The lifetime of InitList is the same as lifetime of factory. Because factory is a local variable, this is going to be a use-after-free. You need to use the factory that lives inside svalBuilder instead, accessible through methods getEmptySValList () and prependSVal() of svalBuilder.getBasicValueFactory(). |
clang/lib/StaticAnalyzer/Core/RegionStore.cpp | ||
---|---|---|
1787 | I assume targetType is the type we want to interpret the region as. Below this condition we seem to work with arrays. If targetType is an array, then we return something here instead of going further and returning something else we probably want. Why aren't we going further in that case? | |
1869 | What about classes? class A { public: int x; }; struct B { int x; } A and B are technically the same, but A will fall into the true branch, B will fall into the false branch. | |
1891 | This crashes if ElemExpr is a nullptr. | |
1914 | Consider moving this into the for loop to avoid confusion. |
clang/lib/StaticAnalyzer/Core/RegionStore.cpp | ||
---|---|---|
564–565 | You are right. | |
1920 | Thank you for the great pointer. It makes sense to make changes this way. |
clang/lib/StaticAnalyzer/Core/RegionStore.cpp | ||
---|---|---|
1787 | A good comment and the answer would be because i wanted to limit my changes to the case of the struct/array of structs/nested structs only. To properly implement support of all cases the changes should be more global and more intrusive. My goal was to limit a scope and make baby step improvement. | |
1869 | Classes are currently not supported and were not tested. But adding class support would be no brainier here. However i wanted to limit the scope of the changes. | |
1891 | Accepted. nullptr should never pop up with correctly constructed initialization list in the line 1890, but nobody promised me to have a correct initialization list here. | |
1914 | If i am to do it i'll need to move line 1883 into the for loop as well for consistency making for loop bulky and cumbersome. Do you really think it will make code easier to understand? |
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
432 | I did some digging into the other parts of the checker and I'm not sure that we guarantee that every time For example if you look at CStringChecker::evalMemcmp(), it has the following content: // int memcmp(const void *s1, const void *s2, size_t n); CurrentFunctionDescription = "memory comparison function"; AnyArgExpr Left = {CE->getArg(0), 0}; AnyArgExpr Right = {CE->getArg(1), 1}; SizeArgExpr Size = {CE->getArg(2), 2}; ... State = CheckBufferAccess(C, State, Right, Size, AccessKind::read, CK); State = CheckBufferAccess(C, State, Left, Size, AccessKind::read, CK); CStringChecker::CheckBufferAccess() will eventually call your code and in the first case, you will deduce the type | |
clang/lib/StaticAnalyzer/Core/RegionStore.cpp | ||
564–565 |
I think you can only do this outside RegionStore. The problem is that RegionStore only knows about the beginning of Consider the following example: int arr[5] = {1, 2, 3, 4, 5}; You can see that the array has 5 elements and the initializer list also has 5 elements, so the array is initialized properly. What you might try doing here is that you create like a set that tracks every region that has been initialized and use it E.g.: void false_positive() { int src[] = {1, 2, 3, 4}; <- add `src` to the set, so you mark it initialized int dst[5] = {0}; <- add `dst` to the set, so you mark it initialized memcpy(dst, src, 4 * sizeof(int)); // false-positive: // The 'src' buffer was correctly initialized, yet we cannot conclude // that since the analyzer could not see a direct initialization of the // very last byte of the source buffer. ^ Analyzer fails to recognize that the buffer is initialized, so it attempts to report a warning. Before you let that happen you look up `src` in the set that contains initialized buffers and decide if the warning is valid or not. } You might need to store more information, like the length of the array, but this approach might still improve the false positive rate. ASTMatchers will be helpful if you think it worths pursuing. The same approach can be used for structs but then you compare init list elements with the number of fields | |
1891 | You checked it on line 1848 too, that's why I thought I point this out. | |
1914 | We do something similar to this loop in ExprEngine::VisitLambdaExpr(), I think it is perfectly readable but you can leave it like this if you want. |
clang-format not found in user’s local PATH; not linting file.