Page MenuHomePhabricator

[analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration.
Needs ReviewPublic

Authored by ASDenysPetrov on Jun 15 2021, 3:32 AM.

Details

Summary

We can use direct index to retrieve a value from multi-dimensional array relying on a fact that static constant array is a solid block of memory. Iterate through a list initialization using a raw index to get the value.

Example:

const int arr[2][2] = {{1}, {3, 4}};
const int *ptr = (int *)arr;
ptr[0]; // 1
ptr[1]; // 0
ptr[2]; // 3
ptr[3]; // 4

Fixes: https://bugs.llvm.org/show_bug.cgi?id=50604

Diff Detail

Unit TestsFailed

TimeTest
3,140 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,780 msx64 debian > libarcher.parallel::parallel-simple.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp -latomic && env OMP_TOOL_VERBOSE_INIT=stderr env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp.log 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c --check-prefixes CHECK,TSAN_ON
3,060 msx64 debian > libarcher.parallel::parallel-simple2.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c
2,890 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,970 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (15 Failed)

Event Timeline

ASDenysPetrov created this revision.Jun 15 2021, 3:32 AM
ASDenysPetrov requested review of this revision.Jun 15 2021, 3:32 AM

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

I'm not sure about whether or not this patch would only work for constant arrays with initializer lists. If it does only work for such arrays, then I wonder whether the fix is broad enough -- I haven't tested (yet), but I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings about garbage/undefined values. I'll try it tomorrow morning...

I tested with an unpatched build using a reproducer without an initializer list, and didn't get the spurious warnings -- so your approach seems correct to me now. I've also tested your patch and I believe it gives correct behavior.

clang/lib/AST/Type.cpp
149
clang/include/clang/AST/Expr.h
4971

I think in most (all?) other methods in this class, array indices are unsigned in the API. If the array index itself comes from an expression that is negative (i.e., a literal negative integer, or an constant expression that evaluates to a negative number), that has to be handled correctly, but I'm not sure this is the right place to do it. As this code stands, if an integer literal used used, which is greater than LONG_MAX, but less than ULONG_MAX, it will be end up being treated as invalid in this method, won't it?

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679

I see where you got the int64_t from -- that's what getSExtValue() returns. So, if the literal const index value in the expr is greater than LONG_MAX (but less than ULONG_MAX, of course), this would assert. That seems undesirable....

I've looked this over and tested it locally, and I'm pretty sure it's a good patch. If it were solely up to me, I'd accept this patch as-is. I don't think I should assume I have enough experience in this area though... @NoQ , could you take a look over this, and accept it if you think it's safe and reasonable?

@chrish_ericsson_atx

Sorry for the late reply. Thank you for reveiwing this.

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I also have to mention one point of what this patch do more. Consider next:

int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>

As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

clang/lib/AST/Type.cpp
149

+1

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1670–1679

That's a great catch! I'll make changes soon.

I think the presence of the initializer list in the test case is not necessary to trigger the spurious warnings

Could you please provide some test cases that you think will uncover other issues. I'll add them to the test set.

I tested locally with this patch and found that my guess was incorrect-- I couldn't trigger the incorrect behavior without an initializer list. So I think your code and testcases are good as they are!

I also have to mention one point of what this patch do more. Consider next:

int const arr[2][2] = {{1, 2}, {3, 4}}; // global space
int const *ptr = &arr[0][0];
ptr[3]; // represented as ConcreteInt(0) 
arr[1][1]; // represented as reg_$0<int Element{Element{arr,1 S64b,int [2]},1 S64b,int}>

As you can see, now the access through the raw pointer is more presice as through the multi-level indexing. I didn't want to synchronyze those retrieved values both to be reg_$0. I've seen a way to handle it more sophisticatedly.
I'm gonna do the same for the multi-level indexing (aka arr[1][2][3]).

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

@chrish_ericsson_atx

I don't understand -- probably I don't have enough experience with analyzer state dumps to know what I should find surprising or better in this example.

Simply saying, now ptr[3] returns value 4 as expected, but arr[1][1] still returns unknown symbol.
Previously ptr[3] also returned unknown symbol, basically matched with what arr[1][1] returns.
After my patch ptr[3] started to return 'undefined' and therefore asserted.
I've made a fix which improved the behavior for ptr[3] but remained arr[1][1] as is.

chrish_ericsson_atx added a comment.EditedJun 25 2021, 1:00 PM

While this patch resolves the issue captured in https://bugs.llvm.org/show_bug.cgi?id=50604, it actually introduces a *new* bug. Perhaps this is what you were alluding to? Here's a reproducer which doesn't fail on main (with or without the problematic b30521c28a4d commit), but it *does* fail with this proposed patch:

eahcmrh@seroius03977[21:50][repo/eahcmrh/ltebb]$ cat ~/pr50604-newbug.c 
static float const dt[12] =
{
  0.0000, 0.0235, 0.0470, 0.0706, 0.0941, 0.1176,
  0.1411, 0.1646, 0.1881, 0.2117, 0.2352, 0.2587
};
void foo(float s)
{
  (void)( s + dt[0]) ;
}
eahcmrh@seroius03977[21:57][repo/eahcmrh/ltebb]$ /proj/bbi/eahcmrh/arcpatch-D104285/compiler-clang/bin/clang -Xanalyzer -analyzer-werror --analyze ~/pr50604-newbug.c 
/home/eahcmrh/pr50604-newbug.c:8:13: error: The right operand of '+' is a garbage value [core.UndefinedBinaryOperatorResult]
  (void)( s + dt[0]) ;
            ^ ~~~~~
1 error generated.

I'll upload this reproducer to the bug report as well.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

@chrish_ericsson_atx
Thanks for the new test case. I'll handle it ASAP.

To be clear, neither this new reproducer nor the one I originally posted fail if commit b30521c28a4d is reverted. Is it worth considering reverting that commit until a patch that addresses the original problem and doesn't introduce these new regressions is available?

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

I don't think we should revert b30521c28a4d because it corrects symbol representation in CSA and fixes two bugs: https://bugs.llvm.org/show_bug.cgi?id=37503 and https://bugs.llvm.org/show_bug.cgi?id=49007. Another point of non-revert is that your cases were previously hidden in CSA core and that's good to find them. I'm afraid it's a dubious idea to return back old bugs in favor of not seeing new ones.

Valid point. Thanks for considering the question. I agree it's better to move forward and get this patch working. :) I just wish I had the bandwidth to help more...

Fixed a case mentioned by @chrish_ericsson_atx. Added the cases to the common bunch.

@chrish_ericsson_atx
OK. I think I found the issue. Could you please check whether it works for you?

Fixed concern about index type being either int64_t or uint64_t.

I like the idea and I think this is a valuable patch. However, because of the changes under lib/AST we need to add other reviewers who are responsible for those parts (e.g. aaronballman or rsmith). Is there really no way to workaround those changes? E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex?

ASDenysPetrov retitled this revision from [analyzer] Retrieve value by direct index from list initialization of constant array declaration. to [analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration..
ASDenysPetrov added a comment.EditedThu, Jul 29, 3:32 PM

@martong
I've added new reviewers, thanks for the prompt.

E.g. could we have a free function outside of the InitListExpr to implement getExprForConstArrayByRawIndex

It is possible, but I think this is more natural for the instance of InitListExpr to be responsible for such traversion.