Page MenuHomePhabricator

[analyzer] Add SufficientSizeArrayIndexingChecker
Needs ReviewPublic

Authored by gamesh411 on Oct 22 2019, 12:15 PM.



Checks for indexing situations, where an array is indexed with a
variable of type not sufficiently large to cover the whole range of the
arrays possible index set. This check is uses path-sensitivity in order
to reason about the possible size of the array being checked.

Diff Detail

Unit TestsFailed

60 mslinux > Clang.Sema::Unknown Unit Message ("")
Script: -- : 'RUN: at line 2'; not /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc /mnt/disks/ssd0/agent/llvm-project/clang/test/Sema/builtin-amdgcn-atomic-inc-dec-failure.cpp -x hip -fcuda-is-device -o - -emit-llvm -triple=amdgcn-amd-amdhsa 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/Sema/builtin-amdgcn-atomic-inc-dec-failure.cpp
790 mslinux > SanitizerCommon-asan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=address -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/asan-x86_64-Linux/Linux/Output/protoent.cpp.tmp
550 mslinux > SanitizerCommon-lsan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/lsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp
700 mslinux > SanitizerCommon-msan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=memory -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/protoent.cpp.tmp
950 mslinux > SanitizerCommon-tsan-x86_64-Linux.Linux::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=thread -m64 -ldl -std=c++11 -O0 -g /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/protoent.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/sanitizer_common/tsan-x86_64-Linux/Linux/Output/protoent.cpp.tmp
View Full Test Results (9 Failed)

Event Timeline

gamesh411 created this revision.Oct 22 2019, 12:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2019, 12:15 PM

Please feel free to add more reviewers.

Please feel free to add more reviewers.

Here you go!

NoQ added a comment.Oct 29 2019, 2:31 PM

I suspect that with this check by choosing path-sensitivity you gain less than you lose. That is, checking how the array pointer is passed around may be nice, but if you sacrificed that, you could have turned this check into a compiler warning which would have had an effect on much larger audience.


What makes this checker C++-specific? All your tests are in plain C.


This is always true (we should probably tell SVal.getAsRegion() to return a const SubRegion *).


This never happens. The chain of superregions never has loops; it is always finite.


This deserves comments on what kinds of regions do you expect to see here. Do i understand correctly that you expect BaseMemRegion to be an ElementRegion and its superregion to be the whole array? 'Cause the former is super unobvious and most likely shouldn't be that way.


I recommend also adding trackExpressionValue() for the array expression. In your "symbolic index handling" tests it would hopefully* allow you to see the execution path within f() as part of the report, which is essential.

*If it doesn't work out of the box, it's most likely a bug in trackExpressionValue(). But generally such tracking needs to be implemented in order to move the checker out of alpha, because path-sensitive reports are worthless when you don't see visitor notes on all the interesting events on the path.

NoQ added inline comments.Oct 29 2019, 2:41 PM

The whole point of bug types is to remain the same regardless of the specific message. They're more like a category. Please use only one bug type for the whole checker (unless you really find different categories of bugs).

steakhal added inline comments.Oct 30 2019, 3:14 AM

Shoudn't we use isa<> if we don't use the resulting pointer?


I think its better to stay consistent regards naming.
Use either number literals expressing numeric values, or spell out its name.
I would prefer the latter like: ConstantOne with type SVal.

gamesh411 updated this revision to Diff 275605.Jul 6 2020, 1:53 AM

change license

gamesh411 updated this revision to Diff 275611.Jul 6 2020, 2:08 AM

move to core.alpha

gamesh411 updated this revision to Diff 275625.Jul 6 2020, 2:47 AM

make single bug-type
fix checker registration

gamesh411 updated this revision to Diff 275678.Jul 6 2020, 5:11 AM

modernize the memory modeling code, still WIP

gamesh411 updated this revision to Diff 275938.Jul 7 2020, 1:31 AM

fix detection logic
fix license header
add missing warning to test

gamesh411 marked 8 inline comments as done.Jul 7 2020, 1:35 AM
gamesh411 added inline comments.

I have removed the whole BugType factory think, that was really stupid in retrospect.


I feel a bit more familiar now with the infrastructure now, and I used getDynamicElementCount as it now an available utility. Thanks for the comments, they were helpful for discovering how things work! :D

gamesh411 marked 4 inline comments as done.Jul 7 2020, 1:36 AM
gamesh411 added inline comments.

Very true, I have moved to core.alpha.

gamesh411 marked an inline comment as done.Jul 7 2020, 1:36 AM

The checker has been updated, the comments and the logic polished. I would still take a stab at this being a ClangSa checker (as opposed to being a Tidy check). What do you think?

balazske added inline comments.Jul 7 2020, 6:48 AM

The bug type can be initialized here:
BT{this, "...", "..."}
How did this compile with only one text argument to constructor? I think the first is a short name of the bug, second is a category that is not omittable. The text that is used here should be passed to the BugReport. BugType::getDescription returns the "name" of the bug that is the first argument. But I am not sure what matters from these texts, the code design looks confusing.


if (isa<IntegerLiteral>(Index)) should be used. IntegerLiteral is a subclass of Expr, not a QualType.


I would use SVB.getBasicValueFactory().getMaxValue(IndexType) to get the max value.


SValBuilder::getArrayIndexType should be better than the UnsignedLongLongTy.


I think SVB.getConditionType() should be used for condition result.


I am not sure if this markInteresting call is correct.

37 ↗(On Diff #275938)

I do not know if it is safe to make such assumptions about sizeof.

gamesh411 marked 7 inline comments as done.Jul 8 2020, 1:43 AM
gamesh411 added inline comments.

I think because it is initialized with a BuiltinBug, which must be a subclass of BugType. I don't really know what should be preferable nowadays, as this code was actually written more than a year ago. Thanks for pointing out that it can be initialized there, I think lazy initialization seems not that important with this particular checker.


The way I have structured the code is very misleading, sorry for that, I will move the type extraction if lower, where it is actually used.


Good point, thanks for pointing out such a method existed in BasicValueFactory!


Seems reasonable :)


I will use that, thanks.


What I wanted to do conceptually is to mark the array itself interesting. I am however not sure whether this is the best way. I will try this and FirstElement itself and maybe look at whether there are some additional notes emitted along the way.

37 ↗(On Diff #275938)

You are definitely right! However it is common as per:

Data models

The choices made by each implementation about the sizes of the fundamental types are collectively known as data model. Four data models found wide acceptance:

32 bit systems:

        LP32 or 2/4/4 (int is 16-bit, long and pointer are 32-bit) 

            Win16 API 

        ILP32 or 4/4/4 (int, long, and pointer are 32-bit); 

            Win32 API
            Unix and Unix-like systems (Linux, macOS) 

64 bit systems:

        LLP64 or 4/4/8 (int and long are 32-bit, pointer is 64-bit) 

            Win64 API 

        LP64 or 4/8/8 (int is 32-bit, long and pointer are 64-bit) 

            Unix and Unix-like systems (Linux, macOS) 

Other models are very rare. For example, ILP64 (8/8/8: int, long, and pointer are 64-bit) only appeared in some early 64-bit Unix systems (e.g. Unicos on Cray).

Only ILP32 has 16 bit ints.
Next idea would be to use fixed-width integer types from cstdint. But tests should not use system headers, and there are mentions in test files to int32_t, howevery they are just typedefs for int. And I think we maintaining a whole standard library headers is a bit too much a hassle.

gamesh411 updated this revision to Diff 276346.Jul 8 2020, 2:19 AM

apply review suggestions

gamesh411 marked 5 inline comments as done.Jul 8 2020, 2:31 AM
balazske added inline comments.Jul 9 2020, 3:06 AM

I had even a plan to remove the BuiltinBug because it looks confusing and does not add real value. New checkers should use BugType.


Probably function SVB.getConstantVal can be used to test if there is a (compile-time) constant passed to the index. But it may be value of a (const) variable.

48 ↗(On Diff #276346)

Does this work in [32 + 1] case?

106 ↗(On Diff #276346)

choice can be here only 1. If it could be 1 or 2 we should get no warning because the array size may be good or bad. But to test that it is enough that choice can have any value, like in test_symbolic_index_handling4.

120 ↗(On Diff #276346)

Here "is a chance that indexing is correct". So no warning should occur?

37 ↗(On Diff #275938)

Still it would be good to check if the test passes on all the buildbots.

1 ↗(On Diff #276346)

I could not find difference between this and the previous test file (except the multi-dimensional arrays are omitted). It would be better to have a single test file without repetition. (Multiple RUN lines in a single file should work).

gamesh411 marked 4 inline comments as done.Jul 17 2020, 1:36 AM

I am now experimenting with the suggestions. The other issues are on the worklist. Thanks for these so far.


Yes, that is a design question, should the checker warn if the index is not a literal but a const variable like:

constexpr char SpecialIndex = 12;
int Array[300] = make_array();
Array[SpecialIndex]; // should we warn in this case? because now we do,
                     // as the indexing expression is not a literal.
48 ↗(On Diff #276346)

It should, I will add a case for it.

106 ↗(On Diff #276346)

Yes, this test is a bit redundant, should I remove it in your opinion?

37 ↗(On Diff #275938)

I'm afraid we can only test that by committing, as there is no other way to submit a build-job ( not that I am aware of, any information in that regard is welcome :) ).

gamesh411 updated this revision to Diff 278703.Jul 17 2020, 3:10 AM

move tests to one file

gamesh411 updated this revision to Diff 278727.Jul 17 2020, 5:00 AM
gamesh411 marked 3 inline comments as done.

extend test cases
add comments to non-obvious cases

gamesh411 added inline comments.Jul 17 2020, 5:07 AM
120 ↗(On Diff #276346)

I think as there is a branch where it is provably wrong, this is correct. It is just not obvious from the diag message which branch is taken, and consequently which array is being indexed. I think there is no way to address this from inside the checker.