This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] CERT STR rule checkers: STR30-C
Needs RevisionPublic

Authored by Charusso on Dec 6 2019, 4:21 PM.

Details

Summary

This checker is implemented based on the following rule:
https://wiki.sei.cmu.edu/confluence/display/c/STR30-C.+Do+not+attempt+to+modify+string+literals

It warns on misusing the following functions: strpbrk(), strchr(),
strrchr(), strstr(), memchr().

Diff Detail

Event Timeline

Charusso created this revision.Dec 6 2019, 4:21 PM

Examples generated by CodeChecker from the curl project:

NoQ added inline comments.Dec 9 2019, 7:23 PM
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2109 ↗(On Diff #232674)

Why does it matter whether the region is symbolic or not?

2110 ↗(On Diff #232674)

Mmm, that's not a correct return value for these functions. These functions don't simply pass through their first argument.

2122 ↗(On Diff #232674)

Why "heap"?

Charusso marked 2 inline comments as done.Dec 9 2019, 7:41 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2109 ↗(On Diff #232674)

Hm, I think you are right, we could omit that if-statement. I wanted to specify to reuse conjured symbols.

2110 ↗(On Diff #232674)

Yes, but we need some index here. It requires a NonLoc, so I just randomly picked the first index, but I like the idea of an unknown index. Would we like to introduce UnknownVal for indices?

2122 ↗(On Diff #232674)

Well, a string which length is at least 16 characters long is going to be allocated on the heap. I have to conjure the string here to create its element.

NoQ added inline comments.Dec 9 2019, 7:48 PM
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2110 ↗(On Diff #232674)

Use the correct region but conjure the index.

2122 ↗(On Diff #232674)

o.o

void foo() {
  // This string is 20 characters long
  // but it's clearly on the stack.
  char str[] = "12345678901234567890";
  // This one is therefore also on the stack.
  char *ptr = strchr(str, '0');
}
Charusso updated this revision to Diff 233905.Dec 13 2019, 6:53 PM
Charusso marked 6 inline comments as done.
  • Try to conjure the index of the 'ElementRegion'.
Charusso retitled this revision from [analyzer] CERT: StrChecker: 30.c to [analyzer] CERT: STR30-C.EditedDec 13 2019, 7:01 PM
Charusso marked 3 inline comments as done.

In order to bypass [1] the CK_LValueToRValue evalCast() we have to create en ElementRegion as a return-value of the problematic function call. In that case for a mythical reason we miss the fact the pointer is nullable. I have not figured out yet why, but tried to create an appropriate return-value using a VarRegion so we could refer to the underlying constant string's name.

[1] bypass: If I tried to evaluate the call by returning a pointer to the element it immediately wraps the SymbolicRegion into an ElementRegion, like with HeapSpaceRegions it is useful, but for us it is bad.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2122 ↗(On Diff #232674)

Well, a string which length is at least 16 characters long is going to be allocated on the heap. I have to conjure the string here to create its element.

I really felt that the std::string should behave like the C-strings, but C-strings are on the stack whatever it takes, yes, my bad. Thanks for pointing that out!

clang/test/Analysis/cert/str30-c-notes.cpp
30

Needs to be an assumption.

NoQ added a comment.Feb 3 2020, 7:46 AM

Let's separate CStringChecker improvements into a separate patch and have a separate set of tests for it.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2097–2098 ↗(On Diff #233905)

Ok, so this whole thing is trying to evaluate strchr-like functions, right? I've no idea what it does but the problem is much more trivial that what you're trying to do here.

Branch 1:

  1. Conjure the offset.
  2. Add it to the original pointer (using evalBinOp).
  3. Bind the result.

Branch 2:

  1. Bind null.

And you probably should drop branch 2 completely because usually there's no indication that the character may in fact be missing in the string, and i don't want more null dereference false alarms. So it's just branch 1.

So the whole function should be 3 lines of code (maybe with a couple line breaks).

Well, maybe you should also handle the case where the original pointer is null (not sure if it's an immediate UB or not).

This could be improved by actually taking into account the contents of the string, but you don't seem to be trying to do this here.

2101–2103 ↗(On Diff #233905)

So, like, StateNull is the state in which it is assumed that 0 is non-zero and State is the state in which it is assumed that 0 is zero?

I mean, apart from the naming flip-flop - i can tell you in advance that 0 is zero, it's not a matter of assumptions.

2127–2128 ↗(On Diff #233905)

This is guaranteed to return the string region that you already have as the value of that expression.

Charusso updated this revision to Diff 242159.Feb 3 2020, 12:36 PM
Charusso marked 8 inline comments as done.
  • Remove the modeling from CStringChecker.

Thanks! I have felt that may I create a too complex symbolic value. Sorry for stealing your time.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2097–2098 ↗(On Diff #233905)

I've no idea what it does

I wanted to represent the root string's VarDecl in the symbolic value so I could refer to it. I think it became too complex and printing out the root string's variable name does not worth that complexity. But here it is from str30-c-explain.cpp:

char *slash;
slash = strrchr(pathname, '/');

clang_analyzer_dump(slash);
// CHECK: &Element{pathname,conj_$1{long long, LC1, S{{[0-9]+}}, #1},char}

clang_analyzer_explain(slash);
// CHECK: pointer to element of type 'char' with index 'symbol of type 'long long' conjured at statement 'pathname'' of parameter 'pathname'

This could be improved by actually taking into account the contents of the string, but you don't seem to be trying to do this here.

At first the name of the string would be cool to print out and we need dataflow support to make it cool. Given that it is too complex I have reverted the modeling part. Also I wanted to create a simpler modeling with your modeling solution for my name-storing purposes, but I cannot.

2101–2103 ↗(On Diff #233905)

I wanted to force out the state split of an unknown indexing: null and non-null but may it was too explicit and useless, yes.

2127–2128 ↗(On Diff #233905)

Whoops. The idea was that to handle string regions explicitly and may calculate the returning index as well.

clang/test/Analysis/cert/str30-c-notes.cpp
30

Let us say it is non-null.

Charusso updated this revision to Diff 265963.May 24 2020, 9:32 PM
Charusso retitled this revision from [analyzer] CERT: STR30-C to [analyzer] CERT STR rule checkers: STR30-C.
  • Refactor.
In D71155#1854908, @NoQ wrote:

Let's separate CStringChecker improvements into a separate patch and have a separate set of tests for it.

I was thinking about creating tests, but I cannot figure out any better testing than a live checker that uses such features of CStringChecker. Also I have not found any real world issues with the checker, so I could not really test its real behavior.

baloghadamsoftware requested changes to this revision.Jun 29 2020, 9:35 AM
baloghadamsoftware added inline comments.
clang/docs/analyzer/checkers.rst
1942

STR30-C is more general: Do not attempt to modify string literals. You should not check for these functions specifically, just model that they return string literals. However, you can also declare a string literal yourself. The checker should look for modifications of any string literals, whether returned by these functions or not.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
312

I do not see V used in this function anywhere. We are not interested to what it is attempted to be modified. We are just interested that it is attempted to be modified.

This revision now requires changes to proceed.Jun 29 2020, 9:35 AM