This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] NFCi: Refactor CStringChecker: use strongly typed internal API
ClosedPublic

Authored by steakhal on Feb 18 2020, 4:25 PM.

Details

Summary

CStringChecker is a huge beast.

My effort in improving the analyzer regarding taint analysis is humbled by multiple factors.
I wanted to extend the diagnostics of the CStringChecker with taintedness.

In the long run, the diagnostic emitting parts of the GenericTaintChecker would be migrated to multiple checkers, leaving it's responsibility only to *model* taint propagation.
Eg. the GenericTaintChecker::checkTaintedBufferSize functionality will be mostly part of the CStringChecker.

This plan requires the CStringChecker to be refactored to support a more flexible reporting mechanism.

This patch does only refactorings, such:

  • eliminates always false parameters (like WarnAboutSize)
  • reduces the number of parameters
  • makes strong types differentiating *source* and *destination* buffers (same with size expressions)
  • binds the argument expression and the index, making diagnostics accurate and easy to emit
  • removes a bunch of default parameters to make it more readable
  • remove random const char * warning message parameters, making clear where and what is going to be emitted

Note that:

  • CheckBufferAccess now checks *only* one buffer, this removed about 100 LOC code duplication
  • not every function was refactored to use the /new/ strongly typed API, since the CString related functions are really closely coupled monolithic beasts, I will refactor them separately
  • all tests are preserved and passing; only *the message changed at some places*. In my opinion, these messages are holding the same information.

I would also highlight that this refactoring caught a bug in clang/test/Analysis/string.c:454 where the diagnostic did not reflect reality. This catch backs my effort on simplifying this monolithic CStringChecker.

Diff Detail

Event Timeline

steakhal created this revision.Feb 18 2020, 4:25 PM
steakhal updated this revision to Diff 245300.Feb 18 2020, 4:27 PM
steakhal edited the summary of this revision. (Show Details)

Upload the right diff.

NoQ accepted this revision.Feb 19 2020, 3:24 AM

This is fantastic, i love it!

all tests are preserved and passing; only *the message changed at some places*. In my opinion, these messages are holding the same information.

You make it sound like an accident; i encourage you to have a closer look to make sure that there are no other effects than the ones observed on tests.

clang/test/Analysis/string.c
454

Nice catch indeed!~

This revision is now accepted and ready to land.Feb 19 2020, 3:24 AM
steakhal marked an inline comment as done.Feb 19 2020, 4:32 AM
In D74806#1882300, @NoQ wrote:

This is fantastic, i love it!

all tests are preserved and passing; only *the message changed at some places*. In my opinion, these messages are holding the same information.

You make it sound like an accident; i encourage you to have a closer look to make sure that there are no other effects than the ones observed on tests.

How should I look for differences/regressions without extending and validating systematically the test cases we have for this checker? That seems to be a huge work since bstring.c has 500+ and the string.c has 1600+ lines.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
37

I'm not marking it const in this patch since the 496th line swaps two instances of this class, so it cannot be done without refactoring that function.

494–498

It is really interesting that it swaps the arguments in some sense.

I'm pretty sure that now if a diagnostic is emitted after this swap, the argument indexes used to construct the diagnostic message would mismatch. I will dig into this later, probably refactor the whole function to make the intentions clear. What do you think? @NoQ

1580–1587

This part is really interesting.

Using strong types highlighted this particular code. Since SizeArgExpr is conceptually different than a SourceArgExpr this code did not compile without additional effort.

Previously it passed the size argument if it were given, but if the function were unbounded than it passed the source expression where size was expected, which does not make any sense.

Since I plan to refactor the whole cstring related functions, not just the memset, memcpy etc. I tried to leave it as it were, to preserve the current (broken) semantics of the function.

This revision was automatically updated to reflect the committed changes.