This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Use std::optional instead of custom "empty" state
ClosedPublic

Authored by donat.nagy on Apr 26 2023, 7:34 AM.

Details

Summary

This commit eliminates the uninitialized error state from the class RegionRawOffsetV2 (which is locally used by the Clang Static Analyzer checker alpha.security.ArrayBoundV2) and replaces its use with std::optional.

Motivated by https://reviews.llvm.org/D148355#inline-1437928

Moreover, the code of RegionRawOffsetV2::computeOffset() is rearranged to clarify its behavior. The helper function getValue() was eliminated by picking a better initial value for the variable Offset; two other helper functions were replaced by the lambda function Calc() because this way it doesn't need to take the "context" objects as parameters.

This reorganization revealed some surprising (but not outright buggy) behavior that's marked by a FIXME and will be revisited in a separate commit.

Diff Detail

Event Timeline

donat.nagy created this revision.Apr 26 2023, 7:34 AM
Herald added a project: Restricted Project. · View Herald Transcript
donat.nagy requested review of this revision.Apr 26 2023, 7:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2023, 7:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal accepted this revision.Apr 26 2023, 8:04 AM

Approved, assuming we prefer std::nullopt over the default construction of std::optional and you clang-format the affected hunks, such as the declaration of computeOffset.
Make sure the commit message complies with:

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
305–306

I'd recommend using the more explicit version for such cases.

This revision is now accepted and ready to land.Apr 26 2023, 8:04 AM
donat.nagy edited the summary of this revision. (Show Details)

As I was finalizing this commit, I noticed that the logic of RegionRawOffsetV2::computeOffset() is presented in a crazy unconventional way; so I rewrote it (while preserving the original behavior).

The most convoluted part was that the old code initialized the offset variable to UnknownVal(), then compensated for this choice by introducing a function getValue that always substituted zero when it encountered this UnknownVal() (no other effect placed UnknownVal() into that variable without immediately returning afterwards). I suspect that this initial value was introduced to avoid the analysis of load expressions that do not involve any array indexing (i.e. don't have ElementRegion layers), but that logic was broken by a commit in 2011.

@steakhal Can you provide another review for this extended version of the patch?

Sure, Ill look at this on Tuesday.

donat.nagy marked an inline comment as done.Apr 28 2023, 9:12 AM

Thanks!

@steakhal Could you review this change when you have some time for it?

I'm planning to stabilize and de-alpha this checker with a series of incremental changes; and it'd be good have this commit and the unrelated tweak https://reviews.llvm.org/D149460 merged before I start working on the next steps. My plans for the near future:

  • Replace the prototype-level error reporting with real, detailed, useful messages. Implementing this may require some code reorganization to preserve and pass around the details information.
  • Perform an early return when there is no real indexing (there are no ElementRegion layers in the load operation).
  • Reconsider the handling of multidimensional arrays: the current code (in fact the function computeOffset that I'm refactoring there) says that arr[0][10] is a valid way to index int arr[4][5] which not what the users would expect (in this simple case, even clang-tidy knows that "array index 10 is past the end of the array", but ArrayBoundsV2 implements complex logic to compare the index with the whole memory region allocated for the array). I'm leaning towards switching to "normal" behavior here.
  • Report situations like struct foo {int field;} arr[5]; arr[10].field = 123; -- the current implementation doesn't "see" these because this is writing a FieldRegion (which happens to be inside an ElementRegion, but the checker doesn't look for that that).

I'm also open to feedback/suggestions connected to these plans. It's likely that the code added by the current NFC commit will be rewritten by the followup changes, but I think it's still useful to merge it, because it documents the switch from the old implementation.

@steakhal Could you review this change when you have some time for it?

I didn't forget about this one. It's in progress, but it's difficult to squeeze this in. The ETA is probably tomorrow.

I'm planning to stabilize and de-alpha this checker with a series of incremental changes; and it'd be good have this commit and the unrelated tweak https://reviews.llvm.org/D149460 merged before I start working on the next steps.

Good luck with moving this out. It's gonna be bumpy.

My plans for the near future:

  • Replace the prototype-level error reporting with real, detailed, useful messages. Implementing this may require some code reorganization to preserve and pass around the details information.

IMO this is the most important step, yes.

  • Perform an early return when there is no real indexing (there are no ElementRegion layers in the load operation).
  • Reconsider the handling of multidimensional arrays: the current code (in fact the function computeOffset that I'm refactoring there) says that arr[0][10] is a valid way to index int arr[4][5] which not what the users would expect (in this simple case, even clang-tidy knows that "array index 10 is past the end of the array", but ArrayBoundsV2 implements complex logic to compare the index with the whole memory region allocated for the array). I'm leaning towards switching to "normal" behavior here.

In fact, I believe it's UB to index an int arr[4][5] multi-dimensional array like arr[0][10].

  • Report situations like struct foo {int field;} arr[5]; arr[10].field = 123; -- the current implementation doesn't "see" these because this is writing a FieldRegion (which happens to be inside an ElementRegion, but the checker doesn't look for that that).

+1

I'm also open to feedback/suggestions connected to these plans. It's likely that the code added by the current NFC commit will be rewritten by the followup changes, but I think it's still useful to merge it, because it documents the switch from the old implementation.

I'm looking forward to these change. Count me for the reviews.
Also, note that this is in production here, so it will take me time to verify the patches. So, I'd probably prefer to have a patch stack, and do the evaluation for some selected patches - basically bundling the stack into a few (1-2) checkpoints.
It would mean that the reviews would be done for each revision, but the verification (prior to landing anything) would be done at the selected stages.
WDYT?

donat.nagy added a comment.EditedMay 3 2023, 10:43 AM

It's gonna be bumpy.

Yup :) but it's not impossible...

Also, note that this is in production here, so it will take me time to verify the patches. So, I'd probably prefer to have a patch stack, and do the evaluation for some selected patches - basically bundling the stack into a few (1-2) checkpoints.
It would mean that the reviews would be done for each revision, but the verification (prior to landing anything) would be done at the selected stages.
WDYT?

It would be effective to adopt a workflow where the individual commits are quickly reviewed for design directions and visible code quality issues (to avoid dead ends and keep the rebasing on a manageable level); but the complete verification and the upstreaming is done later, in bundles (to avoid wasting time on repeated verifications). Feel free to give "Seems fine for now; start working on the next commit; this will be verified and merged later." reviews after handling the bulk of obvious issues. (The "Seems fine for now" declarations don't need to be completely final, I'll fix code quality issues even if you only spot them during the final verification.)

steakhal accepted this revision.May 3 2023, 11:43 PM

We only had a single new issue after this patch. This is well within what I would be comfortable with. I'll investigate the case anyway. It's likely something flaky.
Feel free to land this in the meantime.

We only had a single new issue after this patch. This is well within what I would be comfortable with. I'll investigate the case anyway. It's likely something flaky.

It did not reproduce, so it was a fluke.

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
343–345

FYI, you can unconditionally cast it. The assertion wouldn't fire - at least it didn't for us.

This revision was landed with ongoing or failed builds.May 4 2023, 3:58 AM
This revision was automatically updated to reflect the committed changes.
donat.nagy added inline comments.May 4 2023, 4:04 AM
clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
343–345

I only noticed this after merging my change, but this won't be relevant, because the followup commits will completely restructure this part of the code.