This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add path note tags to standard library function summaries.
ClosedPublic

Authored by NoQ on Mar 22 2022, 10:00 PM.

Details

Summary

This is a solution to the issue with getenv() (https://github.com/llvm/llvm-project/issues/53276) but I covered a few more functions just because I could.

The patch is straightforward except the tiny fix in BugReporterVisitors.cpp that suppresses a default note for "Assuming pointer value is null" when a note tag from the checker is present. This is probably the right thing to do but also definitely not a complete solution to the problem of different sources of path notes being unaware of each other, which is a large and annoying issue that we have to deal with. Note tags really help there because they're nicely introspectable. The problem is demonstrated by the newly added getenv() test; I did not investigate why doesn't the original buggy report have the same note but I agree that this might be interesting to figure out.

The notes are currently optional but I think we should eventually implement all of them and then make them mandatory.

The notes are prunable, i.e. they won't bring-in entire stack frames worth of notes just because they're there, but they will be always visible regardless of whether the value is of interest to the bug report. I think this is debatable, the arguably better solution is to make them non-prunable but conditional to the value being tracked back to the call, which would probably need a better tracking infrastructure.

Diff Detail

Event Timeline

NoQ created this revision.Mar 22 2022, 10:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 10:00 PM
NoQ requested review of this revision.Mar 22 2022, 10:00 PM
NoQ added inline comments.Mar 22 2022, 10:02 PM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
41

This comment has been out of date for years and I don't think it makes sense to have it in the first place.

1215–1217

I'm not adding notes to these ranges because these days I think they were a mistake. A three-way (sometimes four-way) state split is pretty hard to justify. It's much safer to drop the case and stick to the two ASCII cases.

The notes are prunable, i.e. they won't bring-in entire stack frames worth of notes just because they're there, but they will be always visible regardless of whether the value is of interest to the bug report. I think this is debatable, the arguably better solution is to make them non-prunable but conditional to the value being tracked back to the call, which would probably need a better tracking infrastructure.

I was thinking of passing a lambda and doing the rest there. We could have lambda factories to make it less cumbersome to define - and also reuse code.

IMO this is a nice increment.

LGTM on my end, this is awesome!

The notes are prunable, i.e. they won't bring-in entire stack frames worth of notes just because they're there, but they will be always visible regardless of whether the value is of interest to the bug report. I think this is debatable, the arguably better solution is to make them non-prunable but conditional to the value being tracked back to the call, which would probably need a better tracking infrastructure.

I was thinking of passing a lambda and doing the rest there. We could have lambda factories to make it less cumbersome to define - and also reuse code.

Could you think of a scenario where a lambda would be required? It sure is more general, but I don't immediately see the gain.

NoQ planned changes to this revision.Mar 23 2022, 8:15 PM

Ok there's actually a huge bug in this patch, namely we can't say "Assuming..." if there's no state split (i.e., when we know from the start which branch is taken so we don't have to assume). I'll fix.

The notes are prunable, i.e. they won't bring-in entire stack frames worth of notes just because they're there, but they will be always visible regardless of whether the value is of interest to the bug report. I think this is debatable, the arguably better solution is to make them non-prunable but conditional to the value being tracked back to the call, which would probably need a better tracking infrastructure.

I was thinking of passing a lambda and doing the rest there. We could have lambda factories to make it less cumbersome to define - and also reuse code.

Yes sure, that's exactly how note tags are designed to work, but that lambda still needs to know whether the value is tracked. I guess I'll take a look at how well interestingness performs in these examples, maybe it's worth having from the start.

NoQ updated this revision to Diff 420701.Apr 5 2022, 9:23 PM
  • Eliminate notes on known paths.
  • Add some documentation for the new class.
    • Mostly moved from existing documentation for the entire summary class.
    • Fix incorrect assessment that Exploded Graph is a tree. It's not even acyclic!
  • Fix clang-format warnings.
NoQ added a comment.Apr 5 2022, 9:27 PM

I think interestingness definitely requires more work. In particular, in null dereference from getenv(), the note should be unprunable. But we can't control prunability dynamically yet, so it requires a bit more work. So I think this patch is ok to go and I'll hopefully follow up with more interestingness logic.

steakhal added inline comments.Apr 6 2022, 1:50 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
872–884

I thought we mostly have one or two cases. If we have two cases, then the constraint should be the opposite of the other one.

If this is the case, when does NewState && NewState != State not imply Node->succ_size() > 1 given that Summary.getCases().size() >= 2?

1326–1330

Why don't we have notes for these cases?

clang/test/Analysis/std-c-library-functions-path-notes.c
4

This is the default value. What's the benefit of passing this?

balazske added inline comments.Apr 6 2022, 2:52 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
400

It would be good to test how this looks in doxygen output. If it is for generated documentation formatting tags can be used, otherwise this can be a // comment.

879

Can other checkers not add successor nodes (that make this check not always correct)?

879

Is there a reason against add the note without the word "Assuming" instead of no note?

NoQ added inline comments.Apr 12 2022, 12:24 PM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
872–884

Typically this is going to be the case but I wouldn't rely on that. The constraint solver may get confused (inb4 "system is over constrained"), there may be other updates to the state in the branch definition which would cause the state to change.

879

Can other checkers not add successor nodes (that make this check not always correct)?

They'll be chained to the newly created node, not to the same predecessor. Two checkers adding transitions on the same post-call won't (and shouldn't) suddenly cause a state split.

Is there a reason against add the note without the word "Assuming" instead of no note?

We generally never have event notes in such cases.

We sometimes have control flow notes such as "aking true branch" (arrow pointing to the branch in plist).

Then, separately from that, we have tracking notes such as "an interesting value appeared in this variable from this assignment, which also makes the assigned value interesting" (eg. "null pointer value assigned to 'p'") - we could have a similar note "an interesting value was returned from that function because that other value had a certain range, which also makes that other value interesting" - which we should definitely consider.

But simply saying "this function call was completely predictable and we don't even care what the value is in the first place" doesn't sound useful.

879

*Taking

1326–1330

Like I mentioned in D122285#inline-1169194, I'm not sure these cases should exist at all. In particular, explaining them to the user is fairly problematic because justifying them in the first place is problematic.

clang/test/Analysis/std-c-library-functions-path-notes.c
4

I'm not sure what the default value should be. I guess I can remove this and/or move non-getenv tests into another file.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 28 2022, 5:33 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 5:33 PM
NoQ marked 5 inline comments as done.Apr 28 2022, 5:35 PM
NoQ added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
400

It doesn't! Classes in .cpp files don't seem to show up in doxygen at all. So it doesn't matter which style do we use. I think I'll keep it doxygenized just in case we decide to move these classes out to a public header at some point.