This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][StdLibraryFunctionsChecker] Add NoteTags for applied arg constraints
ClosedPublic

Authored by martong on Apr 29 2021, 5:45 AM.

Details

Summary

In this patch I add a new NoteTag for each applied argument constraint.
This way, any other checker that reports a bug - where the applied
constraint is relevant - will display the corresponding note. With this
change we provide more information for the users to understand some
bug reports easier.

Diff Detail

Event Timeline

martong created this revision.Apr 29 2021, 5:45 AM
martong requested review of this revision.Apr 29 2021, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2021, 5:45 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal added inline comments.Apr 29 2021, 7:43 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
969–970

This way each and every applied constraint will be displayed even if the given argument does not constitute to the bug condition.
I recommend you branching within the lambda, on the interestingness of the given argument constraint.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
241–243

nit.
BTW I raised my related concerns about this elsewhere.

253–262

I was puzzled for a moment on why do you have two notes here.
By checking the definition of __arg_constrained_twice(), I can see that it has two ArgConstraints.

Although, it shouldn't be a problem as on the UI we would visualize notes for only a single bugreport. So it is probably clear that both of the notes are valid and correspond to the given statement. It might look clunky in the LIT test, but should be 'somewhat' readable in real life.

You should take no action here. I leave this comment just for the record.

NoQ added inline comments.Apr 29 2021, 4:45 PM
clang/test/Analysis/std-c-library-functions-arg-constraints-note-tags.cpp
17

This has to be a user-friendly message.

  • "Constraints" is compiler jargon.
  • We cannot afford shortening "argument" to "arg".
  • Generally, the less machine-generated it looks the better (":" is definitely robotic).
clang/test/Analysis/std-c-library-functions-arg-constraints.c
42

This isn't part of this patch but what do you think about {-1} U [0, 255]? Or, you know, [-1, 255].

martong planned changes to this revision.Apr 30 2021, 2:46 AM
martong marked 4 inline comments as done.
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
969–970

Okay, good point, thanks for the feedback! I am planning to change to this direction.

clang/test/Analysis/std-c-library-functions-arg-constraints-note-tags.cpp
17

Okay, thanks for your comment. I can make it to be more similar to the other notes we already have. What about this?

Assuming the 1st argument is within the range [1, 1]

We cannot afford shortening "argument" to "arg".

I'd like to address this in another following patch if you don't mind.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
42

Yeah, good idea, {-1} U [0, 255] would be indeed nicer and shorter. However, [-1, 255] is hard(er) so I am going to start with the former in a follow up patch.

NoQ added inline comments.May 2 2021, 6:31 PM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
969–970

Excellent catch @steakhal!

I think you can always emit the note but only mark it as unprunable when the argument is interesting. This way it'd work identically to our normal "Assuming..." notes.

clang/test/Analysis/std-c-library-functions-arg-constraints-note-tags.cpp
17

This sounds good for a generic message. I still think that most of the time these messages should be part of the summary. Eg.,

Assuming the 1st argument is within range [33, 47] U [58, 64] U [91, 96] U [123, 125]

ideally should be rephrased as

Assuming the argument is a punctuation character

in the summary of ispunct().

martong marked 4 inline comments as done.May 3 2021, 2:14 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
969–970

I think you can always emit the note but only mark it as unprunable when the argument is interesting. This way it'd work identically to our normal "Assuming..." notes.

IsPrunable is a const member in NoteTag. So, we have to decide about prunability when we call getNoteTag. To follow your suggestion, we should decide the prunability dynamically in TagVisitor::VisitNode. This would require some infrastructural changes in NoteTag. We could add e.g. another Callback member that would be able to decide the prunability with the help of a BugReport&. I am okay to go into that direction, but it should definitely be separated from this patch (follow-up). I am not sure if it is an absolutely needed dependency for this change, is it? (If yes then I am going to create the dependent patch first).

martong marked 2 inline comments as done.May 3 2021, 2:20 AM
martong added inline comments.
clang/test/Analysis/std-c-library-functions-arg-constraints-note-tags.cpp
17

Yes, absolutely, good idea. It makes sense to provide another member for the Summary that could specifically describe the function specific assumptions (or violations). However, before we would be able to go through all functions manually to create these specific messages we need a generic solution to have something that is more descriptive than the current solution.

martong updated this revision to Diff 342342.May 3 2021, 2:58 AM
martong marked an inline comment as done.
  • Add the description to the note tag only if the SVal is interesting
  • Use 'Assuming the nth arg is in ...' form for the descriptions
steakhal added inline comments.May 3 2021, 4:28 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
805–820

I don't know. Report message construction always seemed clunky.
Clang's or ClangTidy's approach seems superior in this regard.

Do we have anything better for this @NoQ?
Maybe llvm::format() could be an option.

Regarding this patch: It's fine. Better than it was before!

971
974

Ah, there is a slight issue.
You should mark some stuff interesting here, to make this interestingness propagate back transitively.

Let's say ArgSVal is x + y which is considered to be out of range [42,52]. We should mark both x and y interesting because they themselves could have been constrained by the StdLibChecker previously. So, they must be interesting as well.

On the same token, IMO PathSensitiveBugReport::markInteresting(symbol) should be transitive. So that all SymbolData in that symbolic expression tree are considered interesting. What do you think @NoQ?
If we were doing this, @martong - you could simply acquire the assumption you constructed for the given ValueConstraint, and mark that interesting. Then all SymbolDatas on both sides of the logic operator would become implicitly interesting.

Szelethus added inline comments.Jul 21 2021, 7:40 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
148–149

How about we turn this into a print-like function and instead of returning with a string, we take an llvm::raw_ostream object as argument? SmallString + raw_svector_stream is how we construct most of our checker message strings.

974

On the same token, IMO PathSensitiveBugReport::markInteresting(symbol) should be transitive. So that all SymbolData in that symbolic expression tree are considered interesting. What do you think @NoQ?

Thats how I'd expect this to work. This shouldn't be a burden on the checker developer (certainly not this kind of a checker), but rather be handled by PathSensitiveBugReport.

So I think this is fine as it is.

NoQ added inline comments.Jul 21 2021, 10:35 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
974

Interestingness isn't a thing on its own; its meaning is entirely tied to the nature of the specific bug report. An interesting pointer in the null dereference checker is absolutely not the same thing as an interesting mutex pointer in PthreadLockChecker or a (de)allocated pointer in MallocChecker.

I currently treat interestingness as a "GDM for visitors". It's their own way of communicating with themselves and with each other, a state they keep track of and update as they visit the bug report (initially populated during construction of the bug report). But the meaning of this state is entirely specific to the visitors. It is visitors who give interestingness a meaning (and the visitors are, naturally, also hand-picked during construction of the bug report).

So I think the right question to ask is "what do you want interestingness to mean in your checker?" and build your visitors accordingly.

Your visitors should provide enough information for the user to be able to understand the bug report. When the report says "$x + $y is in range [42, 52] and no values in that range are a valid input to that function", the user asks "why do you think $x + $y is in range [42, 52]?" and we'll have to answer that.

For example, in

if (x + y >= 42 && x + y <= 52)
  foo(x + y);

there's no need to track ranges for $x and $y separately; it is sufficient to point the user to the constraint over $x + $y obtained from the if-statement. On the other hand, in

if (x >= 44 && x <= 50)
  if (y >= -2 && y <= 2)
    foo(x + y);

you'll have to explain both $x and $y in order for the user to understand that $x + $y is indeed in range [42, 52].

There are also other funny edge cases depending on the nature of the arithmetic, such as

int z = x * y;
if (x == 0)
  return 1 / z;

where in order to explain the division by the zero value $x * $y it is sufficient to explain $x which makes $y redundant. And if they both are zero then we should probably flip a coin?

So I think this is a non-trivial problem. Even on the examples above (let alone other bug types!) it's easy to see that interestingness of $x and $y doesn't always follow from the interestingness of $x + $y (but sometimes it indeed does). I think the answer lies somewhere in the underneathies of the constraint solver: we have to follow its logic in order to find out how the range was inferred (which it probably should annotate with more note tags so that we didn't have to reverse-engineer it in the visitor?) (@vsavchenko you may find this discussion particularly peculiar).

martong marked 7 inline comments as done.Sep 15 2022, 3:09 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
148–149

I don't see how that change would be relevant. Would we have a better run-time, or code that is easier to understand? please elaborate.

971

Thanks, changed it.

974

Guys, the thing is, we should have our discussion about "transitive interestingness" somewhere else.

This patch is orthogonal to that problem. Actually, in this patch I add extra notes to already interesting SVals. Those SVals are marked to be interesting by other checkers. As @NoQ describes, I agree, that a checker could be responsible to describe how it wants to handle transitivity. But, once again, it is not the StdLibraryFunctionChecker that is marking the SVals interesting. Please take a look at the newly added test file: We have a division by zero there, thus the divisor is marked interesting by the DivZeroChecker (and transitivity is handled by trackExpressionValue). What we do in this patch is if we know that a value to be interesting and we know that had been constrained by an argument constraint, then we attach a note that describes this fact.

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 3:09 AM
martong updated this revision to Diff 460345.Sep 15 2022, 3:09 AM
martong marked 2 inline comments as done.
  • Rebase
  • move Msg into the lambda

Gentle ping @steakhal @NoQ
Trying to revive this after a year :) I am sorry it took so long to get back to this.

NoQ added a comment.Sep 20 2022, 5:31 PM

Hi, looks great! I found a couple of typos and the amount of changes in tests is suspiciously low. And I want to make sure that the promise to change "arg" -> "argument" isn't lost (but I'll be happy if it's addressed in a follow-up patch).

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
139
719
755

I suspect this needs to be covered by tests.

martong updated this revision to Diff 467772.Oct 14 2022, 7:12 AM
martong marked 3 inline comments as done.
  • Rebase
martong updated this revision to Diff 467773.Oct 14 2022, 7:22 AM
  • Changed to be more verbose: "arg" -> "argument"
  • Fixed "less than" to "greater than"
  • Added new tests
  • Fixed typos

Hi, looks great! I found a couple of typos and the amount of changes in tests is suspiciously low. And I want to make sure that the promise to change "arg" -> "argument" isn't lost (but I'll be happy if it's addressed in a follow-up patch).

Ok, I've changed "arg" to "argument" in the latest update, plus added new test cases. Thanks for the review!

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
755

Okay, I've added further tests for the "not-null" and the "buffer-size-constraint" cases.

NoQ accepted this revision.Oct 25 2022, 3:21 PM

Looks great to me, thanks!!

clang/test/Analysis/std-c-library-functions-arg-constraints-notes.cpp
32–33

The warning is the same as the note here right?

Our warnings traditionally describe the problem (the 1st argument *is* less than 10, and this *is* bad because...), not how things "should" be. I guess we can think more about that later.

This revision is now accepted and ready to land.Oct 25 2022, 3:21 PM
This revision was landed with ongoing or failed builds.Oct 26 2022, 7:34 AM
This revision was automatically updated to reflect the committed changes.
martong marked an inline comment as done.

Looks great to me, thanks!!

Thanks for the review!

clang/test/Analysis/std-c-library-functions-arg-constraints-notes.cpp
32–33

No, actually, the warning is different, it does not contain the text "should be". In this case this is it:

Line 31: Function argument constraint is not satisfied, constraint: BufferSize [alpha.unix.StdCLibraryFunctionArgs]

And then the notes basically further explain how the constraint is not satisfied.

I did not put the check for the warnings here because this test file is responsible for checking the notes only, hence it has the name std-c-library-functions-arg-constraints-notes.cpp.
The warnings are directly tested in std-c-library-functions-arg-constraints.c, however, I have to admit, probably we should have even more specific checks for the warning messages there.