This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker: Add better diagnostics
ClosedPublic

Authored by martong on May 5 2020, 9:55 AM.

Details

Summary

Title says it, but there is still place for further improvements.

Diff Detail

Event Timeline

martong created this revision.May 5 2020, 9:55 AM

Sure, this is an improvement because we display more information, but I'd argue that it isn't really a more readable warning message :) How about

<argno>th argument to the call to <function name>

  • cannot be represented with a character
  • is a null pointer
  • ...

, which violates the function's preconditions.

WDYT?

Also, I see no tests :)

Sure, this is an improvement because we display more information, but I'd argue that it isn't really a more readable warning message :) How about

<argno>th argument to the call to <function name>

  • cannot be represented with a character
  • is a null pointer
  • ...

, which violates the function's preconditions.

WDYT?

+1 Warning messages should be as clear as possible. Although our users are programmers, they do not necessarily understand a message "Range constraint not satisfied.". Maybe the error messages could be stored together with the function summaries.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
506

Instead of std::string we usually use llvm::SmallString and an llvm::raw_svector_ostream to print into it.

balazske added inline comments.Jul 2 2020, 12:28 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
506

This would look better?
"Function argument constraint is not satisfied, constraint: <Name>, ArgN: <ArgN>"

Could you add some tests demonstrating the refined diagnostics of the checker?
It would help to validate the quality of the emitted diagnostics.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
87

Shouldn't we use using instead?

506

Maybe llvm::Twine would be a better choice to llvm::raw_svector_ostream for string concatenations.
docs:

Twine - A lightweight data structure for efficiently representing the concatenation of temporary values as strings.

martong updated this revision to Diff 280368.Jul 24 2020, 1:53 AM
  • Rebase to master
martong marked 4 inline comments as done.Jul 24 2020, 2:31 AM

Sure, this is an improvement because we display more information, but I'd argue that it isn't really a more readable warning message :) How about

<argno>th argument to the call to <function name>

  • cannot be represented with a character
  • is a null pointer
  • ...

, which violates the function's preconditions.

WDYT?

I'd rather prefer @balazske's approach (see below) as that is similarly expressive but sparser.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
87

Yes, we should. But it's legacy code from the times long before.

506

We can use a Twine to create the owning std::string without creating interim objects. But then we must pass an owning object to the ctor of PathSensitiveBugReport, otherwise we might end up with dangling pointers.
https://llvm.org/docs/ProgrammersManual.html#dss-twine

martong updated this revision to Diff 280376.Jul 24 2020, 2:34 AM
martong marked 3 inline comments as done.
  • Use Twine
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
506

Yes, I like this.

Szelethus requested changes to this revision.Aug 13 2020, 4:55 AM

Tests c:

I'm still not a huge fan of the warning message. Now it describes what kind of constraint was violated, but not how (too large? too small?). Also, while I respect that you didn't prefer my suggestion, I think the warning message there is still a lot more welcoming, even if it doesn't convey more information. But, I don't necessarily insist on it.

This revision now requires changes to proceed.Aug 13 2020, 4:55 AM

There is a TODO comment at the error message, it needs improvement. The current form is still something for developers, not for end users. For final version I would accept textual descriptions (as mentioned above), not names like "BufferSize" and words like "ArgN" inside it.

Szelethus added a comment.EditedAug 13 2020, 6:22 AM

Ah, okay, silly me. Didn't catch that. Then the message is OK for now.

edit: Describing how the violation happened might be a valuable for development purposes as well.

Ah, okay, silly me. Didn't catch that. Then the message is OK for now.

edit: Describing how the violation happened might be a valuable for development purposes as well.

What if we'd add a toString method to the constraints and we'd add this to Msg? This way we'd know the contents of the constraint, thus we we'd know how the constraint is violated.

Ah, okay, silly me. Didn't catch that. Then the message is OK for now.

edit: Describing how the violation happened might be a valuable for development purposes as well.

What if we'd add a toString method to the constraints and we'd add this to Msg? This way we'd know the contents of the constraint, thus we we'd know how the constraint is violated.

I mean we'd know what is not satisfied. But, to know why exactly that is not satisfied we should dump the whole State but that's obviously not an option. Perhaps we could track which symbols and expressions are participating in the assumption related to the constraint and we could dump only those, but this seems to be a very complex approach.

Szelethus accepted this revision.Sep 10 2020, 12:41 AM

What if we'd add a toString method to the constraints and we'd add this to Msg? This way we'd know the contents of the constraint, thus we we'd know how the constraint is violated.

I mean we'd know what is not satisfied. But, to know why exactly that is not satisfied we should dump the whole State but that's obviously not an option. Perhaps we could track which symbols and expressions are participating in the assumption related to the constraint and we could dump only those, but this seems to be a very complex approach.

I realize that the how and why phrases in this context a bit too vague :) What do you mean under having to dump the whole State? I didn't mean to compress a bug path into a warning message, only what I mentioned in D79431#2020951. In any case, I think its okay to just move on with this patch. LGTM!

This revision is now accepted and ready to land.Sep 10 2020, 12:41 AM

What if we'd add a toString method to the constraints and we'd add this to Msg? This way we'd know the contents of the constraint, thus we we'd know how the constraint is violated.

I mean we'd know what is not satisfied. But, to know why exactly that is not satisfied we should dump the whole State but that's obviously not an option. Perhaps we could track which symbols and expressions are participating in the assumption related to the constraint and we could dump only those, but this seems to be a very complex approach.

I realize that the how and why phrases in this context a bit too vague :) What do you mean under having to dump the whole State? I didn't mean to compress a bug path into a warning message, only what I mentioned in D79431#2020951. In any case, I think its okay to just move on with this patch. LGTM!

First, thanks for the review!

Second, "dumping the whole State" is an overstatement. What I meant is this: to properly explain to the user how the constraint is failed is a hard task. Because we should dig up all symbols and expressions that are related to the constraint and we should dump their values. E.g. consider the violation of a range constraint here,

ssize_t send(int socket, const void *buffer, size_t length, int flags);
void foo() {
  if (socket < 0)
    send(socket, buf, buflen, flags); // constraint violation for 'socket' it should be in [0, IntMax]
}

We should print something like this: 'socket' it should be in [0, IntMax] but it is in [IntMin, -1]
In this example, the related symbol is only socket but this could be more complex, e.g. if the constraint is related to a buffer size ...

This revision was landed with ongoing or failed builds.Sep 10 2020, 3:44 AM
This revision was automatically updated to reflect the committed changes.