This is an archive of the discontinued LLVM Phabricator instance.

[clang][analyzer] Add and change NoteTags in StdLibraryFunctionsChecker.
ClosedPublic

Authored by balazske on Jun 23 2023, 12:28 AM.

Details

Summary

Change 1: ErrnoChecker notes show only messages related to errno,
not to assumption of success or failure of functions.
Change 2: StdLibraryFunctionsChecker adds its own note about success
or failure of functions, and the errno related note, independently.
Change 3: Every modeled function in StdLibraryFunctionsChecker
should have a note tag message in all "cases". This is not implemented yet,
only for file (stream) related functions.

Diff Detail

Event Timeline

balazske created this revision.Jun 23 2023, 12:28 AM
Herald added a reviewer: NoQ. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
balazske requested review of this revision.Jun 23 2023, 12:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 12:28 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
balazske added inline comments.Jun 23 2023, 2:46 AM
clang/test/Analysis/errno-stdlibraryfunctions-notes.c
17–20

This is the type of note that looks not necessary and even confusing. It could be any case of failure or success, the failure is chosen. This does not matter for the end result but can be confusing for users (one may think that there is a connection to the found bug).

clang/test/Analysis/std-c-library-functions-path-notes.c
72

Here no note is shown. Probably because the summary of islower has cases without note, these notes should be added.

clang/test/Analysis/stream-errno-note.c
25

At this place a note 'Assuming that the call is successful' should be displayed. But this is not working because StreamChecker is enabled. StreamChecker makes a state split before StdCLibraryFunctionsChecker for tmpfile failure and success, then in StdCLibraryFunctionsChecker the successor count is 1 and the note is not added. Probably the logic can be improved by finding the first node that belongs to the CallEvent. Or count how many cases are applied before adding the note tags.

balazske updated this revision to Diff 534487.Jun 26 2023, 3:40 AM

Add note tag always if function is not evaluated as "pure".
Reformat the code.

This is a good and useful commit; but I have some questions connected to the state transition handling code that you had to modify. (The State/ExplodedNode APIs of the engine are really counter-intuitive... :( )

clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp
316–318

Consider using llvm:formatv() here and in other similar messages, it is much less verbose.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1300

Can you explain why is it safe to use const_cast here? (I don't see any concrete issue, but the engine has lots of invariants / unwritten rules and I fear that this might break one of them.)

donat.nagy added inline comments.Jun 26 2023, 9:00 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1310

It's surprising to see this check inside the lambda, as its result does not depend on BR. My best guess is that it's performed here because the successors of Node will appear between the execution of the surrounding code and the execution of this lambda.

However, CheckerContext.h line 69-70 claims that "checkers should not retain the node in their state since the nodes might get invalidated." which would imply that the captured Node might be invalid when the lambda is called.

balazske added inline comments.Jun 27 2023, 12:02 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1300

The node Pred should be modified only later when a successor is added (addTransition has non-const parameter).

1310

This check is to decide if multiple cases could be applied, the same as if we count how many times this place in the loop is executed (add a transition for a case, constraints could be applied). This check is problematic because other checkers can apply state splits before this checker is executed or after it, in this way StreamChecker interferes with this code (it has a state split for success/failure cases of same function, and here we see only that a single case is applied on one branch). This is why this check is only used in the EvalCallAsPure case (theoretically still another checker can make a state split in PostCall before this where the same constraint is applied, then the problem occurs again).

I made a solution that does not have this check but has 2 case loops instead, but the mentioned problem (which exists when if (Summary.getInvalidationKd() == EvalCallAsPure) is not used) did not go away. And it may not work to search backwards for the first node with the same statement, because maybe not the first one is where a state split is done.

I only think that if this lambda is called with the saved node, that node is not invalid because it is part of a bug report call sequence.

donat.nagy added inline comments.Jun 27 2023, 1:57 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1300

I understood that you need a non-const ExplodedNode * because addTransition expects it; I want to understand why you are allowed to const_cast it (why doesn't this confuse the engine logic).

Equivalent question from the other direction: Why did the author of CheckerContext::getPredecessor() specify that its return value is a const pointer to ExplodedNode?

If we can conclude that const_cast is valid in this kind of situation, then I'd also consider simply removing the "const" from the return type of getPredecessor.

1310

This check is problematic because [...details...]

Thanks for the explanation, now I see why this roundabout solution is needed.

I only think that if this lambda is called with the saved node, that node is not invalid because it is part of a bug report call sequence.

This is a good point, I think we can keep this "check successors in the lambda" solution. Please add a comment like "This check is performed inside the lambda, after other checkers had a chance to add other successors. Dereferencing the saved node object is valid because it's part of a bug report call sequence." to record the the reasoning that we discussed.

balazske updated this revision to Diff 534991.Jun 27 2023, 8:02 AM
balazske marked 5 inline comments as done.

Fixed review issues

balazske added inline comments.Jun 27 2023, 8:07 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1300

The const_cast is not needed at all if Pred and Node is made non-const, and getPredecessor has a non-const version. The Node is saved because we want to add transitions to it, it makes no sense to have it (a pointer to) const. (Probably the const comes from a time when the Node was used only for the lambda? In the lambda it could be const, if it matters.)

The source code changes LGTM. I'll soon check the results on the open source projects and give a final approval based on that.

By the way, I think this commit and the "display notes only if interesting" follow-up change (D153776) should be merged at the same time (but I'd guess that you already planned to do it that way).

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1300

Sorry, it seems that I badly misread the source code of CheckerContext.h (I probably looked at the wrong line when I briefly jumped to the definition of getPredecessor). In fact, getPredecessor has only one version and it returns non-const ExplodedNode *.

This means that your code is (of course) completely valid.

donat.nagy requested changes to this revision.Jun 28 2023, 4:39 AM

As I started to review the code of the follow-up commit D153776, I noticed a dangling StringRef bug which belongs to this review.

Moreover, as a minor remark I'd note that LLVM has a dedicated class for storing string literal constants (btw I learned this from @Szelethus yesterday).

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1304

This variable will be captured and stored by the lambda functions, so it should own the memory area of the string.

In the current code this did not cause problems because this StringRef points to the memory area of a string literal (which stays valid); but if later changes introduced dynamically generated note tags, then this code would dump random memory garbage.

By the way, a very similar issue survived almost four years in CheckerContext.h, I'm fixing it in D153889.

1321

Also update this when you change the type of Note to std::string.

1690–1691

Consider using the StringLiteral subclass of StringRef which is designed for this kind of application (and determines the length of the string in compile time instead of calling strlen during a runtime const char * → StringRef conversion):
https://llvm.org/doxygen/classllvm_1_1StringLiteral.html
(After the suggested change, also update the code that uses these variables!)

This revision now requires changes to proceed.Jun 28 2023, 4:39 AM
balazske updated this revision to Diff 535655.Jun 29 2023, 12:26 AM
balazske marked 3 inline comments as done.

Fixed review issues.
Note tag is added for fread.
Notes contain now the function name.

Thanks for the update! I have two minor remarks (string handling is complicated in C++...) but the code looks good otherwise.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1305

Consider using the method StringRef::data() which directly converts a StringRef to a const char *. Your two-step conversion has the advantage that it adds a \0 terminator even if the StringRef isn't null-terminated, but I cannot imagine a "natural" code change that would introduce references to non-null-terminated char arrays as note message templates.

1321

The return type of this lambda is std::string, so I think it's pointless to convert std::string Note to a C string (which will be used to implicitly construct another std::string).

balazske updated this revision to Diff 535800.Jun 29 2023, 7:46 AM

Fixed review issues.

donat.nagy accepted this revision.Jun 29 2023, 8:09 AM

LGTM, but please wait until you can merge this together with D153776.

This revision is now accepted and ready to land.Jun 29 2023, 8:09 AM
Szelethus added a comment.EditedJun 29 2023, 8:29 AM

This is the quintessential example of a patch where while the test files look promising, we need some evaluation on the real world. I understand that its a chore -- but this is simply the nature of the beast.

While I like how this looks like, I'd be more pleased if some real world data would back it up (I understand that we shared a link around internally, but that kind of defeats the purpose of an open source review).

See results and their discussion on the review of the tightly connected follow-up commit D153776.

Please, let me have a look at this stack tomorrow or the day after prior landing it.

steakhal added inline comments.Jul 11 2023, 6:05 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1305

I would prefer not to rely on that StringRefs (here) are expected to be null-terminated, unless benchmarks demonstrate that this is important. If that would turn out to be the case, then we would need to enforce this using some sort of assert expression.

1337

Previously, we had a cast<>(Call.getDecl()), thus we should only have a dyn_cast here.

clang/test/Analysis/stream-note.c
61–62

Why are these notes doubled?

balazske marked 3 inline comments as done.Jul 12 2023, 1:39 AM
balazske added inline comments.
clang/test/Analysis/stream-note.c
61–62

There are 2 cases of resource leak reported (for F1 and F2) and a note tag is there for both of these.

steakhal added inline comments.Jul 12 2023, 7:10 AM
clang/test/Analysis/stream-note.c
61–62

I still think we should only have a single note there given that F2 is completely unrelated to the initialization of F1.
Do I miss something?

steakhal added inline comments.Jul 12 2023, 7:15 AM
clang/test/Analysis/stream-note.c
61–62

Oo, now I see. D153776 is supposed to fix limit the notes to be displayed only for the interesting values. nvm.

(Just added some side remarks about string handling annoyances.)

@balazske Please (1) handle the dyn_cast request of @steakhal and also (2) do something with this "how to convert StringRef to char *" question that we're bikeshedding. I hope that after those we could finally merge this commit sequence.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1305

I think the way of expressing this concrete conversion is a very low priority question (a.k.a. bikeshedding), so returning to the previously used two-step conversion is also fine for me.

The primary goal of this ".data()" shortcut was not performance, but clarity: those chained conversions triggered my "detect suspicious code" instincts and this was the best alternative that I could find. In general I feel that the LLVM codebase has way too many string types, and the back-and-forth conversions between them add lots of unnecessary noise to the code.

In this particular case I think it's the fault of llvm::formatv that it doesn't accept StringRef (the most commonly used non-owning string) and in general it's problematic design that there is no "good" conversion between StringRef and C-style strings in either direction.

balazske updated this revision to Diff 540015.Jul 13 2023, 7:13 AM
balazske marked 6 inline comments as done.

Changed format string back to str().c_str(),
changed dyn_cast_or_null.

No major problems were indicated with the patch stack, I plan to merge all of these soon. Small problems can be still corrected before or when the checker is put out from the alpha package.

No major problems were indicated with the patch stack, I plan to merge all of these soon. Small problems can be still corrected before or when the checker is put out from the alpha package.

Make sure features are consistent as clang-17 will branch off next Tuesday if you want these to be part of that release.
Also, consider fine-tuning the proposed release-notes for the features you developed. See D155445.