Change 1: ErrnoChecker notes show only messages related to errno,
not to assumption of success or failure of functions.
Change 2: StdLibraryFunctionsChecker adds its own note about success
or failure of functions, and the errno related note, independently.
Change 3: Every modeled function in StdLibraryFunctionsChecker
should have a note tag message in all "cases". This is not implemented yet,
only for file (stream) related functions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/test/Analysis/errno-stdlibraryfunctions-notes.c | ||
---|---|---|
17–20 | This is the type of note that looks not necessary and even confusing. It could be any case of failure or success, the failure is chosen. This does not matter for the end result but can be confusing for users (one may think that there is a connection to the found bug). | |
clang/test/Analysis/std-c-library-functions-path-notes.c | ||
72 | Here no note is shown. Probably because the summary of islower has cases without note, these notes should be added. | |
clang/test/Analysis/stream-errno-note.c | ||
25 | At this place a note 'Assuming that the call is successful' should be displayed. But this is not working because StreamChecker is enabled. StreamChecker makes a state split before StdCLibraryFunctionsChecker for tmpfile failure and success, then in StdCLibraryFunctionsChecker the successor count is 1 and the note is not added. Probably the logic can be improved by finding the first node that belongs to the CallEvent. Or count how many cases are applied before adding the note tags. |
This is a good and useful commit; but I have some questions connected to the state transition handling code that you had to modify. (The State/ExplodedNode APIs of the engine are really counter-intuitive... :( )
clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp | ||
---|---|---|
316–318 | Consider using llvm:formatv() here and in other similar messages, it is much less verbose. | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
1299 | Can you explain why is it safe to use const_cast here? (I don't see any concrete issue, but the engine has lots of invariants / unwritten rules and I fear that this might break one of them.) |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1309 | It's surprising to see this check inside the lambda, as its result does not depend on BR. My best guess is that it's performed here because the successors of Node will appear between the execution of the surrounding code and the execution of this lambda. However, CheckerContext.h line 69-70 claims that "checkers should not retain the node in their state since the nodes might get invalidated." which would imply that the captured Node might be invalid when the lambda is called. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1299 | The node Pred should be modified only later when a successor is added (addTransition has non-const parameter). | |
1309 | This check is to decide if multiple cases could be applied, the same as if we count how many times this place in the loop is executed (add a transition for a case, constraints could be applied). This check is problematic because other checkers can apply state splits before this checker is executed or after it, in this way StreamChecker interferes with this code (it has a state split for success/failure cases of same function, and here we see only that a single case is applied on one branch). This is why this check is only used in the EvalCallAsPure case (theoretically still another checker can make a state split in PostCall before this where the same constraint is applied, then the problem occurs again). I made a solution that does not have this check but has 2 case loops instead, but the mentioned problem (which exists when if (Summary.getInvalidationKd() == EvalCallAsPure) is not used) did not go away. And it may not work to search backwards for the first node with the same statement, because maybe not the first one is where a state split is done. I only think that if this lambda is called with the saved node, that node is not invalid because it is part of a bug report call sequence. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1299 | I understood that you need a non-const ExplodedNode * because addTransition expects it; I want to understand why you are allowed to const_cast it (why doesn't this confuse the engine logic). Equivalent question from the other direction: Why did the author of CheckerContext::getPredecessor() specify that its return value is a const pointer to ExplodedNode? If we can conclude that const_cast is valid in this kind of situation, then I'd also consider simply removing the "const" from the return type of getPredecessor. | |
1309 |
Thanks for the explanation, now I see why this roundabout solution is needed.
This is a good point, I think we can keep this "check successors in the lambda" solution. Please add a comment like "This check is performed inside the lambda, after other checkers had a chance to add other successors. Dereferencing the saved node object is valid because it's part of a bug report call sequence." to record the the reasoning that we discussed. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1299 | The const_cast is not needed at all if Pred and Node is made non-const, and getPredecessor has a non-const version. The Node is saved because we want to add transitions to it, it makes no sense to have it (a pointer to) const. (Probably the const comes from a time when the Node was used only for the lambda? In the lambda it could be const, if it matters.) |
The source code changes LGTM. I'll soon check the results on the open source projects and give a final approval based on that.
By the way, I think this commit and the "display notes only if interesting" follow-up change (D153776) should be merged at the same time (but I'd guess that you already planned to do it that way).
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1299 | Sorry, it seems that I badly misread the source code of CheckerContext.h (I probably looked at the wrong line when I briefly jumped to the definition of getPredecessor). In fact, getPredecessor has only one version and it returns non-const ExplodedNode *. This means that your code is (of course) completely valid. |
As I started to review the code of the follow-up commit D153776, I noticed a dangling StringRef bug which belongs to this review.
Moreover, as a minor remark I'd note that LLVM has a dedicated class for storing string literal constants (btw I learned this from @Szelethus yesterday).
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1303 | This variable will be captured and stored by the lambda functions, so it should own the memory area of the string. In the current code this did not cause problems because this StringRef points to the memory area of a string literal (which stays valid); but if later changes introduced dynamically generated note tags, then this code would dump random memory garbage. By the way, a very similar issue survived almost four years in CheckerContext.h, I'm fixing it in D153889. | |
1320 | Also update this when you change the type of Note to std::string. | |
1688–1689 | Consider using the StringLiteral subclass of StringRef which is designed for this kind of application (and determines the length of the string in compile time instead of calling strlen during a runtime const char * → StringRef conversion): |
Fixed review issues.
Note tag is added for fread.
Notes contain now the function name.
Thanks for the update! I have two minor remarks (string handling is complicated in C++...) but the code looks good otherwise.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1304 | Consider using the method StringRef::data() which directly converts a StringRef to a const char *. Your two-step conversion has the advantage that it adds a \0 terminator even if the StringRef isn't null-terminated, but I cannot imagine a "natural" code change that would introduce references to non-null-terminated char arrays as note message templates. | |
1320 | The return type of this lambda is std::string, so I think it's pointless to convert std::string Note to a C string (which will be used to implicitly construct another std::string). |
This is the quintessential example of a patch where while the test files look promising, we need some evaluation on the real world. I understand that its a chore -- but this is simply the nature of the beast.
While I like how this looks like, I'd be more pleased if some real world data would back it up (I understand that we shared a link around internally, but that kind of defeats the purpose of an open source review).
See results and their discussion on the review of the tightly connected follow-up commit D153776.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1304 | I would prefer not to rely on that StringRefs (here) are expected to be null-terminated, unless benchmarks demonstrate that this is important. If that would turn out to be the case, then we would need to enforce this using some sort of assert expression. | |
1336 | Previously, we had a cast<>(Call.getDecl()), thus we should only have a dyn_cast here. | |
clang/test/Analysis/stream-note.c | ||
61–62 | Why are these notes doubled? |
clang/test/Analysis/stream-note.c | ||
---|---|---|
61–62 | There are 2 cases of resource leak reported (for F1 and F2) and a note tag is there for both of these. |
clang/test/Analysis/stream-note.c | ||
---|---|---|
61–62 | I still think we should only have a single note there given that F2 is completely unrelated to the initialization of F1. |
(Just added some side remarks about string handling annoyances.)
@balazske Please (1) handle the dyn_cast request of @steakhal and also (2) do something with this "how to convert StringRef to char *" question that we're bikeshedding. I hope that after those we could finally merge this commit sequence.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
1304 | I think the way of expressing this concrete conversion is a very low priority question (a.k.a. bikeshedding), so returning to the previously used two-step conversion is also fine for me. The primary goal of this ".data()" shortcut was not performance, but clarity: those chained conversions triggered my "detect suspicious code" instincts and this was the best alternative that I could find. In general I feel that the LLVM codebase has way too many string types, and the back-and-forth conversions between them add lots of unnecessary noise to the code. In this particular case I think it's the fault of llvm::formatv that it doesn't accept StringRef (the most commonly used non-owning string) and in general it's problematic design that there is no "good" conversion between StringRef and C-style strings in either direction. |
No major problems were indicated with the patch stack, I plan to merge all of these soon. Small problems can be still corrected before or when the checker is put out from the alpha package.
Make sure features are consistent as clang-17 will branch off next Tuesday if you want these to be part of that release.
Also, consider fine-tuning the proposed release-notes for the features you developed. See D155445.
Consider using llvm:formatv() here and in other similar messages, it is much less verbose.