Bug reports of resource leak are now improved.
If there are multiple resource leak paths for the same stream,
only one wil be reported.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
If there are multiple resource leak paths for the same stream, only one wil be reported.
What is the rationale behind this? Do all leaks from the same opening describe the same kind of error? Is this based on observations on a codebase? I'm not against it -- but nor am I immediately for it. Otherwise LGTM.
The code was taken from FuchsiaHandleChecker. I do not know which approach to use, it may be that reporting all leak paths is better, these are really different problems. But I did not like getting many similar looking bug reports at a function that opens a file and then in various error cases stops the program by a "noreturn" function. At every such case a false positive resource leak is reported (I think it is OK to not close the file at stopping the program specially if there is some kind of error). Or the checker can be improved to find at a checkDeadSymbols if the program execution is stopping and do not report resource leak in that case.
Alright, I'm sold. How about we add a checker option for it? I don't actually insist, just an idea. @xazax.hun, how has this feature played out?
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 | How about getAcquisitionSite, and a line of comment:
| |
410–411 | I see what you mean, but I'd phrase this differently, and place it...
| |
974–1005 | ...here! | |
clang/test/Analysis/stream-note.c | ||
2 | core package! Also, we don't specify -analyzer-store region explicitly, we even wondered whether we should just remove the option altogether. Fun fact, clang_analyze_cc1 actually expands to an invocation that contains it anyways. |
- Report every path of resource leak.
- Do not report if non-returning function was encountered.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
376–377 | Another thing you might want to check is that the warning is coming from your checker. The symbol may be marked as interesting by another checker for a completely unrelated reason. The easiest way to check that is usually to compare the report's bug type to your checker's bug type. (we should absolutely automate this) |
Do i understand correctly that the checker is no longer "missing limbs" and we should consider turning it on by default? If so, @balazske could you prioritize hunting down the remaining false positives above adding new checks / hunting down false negatives, so that users could finally start taking advantage of the checker?
I'd still like to see more NoteTags such as "File read failed, end-of-file indicator set on 'F'", and a final evaluation would be nice, but otherwise this checker looks amazing.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 | Ok, so this is a tiny auxiliary visitor. I wish we could set uniqueing location post-factum from within the note tag. Unfortunately we can't because notes are generated after uniqueing :( I'd like you to experiment with tracking this information as part of the state instead so that you didn't need to perform an additional scan. I'd be happy if it helps you avoid performing this additional pass. I.e., when you're opening the file, add all the information you need for building your uniqueing location into the stream state (not the node though, just the program point or something like that). Then retrieve it in O(1) when you're emitting the report. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 | Another thing we could try is to implement such post-processing pass globally for all checkers. I.e., instead of an optional uniqueing location accept an optional lambda that looks at a node and answers whether this node should be used as a uniqueing location. Then before report deduplication scan each report that supplies such lambda and update its uniqueing location. That'll probably require some work on BugReporter but that sounds like a nice way to avoid all the boilerplate. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 | Storing the "acquisition site" in the state is the natural way of doing this. Probably I should not think that existing checkers do things the best way, this function is taken from FuchsiaHandleChecker (or here it has a specific reason?). And not storing the data in the state is a bit less memory consumption if this matters. | |
406 | We should check how many checkers can benefit from such a "uniqueing location callback". Normally the checker should know what the uniqueing location is, at least when the bug report is created. The location is naturally obtained from the state that the checker maintains, at least if we want to avoid scans like getAcquisitionSite. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 |
Well, if you ever find yourself copy-pasting a large chunk of code from a different checker and using it almost unchanged, it's a good indication that the checker API needs to be improved. You can't do things "the best way" in a single checker in isolation, it's a collective effort. You're a programmer, you can change everything, no need to be confined in your checker.
No, i don't think it ever happens automagically. It's either tracked in the state specifically for that purpose or scanned backwards. So i'd rather believe that every checker that has non-default uniqueing locations will benefit from such facility. |
Balázs, could you please add the checker option within this patch? If we find that the option works well (removes a lot of useless reports) I'd be happy to help implement that uniqueing pass.
CmdLineOption<Boolean, "UniqueLeaks", "Only display a single report for each leaked stream object, rather than a report for each path of execution on which the same stream was leaked", "false", InAlpha>,
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
406 |
I've lately found the uniqueing location quite wonky, because its hardly ever the location we want to unique by, but it is the only way to pull it off with the existing infrastructure. More often then not the uniqueing point is a variable, a stream object, or something that isn't a location, but can sort of be tied to a location. So, lets go with this suggestion! @balazske As long as this is an off-by-default hidden checker option, I think we can commit this code for experimentation purposes. I believe it is correct for 99% of the cases. But I agree, we need to tie its enabling to a more robust solution. Lets add some comments: |
Balázs, could you please add the checker option within this patch?
I'd rather have this decision made globally. Like, for all leaks, or something like that. Our behavior should be consistent.
I do not understand fully this "globally". A new option should be added that affects all checkers that detect some kind of resource leak? And then implement that kind of report uniqueness in all checkers that detect resource leak.
Other possible solution: Leave the current way of checker specific options, and add a kind of "meta-option" that can set multiple (checker) options in a batch. For our case this would set the report uniqueing option for every checker that supports it. Still it remains possible to set options separately for each checker.
I see where you're coming from @NoQ. What do you think, @balazske? I think there is is still value in this implementation as a debug option to gather data, so that we don't invest a lot of time creating a robust infrastructure for an idea that might not work out.
Yup, its a fair point that all leaks describe the same kind of bug, even if the root cause of that bug may come from different kinds of programming errors, so it makes sense to unique them all the same way.
And then implement that kind of report uniqueness in all checkers that detect resource leak.
That could be helped additionally by creating a distinct LeakBugReport, derived from PathSensitiveBugReport, that would take non-optional uniqueing lambda to find the ExplodedNode responsible for the resource acquisition. Or the actual ExplodedNode itself.
Other possible solution: Leave the current way of checker specific options, and add a kind of "meta-option" that can set multiple (checker) options in a batch. For our case this would set the report uniqueing option for every checker that supports it. Still it remains possible to set options separately for each checker.
That could be achieved with Artem's proposed package system (or hashtags): D77866#2069144 (Package options are a thing even today). However, if we had a LeakBugReport class, we could implement the option with regular analyzer configs.
That's an awesome idea, i'm speechless :)
Yes, that's probably the best approach. If you want to experiment a lot with this stuff, you probably want data from more different checkers than just yours (i expect your checker to be relatively quiet compared to, say, MallocChecker that'll provide a lot more input to your experiment). I'd only go for an ability to configure checkers individually if we have any signal at all that they *need* to be configured individually; otherwise enforcing consistent user experience is a good thing.
So for this patch it would be OK to have the uniqueing location as it is now. A next large change can be to add the global resource leak report uniqueing feature, this changes anyway more existing checkers (including this one). (Still I want to finish other improvements in the StreamChecker.)
clang-format not found in user's PATH; not linting file.