This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][StreamChecker] Add note tags for file opening.
ClosedPublic

Authored by balazske on Jun 8 2020, 8:41 AM.

Details

Summary

Bug reports of resource leak are now improved.
If there are multiple resource leak paths for the same stream,
only one wil be reported.

Diff Detail

Event Timeline

balazske created this revision.Jun 8 2020, 8:41 AM

If there are multiple resource leak paths for the same stream, only one wil be reported.

What is the rationale behind this? Do all leaks from the same opening describe the same kind of error? Is this based on observations on a codebase? I'm not against it -- but nor am I immediately for it. Otherwise LGTM.

If there are multiple resource leak paths for the same stream, only one wil be reported.

What is the rationale behind this? Do all leaks from the same opening describe the same kind of error? Is this based on observations on a codebase?

The code was taken from FuchsiaHandleChecker. I do not know which approach to use, it may be that reporting all leak paths is better, these are really different problems. But I did not like getting many similar looking bug reports at a function that opens a file and then in various error cases stops the program by a "noreturn" function. At every such case a false positive resource leak is reported (I think it is OK to not close the file at stopping the program specially if there is some kind of error). Or the checker can be improved to find at a checkDeadSymbols if the program execution is stopping and do not report resource leak in that case.

Alright, I'm sold. How about we add a checker option for it? I don't actually insist, just an idea. @xazax.hun, how has this feature played out?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

How about getAcquisitionSite, and a line of comment:

Searches for the ExplodedNode where the file descriptor was acquired for Sym.

397–398

I see what you mean, but I'd phrase this differently, and place it...

Resource leaks can result in multiple warning that describe the same kind of programming error:

void f() {
  FILE *F = fopen("a.txt");
  if (rand()) // state split
    return; // warning
} // warning

While this isn't necessarily true (leaking the same stream could result from a different kinds of errors), the reduction in redundant reports makes this a worthwhile heuristic.

940–950

...here!

clang/test/Analysis/stream-note.c
2

core package! Also, we don't specify -analyzer-store region explicitly, we even wondered whether we should just remove the option altogether. Fun fact, clang_analyze_cc1 actually expands to an invocation that contains it anyways.

balazske updated this revision to Diff 270140.Jun 11 2020, 7:41 AM
  • Report every path of resource leak.
  • Do not report if non-returning function was encountered.
balazske updated this revision to Diff 270161.Jun 11 2020, 9:00 AM
  • Added tests.
  • Report every path of resource leak.

I thought we agreed on the uniqueing being great?

NoQ added a subscriber: NoQ.Jun 11 2020, 1:15 PM
NoQ added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
376–377

Another thing you might want to check is that the warning is coming from your checker. The symbol may be marked as interesting by another checker for a completely unrelated reason. The easiest way to check that is usually to compare the report's bug type to your checker's bug type.

(we should absolutely automate this)

NoQ added a reviewer: NoQ.Jun 14 2020, 3:32 AM

Do i understand correctly that the checker is no longer "missing limbs" and we should consider turning it on by default? If so, @balazske could you prioritize hunting down the remaining false positives above adding new checks / hunting down false negatives, so that users could finally start taking advantage of the checker?

I'd still like to see more NoteTags such as "File read failed, end-of-file indicator set on 'F'", and a final evaluation would be nice, but otherwise this checker looks amazing.

balazske updated this revision to Diff 270974.Jun 16 2020, 12:42 AM
  • Rebase
  • Added check for checker in NoteTag function.
balazske updated this revision to Diff 271007.Jun 16 2020, 2:54 AM
balazske marked an inline comment as done.
  • Re-added the location uniqueing feature.
balazske updated this revision to Diff 271291.Jun 17 2020, 12:55 AM
balazske marked 2 inline comments as done.

Corrected command line arguments in tests.

balazske marked an inline comment as done.Jun 17 2020, 12:55 AM
Szelethus accepted this revision.Jun 17 2020, 3:25 AM
Szelethus marked 2 inline comments as done.

Yay! Getting so close to enabling this by default. I'm a big fan of your work on this checker.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
378

I think is is going to be good enough until we automate things.

clang/test/Analysis/stream.c
1

Nice catch.

This revision is now accepted and ready to land.Jun 17 2020, 3:25 AM
NoQ added inline comments.Jun 17 2020, 7:45 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

Ok, so this is a tiny auxiliary visitor.

I wish we could set uniqueing location post-factum from within the note tag. Unfortunately we can't because notes are generated after uniqueing :(

I'd like you to experiment with tracking this information as part of the state instead so that you didn't need to perform an additional scan. I'd be happy if it helps you avoid performing this additional pass.

I.e., when you're opening the file, add all the information you need for building your uniqueing location into the stream state (not the node though, just the program point or something like that). Then retrieve it in O(1) when you're emitting the report.

NoQ added inline comments.Jun 17 2020, 7:51 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

Another thing we could try is to implement such post-processing pass globally for all checkers. I.e., instead of an optional uniqueing location accept an optional lambda that looks at a node and answers whether this node should be used as a uniqueing location. Then before report deduplication scan each report that supplies such lambda and update its uniqueing location. That'll probably require some work on BugReporter but that sounds like a nice way to avoid all the boilerplate.

balazske marked 3 inline comments as done.Jun 18 2020, 12:18 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

Storing the "acquisition site" in the state is the natural way of doing this. Probably I should not think that existing checkers do things the best way, this function is taken from FuchsiaHandleChecker (or here it has a specific reason?). And not storing the data in the state is a bit less memory consumption if this matters.

393

We should check how many checkers can benefit from such a "uniqueing location callback". Normally the checker should know what the uniqueing location is, at least when the bug report is created. The location is naturally obtained from the state that the checker maintains, at least if we want to avoid scans like getAcquisitionSite.

NoQ added inline comments.Jun 18 2020, 2:37 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

Probably I should not think that existing checkers do things the best way

Well, if you ever find yourself copy-pasting a large chunk of code from a different checker and using it almost unchanged, it's a good indication that the checker API needs to be improved. You can't do things "the best way" in a single checker in isolation, it's a collective effort. You're a programmer, you can change everything, no need to be confined in your checker.

We should check how many checkers can benefit from such a "uniqueing location callback". Normally the checker should know what the uniqueing location is, at least when the bug report is created.

No, i don't think it ever happens automagically. It's either tracked in the state specifically for that purpose or scanned backwards. So i'd rather believe that every checker that has non-default uniqueing locations will benefit from such facility.

Balázs, could you please add the checker option within this patch? If we find that the option works well (removes a lot of useless reports) I'd be happy to help implement that uniqueing pass.

CmdLineOption<Boolean,
              "UniqueLeaks",
              "Only display a single report for each leaked stream object, rather than a report for each path of execution on which the same stream was leaked",
              "false",
              InAlpha>,
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
393

Another thing we could try is to implement such post-processing pass globally for all checkers. I.e., instead of an optional uniqueing location accept an optional lambda that looks at a node and answers whether this node should be used as a uniqueing location. Then before report deduplication scan each report that supplies such lambda and update its uniqueing location. That'll probably require some work on BugReporter but that sounds like a nice way to avoid all the boilerplate.

I've lately found the uniqueing location quite wonky, because its hardly ever the location we want to unique by, but it is the only way to pull it off with the existing infrastructure. More often then not the uniqueing point is a variable, a stream object, or something that isn't a location, but can sort of be tied to a location. So, lets go with this suggestion!

@balazske As long as this is an off-by-default hidden checker option, I think we can commit this code for experimentation purposes. I believe it is correct for 99% of the cases. But I agree, we need to tie its enabling to a more robust solution.

Lets add some comments:
// HACK: This is essentially a tiny bugreporter visitor, but before the trimming of the exploded graph (so it may contain directed cycles still). We use it to acquire a uniqueing location, but it would be better if we could unique by the actual ExplodedNode instead. We should probably implement a pass before bug report generation that takes a lambda that looks at a node and answers whether this node should be used as a uniqueing node in favor of the currect location-based technique. Note that FuchsiaHandleChecker does something very similar as well.

NoQ added a comment.Jun 18 2020, 6:43 AM

Balázs, could you please add the checker option within this patch?

I'd rather have this decision made globally. Like, for all leaks, or something like that. Our behavior should be consistent.

I do not understand fully this "globally". A new option should be added that affects all checkers that detect some kind of resource leak? And then implement that kind of report uniqueness in all checkers that detect resource leak.

Other possible solution: Leave the current way of checker specific options, and add a kind of "meta-option" that can set multiple (checker) options in a batch. For our case this would set the report uniqueing option for every checker that supports it. Still it remains possible to set options separately for each checker.

I see where you're coming from @NoQ. What do you think, @balazske? I think there is is still value in this implementation as a debug option to gather data, so that we don't invest a lot of time creating a robust infrastructure for an idea that might not work out.

I do not understand fully this "globally". A new option should be added that affects all checkers that detect some kind of resource leak?

Yup, its a fair point that all leaks describe the same kind of bug, even if the root cause of that bug may come from different kinds of programming errors, so it makes sense to unique them all the same way.

And then implement that kind of report uniqueness in all checkers that detect resource leak.

That could be helped additionally by creating a distinct LeakBugReport, derived from PathSensitiveBugReport, that would take non-optional uniqueing lambda to find the ExplodedNode responsible for the resource acquisition. Or the actual ExplodedNode itself.

Other possible solution: Leave the current way of checker specific options, and add a kind of "meta-option" that can set multiple (checker) options in a batch. For our case this would set the report uniqueing option for every checker that supports it. Still it remains possible to set options separately for each checker.

That could be achieved with Artem's proposed package system (or hashtags): D77866#2069144 (Package options are a thing even today). However, if we had a LeakBugReport class, we could implement the option with regular analyzer configs.

NoQ added a comment.Jun 19 2020, 7:28 AM

That could be helped additionally by creating a distinct LeakBugReport, derived from PathSensitiveBugReport, that would take non-optional uniqueing lambda to find the ExplodedNode responsible for the resource acquisition. Or the actual ExplodedNode itself.

That's an awesome idea, i'm speechless :)

I do not understand fully this "globally". A new option should be added that affects all checkers that detect some kind of resource leak? And then implement that kind of report uniqueness in all checkers that detect resource leak.

Yes, that's probably the best approach. If you want to experiment a lot with this stuff, you probably want data from more different checkers than just yours (i expect your checker to be relatively quiet compared to, say, MallocChecker that'll provide a lot more input to your experiment). I'd only go for an ability to configure checkers individually if we have any signal at all that they *need* to be configured individually; otherwise enforcing consistent user experience is a good thing.

So for this patch it would be OK to have the uniqueing location as it is now. A next large change can be to add the global resource leak report uniqueing feature, this changes anyway more existing checkers (including this one). (Still I want to finish other improvements in the StreamChecker.)

NoQ accepted this revision.Jun 19 2020, 10:20 AM

Yup, i think so!

This revision was automatically updated to reflect the committed changes.