This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Prune calls to functions with linear CFGs that return a non-zero constrained value
ClosedPublic

Authored by Szelethus on Jul 4 2019, 5:08 PM.

Details

Summary

During the evaluation of D62883, I noticed a bunch of totally meaningless notes with the pattern of "Calling 'A'" -> "Returning value" -> "Returning from 'A'", which added no value to the report at all.

This patch (not only affecting tracked conditions mind you) prunes diagnostic messages to functions that return a value not constrained to be 0, and are also linear.

Diff Detail

Repository
rL LLVM

Event Timeline

Szelethus created this revision.Jul 4 2019, 5:08 PM
NoQ added a comment.Jul 4 2019, 8:03 PM

I guess this makes sense. The results look good. I'm slightly worried that we're fighting the symptoms rather than the root cause here: why were these values tracked that far in the first place when we already have no interest in tracking them at the end of the function? I.e., i suspect that your "mild tracking mode" would get rid of a lot of those automagically.

clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

Huh, so there's not even a note in getHalfPoint(), just calling..returning? This definitely needs some attention from NoStoreFuncVisitor.

Generally, i think this is probably the single place where we do really want some info about what happens in getHalfPoint(). The report that consists only of "p is initialized..." and "...p is uninitialized" is pretty weird. Btw, could you write down the full warning text in this test? How bad this actually is?

Szelethus marked 2 inline comments as done.Jul 5 2019, 8:08 AM
In D64232#1570938, @NoQ wrote:

I'm slightly worried that we're fighting the symptoms rather than the root cause here: why were these values tracked that far in the first place when we already have no interest in tracking them at the end of the function?

Could you please elaborate? Which of the modified test cases (or any other) do you think falls under "being tracked too far" and why? Whenever the CFG where the value isn't linear, I think the information could be valuable, see the inline.

I.e., i suspect that your "mild tracking mode" would get rid of a lot of those automagically.

A lot of those, correct, all of them, nope. I've been slacking on publishing the moderate tracking I've been working on, I'll get that done during the day.

clang/test/Analysis/track-control-dependency-conditions.cpp
185 ↗(On Diff #208101)

This note is meaningful, the bug would not have occurred if coin() wasn't assumed to be false.

clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

Wild idea: UninitializedObjectChecker exposes it's main logic through a header file, how about we use it when we find a read of an uninitialized region?

Szelethus marked an inline comment as done.Jul 14 2019, 3:14 PM
Szelethus added inline comments.
clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

Passed-by-value struct argument contains uninitialized data (e.g., field: 'y')

Quite good imo.

NoQ added inline comments.Jul 14 2019, 10:37 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1052 ↗(On Diff #208276)

This time i'd rather go for "has exactly 3 blocks", because we're mostly interested in the CFG being "syntactically linear" rather than "semantically linear". I.e., if the CFG has an if-statement with a compile-time-constant condition, i'd rather show the path, because who knows what does the programmer think about this condition.

clang/test/Analysis/track-control-dependency-conditions.cpp
185 ↗(On Diff #208101)

Mmmmm. Mmmmm. Dunno, this is getting complicated very quickly. Say, if you replace return true with return false on the previous line, the note becomes meaningless again. I don't see a direct connection between meaningfulness and linearness.

The note in this example is relatively meaningful indeed, but i'm not sure it's so much meaningful that it's worth the noise. It's still not surprising for me that flipCoin() occasionally returns false. I do believe that it may be sometimes surprising that flipCoin() may return false on the *current path*, which would make the note meaningful. However, note that we don't prove that it may return false, we only push the assumption one step deeper, i.e. put the blame on coin() instead of flipCoin(), but we still fully trust our assumption about coin() which may have the same problem. And if we had an actual proof that it may return false, we would have had a concrete-false return value, so the patch wouldn't apply.

I guess some experimental data would be great to have.

clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

What specific logic are you talking about? You mean it'd scan the structure for uninitialized fields and present the results of the scan to the user in a note?

Szelethus marked 2 inline comments as done.Jul 15 2019, 11:43 AM
Szelethus added inline comments.
clang/test/Analysis/track-control-dependency-conditions.cpp
185 ↗(On Diff #208101)

Oh yea, I see where you are going. As I understand correctly, these are two separate problems: trying to somehow argue about other execution paths and whether dropping all constraints on a symbol is a good approach.

I should have all the results by tomorrow if all goes according to plan.

clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

What specific logic are you talking about? You mean it'd scan the structure for uninitialized fields and present the results of the scan to the user in a note?

Nvm, looked at the code, realized that what I said made no sense. What we are really missing here is a trackRegionValue() function :^)

Btw, I wasted soooo much time on figuring out that you don't get ANY notes whatsoever if you make this a cpp file rather than a c file, only the warning... Is this intended?

Szelethus updated this revision to Diff 210330.Jul 17 2019, 8:09 AM

Rebase on top master. Putting this on the bottom of the patch stack because this really deserves it's own analysis. (Side note, I completely messed up like ~40 hrs worth of analysis because I didn't check which branches do I have stacked on top of each other, so this might take a while...)

xazax.hun added inline comments.Jul 17 2019, 11:44 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1052 ↗(On Diff #208276)

Yeah, this is a hard question, both sides can have its own reasons.

One of my concern is that functions that does seem to be linear to people are actually not linear CFG wise. One example is single one liner functions with an assert preceding the line in question.

Other interesting note is that once we add exception support and edges for exception handling in the CFG, all the analyzer budgets and CFG node count based heuristics need to be redone. But this is not something we should worry about at this point.

Szelethus added a comment.EditedJul 20 2019, 6:50 AM

LLVM+Clang (A total of 207 reports changed)

Bitcoin (only 1 changed)

with "track-conditions" set to true (bug length)with "track-conditions" set to true and this patch applied (bug length)
net_processing.cpp (27)net_processing.cpp (21)

Xerces (A total of 11 reports changed):

with "track-conditions" set to true (bug length)with "track-conditions" set to true and this patch applied (bug length)
XSAXMLScanner.cpp ( 14)XSAXMLScanner.cpp (10)
ReaderMgr.cpp (12)ReaderMgr.cpp (8)
SGXMLScanner.cpp (20)SGXMLScanner.cpp (16)
DFAContentModel.cpp (16)DFAContentModel.cpp (12)
List of files that changedWhere the results can be checked by changing the run the left corner

I like how things look like with this patch.

Szelethus updated this revision to Diff 211744.Jul 25 2019, 7:11 AM
  • Be even more strict on the rule: mark the note as prunable even if the note message says .*(loaded from '<varname>'). This is okay, because ReturnVisitor will track the return value anyways, and if at least a single non-prunable note is added to the function, it won't be pruned (see the test case in the namespace important_returning_pointer_loaded_from).
  • Add test cases for when the return value is function-local (unimportant_returning_pointer_loaded_from), and when it's the parameter (unimportant_returning_pointer_loaded_from_through_cast, creduced and then manually reduced from an ASTImporter.cpp report).
Szelethus updated this revision to Diff 212801.Aug 1 2019, 7:02 AM

Use size() == 3 instead if isLinear(), add a related test case.

NoQ accepted this revision.Aug 1 2019, 1:56 PM

Sry, should have approved this ages ago >.<

clang/test/Analysis/uninit-vals.c
181 ↗(On Diff #208101)

Btw, I wasted soooo much time on figuring out that you don't get ANY notes whatsoever if you make this a cpp file rather than a c file, only the warning... Is this intended?

At a glance, the behavior of this code in C and C++ is ridiculously different. The way structures are returned-by-value is one of the glaring differences between C and C++. Even at runtime / ABI / calling convention level: in C it's acceptable to simply pass the structure through registers (or put it on the stack, wherever the return value normally lives), however in C++ the calling convention usually requres you to pass return address as a separate hidden parameter so that to use it as "this" in the structure's constructor (and then later use it for RVO and NRVO which may span multiple stack frames) . And the Static Analyzer also deals with completely different ASTs. So i'm not surprised that there's a difference.

But i would also probably not want there to be a difference. If you could turn it into a sensible bug report i guess it could be helpful.

This revision is now accepted and ready to land.Aug 1 2019, 1:56 PM
This revision was automatically updated to reflect the committed changes.
Szelethus marked 5 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2019, 4:21 PM