This is an archive of the discontinued LLVM Phabricator instance.

[StaticAnalyzer] Completely unrolling specific loops with known bound option
ClosedPublic

Authored by szepet on Jun 15 2017, 4:15 PM.

Details

Summary

This feature allows the analyzer to consider loops to completely unroll. New requirements/rules (for unrolling) can be added easily via ASTMatchers.

Right now it is hidden behind a flag, the aim is to find the correct heuristic and create a solution which results higher coverage % and more precise analysis, thus can be enabled by default.

Right now the blocks which belong to an unrolled loop are marked by the LoopVisitor which adds them to the ProgramState.
Then whenever we encounter a CFGBlock in the processCFGBlockEntrance which is marked then we skip its investigating. That means, it won't be considered to be visited more than the maximal bound for visiting since it won't be checked.

Diff Detail

Event Timeline

szepet created this revision.Jun 15 2017, 4:15 PM
xazax.hun edited edge metadata.Jun 15 2017, 11:50 PM

Nice work! Some comments inline.

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
572

Maybe the name should be more specific to express that it is not applied to every loop?

lib/StaticAnalyzer/Core/ExprEngine.cpp
1523

This change is not relevant for this patch.

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
42

Commented code. Maybe you can transform this to an assert.

55

Prefer StringRef as argument.

127

We do not try to check the bound? E.g.: if it is larger than a big number (according to the analyzer).

177

I wonder if State is the right place to store this information. Usually, we store path sensitive information in the state, but this information tha way compute now is not path sensitive. Maybe there could be a map that is independent of the program state? Or a property of the basic blocks?

szepet updated this revision to Diff 102791.Jun 16 2017, 2:19 AM
szepet marked 3 inline comments as done.
szepet retitled this revision from [StaticAnalyzer] Completely unrolling specific loopes with known bound option to [StaticAnalyzer] Completely unrolling specific loops with known bound option .

Small update based on comments.

Hello Peter,

While enabling this by default looks dangerous, I think is nice to have some option for analyzer. I have some comments inline.

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
45

Our team has encountered some problems with plain static matchers recently - they can randomly crash while destructing. We're investigating it right now. Possibly, ManagedStatic should be used for them, but we don't know this exactly and nobody answered in cfe-dev about this.

50

It also seems harmless to support casts from integer literals here. Example: unsigned i = 0; i < 9; ++i) // 9 should have a cast. Same for init matcher.

138

Maybe we should just use anyOf() matcher instead of this condition chain?

173

This can be nicely C++11-fied.

177

+1. I wonder if this can harm merging of ExplodedNodes due to difference in state. Maybe it should be removed at all? Nobody reads it anyway.

test/Analysis/analyzer-config.cpp
45

Missed newline.

test/Analysis/loop-unrolling.cpp
102

I see some inconsistency in formatting of this file. Could you clang-format it?

NoQ edited edge metadata.Jun 16 2017, 12:41 PM

Uhm, it just striked me where this is going. So, generally, we're bound to track when we're entering the loop and exiting the loop. And we have troubles with that, because just by looking at the block we can't say in what loops we are.

We technically can find out which loop we are in until IPA comes in (by ascending through parent statements, or the way Peter does it, which is the same thing "inside out" - by listing the affected blocks). It gets worse when IPA comes in, especially when recursion comes in.

Now to the eureka - am i the only one here who totally forgot about D16403/D19979?

NoQ added a comment.Jun 16 2017, 1:16 PM

A historical retrospective for Peter: there have been quite a few people, at different moments of time, who wanted a checker callback for entering and exiting "scopes". For example, StackAddrEscapeChecker finds local variables that escape a function by reference, but doesn't find local variables that escape an if-branch or a loop body by reference. The proposed approach involved creating a ScopeContext, a counterpart of StackFrameContext that gets constructed and put on the stack of location contexts when a scope is entered and popped when the scope is exited, similarly to how StackFrameContext is put on the stack of location contexts to represent the function call. Because a lot of events occur at scope boundaries, that'd be a great addition. However, the patches i mentioned are not yet done because some corner-cases were not handled. Whether you should complete these patches, as part of your GSoC project - i do not know, it may be harder than it seems. But they'd certainly serve as a good source of inspiration.

a.sidorin added inline comments.Jun 19 2017, 2:20 AM
lib/StaticAnalyzer/Core/LoopUnrolling.cpp
45

Oops, sorry - I have missed that it is a function. Please ignore my previous comment.

Now to the eureka - am i the only one here who totally forgot about D16403/D19979?

We're working on bringing it back to life right now. But it takes time.

NoQ added a comment.EditedJun 19 2017, 10:05 AM

Now to the eureka - am i the only one here who totally forgot about D16403/D19979?

We're working on bringing it back to life right now. But it takes time.

That's totally cool.

Do you think it'd be possible to split up ScopeContext support into small patches that handle different kinds of scopes? Like, ifs in one patch, loops in another patch, anything that involves weird gotos in the last patch. Have some obvious scopes generate the scope context, and the rest of them behave as before, and slowly grow the set of kinds of scopes that have their scope context. Then we'd be able to catch our loop entrances/exits in simple loops without needing to wait for the corner cases to be fully polished. Or do the corner cases stick themselves everywhere and spoil all the fun?

The problem here is not in ScopeContext - it isn't going to be a huge patch. The problem is CFG-related mostly - there are many corner cases. We'll also need some help with fixing Objective-C checkers (RetainCount, for example).

NoQ added a comment.Jun 19 2017, 10:18 AM

Yeah, right, i guess it's mostly about the CFG patch; i was just saying that if, in the end, we don't get scope contexts for all the scopes, this is probably fine - as long as we have *some* of them.

Hmm, could you remind me what was wrong with RetainCountChecker? (better in the other review^^)

Thanks for looking at this.

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
46

You might want to account for != e.g. for (x = 10; x != 0; --x)

55

I think there is an edge case here, what happens if the loop variant is incremented or decremented multiple times in the body?:

while (x > 0) {
  ++x;
  // ...
  --x;
}
test/Analysis/loop-unrolling.cpp
36

Should this be // no-warning as the loop should not be unrolled? And the same for simple_no_unroll2?

zaks.anna edited edge metadata.Jun 20 2017, 1:28 PM

The problem here is not in ScopeContext - it isn't going to be a huge patch. The problem is CFG-related mostly - there are many corner cases. We'll also need some help with
fixing Objective-C checkers (RetainCount, for example).

@a.sidorin, please, send patches for review in small increments to make sure the patches get immediate feedback, before they get too big.
Thank you for looking into this!

szepet updated this revision to Diff 103501.Jun 21 2017, 6:00 PM
szepet marked 10 inline comments as done.
szepet added a reviewer: a.sidorin.

Answered comments and made changes according to them.

Thank you for every comment and notes/guidance.

Now to the eureka - am i the only one here who totally forgot about D16403/D19979?

We're working on bringing it back to life right now. But it takes time.

As I understand these patches could possibly solve all of the problems which are correlated to the scopes of the loops. Thanks in advance for your work! As I see it will take a long time to finish it but great to know it is still developed!

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
572

Could not come up with a more specific but short (at least not 1 km long) name. Do you might have a great idea for that?

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
55

Yes, there is a problem (thought about that) and I have to say here, that these are only initial patterns which must be improved to make the feature useful. Right now in the first patch the aim was only to have some patterns which can show the community that there are cases when it is worth to unroll. So - according to my plans - edge cases like this will be covered in a follow-up patch if it is alright.

127

Yepp, that is an important thing to work on. Evaluating these integer literals and even known integer variables will be the main part of the next patch on unrolling.

138

Sure! My idea was to use a MatchFinder but it provides me only analyzing the results via a callback function and it did not really fit into this model as I see. So if you have any other idea how to make these matchers more organized please let me know.

177

That is not exactly true that nobody reads it, the isUnrolledLoopBlock() function returns its value depending on this information. However, you (both) are right that it would be nice to use less memory and instead of storing we could look it up based on the stored LoopStmt (so I have rewritten it that way).

test/Analysis/loop-unrolling.cpp
36

As far as I see understand the Analyzer behavior it should warn since variable i is escaped (because the body of the function foo() is not known and passed it by ref). So the (symbolic) value i becomes unknown and the Analyzer inspects both possibilities when evaluating the condition. (Same below.)

However, in case simple_no_unroll3()the analyzer is able to model every step and track the value of i.

seaneveson added inline comments.Jun 22 2017, 3:03 AM
lib/StaticAnalyzer/Core/LoopUnrolling.cpp
55

OK, sounds good.

test/Analysis/loop-unrolling.cpp
36

Ah I didn't think of that. Maybe you could also explicitly check in the test that the loop is not unrolled?

foo(i);
if (i == 8)
  int c = 22 / (k - 42); // no-warning

Or something similar?

szepet added inline comments.Jun 22 2017, 4:43 AM
test/Analysis/loop-unrolling.cpp
36

It raises the same problem. Since in the second iteration the value of i will be unknown the analyzer will consider the case in which its value is 8. So that branch will be evaluated in the second iteration.
I don't really see a way to check this explicitly like that (maybe I miss something trivial and in that case please let me know!). The best I could think about was to check these implicitly through the statistics (which check can be found at the end of the file).

szepet updated this revision to Diff 103578.Jun 22 2017, 8:16 AM

Just ran clang-format on LoopUnrolling.cpp.

szepet updated this revision to Diff 104313.Jun 27 2017, 4:50 PM

Updated the check of the blocks of inline analyzed functions using StackFrameContext.

zaks.anna added inline comments.Jun 27 2017, 6:17 PM
lib/StaticAnalyzer/Core/ExprEngine.cpp
1519

Would this work if you have a loop with a known bound that calls a function, which itself has a loop? (I assume that the analyzer has the body for the function and it can be inlined.)

I am concerned that even though the nested loops are outlawed by the LoopBlockVisitor, they could sneak in indirectly through an inlined function.

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
155

Maybe add a comment saying "No support for nested loops right now."

szepet updated this revision to Diff 104636.Jun 29 2017, 7:00 AM
szepet marked an inline comment as done.

Bug with nested loops in inlined functions fixed.
Test cases added.

lib/StaticAnalyzer/Core/ExprEngine.cpp
1519

Yes, and that case was not handled. I came up with the most simple but a little bit hacky solution to that. Added tests for this case.

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
155

I could but I think it is not true, though. Exactly that line is the support for nested loops. I mean that in case of nested loops we only mark the outer loop as unrolled. I think this is the way how the feature should work. It is quite simple but in case of a complex inner loop we fall back to the conservative way in order to not simulate unnecessary things and avoid false positives.

NoQ added a comment.Jun 29 2017, 7:42 AM

So how much of this would be thrown away once we have scopes?

The matchers will remain.
isUnrolledLoopBlock() would be rewritten.
LoopBlockVisitor would most likely be replaced with ascending location contexts, which would also cover IPA-related tricks.
processCFGBlockEntrance is already reworked in the next patch.

So this patch is adding some matchers and some proof-of-concept code to make them work.
Sounds good to me, i think we can land this. I'd have a closer look at the matchers and then accept.

xazax.hun added inline comments.Jul 4 2017, 2:50 AM
include/clang/StaticAnalyzer/Core/PathSensitive/LoopUnrolling.h
25

I think the formatting is off in this file. Do you need all these includes in this header?

lib/StaticAnalyzer/Core/CMakeLists.txt
59

What do you use DynamicASTMatchers for? Are you sure you need that?

lib/StaticAnalyzer/Core/ExprEngine.cpp
1502

The block is terminated by a loop? I think this comment is confusing. Maybe you wanted to say something like the blocks of loops are registered based on the terminator statement?

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
55

What about unary operators like ++/--? I think if you do not consider this deliberately, you should add a comment why.

96

Do you cover aliases?

E.g.:

for(int i = 0; i < 10; ++i) {
  int &j = i;
  --j;
}
106

I can see some code duplication e.g. the unless part of the loops.

172

Do you use this member?

szepet updated this revision to Diff 105832.EditedJul 10 2017, 4:00 AM
szepet marked 7 inline comments as done.

Updated the patch according to the comments. Matchers updated.
Measured the number of matches of the pattern on various projects and it turns out that strict while/do-while patterns - which I'm looking for - are much less common than forStmts.
Detailed statistics:

Project# of ForLoop# of While/DoWhile Loop
Curl41911
Xerces1002
LibPNG600120
FFmpeg30550130

These are not the # of the different loops but the loops which was encountered by the analyzers and got unrolled. So I removed the while and do-while matchers from this initial patch.

Project# of ForLoop# of While/DoWhile Loop
Curl41911
Xerces1002
LibPNG600120
FFmpeg30550130

Just to be clear, both of the columns refer to the number of matches of your matchers right? So for loops that would not be unrolled are skipped.

Just to be clear, both of the columns refer to the number of matches of your matchers right? So for loops that would not be unrolled are skipped.

Yes.

NoQ added a comment.Jul 10 2017, 7:14 AM

Yeah, it seems that if the same loop is encountered on different paths, it gets counted twice, but if it's encountered multiple times on the same path, it's counted once. At least this is how the new NumTimesLoopUnrolled statistic seems to work.

lib/StaticAnalyzer/Core/ExprEngine.cpp
22

Is this still used?

lib/StaticAnalyzer/Core/LoopUnrolling.cpp
73

Hmm, does it make sense to handle escaping into brace initializers by reference?

73–77

NodeName unused.

test/Analysis/loop-unrolling.cpp
36

Would clang_analyzer_numTimesReached() be of any use?

szepet updated this revision to Diff 106414.Jul 13 2017, 5:36 AM
szepet marked 3 inline comments as done.

Updates based on comments.

szepet added inline comments.Jul 13 2017, 5:36 AM
lib/StaticAnalyzer/Core/LoopUnrolling.cpp
73

Yes, added the matcher and test case.

test/Analysis/loop-unrolling.cpp
36

That is exactly what I need for testing a feature like this. Unfortunately I could not really use it. Whenever I tried to call this in a function which is not the first defined function in the file then it just crashed. So right now I don't see if its a bug or a feature and I just misused it.

NoQ added inline comments.Jul 13 2017, 11:10 AM
test/Analysis/loop-unrolling.cpp
2

Hmm, just noticed this. Why does this require asserts?

36

I think we can definitely come up with a saner behavior =) If you post a crashing test, i'd gladly have a look.

szepet added inline comments.Jul 13 2017, 11:29 AM
test/Analysis/loop-unrolling.cpp
2

Since the statistics are enabled only if asserts are enabled. (At least that is how I think it is.)
http://llvm.org/doxygen/Statistic_8cpp_source.html#l0042 (Line42 says this and I think I already ran into that problem that I did not have any stat output since builded a release version.)

36

Just to make sure: to bugzilla or just leave one here?

NoQ added inline comments.Jul 13 2017, 11:36 AM
test/Analysis/loop-unrolling.cpp
2

Ok :0

36

Here's fine :)

szepet added inline comments.Jul 13 2017, 11:51 AM
test/Analysis/loop-unrolling.cpp
36

Alright! So the smallest examples I was able to came up are the following:

test1.cpp (works absolutely fine):

void clang_analyzer_numTimesReached();

void foo(){
 clang_analyzer_numTimesReached();
}

void bar(){
}

test2.cpp (crashes):

void clang_analyzer_numTimesReached();

void foo(){
}

void bar(){
 clang_analyzer_numTimesReached();
}

I have ran the following command:
clang -Xclang -analyze -Xclang -analyzer-checker=core,debug.ExprInspection -c test.cpp

(In more general I have experienced that it works iff it is used in the first defined function inside the TU.)

NoQ added inline comments.Jul 13 2017, 12:18 PM
test/Analysis/loop-unrolling.cpp
36

Ahaha, thanks! I guess you can just add ReachedStats.clear() at the end of ExprInspectionChecker::checkEndAnalysis().

It doesn't crash for the first function in the file because the first function in the file would be analyzed *last*, and the node pointer stored in that map would still be valid. But not for other functions.

szepet updated this revision to Diff 107294.Jul 19 2017, 6:12 AM

updated the test files

NoQ accepted this revision.Jul 19 2017, 10:18 AM

The new tests look great, thanks. I think it's time to land :) You do already have commit access, don't you?

This revision is now accepted and ready to land.Jul 19 2017, 10:18 AM
This revision was automatically updated to reflect the committed changes.
In D34260#814856, @NoQ wrote:

The new tests look great, thanks. I think it's time to land :) You do already have commit access, don't you?

Yes, and thank you for the fast review and the solution for the bug. Unfortunately I had to revert it since it failed on clang-ppc64be-linux. Crashed while running the tests.
I was not able to reproduce the bug, so is there any way to use the buildbot for test purposes?

szepet reopened this revision.Jul 20 2017, 8:02 AM
This revision is now accepted and ready to land.Jul 20 2017, 8:02 AM
szepet updated this revision to Diff 107608.EditedJul 20 2017, 4:55 PM

I suspect that the usage of the CFGStmtMap* caused the undefined behaviour since its lifetime was depending on its LocationContext.
In this version the isUnrolledLoopBlock creates the CFGStmtMap based on the CFG of the function which contains the unrolled loop. So the State stores FunctionDecl* instead of CFGStmtMap*.

Since clang_analyzer_numTimesReached() is a perfect tool to check the feature I removed the statistic checks.

Could you take a look?
(If you want to see a proper diff then skip Diff10. So compare Diff11 with Diff9.)

szepet updated this revision to Diff 107610.Jul 20 2017, 5:08 PM

Removed REQUIRE ASSERT from test file.

szepet updated this revision to Diff 107976.Jul 24 2017, 3:21 PM
szepet added a subscriber: cfe-commits.

Accidentally left typo removed.
OK so I am not sure but am I allowed to commit it again? I mean I made some notable changes. Not on the functionality of the feature but the stored data.
So should i wait for a quick review or could I commit it right now? (It is something that would be useful to know for future commits too.)

NoQ added a comment.Jul 25 2017, 1:27 AM

I suspect that the usage of the CFGStmtMap* caused the undefined behaviour since its lifetime was depending on its LocationContext.

Yeah, it lives in AnalysisDeclContextManager, which lives in AnalysisManager, which lives throughout a single analysis (top-level function) and then dies (when the next top-level function is picked that wasn't covered during inlining) to clean up the allocators.

However, you store it in the program state, right? And the new analysis has new program states, which don't contain references to the old CFGStmtMaps(?)

Also the way you construct CFGStmtMap manually sounds slow to me (it's done on every path), i guess the whole point of having it in AnalysisDeclContext was to avoid this.

So i don't think this is it, but i guess you should try speculatively relanding anyway, and add the extra run-line that'd show you the backtrace. We made changes, so the issue might have been wiped out accidentally (or maybe you're actually right!), and if it wasn't, at least we'd have the backtrace for the crash.

In D34260#819721, @NoQ wrote:

... and add the extra run-line that'd show you the backtrace. We made changes, so the issue might have been wiped out accidentally (or maybe you're actually right!), and if it wasn't, at least we'd have the backtrace for the crash.

Hmm I removed the statistics checks so I guess it should print the stacktrace in case of a crash anyway. Am I right? (The whole point of the statistics check was to count the number of loops unrolled but 'clang_analyzer_numTimesReached();' is much more powerful.

NoQ added a comment.Jul 25 2017, 3:07 AM

Yeah, right, good point =)

If you can't reproduce, you should try running a debug build through valgrind. It points out this issue:

==29522== Invalid read of size 4
==29522==    at 0x16EBED0: clang::LocationContext::getCurrentStackFrame() const (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x1686BA9: clang::ento::isUnrolledLoopBlock(clang::CFGBlock const*, clang::ento::ExplodedNode*, clang::ento::AnalysisManager&) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x164F009: clang::ento::ExprEngine::processCFGBlockEntrance(clang::BlockEdge const&, clang::ento::NodeBuilderWithSinks&, clang::ento::ExplodedNode*) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x162A0B7: clang::ento::CoreEngine::HandleBlockEdge(clang::BlockEdge const&, clang::ento::ExplodedNode*) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x162A455: clang::ento::CoreEngine::dispatchWorkItem(clang::ento::ExplodedNode*, clang::ProgramPoint, clang::ento::WorkListUnit const&) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x162A782: clang::ento::CoreEngine::ExecuteWorkList(clang::LocationContext const*, unsigned int, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0xC6995E: (anonymous namespace)::AnalysisConsumer::ActionExprEngine(clang::Decl*, bool, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*> >*) [clone .part.4645] (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0xC6A1C9: (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*> >*) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0xC7577B: (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) [clone .part.4696] (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0xC7EB19: clang::ParseAST(clang::Sema&, bool, bool) (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0xA2CD65: clang::FrontendAction::Execute() (in /opt/clang/build/bin/clang-6.0)
==29522==    by 0x9F471D: clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (in /opt/clang/build/bin/clang-6.0)
==29522==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

Try fixing this invalid read and the buildbots (and my builds :) ) should be working again.

Try fixing this invalid read and the buildbots (and my builds :) ) should be working again.

Yeah, thanks for the help! :)
I already sent a fix for that: https://reviews.llvm.org/rL309036 and another one: https://reviews.llvm.org/rL309061 which I think was not necessary but the buildbot failed on the first try (I still think it just haven't picked up the commit)
For me it seems that these fixes solved the problem as the address sanitizer tests passes as well on clang (and nobody reverted the commits^^).

szepet closed this revision.Jul 26 2017, 3:19 AM

So the fixes seem to work. The problem was the line 'StackFrame = StackFrame->getParent()->getCurrentStackFrame();' since I havent checked if the parent of the StackFrame is nullptr. (It is quite interesting that it just not crashed on my computer.)

NoQ added a comment.Jul 26 2017, 10:13 AM

Cool, i totally forgot that we have valgrind! :)

MTC added a subscriber: MTC.Mar 22 2018, 5:08 AM
MTC added inline comments.
lib/StaticAnalyzer/Core/LoopUnrolling.cpp
107

Hi Peter, sorry to bother you!

I have some confusion here. Whether hasRHS(integerLiteral()) needs to added with ignoringParenImpCasts?

The given code below does not unroll the loop because i = 0; has a ImplicitCastExpr for 0. But if I change unsigned i = 0 to int i = 0, loop unrolling will happen. So I don't know if this is intentional.

int func() {
  int sum = 0;
  unsigned i;
  for (i = 0; i < 10; ++i) {
    sum++;
  }
  return sum;
}

Hope you can help me, thank you!