This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Memoize complexity of SymExpr
ClosedPublic

Authored by mikhail.ramalho on Jul 12 2018, 6:07 AM.

Details

Summary

This patch introduces a new member to SymExpr, which stores the symbol complexity, avoiding recalculating it every time computeComplexity() is called.

Also, increase the complexity of conjured Symbols by one, so it's clear that it has a greater complexity than its underlying symbols.

Diff Detail

Repository
rC Clang

Event Timeline

NoQ added inline comments.Jul 12 2018, 10:07 AM
include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
97–100

I'd much rather return 1 here. There's nothing to iterate through in SymbolConjured or generally in SymbolData, those are pretty much atomic.

include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
97–100

Right, should I move it to SymbolData then?

Don't iterate though SymbolData's symbol as they're atomic.

NoQ added inline comments.Jul 16 2018, 8:13 AM
lib/StaticAnalyzer/Core/SymbolManager.cpp
161–162

Why not just sum up the complexity of direct sub-symbols? That'd be a bit more code (you'll need to override the function for SymIntExpr, IntSymExpr, SymSymExpr, SymbolCast), but it sounds a lot more efficient.

lib/StaticAnalyzer/Core/SymbolManager.cpp
161–162

Sure, I'll move this code to SymbolCast::computeComplexity() and BinarySymExpr::computeComplexity(), and check if it's faster.

NoQ added inline comments.Jul 16 2018, 9:10 AM
lib/StaticAnalyzer/Core/SymbolManager.cpp
161–162

Like, i mean, recursion instead of iterators.

Update how to calculate complexity.

Is that what you suggested, @NoQ?

This version seems to be 1% faster compared to the other one.

NoQ added a comment.Jul 16 2018, 1:17 PM
unsigned SymSymExpr::computeComplexity() {
  if (Complexity == 0)
    Complexity = LHS->computeComplexity() + RHS->computeComplexity();
  return Complexity;
}

etc.

Because symbol_iterator iterates over all sub-symbols, while we're only interested in direct sub-symbols as recursion takes care of the rest.

Updated complexity calculation by getting the complexity of sub-symbols + 1 (for the current wrapper).

NoQ added a comment.Jul 17 2018, 10:46 AM

Yeah, that's what i expected to work. Now the trick, of course, would be to avoid infinite recursion :P

include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
435

I don't thing this "1" thing is necessary in this specific case. We'll anyway have a bigger complexity than that of each operand.

include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
435

But the idea is that wrappers always increase the complexity by 1, right? That's why I left it there.

NoQ added inline comments.Jul 17 2018, 10:52 AM
include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
435

That's not much of an idea, all we need is to make sure that bigger symbols have larger complexity, so that we couldn't construct arbitrarily long symbolic expressions of complexity 1, which doesn't require any extra effort here.

george.karpenkov added inline comments.
include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
435

i'm happy either way, both approaches make sense.

This revision is now accepted and ready to land.Jul 17 2018, 11:07 AM

I just tried without the +1 and there's a regression:

error: 'warning' diagnostics expected but not seen: 
  File /home/mramalho/llvm/tools/clang/test/Analysis/taint-generic.c Line 268: Division by a tainted value, possibly zero
1 error generated.

which is strange, as removing the +1 actually decreases the complexity so it's not like the expression is being discarded.

NoQ added a comment.Jul 17 2018, 11:18 AM

Curious. Might there be an overflow?

@NoQ I think the analyzer would hang if we construct an expression with a complexity of 2^32

The complexity values are still quite small and, without the +1, a lot more expressions are below the 25 threshold.

NoQ added a comment.Jul 17 2018, 4:10 PM

The complexity values are still quite small and, without the +1, a lot more expressions are below the 25 threshold.

Yeah, i guess it's just about yielding the exact same values as the old algorithm. The regression disappears when i bump the limit from 25 to 33.

I'd suggest to remove the +1 and speculatively bump the limit to, say, 35, without spending much time evaluating the exact number; would that be scary?

In D49232#1165864, @NoQ wrote:

The complexity values are still quite small and, without the +1, a lot more expressions are below the 25 threshold.

Yeah, i guess it's just about yielding the exact same values as the old algorithm. The regression disappears when i bump the limit from 25 to 33.

I'd suggest to remove the +1 and speculatively bump the limit to, say, 35, without spending much time evaluating the exact number; would that be scary?

Sounds good to me.

Bumped the max symbol complexity threshold to 35.

NoQ added inline comments.Jul 18 2018, 2:20 PM
include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
273

Before i forget: these also need an override to satisfy our -Werror buildbots.

NoQ accepted this revision.Jul 18 2018, 2:20 PM

LGTM otherwise!

include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
273

Oh, thanks! I missed those.

Added missing overrides.

This revision was automatically updated to reflect the committed changes.