This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Reject incomplete matches in CallEvent::isCalled(), also introduce wildcards
AbandonedPublic

Authored by steakhal on Oct 14 2021, 5:12 AM.

Details

Summary

Previously, if all the parts of the CallDescription were matched, we
accepted the Call.
On the other hand, it would make sense to check if the Call has any
unmatched qualified name parts, that are not inline or anonymous
namespaces.

Consequently, now we reject the Call: my::std::container::data() when
matching against CallDescription: { "std", "container", "data" }.

To get all the tests to work, I needed to introduce wildcard matching.
The wildcard is the "?" string literal.

Using wildcards, now it's possible to match llvm::MyClass::getAs() by
the { "llvm", "?", "getAs" } CallDescription.
Check the related unittest for more examples.

About the semantics of the 'wildcard':
The qualified name parts are matched as usual:
If we mismatch, and the current part was an inline/anonymous namespace,
we skip this and continue by trying to match the rest of the segments.
Else, if we mismatched AND we are not trying to match against a wildcard,
we reject.
Otherwise, we either matched OR consumed a wildcard, so advance to the
next segment.

So, a wildcard can only match against a single name part. They will
match greedily except when it's an inline/anonymous namespace, which
won't consume the wildcard.

Diff Detail

Event Timeline

steakhal created this revision.Oct 14 2021, 5:12 AM

I think it would be really useful to copy the description and the usage of the "wildcard" you provide in the summary just above the definition of the CallDescription class or above the constructor.

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
555

Perhaps we could have at least one test when the DeclContext is a CXXRecordDecl (and not a NamespaceDecl).

steakhal updated this revision to Diff 379780.Oct 14 2021, 11:10 AM
steakhal edited the summary of this revision. (Show Details)
steakhal marked an inline comment as done.Oct 14 2021, 11:15 AM

I think it would be really useful to copy the description and the usage of the "wildcard" you provide in the summary just above the definition of the CallDescription class or above the constructor.

Huh, it seems like I missed nameless structs as well. Check the WildcardNamelessStructs test case, it's sick.

Updates:

  • rebased
  • IsInlineOrAnonymousNamespace() -> SkippableNameless(), which detects inline/anonymous namespaces, nameless structs.
  • added doc comments for wildcards
  • added assertion that the function name cannot be a wildcard.
  • added FIXME stating that we will use function decl for comparisons
clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
555

Sure!

steakhal marked an inline comment as done.
martong added inline comments.Oct 14 2021, 12:25 PM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
350

IsSkippable ? Being unnamed (the term nameless is not used) is just a detail, who knows, maybe in the future we want to have an exclusion for exotic but unnamed AST nodes.

355–359

Actually, there are two 'terms' here: unnamed and anonymous structs and they are different:

struct A { // unnamed structs
  struct { struct A *next; } entry0;
  struct { struct A *next; } entry1;
};

struct X { struct { int a; }; struct { int b; }; }; // anonymous structs

Why does it matter? Because I am not sure that e.g. FieldDecl::isAnonymousStructOrUnion() returns true for an unnamed struct, my guess it does not! (I have some vague memory that I've spent a few hours b/c of this when I worked with the ASTImporter).
Although, I think your code handles both cases, an additional test case would be beneficial. And unfortunately, there could be other anonymous members: unions.

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
648–650

Yes, I think they would deserve another test case.

martong added inline comments.Oct 14 2021, 12:36 PM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
355–359

...I am not sure that e.g. FieldDecl::isAnonymousStructOrUnion() returns true for an unnamed struct, my guess it does not!

Ahh, this is not a problem. Becuase it is impossible to call isAnonymousStructOrUnion on an unnamed struct because by definition there's no FieldDecl with that struct.
Still, I hope the rest of my previous comment makes sense.

steakhal updated this revision to Diff 379937.Oct 15 2021, 1:55 AM
steakhal marked 2 inline comments as done.

I think it would be really useful to copy the description and the usage of the "wildcard" you provide in the summary just above the definition of the CallDescription class or above the constructor.

Definitely! I'll update accordingly.


Changes:

  • SkippableNameless -> IsSkippable
  • add test covering unnamed unions
  • add comment why we don't test anonymous structs/unions
  • simplify the rejection loop
  • add const to a local variable
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
350

Okay, that should be enough.

355–359

Actually, there are two 'terms' here: unnamed and anonymous structs and they are different

Yes, exactly. This is why I decided calling the lambda as unnamed instead of anonymous xD

I have some vague memory that I've spent a few hours

Me too :D

Ahh, this is not a problem. Becuase it is impossible to call isAnonymousStructOrUnion on an unnamed struct because by definition there's no FieldDecl with that struct.

Exactly. I should have left a comment there about this!
You probably meant RecordDecl::isAnonymousStructOrUnion(), and it can actually contain FieldDecls, but no FunctionDecls.
I'm gonna also add a test for unnamed unions.

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
648–650

One can not define an anonymous struct/union containing a method that we could match for.

Nice extension. Maybe it's worth to think about using regex in the future?

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
69–71

As I can see this is unrelated changes. And IMO you should move this in the following patch with corresponding test coverage.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
359

As I can see, implementation of getAsIdentifierInfo never returns nullptr. So we could get by with single Name.isIdentifier() here.

378–391

Nice :) I'm in favor of a such type of self-readable code (brought out const bools) instead of comments around.
The only suggestion here is to enclose the conditions in parentheses.

ASDenysPetrov added inline comments.Nov 10 2021, 9:41 AM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
359

As I can see, implementation of getAsIdentifierInfo never returns nullptr. So we could get by with single Name.isIdentifier() here.

An amendment: getAsIdentifierInfo already has isIdentifier inside. So we can just use getAsIdentifierInfo only.

steakhal marked 3 inline comments as done.Nov 10 2021, 9:53 AM

Nice extension. Maybe it's worth to think about using regex in the future?

We could, but we would definitely pay the price. So, I'm not really a fan of doing that.

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
69–71

This is a strongly related hunk. The cast-value-logic.cpp:145 would fail otherwise.
The clang::Shape::getAs<Circle>() function would not be matched without this hunk.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
359

The CallDescription.WildcardUnnamedStructs unittest demonstrates exactly this.
In that case, even though Name.isIdentifier() is true, Name.getAsIdentifierInfo() returns nullptr.
To signify this fact, I'm using an explicit check against nullptr and the comment tries to explain the scenario when it might happen.

378–391

I think it's commonly known that the negation has precedence over the conjunction operator. So, I would insist.

We could, but we would definitely pay the price. So, I'm not really a fan of doing that.

That's fair. Performance of regex really is a weak place.

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
69–71

OK, I see.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
359

getAsIdentifierInfo never returns nullptr

My fault. I was wrong. It can return nullptr. Skip this comments :).

378–391

Oh, sorry, my obscure comment. I mean here:

const bool Mismatch =
          (cast<NamedDecl>(Ctx)->getName() != *QualifierPartsIt);
const bool MatchingAgainstWildcard = (*QualifierPartsIt == Wildcard);
steakhal marked 2 inline comments as done.Nov 10 2021, 10:58 AM
steakhal added inline comments.
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
378–391

I see. Makes sense.

steakhal updated this revision to Diff 386253.Nov 10 2021, 11:04 AM
steakhal marked an inline comment as done.

add parens


I also plan to update the title and the summary of this patch to emphasize that the major change is about wildcards.
Even though originally purposed as titled.

steakhal marked 3 inline comments as done.Nov 10 2021, 11:05 AM

@steakhal for the changes.
Please, add to summary several examples of the feature usage.

ASDenysPetrov accepted this revision.Nov 11 2021, 1:23 AM
This revision is now accepted and ready to land.Nov 11 2021, 1:23 AM

@steakhal for the changes.
Please, add to summary several examples of the feature usage.

Sure, I will!

I would encourage the rest of the community to have a look at this. I'm really curious about your opinions!
@xazax.hun @NoQ

xazax.hun requested changes to this revision.Nov 12 2021, 9:09 AM

I like the overall idea of being strict by default and use wild cards to get the previous behavior. I am not convinced, however, about the current matching strategy.

Previously, we could skip arbitrary number of namespaces. As far as I understand, the proposed wild card will match exactly one namespace. While using the new method we can express that there is a namespace that could be anything, we cannot express that there can be any number of namespaces that could be anything.

I think people would expect {"foo", "*", "bar"} to match foo:skipped1:skipped2:bar. We could introduce a separate symbol like ? if we want to support matching a single namespace.
So I think if we want to retain the features we had before this patch we might want to change the matching logic.

But let me know if I misunderstood something.

This revision now requires changes to proceed.Nov 12 2021, 9:09 AM

Alternatively, if we do not want the wildcard to match multiple names, I'd suggest using "?" which is more conventional to match a single item.

I like the overall idea of being strict by default and use wild cards to get the previous behavior. I am not convinced, however, about the current matching strategy.

Previously, we could skip arbitrary number of namespaces. As far as I understand, the proposed wild card will match exactly one namespace. While using the new method we can express that there is a namespace that could be anything, we cannot express that there can be any number of namespaces that could be anything.

It's true. But it turns out no checker explored this fact - according to the tests.
Do you have any use-case actually in the wild, that we would regress on by having this restriction?

I think people would expect {"foo", "*", "bar"} to match foo:skipped1:skipped2:bar. We could introduce a separate symbol like ? if we want to support matching a single namespace.

I think the ? is a great idea. We shall see if it won't conflict with some ObjC madness.

So I think if we want to retain the features we had before this patch we might want to change the matching logic.

I wouldn't mind, but it would make the matching even more difficult, potentially backtracking (?) which I'm not really a fan of.

I wouldn't mind, but it would make the matching even more difficult, potentially backtracking (?) which I'm not really a fan of.

I would be surprised if it would need backtracking.

balazske added inline comments.Nov 15 2021, 8:13 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
1270 ↗(On Diff #386253)

This documentation is not the most easy to understand, probably because meaning of "skip" in this and next context seems to be different. Better: like inline namespaces are ignored, and the "*" matches one non-inline namespace (if this is correct).

namespace X { inline namespace X { void f(); } }

Here {"X", "X", "f"} will match or not? And {"X", "*", "f"}, and {"X", "f"}?

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
350

I really do not like the "lambda inside lambda" constructs.

steakhal updated this revision to Diff 387340.Nov 15 2021, 11:42 AM
steakhal edited the summary of this revision. (Show Details)
  • Rebased, clang-format
  • Changed wildcard sign: "*" -> "?"
  • Added the InlineNamespaces test
  • Elaborate/rephrase the doc comment about the semantics of the wildcards
xazax.hun accepted this revision.Nov 15 2021, 11:43 AM
This revision is now accepted and ready to land.Nov 15 2021, 11:43 AM

Thanks @balazske, your point is really concerning. I'll need to think about it how to proceed.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
1270 ↗(On Diff #386253)

I see. This is an issue.
Inline namespaces will be consumed only if their name matches the expected qualifier name part. Otherwise, they will be skipped and the matching continues with the same expected qualifier name part.
You will have a match for {"X", "X", "f"}, but won't with {"X", "*", "f"} - which is really counter intuitive :(
Thanks for highlighting this.
You would still have a match for {"X", "f"} and {"*", "f"} though. Check the InlineNamespaces test.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
350

Let's land D113590 first. After that, I will probably refactor this patch to use static functions instead.

Let's reach a wider consensus about this. I'm really worried.

xazax.hun requested changes to this revision.Nov 15 2021, 12:19 PM
xazax.hun added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
1270 ↗(On Diff #386253)

Whooops, I missed this one. Yeah, let's make this more intuitive before merging.

This revision now requires changes to proceed.Nov 15 2021, 12:19 PM
balazske added inline comments.Nov 16 2021, 12:31 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
1270 ↗(On Diff #386253)

In a code like above a call to X::f() (from not in namespace) would be ambiguous, but X::X::f() is OK. So the allowed match should be for {"X", "X", "f"}, {"X", "?", "f"}, {"?", "X", "f"}, but not {"X", "f"} or {"?", "f"}. But if the second namespace is Y instead of X the {"X", "f"} could match.

steakhal abandoned this revision.Jan 17 2022, 6:14 AM

I can't see how one could implement the ? and * placeholders without backtracking. Backtracking is not necessarily bad, but I think it's less than ideal. I haven't done a POC implementation nor any measurements.
So, I'm abandoning this patch.