This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] CastValueChecker: Model casts
ClosedPublic

Authored by Charusso on Jul 8 2019, 3:39 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

Charusso created this revision.Jul 8 2019, 3:39 PM
Charusso marked an inline comment as done.Jul 8 2019, 3:43 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
175 ↗(On Diff #208525)

This CXXMemberCallExpr business is necessary? I am not sure as I have not seen it widely used.

NoQ added inline comments.Jul 8 2019, 4:40 PM
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
25–27 ↗(On Diff #208525)

Kinda nice, but a simple pointer-to-member would have been sufficient and much more lightweight:

using CastCheck = void (CastValueChecker::*)(const CallExpr *,
                         DefinedOrUnknownSVal, CheckerContext &);

(not sure, i don't fully understand the performance implications of using std::function)

40–43 ↗(On Diff #208525)

Please add an argument count check to your CallDescriptions.

Otherwise accessing argument 0 may result in a crash when we're analyzing code that isn't LLVM but defines a function with the same name which also has no arguments.

That's a fairly common source of crashes.

60–63 ↗(On Diff #208525)

The assume operation is expensive. Let's avoid doing it twice:

State = State->assume(ParamDV, true);
if (!State)
  return;
67–68 ↗(On Diff #208525)

Let's do ...getType()->getPointeeCXXRecordDecl()->getNameAsString(). It's shorter and more on point. (might need to check for null decl though).

75 ↗(On Diff #208525)

This statement needs a predicate.

I suggest: Assuming dynamic cast from 'A' to 'B' succeeds.

78 ↗(On Diff #208525)

true! We don't want a note about every random cast to bring in dozens of nested stack frames.

92 ↗(On Diff #208525)

Just drop Invalidate. Situations where you actually notice the difference are extremely rare. In this case there's nothing to invalidate because there's no existing value for the call-expression.

100 ↗(On Diff #208525)

Assuming dynamic cast from 'A' to 'B' fails.

Later when we add support for dynamic type tracking, we may as well add a "Knowing..." piece :)

175 ↗(On Diff #208525)

It definitely needs different handling because this cannot ever be null.

If you haven't seen it widely used, it's probably not necessary; i too rarely see those.

177 ↗(On Diff #208525)

Call.getArgSVal(0).

clang/test/Analysis/cast-value.cpp
34 ↗(On Diff #208525)

debug.ExprInspection is your friend!

I'd also call for making the tests more isolated. In this test you're hardcoding a fairly arbitrary and random execution path that the analyzer chose through your function. It's generally interesting what paths do we explore first, but that's not what the patch is about.

Here's an example of a test that i would have written:

void foo(Shape *S) {
  Circle *C = dyn_cast_or_null<Circle>(S);
  clang_analyzer_numTimesReached(); // expected-warning{{3}}
  if (S && C)
    clang_analyzer_eval(C == S); // expected-warning{{TRUE}}
  if (!S)
    clang_analyzer_eval(!C); // expected-warning{{TRUE}}
  if (S && !C)
    clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
}

(you might want to test notes separately)

NoQ added a comment.Jul 8 2019, 4:41 PM

The overall code looks great! It's well-structured and easy to understand, thanks!

NoQ added inline comments.Jul 8 2019, 4:43 PM
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
124 ↗(On Diff #208525)

Assuming null pointer is passed into cast o.o

Charusso updated this revision to Diff 208573.Jul 8 2019, 7:15 PM
Charusso marked 12 inline comments as done.
  • Fix
  • New getNoteTag() which accepts a plain note.

Thanks! My mind was really set to actually model these with classof(), whoops.

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
25–27 ↗(On Diff #208525)

It took me an hour to realize I have to push the class as the signature of the function to create a member-callback. I think it is not that lightweight, but it is modern. I believe it is made for that purpose when you could write Foo[13] or *(13 + P), where the latter creeps me out.

Thanks for the copy-pasteable code, I really enjoy to see what do you think exactly, but this is the first time when I would pick my idea. (I would had recommended that modern way in your patch just I thought it is not working with methods.)

clang/test/Analysis/cast-value.cpp
34 ↗(On Diff #208525)

Great idea, thanks!

Charusso updated this revision to Diff 208574.Jul 8 2019, 7:22 PM
  • Fix a typo.
NoQ added a subscriber: rnkovacs.Jul 8 2019, 7:54 PM

We should add at least some tests for the other three functions.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
278 ↗(On Diff #208574)

"Model implementation of custom RTTIs."

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
250 ↗(On Diff #208574)

Great! I recognize the need for such function now.

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
58–59 ↗(On Diff #208574)

I suspect that this may crash if CD is an anonymous structure. So you should use getNameAsString() instead.

61 ↗(On Diff #208574)

Use after free! QualType::getAsString() returns a temporary std::string. You're returning a StringRef that outlives the string it refers to. The solution is usually to return the std::string by value.

*summons @rnkovacs*

Generally, i'd rather bail out on this branch. If we're seeing a dyn_cast of something that isn't a class, we're already doing something wrong.

25–27 ↗(On Diff #208525)

Ok then!

clang/test/Analysis/cast-value.cpp
3 ↗(On Diff #208574)

Nice!

I guess putting them into separate files would have been easier this time.

Charusso updated this revision to Diff 208586.Jul 8 2019, 8:37 PM
Charusso marked 9 inline comments as done.
  • Fix.
  • More tests.
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
61 ↗(On Diff #208574)

Well, at least it does not crash. Thanks! I like that general return which we could evolve further. It is the same story like Assuming....

Charusso updated this revision to Diff 208588.Jul 8 2019, 8:49 PM
  • Simplify the new getNoteTag().
Szelethus added inline comments.Jul 9 2019, 1:53 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
279 ↗(On Diff #208588)

This checker only does modeling, but isn't hidden. Should we hide it?

NoQ added inline comments.Jul 9 2019, 11:21 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
101–102 ↗(On Diff #208588)

I believe we want to put this checker as well as the return value checker into apiModeling.llvm.

This doesn't prevent the checkers from being generally useful for other packages: if we want to add support for other APIs, we'll simply reserve a different checker name for the same checker object (with internal flags to enable/disable particular API groups).

279 ↗(On Diff #208588)

Dunno :)

I think we should hide the checkers that model core language functionality (including standard library functionality, as defining your own stuff in namespace std has undefined behavior in most cases, at least in c++) but we shouldn't hide checkers for specific user-made APIs because people may want to disable them simply because they have their own conflicting API.

Also, btw, the whole apiModeling package is currently hidden.

clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
1 ↗(On Diff #208588)

Let's mention RTTI here as well. Otherwise it sounds weird, as all actual casts are applied by ExprEngine.

9 ↗(On Diff #208588)

And here.

61 ↗(On Diff #208574)

No, it's not the same story. It's not "some valid situation that we don't support yet". It's "a normally impossible situation that indicates that we should not try to model this function because there's no reason for us to believe we know what it does".

(yay, i understood what you mean without a reference; that was hard^^)

Charusso updated this revision to Diff 208824.Jul 9 2019, 3:12 PM
Charusso marked 6 inline comments as done.
  • Move to apiModeling.llvm.
  • Prevent unknown casts.
  • Refactor.
NoQ added inline comments.Jul 9 2019, 3:16 PM
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
58 ↗(On Diff #208824)

You should still check if Cast->getType()->getPointeeCXXRecordDecl() returns null, otherwise you can crash.

But when it's null, you should immediately return false from your evalCall.

Charusso marked 2 inline comments as done.Jul 9 2019, 3:19 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Checkers/CastValueChecker.cpp
172 ↗(On Diff #208824)

If we cannot obtain any of the CXXRecordDecls then we bail out. You are right about that feature, and I like it.

NoQ accepted this revision.Jul 9 2019, 3:23 PM

Whoops. I underestimated you (:

Ok, anyway, this was my last comment. Great job! Remember to make sure that it works on the actual LLVM, not only on tests; mocked-up test headers are very easy to get wrong.

This revision is now accepted and ready to land.Jul 9 2019, 3:23 PM
In D64374#1577235, @NoQ wrote:

Whoops. I underestimated you (:

Ok, anyway, this was my last comment. Great job! Remember to make sure that it works on the actual LLVM, not only on tests; mocked-up test headers are very easy to get wrong.

The only crash-able code was that:

#0 Calling llvm::unique_dyn_cast

where:

ProgramStateRef ExprEngine::createTemporaryRegionIfNeeded():
Assertion `!InitValWithAdjustments.getAs<Loc>() || Loc::isLocType(Result->getType()) || Result->getType()->isMemberPointerType()'

because of cast<>:

template <class X, class Y>
LLVM_NODISCARD inline auto unique_dyn_cast(std::unique_ptr<Y> &Val)
    -> decltype(cast<X>(Val)) {
  if (!isa<X>(Val))
    return nullptr;
  return cast<X>(std::move(Val));
}

(source: https://llvm.org/doxygen/Casting_8h_source.html)

So we bypass every error with that CXXRecordDecl checking from now. Would we like to model that?

Charusso updated this revision to Diff 208831.Jul 9 2019, 3:42 PM
  • Add the forgotten llvm namespace to the CallDescription.
NoQ added a comment.Jul 9 2019, 3:44 PM
  • Add the forgotten llvm namespace to the CallDescription.

Nice!

The only crash-able code was that:

#0 Calling llvm::unique_dyn_cast

Mmm. What? Can you provide more info, eg. the full backtrace?

Charusso added a comment.EditedJul 9 2019, 3:53 PM
In D64374#1577266, @NoQ wrote:

Can you provide more info, eg. the full backtrace?

Well, unique_dyn_cast<> and unique_dyn_cast_or_null<> is used like 20 times in the LLVM codebase, whoops. We want to model it.

Full info: Edit: removed due it breaks the entire page.

NoQ added a comment.Jul 9 2019, 4:04 PM

Mmm. Ok, am i understanding correctly that this was crashing because you bound a 0 (Loc) to a prvalue expression of type unique_ptr? Yeah, right, don't do that :)

used like 20 times in the LLVM codebase

That doesn't sound like it's high on our list. That's not too many times. Probably very few false positives come out of it. Also modeling smart pointers is much harder than modeling regular pointers. We could add some modeling into SmartPtrChecker, but that'd require actually developing that checker :)

Thanks for the review!

In D64374#1577279, @NoQ wrote:

That doesn't sound like it's high on our list. That's not too many times. Probably very few false positives come out of it. Also modeling smart pointers is much harder than modeling regular pointers.

Hm, because it is smart, probably it is out of our scope. Thanks for pointing it out!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 4:34 PM
Charusso updated this revision to Diff 208847.Jul 9 2019, 5:15 PM
  • Add the llvm namespace to the test file.

I guess getAs and castAs methods could join the party too. I'm evaluating plenty of results on LLVM, and those seem to be in the same problem category.

Do you have plans to extend this checker with the modeling of isa<>? I might take that off your shoulder if not :)

Charusso added a comment.EditedAug 7 2019, 9:05 AM

I guess getAs and castAs methods could join the party too. I'm evaluating plenty of results on LLVM, and those seem to be in the same problem category.

Do you have plans to extend this checker with the modeling of isa<>? I might take that off your shoulder if not :)

Hey! Sorry for the late response, just it is a heavy change. This patch creates about 200 TP, I have checked out every of the new reports.
D65889 creates about 400 TP as an estimation, then isa() is the next goal. Thanks you for the ideas!