This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Track trivial copy/move constructors and initializer lists in the BugReporter
ClosedPublic

Authored by isuckatcs on Aug 5 2022, 8:07 AM.

Details

Summary

1.) Constructors

Prior to this patch, if an object had a trivial copy/move constructor we couldn't track
a value further than the copy constructor invocation. The problem is that in such cases
the analyzer doesn't inline the constructor, but performs a trivial copy instead. A trivial
copy means, we just copy the values from one place to another.

This patch handles this case by matching the regions of the 2 objects involved in the
construction, and tracks the appropriate region.

2.) Initializer lists

Prior to this patch we ignored initializer lists.

This patch handles them up to some extent.

Diff Detail

Event Timeline

isuckatcs created this revision.Aug 5 2022, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 8:07 AM
isuckatcs requested review of this revision.Aug 5 2022, 8:07 AM
NoQ added a comment.Aug 5 2022, 11:35 AM

Great! Sounds like the conspiracy to not work with constructors goes deeper than I thought.

The new behavior looks lovely, I heavily recommend adding these examples as -analyzer-output=text tests.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2434

Hmmm so you're tracking *all* arguments. I'm worried that this will cause you to track values that aren't necessarily relevant to the report.

These notes are difficult to judge because it's usually important to maintain balance, too little information and the report becomes incomprehensible, too much information and people won't be able to navigate it. So we're only tracking things that are directly relevant (and we're trying our best to identify them).

For example, in this slightly more complicated case

int foo() {
    int *x = nullptr, *y = nullptr;
    S s{x, y};
    auto [a, b] = s;

    return *b;
}

we should ideally track the null pointer back to declaration of y but not x. A good long-term solution for that would probably be to track the individual *field* inside s (until we reach the field's initializer). I'm not sure how good are we at that right now, but I also don't immediately see any shortcuts that we could take here hmmm.

isuckatcs updated this revision to Diff 450635.Aug 7 2022, 9:15 AM
isuckatcs marked an inline comment as done.
  • Addressed a comment
  • Handling initializer lists.

I will add tests later.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2434

What I can do is finding the field the value belongs to and extract it's initializer expression. Then mark that as the initializer instead of the constructor.

The problem we face here is that the bug reporter checks the statements inside the exploded nodes. If the field initializer is not in the egraph, the bug reporter won't be able to track it.

In the snippet above ExprEngine::performTrivialCopy() is called, so the copy constructor is not inlined, hence the field initializer expressions are not in the CFG.

As a result if the constructor is not inlined, there's no way to tell which argument to track. It's either all or none. Also initializer lists aren't handled either, so in case of S s{x, y} neither x nor y would be tracked.

isuckatcs added inline comments.Aug 7 2022, 9:16 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2434

expressions are not in the CFG.

I meant egraph here instead of CFG.

NoQ added inline comments.Aug 8 2022, 12:46 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2434

Aha interesting.

Every time BugReporter doesn't have enough information to reverse-engineer what ExprEngine has done, ExprEngine can leave breadcrumbs to help BugReporter figure out what happened. One mechanism we have for that is ProgramPointTag. You can either add a simple tag that BugReporter can pattern-match, or the modern solution is NoteTag that can replace entire bug visitors (but it's probably hard to integrate with an existing visitor).

Then again, performTrivialCopy() isn't the typical case. If we handle the typical case (i.e., add a non-trivial copy constructor to the test and handle that), that'd already be an improvement.

isuckatcs added inline comments.Aug 8 2022, 4:32 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2434

If we handle the typical case (i.e., add a non-trivial copy constructor to the test and handle that), that'd already be an improvement.

I'm handling one element copy constructors and initializer list, so for some simple snippets there are improvements.

E.g.:

struct S {
    int *a, *b;

    S(int *_a, int *_b) : a{_a}, b{_b} {}
};

void foo() {
    int *x = nullptr, *y = nullptr;

    S s{x, y};

    int i = *s.a;
}

The output before the patch:

test.cpp:12:7: note: Calling constructor for 'S'
    S s{x, y};
      ^~~~~~~
test.cpp:5:27: note: Null pointer value stored to 's.a'
    S(int *_a, int *_b) : a{_a}, b{_b} {}
                          ^
test.cpp:12:7: note: Returning from constructor for 'S'
    S s{x, y};
      ^~~~~~~
test.cpp:14:13: note: Dereference of null pointer (loaded from field 'a')
    int i = *s.a;
            ^  ~

The output after it:

test.cpp:10:10: note: 'x' initialized to a null pointer value
    int *x = nullptr, *y = nullptr;
         ^
test.cpp:12:9: note: Passing null pointer value via 1st parameter '_a'
    S s{x, y};
        ^
test.cpp:12:7: note: Calling constructor for 'S'
    S s{x, y};
      ^~~~~~~
test.cpp:5:27: note: Null pointer value stored to 's.a'
    S(int *_a, int *_b) : a{_a}, b{_b} {}
                          ^
test.cpp:12:7: note: Returning from constructor for 'S'
    S s{x, y};
      ^~~~~~~
test.cpp:14:13: note: Dereference of null pointer (loaded from field 'a')
    int i = *s.a;
            ^  ~

We can reach x because _a is tracked in the initializer list at a{_a}.

I think one solution would be to store somewhere where we started to track an expression and
where we ended it. Later when we encounter an initializer list or a constructor we can match the
parameters or the field initializers with the already tracked expressions and if we find a match
we know which value to track.

isuckatcs updated this revision to Diff 451476.Aug 10 2022, 8:07 AM

New strategy for implicit copy/move ctor tracking.

Initializer lists are still not handled.

isuckatcs updated this revision to Diff 452823.Aug 15 2022, 3:24 PM
  • Progressed with initializer lists.
  • Added more tests.
xazax.hun added inline comments.Aug 16 2022, 9:03 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1419

Nit: we have isCopyOrMoveConstructor.

1515

I think either the comment or the code might need to be updated. Trivial and user defined are not mutually exclusive terms.
Consider:

struct X {
  std::vector<int> v;
};

The copy constructor of X is neither trivial nor user provided.

Rewrote from scratch

isuckatcs added inline comments.Aug 17 2022, 12:26 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1443–1444

This is still buggy and can result in false positives, so it's commented out for now. If we don't want to handle this situation, this will be removed.

isuckatcs updated this revision to Diff 453576.Aug 18 2022, 2:21 AM
isuckatcs marked an inline comment as done.
isuckatcs retitled this revision from [analyzer] Track constructors in the BugReporter to [analyzer] Track trivial copy/move constructors and initializer lists in the BugReporter.
isuckatcs edited the summary of this revision. (Show Details)

Updated initializer list handling

NoQ added a comment.Aug 25 2022, 12:19 PM

The current patch looks good to me, it matches my understanding of the problem. Excellent work!

There seem to be more loose ends to cover but I think it's better to commit this patch as-is, as a strict improvement, and follow-up later if you have time. Maybe leave comments for future generations to know where to look for trouble!

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1437

I suspect that another possibility is ElementRegion for initializer lists that define arrays. Maybe leave a FIXME to look into that?

On the other hand, isa<InitListExpr>(E) sounds like it could become the type of variable E, to promote a runtime check to a compile-time check.

1453

Nitpick: The coding convention recommends against using auto when the type isn't obvious from the initializer. "Initializer" doesn't necessarily refer to an expression (could be like CXXCtorInitializer).

1490–1492

This probably needs another recursive descend, eg. if we track a.y.e and encounter S a = b then we need to track b.y.e.

Maybe leave a fixme?

isuckatcs updated this revision to Diff 456324.Aug 29 2022, 6:14 AM
isuckatcs marked 5 inline comments as done.

Addressed comments

xazax.hun added inline comments.Aug 29 2022, 8:44 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1451

I am a bit anxious about this matching. It probably works ok for most types, but consider a class like:

struct Common { int x; };

struct Inner {
  Common c;
};

struct S {
  Common c;
  Inner;
};

The type Common can be found at multiple levels. Would we ever confuse S::Common with S::Inner::Common? When we see a FieldRegion for Common:x, we would have a match twice. What guarantees that we always pick the right one? Or do I miss something?

Fortunately, picking the wrong one might not result in a wrong warning text, as subsequent matching might end up failing.

1473

I wonder whether top-level is ambiguous. Some people might think of s as the top level. How about innermost?

1475–1494

I feel like the code here is almost identical to the one at the beginning. Is it possible to extract this to a lambda and call it twice?

isuckatcs marked 3 inline comments as done.

Addressed comments

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1451

Would we ever confuse S::Common with S::Inner::Common?

Consider this snippet that uses the structs you created above with Inner i and int *x.

int *x = nullptr, *y = nullptr;
S s{x, y};

int a = *s.i.c.x;

The region we track in this case is s.i.c.x. The InitListExpr is

InitListExpr 'S':'struct S'
|-InitListExpr 'Common':'struct Common'
| `-ImplicitCastExpr 'int *' <LValueToRValue>
|   `-DeclRefExpr 'int *' lvalue Var  'x' 'int *'
`-InitListExpr 'Inner':'struct Inner'
  `-InitListExpr 'Common':'struct Common'
    `-ImplicitCastExpr 'int *' <LValueToRValue>
      `-DeclRefExpr 'int *' lvalue Var 'y' 'int *'

The type of this InitListExpr is RecordType 'struct S', so we first move up the regions
until we hit one with a matching type, which is s. Then we start moving back down the
regions and now the InitListExpr too, always choosing the sub-expression that matches
the index of the region.

So basically the functions always searches for the initializers top-down based on the field
indices, so I think if it confuses 2 regions, we made a mistake somewhere else.

Also I've reworked this function to be iterative and less confusing.

NoQ accepted this revision.Aug 30 2022, 10:36 AM

LGTM as long as other reviewers are happy!

This revision is now accepted and ready to land.Aug 30 2022, 10:36 AM
martong accepted this revision.Sep 5 2022, 4:39 AM

Thanks, you have nice tests with good coverage! LGTM!

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 8:06 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript