This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][UninitializedObjectChecker] New flag to turn off dereferencing
ClosedPublic

Authored by Szelethus on Jul 17 2018, 11:15 AM.

Details

Summary

The idea came from both @george.karpenkov (D45532#1145592) and from bugzilla (https://bugs.llvm.org/show_bug.cgi?id=37965).

Assigning an object with uninitialized value to a pointer/reference could be intentional, especially if one writes a class that just initializes values.

struct Initializer {
  int *a;

  void initialize(/* ... */) {
    a = /* ... */;
  }
  
  Initializer(int *a) : a(a) {}
};

void f() {
  int b;
  Initializer init(&b);
  // ...
  init.initialize(/* ... */);
}

I actually have seen some examples for this in LLVM. While I absolutely agree that a flag like this would be neat, I also think that it should be disabled by default, as for example some objects are only created with malloc/new, and because I didn't find this functionality to be too noisy.

Diff Detail

Event Timeline

Szelethus created this revision.Jul 17 2018, 11:15 AM

@Szelethus false positives are a single biggest problem of the analyzer.
By a *huge* margin, most projects would prefer to err on the side of less, more precise, warnings.
Given that currently in my understanding no actual bugs we are sure about were found by the uninitialized object checker,
I think by default we should err on the "less warnings" side.

This solves part of the problem, could we also separate the code somehow?
Let's say I would need to debug UnitializedObjectChecker without understanding the details of how the pointer chasing is done --- it's hard to do that when all the code is intermixed in one file.

I left a comment on this issue already: D49437#1165462. But in short, definitely yes! It's been very painful to work around pointer/reference objects, too bad I didn't come up with this idea sooner :)

george.karpenkov requested changes to this revision.Jul 17 2018, 11:30 AM
george.karpenkov added inline comments.
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
723

registerChecker now passes through all arguments to the constructor.
Could you pass the analyzer options to the constructor,
and populate the fields from there?

728

Could we remove the negation in the option name? CheckPointeeInitialization? Also the value should be false by default if we want the checker to be evaluated on actual projects.

This revision now requires changes to proceed.Jul 17 2018, 11:30 AM

@Szelethus Hi, do you plan to finish this patch soon-ish? I would like to evaluate it on a few codebases, but the pointer chasing is currently way too fragile / generates too many FPs.

Szelethus updated this revision to Diff 158749.Aug 2 2018, 7:19 AM
Szelethus edited the summary of this revision. (Show Details)

Dereferencing is disabled by default. I think there's a great value in this part of the checker, but I do concede to the fact that it still needs to mature even for alpha. I'll probably revisit this once heap regions are better supported.

Szelethus marked an inline comment as done.Aug 2 2018, 7:26 AM
Szelethus added inline comments.
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
723

From an earlier conversation (D48285#inline-423489):

[...]getBooleanOption(StringRef, /*DefaultVal*/ bool, const ento::CheckerBase *) depends on the constructed checker, so I don't think I can pull this off.

There actually is a way to get boolean options without having to pass the checker as an argument, but then it wouldn't be a checker specific option:
-analyzer-config Pedantic=true
instead of
-analyzer-config alpha.cplusplus.UninitializedObject:Pedantic=true

I did see your patch D49050, but since CHECKER::name is assigned a value after the checker's construction, I don't think it solves this particular problem.

@Szelethus Hi, do you plan to finish this patch soon-ish? I would like to evaluate it on a few codebases, but the pointer chasing is currently way too fragile / generates too many FPs.

What do you mean under fragile? It certainly needs some refactoring and clarification, but I haven't seen the current version of the checker crash due to pointer chasing.

This might be a little nit-picking, but I don't think the finds are false positive. From what I've seen after looking through hundreds of reports, I am confident that pointer chasing is working as intended. I would however agree that some reports hold little value to the developer, as in some cases pointees are intentionally left uninitialized, so it's better to hide it behind a flag for now.

Szelethus updated this revision to Diff 158760.Aug 2 2018, 8:08 AM

Forgot git add and -U9999.

Dereferencing is disabled by default

Great! Thank you!

but I haven't seen the current version of the checker crash due to pointer chasing.

@NoQ saw quite a few crashes during evaluation, right?

This might be a little nit-picking, but I don't think the finds are false positive

Well, yeah, it depends on how do you define things. If we define reports as "bugs", then I guess this checker already does not find bugs, but best-practices violations.
Then it can be argued whether "best practice" should be extended to include all the memory reachable through pointers.

george.karpenkov requested changes to this revision.Aug 2 2018, 2:25 PM

Great! The comment should be updated though.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
46

The comment is outdated and "true" should be switched to "false", right?

This revision now requires changes to proceed.Aug 2 2018, 2:25 PM
Szelethus marked 2 inline comments as done.Aug 3 2018, 2:06 AM

It looks like I won't be back in office until Monday, so I can't finish this one until then :(

but I haven't seen the current version of the checker crash due to pointer chasing.

@NoQ saw quite a few crashes during evaluation, right?

That doesn't sound any good. I'd be very interested in fixing those. Any plans to make some bugreports on bugzilla?

This might be a little nit-picking, but I don't think the finds are false positive

Well, yeah, it depends on how do you define things. If we define reports as "bugs", then I guess this checker already does not find bugs, but best-practices violations.
Then it can be argued whether "best practice" should be extended to include all the memory reachable through pointers.

Fair point. I still am very confident that pointer chasing should be a thing, but I strongly agree that it lacks good heuristics to make the reports more meaningful.

Szelethus updated this revision to Diff 159291.Aug 6 2018, 6:29 AM

Fixed the comments.

george.karpenkov accepted this revision.Aug 6 2018, 9:57 AM

Great, thanks a lot!

This revision is now accepted and ready to land.Aug 6 2018, 9:57 AM

I think what pointer chasing should do, is check whether that pointer owns the pointee. In that case, it should be fine to analyze it. Do you mind if I put a TODO around flag's description stating that this should be implemented and pointer chasing should be enabled by default once that's done?

I think what pointer chasing should do, is check whether that pointer owns the pointee

But ownership is a convention, and it's not always deducible from a codebase.
I think of those as two separate checks, and I think we should only talk about enabling the pointer-chasing after we had established that just checking for uninitialized fields finds lots of valid bugs (and we can only do that after it gets enabled for many projects)

I think what pointer chasing should do, is check whether that pointer owns the pointee

But ownership is a convention, and it's not always deducible from a codebase.

How about the following case:

struct A {
  struct B {
    int b;
  };
  std::unique_ptr<B> ptr;
  A() : ptr(new B) {}
};

A a;

Here, a->ptr->b is clearly uninitialized, and I think it's fine to assume in most cases that no other pointer points to it right after a's construction.

I think of those as two separate checks, and I think we should only talk about enabling the pointer-chasing after we had established that just checking for uninitialized fields finds lots of valid bugs (and we can only do that after it gets enabled for many projects)

I think in the earlier case *this->ptr should be regarded as a regular field, and it could be analyzed without fear of spamming too many reports. Currently the biggest problem is that many objects could contain references to the same object:

struct A { int x; };
struct B {
  A &a;
  B(A &a) : a(a) {}
};
struct C {
  A &a;
  C(A &a) : a(a) {}
};

A a;
B b(a);
C c(a); // a.x reported twice
Szelethus added a comment.EditedAug 6 2018, 1:50 PM

If we ignore references, check whether the pointee was constructed within the constructor, and maybe add some other clever heuristics, I'm very much in favor of enabling pointer chasing by default. But I totally support disabling it for now.

@Szelethus in any case, could you commit this change for now? I'm very keen on seeing the results on a few of our internal projects, and it's much easier to do that once the change is committed.

I'll be back in office tomorrow, about ~13 hours later. But feel free to commit it on my behalf if it's urgent.

Szelethus updated this revision to Diff 159486.Aug 7 2018, 5:07 AM

Added the TODO we were discussing.

This revision was automatically updated to reflect the committed changes.