This is an archive of the discontinued LLVM Phabricator instance.

[clangd] IncludeCleaner: handle using namespace
AbandonedPublic

Authored by ArcsinX on Sep 21 2022, 11:48 AM.

Details

Reviewers
sammccall
Summary

IncludeCleaner suggests to remove #include "test.h" in the following example:

test.h

namespace ns {}

test.cpp

include "test.h"
using namespace ns;

This patch fixes this behavior by handling using-directive declarations

Diff Detail

Event Timeline

ArcsinX created this revision.Sep 21 2022, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 11:48 AM
ArcsinX requested review of this revision.Sep 21 2022, 11:48 AM

Is test.h meaningfully used in that example?
Yes, the code is going to fail to compile without it, but it seems like the "spirit" of IWYU would say to delete both the include and the using directive.

My concern about marking it used is that namespaces are typically redeclared in *every* header, and this effectively disables the feature on large swaths of code:

// foo.h
namespace myproj { void foo(); }

// bar.h
namespace myproj { void bar(); }

// main.cc
#include "foo.h"
#include "bar.h" // not meaningfully used

using namespace myproj;
int main() {
 foo();
}

Is test.h meaningfully used in that example?
Yes, the code is going to fail to compile without it, but it seems like the "spirit" of IWYU would say to delete both the include and the using directive.

Yes, in this example both should be removed, but from IDE user point of view, I expect warning about unused using namespace ..., and after it will be removed, warning about unused include can appear.

Also:

  • I think that in unsure cases, we should keep includes and avoid code with errors
  • IWYU tool doesn't report that include can be removed

My concern about marking it used is that namespaces are typically redeclared in *every* header, and this effectively disables the feature on large swaths of code:

// foo.h
namespace myproj { void foo(); }

// bar.h
namespace myproj { void bar(); }

// main.cc
#include "foo.h"
#include "bar.h" // not meaningfully used

using namespace myproj;
int main() {
 foo();
}

In this case I don't see a difference with other declaration types. E.g. function

// foo.h
void foo();

// bar.h
void foo();

// main.cc
#include "foo.h"
#include "bar.h"

int main() {
 foo();
}

both foo.h and bar.h contains prototype of foo(), but bar.h can be removed

Anyway if this is the only concern, we can handle namespace declaration as a special case inside ReferencedLocationCrawler::add(). something like this:

diff
-    for (const Decl *Redecl : D->redecls())
-      Result.User.insert(Redecl->getLocation());
+    if (llvm::isa<NamespaceDecl>(D)) {
+      Result.User.insert(D->getCanonicalDecl()->getLocation());
+    } else {
+      for (const Decl *Redecl : D->redecls())
+        Result.User.insert(Redecl->getLocation());
+    }

And in the above example #include bar.h will be suggested to remove

Could this be a suitable solution?

Anyway if this is the only concern, we can handle namespace declaration as a special case inside ReferencedLocationCrawler::add(). something like this:

diff
-    for (const Decl *Redecl : D->redecls())
-      Result.User.insert(Redecl->getLocation());
+    if (llvm::isa<NamespaceDecl>(D)) {
+      Result.User.insert(D->getCanonicalDecl()->getLocation());
+    } else {
+      for (const Decl *Redecl : D->redecls())
+        Result.User.insert(Redecl->getLocation());
+    }

And in the above example #include bar.h will be suggested to remove

Could this be a suitable solution?

I don't think picking the canonical declaration would be helpful in general case; it'll likely preserve the first header all the time, which might be unused otherwise.

I feel like the best option here is, diagnosing the unused using directive as well. but I am not sure if changing the lookup semantics would always be "right".
The other option would be to consider headers in batches after full analysis is complete, while keeping track of symbols provided by a particular header.
That way when we have a symbol that's only re-declared in a bunch of headers, we can rank them based on other symbols being provided and only keep the headers that are providing some extra "used" symbols in addition to redecls.
In the absence of such we get to pick an option between "all/none/forward declare ourselves". I guess picking "all" in this case is less troublesome, the user will get all the "unused include" warnings as soon as they use a symbol from that namespace.

This is not easy in the current implementation of clangd, as we preserve symbols provided by headers, but the new include-cleaner library we're working on is more accommodating for such logic. Hence I'd rather leave this implementation as-is today, and see if such a behaviour would be desired in general (not only for namespace-decls, but pretty much any declaration for which we can only see "forward declarations" without a clear "canonical declaration"

Anyway if this is the only concern, we can handle namespace declaration as a special case inside ReferencedLocationCrawler::add(). something like this:

diff
-    for (const Decl *Redecl : D->redecls())
-      Result.User.insert(Redecl->getLocation());
+    if (llvm::isa<NamespaceDecl>(D)) {
+      Result.User.insert(D->getCanonicalDecl()->getLocation());
+    } else {
+      for (const Decl *Redecl : D->redecls())
+        Result.User.insert(Redecl->getLocation());
+    }

And in the above example #include bar.h will be suggested to remove

Could this be a suitable solution?

I don't think picking the canonical declaration would be helpful in general case; it'll likely preserve the first header all the time, which might be unused otherwise.

I feel like the best option here is, diagnosing the unused using directive as well. but I am not sure if changing the lookup semantics would always be "right".

Diagnosing the unused using directive is a good option, but it's not related with IncludeCleaner. For now IncludeCleaner suggests to remove a header, but removing it leads to compile errors, which looks wrong.

The other option would be to consider headers in batches after full analysis is complete, while keeping track of symbols provided by a particular header.
That way when we have a symbol that's only re-declared in a bunch of headers, we can rank them based on other symbols being provided and only keep the headers that are providing some extra "used" symbols in addition to redecls.
In the absence of such we get to pick an option between "all/none/forward declare ourselves". I guess picking "all" in this case is less troublesome, the user will get all the "unused include" warnings as soon as they use a symbol from that namespace.

This looks like a general problem, declaration of a used function in every header leads to "every header is used", maybe it's ok as far as it's not a common case.

So, yes, with the proposed solution we will get rid of some "false positive" cases, but we will get some "false negative" cases.
But IWYU handles this https://github.com/include-what-you-use/include-what-you-use/commit/97e236c55e5ebbd95b8bb221fd9497851f67c3cc

nridge added a subscriber: nridge.Oct 3 2022, 3:10 AM
ArcsinX abandoned this revision.Oct 17 2022, 3:17 AM