Page MenuHomePhabricator

[clang] A more robust way to attach comments
Needs ReviewPublic

Authored by zixuw on May 5 2022, 5:20 PM.

Details

Summary

The current implementation in getRawCommentForDeclNoCacheImpl to check
if a leading comment is attached to a given Decl *D is to see if there
is any of ";{}#@" between the comment and the source location of D,
which indicates that there's other declarations or preprocessor
directives that the comment is meant for. However, this does not work
for enum constants, which are separated by commas.
Comma was originally included in the filter characters list, but removed
in b534d3a0ef69 because macros, attributes and other decorators might
also introduce commas before a declaration. And there's no good way to
reliably and efficiently check if all the commas are from other
declarations.
This patch added a forward pointer in DeclBase pointing to the
previous declaration inside the lexical DeclContext, similar to the
existing NextInContextAndBits. This constructs a doubly linked list so
we can easily query the lexically preceding declaration of the current
decl D, to check if any decl falls between the comment and D.

Diff Detail

Event Timeline

zixuw created this revision.May 5 2022, 5:20 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: arphaman. · View Herald Transcript
zixuw requested review of this revision.May 5 2022, 5:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 5:20 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
zixuw added a comment.May 5 2022, 7:12 PM

Well, apparently there are still some corner-case bugs in the logic. ExtractAPI and a few other tests failed. Will look into it tomorrow.

And if folks have concerns or opinions of the performance/space cost of adding the extra pointer to DeclBase, I can try to keep the original logic and just special-case enum constants to find the previous enum constant decl in the parent enum. Not sure if there are existing ways to get preceding enum constants efficiently, otherwise it'd be an O(n^2) operation for an enum.

Since the issue is specific to enumerators I would recommend against increasing the size of DeclBase, which would increase the size of every single Decl in the AST.

There are multiple ways to approach this:

  • Consider whether bringing back the comma check just for enumerators would work.
  • Check the enumerators from the parent EnumDecl. The number of enumerators inside an EnumDecl is usually relatively small, you can do measurements to see how the performance looks with actual code.
  • If there is noticable performance regression:
    • Either add "previous enumerator" pointer in EnumConstantDecl (increases size of every EnumConstantDecl in the AST)
    • Or add "out-of-AST-decl" tracking via adding some DenseMap in ASTContext that is only populated when comments for enumerators of an EnumDecl are requested. There is much precedent for this, for example see these examples:
	  /// A cache mapping from CXXRecordDecls to key functions.
	  llvm::DenseMap<const CXXRecordDecl*, LazyDeclPtr> KeyFunctions;

	  /// Mapping from ObjCContainers to their ObjCImplementations.
	  llvm::DenseMap<ObjCContainerDecl*, ObjCImplDecl*> ObjCImpls;

	  /// Mapping from ObjCMethod to its duplicate declaration in the same
	  /// interface.
	  llvm::DenseMap<const ObjCMethodDecl*,const ObjCMethodDecl*> ObjCMethodRedecls;

	  /// Mapping from __block VarDecls to BlockVarCopyInit.
	  llvm::DenseMap<const VarDecl *, BlockVarCopyInit> BlockVarCopyInits;

The benefit of a separate map is that it only takes additional memory when a particular functionality is requested.