This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/AST/
-
clang/
-
AST/
2/2
ASTContext.h
-
lib/
-
AST/
3/6
ASTContext.cpp
-
Sema/
-
SemaDecl.cpp
-
test/Sema/
-
Sema/
-
warn-documentation.cpp

Differential D61103

[clang] Add tryToAttachCommentsToDecls method to ASTContext
AbandonedPublic

Authored by jkorous on Apr 24 2019, 4:33 PM.

Download Raw Diff

Details

Reviewers

gribozavr
arphaman

Summary

Loading external comments and sorting them is expensive - mostly due to getDecomposedLoc() begin expensive. For modules with very large number of comments (~100k) this is prohibitively expensive.
In this particular case we are actually not at all interested in getting comments for declarations - just using a side-effect of the implementation which causes documentation comments to be parsed (doxygen) and attached to relevant declarations.

The FIXME in tests is fixed now.

Diff Detail

Event Timeline

jkorous created this revision.Apr 24 2019, 4:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2019, 4:33 PM

Herald added subscribers: cfe-commits, dexonsmith. · View Herald Transcript

jkorous added a parent revision: D61102: [clang][ASTContext][NFCi] Refactor ASTContext::getRawCommentForDeclNoCache.Apr 24 2019, 4:42 PM

The FIXME in tests is fixed now.

... so instead of deleting the test, could you change it to show the current, better diagnostic?

clang/include/clang/AST/ASTContext.h
818	Please add a period.
818	Please add a period.
clang/lib/AST/ASTContext.cpp
494	Would be great to explain why (because we assume that the decls and their comments were parsed just now). Otherwise the comment could enumerate a lot of other things that we are not calling here either...
564	`getCommentForDecl` checks `D->isInvalidDecl()` first.
584	Scanning all comments for every decl? Isn't that O(n^2)? Also logic duplication below. I was expecting a call to `getRawCommentForDeclNoCache`, with an appropriate flag to disable loading external comments (it is a low-level API, users generally don't call it).

jkorous marked 5 inline comments as done.Apr 25 2019, 3:46 PM

jkorous added inline comments.

clang/lib/AST/ASTContext.cpp
494	Haha, you're absolutely right! Thanks.
584	The important thing is that the expensive operation is source location decomposition. That's why I cache everything - `GetCachedCommentBegin` etc. So while you're right that iteration-wise it's O(iterationcd) it`s actually O(decomposec + decomposed) because of the caching. The current code (which is sorting all the comments) is at least O(decomposecln(c)) once you have more comments than `sqrt(300)` (==`MagicCacheSize` in `SourceManager::getInBeforeInTUCache()`). That being said - you're right that just not-loading external comments in `getRawCommentForDeclNoCache` definitely has it's appeal. I'm running a test now get some idea about performance of both approaches. BTW in theory we could also do one of these: Allow clients to transparently set `MagicCacheSize` in `SourceManager::getInBeforeInTUCache()` which is used for SourceLocation sorting (`BeforeThanCompare<RawComment>`) is currently hard-coded to 300 while we are comparing ~100k x ~100k locations. Change caching strategies in `SourceManager::getFileID` and `SourceManager::getLineNumber`.

clang-format
comments

Also, IIUC the test case that I deleted wasn't actually supposed to produce any diagnostics and the fact that it did was a bug. We could keep it as a regression test but I think it has a rather low value. WDYT?

jkorous marked an inline comment as done.Apr 25 2019, 5:29 PM

Also, IIUC the test case that I deleted wasn't actually supposed to produce any diagnostics and the fact that it did was a bug. We could keep it as a regression test but I think it has a rather low value. WDYT?

What do you mean? The issue that the test is trying to show is that a single -line in a /-comment breaks the doc comment and produces weird errors. Instead it should tell the user that it looks like there's an unintended //-line.

gribozavr added inline comments.Apr 29 2019, 2:10 AM

clang/lib/AST/ASTContext.cpp
584	So while you're right that iteration-wise it's O(iterationcd) it`s actually O(decomposec + decomposed) because of the caching. The cost of decomposing is non-trivial, but the cost of each iteration is still at least a hash table lookup. That being said - you're right that just not-loading external comments in getRawCommentForDeclNoCache definitely has it's appeal. I'm running a test now get some idea about performance of both approaches. Reopening, waiting for the results.

Abandoned in favor of https://reviews.llvm.org/D65301

Revision Contents

Path

Size

clang/

include/

clang/

AST/

ASTContext.h

8 lines

lib/

AST/

ASTContext.cpp

134 lines

Sema/

SemaDecl.cpp

14 lines

test/

Sema/

warn-documentation.cpp

10 lines

Diff 196744

clang/include/clang/AST/ASTContext.h

Show First 20 Lines • Show All 808 Lines • ▼ Show 20 Lines	public:
/// Returns nullptr if no comment is attached.		/// Returns nullptr if no comment is attached.
///		///
/// \param OriginalDecl if not nullptr, is set to declaration AST node that		/// \param OriginalDecl if not nullptr, is set to declaration AST node that
/// had the comment, if the comment we found comes from a redeclaration.		/// had the comment, if the comment we found comes from a redeclaration.
const RawComment *		const RawComment *
getRawCommentForAnyRedecl(const Decl *D,		getRawCommentForAnyRedecl(const Decl *D,
const Decl **OriginalDecl = nullptr) const;		const Decl **OriginalDecl = nullptr) const;

		/// For every comment not attached to any decl check if it should be attached
		/// to any of \param Decls.
		gribozavrUnsubmitted Done Reply Inline Actions Please add a period. gribozavr: Please add a period.
		gribozavrUnsubmitted Done Reply Inline Actions Please add a period. gribozavr: Please add a period.
		///
		/// \param PP the Preprocessor used with this TU. Could be nullptr if
		/// preprocessor is not available.
		void tryToAttachCommentsToDecls(ArrayRef<Decl *> Decls,
		const Preprocessor *PP);

/// Return parsed documentation comment attached to a given declaration.		/// Return parsed documentation comment attached to a given declaration.
/// Returns nullptr if no comment is attached.		/// Returns nullptr if no comment is attached.
///		///
/// \param PP the Preprocessor used with this TU. Could be nullptr if		/// \param PP the Preprocessor used with this TU. Could be nullptr if
/// preprocessor is not available.		/// preprocessor is not available.
comments::FullComment getCommentForDecl(const Decl D,		comments::FullComment getCommentForDecl(const Decl D,
const Preprocessor *PP) const;		const Preprocessor *PP) const;

▲ Show 20 Lines • Show All 2,263 Lines • Show Last 20 Lines

clang/lib/AST/ASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines	for (auto I : D->redecls()) {
RawCommentAndCacheFlags &R = RedeclComments[I];		RawCommentAndCacheFlags &R = RedeclComments[I];
if (R.getKind() == RawCommentAndCacheFlags::NoCommentInDecl)		if (R.getKind() == RawCommentAndCacheFlags::NoCommentInDecl)
R = Raw;		R = Raw;
}		}

return RC;		return RC;
}		}

		void ASTContext::tryToAttachCommentsToDecls(ArrayRef<Decl *> Decls,
		const Preprocessor *PP) {
		// Explicitly not calling ExternalSource->ReadComments() as we're interested
		gribozavrUnsubmitted Done Reply Inline Actions Would be great to explain why (because we assume that the decls and their comments were parsed just now). Otherwise the comment could enumerate a lot of other things that we are not calling here either... gribozavr: Would be great to explain why (because we assume that the decls and their comments were parsed…
		jkorousAuthorUnsubmitted Done Reply Inline Actions Haha, you're absolutely right! Thanks. jkorous: Haha, you're absolutely right! Thanks.
		// only in comments and decls that were parsed just now.
		ArrayRef<RawComment *> RawComments = Comments.getComments();
		if (RawComments.empty())
		return;

		auto CacheCommentForDecl = [this, PP](const Decl D, const RawComment C) {
		RawCommentAndCacheFlags CacheEntry;
		CacheEntry.setKind(RawCommentAndCacheFlags::FromDecl);
		CacheEntry.setRaw(C);
		CacheEntry.setOriginalDecl(D);
		RedeclComments[D] = CacheEntry;

		// Always try to parse in order to eventually produce diagnostics.
		comments::FullComment FC = C->parse(this, PP, D);
		// But cache only if we don't have a comment yet
		const Decl *Canonical = D->getCanonicalDecl();
		auto ParsedComment = ParsedComments.find(Canonical);
		if (ParsedComment != ParsedComments.end())
		ParsedComment->second = FC;
		};

		// explicit comment location caching
		std::unordered_map<RawComment *, std::pair<FileID, unsigned>>
		DecomposedCommentBegin;
		std::unordered_map<RawComment *, std::pair<FileID, unsigned>>
		DecomposedCommentEnd;
		std::unordered_map<unsigned, std::unordered_map<unsigned, unsigned>>
		CommentBeginLine;

		// Don't store the result for long - might go dangling.
		auto GetCachedCommentBegin =
		[&DecomposedCommentBegin,
		this](RawComment *RC) -> const std::pair<FileID, unsigned> & {
		assert(RC);
		auto BeginIt = DecomposedCommentBegin.find(RC);
		if (BeginIt != DecomposedCommentBegin.end()) {
		return BeginIt->second;
		}
		DecomposedCommentBegin[RC] =
		SourceMgr.getDecomposedLoc(RC->getSourceRange().getBegin());
		return DecomposedCommentBegin[RC];
		};
		// Don't store the result for long - might go dangling.
		auto GetCachedCommentEnd =
		[&DecomposedCommentEnd,
		this](RawComment *RC) -> const std::pair<FileID, unsigned> & {
		assert(RC);
		auto EndIt = DecomposedCommentEnd.find(RC);
		if (EndIt != DecomposedCommentEnd.end()) {
		return EndIt->second;
		}
		DecomposedCommentEnd[RC] =
		SourceMgr.getDecomposedLoc(RC->getSourceRange().getEnd());
		return DecomposedCommentEnd[RC];
		};
		auto GetCachedCommentBeginLine =
		[&CommentBeginLine,
		this](const std::pair<FileID, unsigned> &CommentBeginLoc) -> unsigned {
		auto BeginFileIt =
		CommentBeginLine.find(CommentBeginLoc.first.getHashValue());
		if (BeginFileIt != CommentBeginLine.end()) {
		auto BeginLineIt = BeginFileIt->second.find(CommentBeginLoc.second);
		if (BeginLineIt != BeginFileIt->second.end()) {
		return BeginLineIt->second;
		}
		}
		CommentBeginLine[CommentBeginLoc.first.getHashValue()]
		[CommentBeginLoc.second] = SourceMgr.getLineNumber(
		CommentBeginLoc.first, CommentBeginLoc.second);
		return CommentBeginLine[CommentBeginLoc.first.getHashValue()]
		gribozavrUnsubmitted Not Done Reply Inline Actions `getCommentForDecl` checks `D->isInvalidDecl()` first. gribozavr: `getCommentForDecl` checks `D->isInvalidDecl()` first.
		[CommentBeginLoc.second];
		};

		for (const Decl *D : Decls) {
		D = adjustDeclToTemplate(D);
		if (!CanDeclHaveDocComment(D))
		continue;

		{
		auto CIt = RedeclComments.find(D);
		if (CIt != RedeclComments.end() && CIt->second.getOriginalDecl() == D) {
		continue;
		}
		}

		llvm::Optional<SourceLocation> OptCandidateCommentLoc =
		getCandidateCommentLocation(SourceMgr, D);
		if (!OptCandidateCommentLoc)
		continue;

		gribozavrUnsubmitted Not Done Reply Inline Actions Scanning all comments for every decl? Isn't that O(n^2)? Also logic duplication below. I was expecting a call to `getRawCommentForDeclNoCache`, with an appropriate flag to disable loading external comments (it is a low-level API, users generally don't call it). gribozavr: Scanning all comments for every decl? Isn't that O(n^2)? Also logic duplication below. I was…
		jkorousAuthorUnsubmitted Done Reply Inline Actions The important thing is that the expensive operation is source location decomposition. That's why I cache everything - `GetCachedCommentBegin` etc. So while you're right that iteration-wise it's O(iterationcd) it`s actually O(decomposec + decomposed) because of the caching. The current code (which is sorting all the comments) is at least O(decomposecln(c)) once you have more comments than `sqrt(300)` (==`MagicCacheSize` in `SourceManager::getInBeforeInTUCache()`). That being said - you're right that just not-loading external comments in `getRawCommentForDeclNoCache` definitely has it's appeal. I'm running a test now get some idea about performance of both approaches. BTW in theory we could also do one of these: Allow clients to transparently set `MagicCacheSize` in `SourceManager::getInBeforeInTUCache()` which is used for SourceLocation sorting (`BeforeThanCompare<RawComment>`) is currently hard-coded to 300 while we are comparing ~100k x ~100k locations. Change caching strategies in `SourceManager::getFileID` and `SourceManager::getLineNumber`. jkorous: The important thing is that the expensive operation is source location decomposition. That's…
		gribozavrUnsubmitted Not Done Reply Inline Actions So while you're right that iteration-wise it's O(iterationcd) it`s actually O(decomposec + decomposed) because of the caching. The cost of decomposing is non-trivial, but the cost of each iteration is still at least a hash table lookup. That being said - you're right that just not-loading external comments in getRawCommentForDeclNoCache definitely has it's appeal. I'm running a test now get some idea about performance of both approaches. Reopening, waiting for the results. gribozavr: > So while you're right that iteration-wise it's O(iterationcd) it`s actually O(decompose*c +…
		const std::pair<FileID, unsigned> DeclLocDecomp =
		SourceMgr.getDecomposedLoc(OptCandidateCommentLoc.getValue());

		// FIXME: We might optimize by keeping count of unattached comments and
		// terminating early.
		for (auto CIt = RawComments.begin(); CIt != RawComments.end(); ++CIt) {
		RawComment C = CIt;
		if (!C->isDocumentation() && !LangOpts.CommentOpts.ParseAllComments)
		continue;

		if (C->isAttached())
		continue;

		if (C->isTrailingComment()) {
		if (isa<FieldDecl>(D) \|\| isa<EnumConstantDecl>(D) \|\| isa<VarDecl>(D) \|\|
		isa<ObjCMethodDecl>(D) \|\| isa<ObjCPropertyDecl>(D)) {
		const std::pair<FileID, unsigned> &CommentBeginDecomp =
		GetCachedCommentBegin(C);
		// Check that Doxygen trailing comment comes after the declaration,
		// starts on the same line and in the same file as the declaration.
		if (DeclLocDecomp.first == CommentBeginDecomp.first &&
		SourceMgr.getLineNumber(DeclLocDecomp.first,
		DeclLocDecomp.second) ==
		GetCachedCommentBeginLine(CommentBeginDecomp)) {
		C->setAttached();
		CacheCommentForDecl(D, C);
		break;
		}
		}
		} else {
		if (IsCommentBeforeDecl(SourceMgr, GetCachedCommentEnd(C),
		DeclLocDecomp)) {
		C->setAttached();
		CacheCommentForDecl(D, C);
		break;
		}
		}
		}
		}
		}

static void addRedeclaredMethods(const ObjCMethodDecl *ObjCMethod,		static void addRedeclaredMethods(const ObjCMethodDecl *ObjCMethod,
SmallVectorImpl<const NamedDecl *> &Redeclared) {		SmallVectorImpl<const NamedDecl *> &Redeclared) {
const DeclContext *DC = ObjCMethod->getDeclContext();		const DeclContext *DC = ObjCMethod->getDeclContext();
if (const auto *IMD = dyn_cast<ObjCImplDecl>(DC)) {		if (const auto *IMD = dyn_cast<ObjCImplDecl>(DC)) {
const ObjCInterfaceDecl *ID = IMD->getClassInterface();		const ObjCInterfaceDecl *ID = IMD->getClassInterface();
if (!ID)		if (!ID)
return;		return;
// Add redeclared method here.		// Add redeclared method here.
▲ Show 20 Lines • Show All 10,115 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,374 Lines • ▼ Show 20 Lines	if (Group.size() >= 2) {
// FinalizeDeclaratorGroup adds these as separate declarations.		// FinalizeDeclaratorGroup adds these as separate declarations.
Decl *MaybeTagDecl = Group[0];		Decl *MaybeTagDecl = Group[0];
if (MaybeTagDecl && isa<TagDecl>(MaybeTagDecl)) {		if (MaybeTagDecl && isa<TagDecl>(MaybeTagDecl)) {
Group = Group.slice(1);		Group = Group.slice(1);
}		}
}		}

// See if there are any new comments that are not attached to a decl.		// See if there are any new comments that are not attached to a decl.
ArrayRef<RawComment *> Comments = Context.getRawCommentList().getComments();		Context.tryToAttachCommentsToDecls(Group, &PP);
if (!Comments.empty() &&
!Comments.back()->isAttached()) {
// There is at least one comment that not attached to a decl.
// Maybe it should be attached to one of these decls?
//
// Note that this way we pick up not only comments that precede the
// declaration, but also comments that follow the declaration -- thanks to
// the lookahead in the lexer: we've consumed the semicolon and looked
// ahead through comments.
for (unsigned i = 0, e = Group.size(); i != e; ++i)
Context.getCommentForDecl(Group[i], &PP);
}
}		}

/// ActOnParamDeclarator - Called from Parser::ParseFunctionDeclarator()		/// ActOnParamDeclarator - Called from Parser::ParseFunctionDeclarator()
/// to introduce parameters into function prototype scope.		/// to introduce parameters into function prototype scope.
Decl Sema::ActOnParamDeclarator(Scope S, Declarator &D) {		Decl Sema::ActOnParamDeclarator(Scope S, Declarator &D) {
const DeclSpec &DS = D.getDeclSpec();		const DeclSpec &DS = D.getDeclSpec();

// Verify C99 6.7.5.3p2: The only SCS allowed is 'register'.		// Verify C99 6.7.5.3p2: The only SCS allowed is 'register'.
▲ Show 20 Lines • Show All 4,668 Lines • Show Last 20 Lines

clang/test/Sema/warn-documentation.cpp

	Show First 20 Lines • Show All 754 Lines • ▼ Show 20 Lines
	// expected-warning@+1 {{'\endverbatim' command does not terminate a verbatim text block}}			// expected-warning@+1 {{'\endverbatim' command does not terminate a verbatim text block}}
	/// \endverbatim			/// \endverbatim
	int test_verbatim_1();			int test_verbatim_1();

	// expected-warning@+1 {{'\endcode' command does not terminate a verbatim text block}}			// expected-warning@+1 {{'\endcode' command does not terminate a verbatim text block}}
	/// \endcode			/// \endcode
	int test_verbatim_2();			int test_verbatim_2();

	// FIXME: we give a bad diagnostic here because we throw away non-documentation
	// comments early.
	//
	// expected-warning@+3 {{'\endcode' command does not terminate a verbatim text block}}
	/// \code
	// foo
	/// \endcode
	int test_verbatim_3();


	// expected-warning@+1 {{empty paragraph passed to '\brief' command}}			// expected-warning@+1 {{empty paragraph passed to '\brief' command}}
	int test1; ///< \brief\author Aaa			int test1; ///< \brief\author Aaa

	// expected-warning@+2 {{empty paragraph passed to '\brief' command}}			// expected-warning@+2 {{empty paragraph passed to '\brief' command}}
	// expected-warning@+2 {{empty paragraph passed to '\brief' command}}			// expected-warning@+2 {{empty paragraph passed to '\brief' command}}
	int test2, ///< \brief\author Aaa			int test2, ///< \brief\author Aaa
	test3; ///< \brief\author Aaa			test3; ///< \brief\author Aaa

	▲ Show 20 Lines • Show All 535 Lines • Show Last 20 Lines