Page MenuHomePhabricator

[clang] Add tryToAttachCommentsToDecls method to ASTContext
AbandonedPublic

Authored by jkorous on Apr 24 2019, 4:33 PM.

Details

Summary

Loading external comments and sorting them is expensive - mostly due to getDecomposedLoc() begin expensive. For modules with very large number of comments (~100k) this is prohibitively expensive.
In this particular case we are actually not at all interested in getting comments for declarations - just using a side-effect of the implementation which causes documentation comments to be parsed (doxygen) and attached to relevant declarations.

The FIXME in tests is fixed now.

Diff Detail

Event Timeline

jkorous created this revision.Apr 24 2019, 4:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2019, 4:33 PM

The FIXME in tests is fixed now.

... so instead of deleting the test, could you change it to show the current, better diagnostic?

clang/include/clang/AST/ASTContext.h
818

Please add a period.

818

Please add a period.

clang/lib/AST/ASTContext.cpp
494

Would be great to explain why (because we assume that the decls and their comments were parsed just now). Otherwise the comment could enumerate a lot of other things that we are not calling here either...

564

getCommentForDecl checks D->isInvalidDecl() first.

584

Scanning all comments for every decl? Isn't that O(n^2)?

Also logic duplication below. I was expecting a call to getRawCommentForDeclNoCache, with an appropriate flag to disable loading external comments (it is a low-level API, users generally don't call it).

jkorous marked 5 inline comments as done.Apr 25 2019, 3:46 PM
jkorous added inline comments.
clang/lib/AST/ASTContext.cpp
494

Haha, you're absolutely right! Thanks.

584

The important thing is that the expensive operation is source location decomposition. That's why I cache everything - GetCachedCommentBegin etc.

So while you're right that iteration-wise it's O(iteration*c*d) it`s actually O(decompose*c + decompose*d) because of the caching. The current code (which is sorting all the comments) is at least O(decompose*c*ln(c)) once you have more comments than sqrt(300) (==MagicCacheSize in SourceManager::getInBeforeInTUCache()).

That being said - you're right that just not-loading external comments in getRawCommentForDeclNoCache definitely has it's appeal. I'm running a test now get some idea about performance of both approaches.

BTW in theory we could also do one of these:

  1. Allow clients to transparently set MagicCacheSize in SourceManager::getInBeforeInTUCache() which is used for SourceLocation sorting (BeforeThanCompare<RawComment>) is currently hard-coded to 300 while we are comparing ~100k x ~100k locations.
  2. Change caching strategies in SourceManager::getFileID and SourceManager::getLineNumber.
jkorous updated this revision to Diff 196744.Apr 25 2019, 3:50 PM
  • clang-format
  • comments

Also, IIUC the test case that I deleted wasn't actually supposed to produce any diagnostics and the fact that it did was a bug. We could keep it as a regression test but I think it has a rather low value. WDYT?

jkorous marked an inline comment as done.Apr 25 2019, 5:29 PM

Also, IIUC the test case that I deleted wasn't actually supposed to produce any diagnostics and the fact that it did was a bug. We could keep it as a regression test but I think it has a rather low value. WDYT?

What do you mean? The issue that the test is trying to show is that a single -line in a /-comment breaks the doc comment and produces weird errors. Instead it should tell the user that it looks like there's an unintended //-line.

gribozavr added inline comments.Apr 29 2019, 2:10 AM
clang/lib/AST/ASTContext.cpp
584

So while you're right that iteration-wise it's O(iteration*c*d) it`s actually O(decompose*c + decompose*d) because of the caching.

The cost of decomposing is non-trivial, but the cost of each iteration is still at least a hash table lookup.

That being said - you're right that just not-loading external comments in getRawCommentForDeclNoCache definitely has it's appeal. I'm running a test now get some idea about performance of both approaches.

Reopening, waiting for the results.