Page MenuHomePhabricator

[clang] Support parsing comments without ASTContext
Needs ReviewPublic

Authored by tom-anders on Feb 1 2023, 2:33 PM.

Details

Reviewers
kadircet
Summary

This is in preparation for implementing doxygen parsing, see discussion in https://github.com/clangd/clangd/issues/529.

Diff Detail

Event Timeline

tom-anders created this revision.Feb 1 2023, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 2:33 PM
tom-anders requested review of this revision.Feb 1 2023, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 2:33 PM
tom-anders added inline comments.Feb 1 2023, 2:37 PM
clang/include/clang/AST/RawCommentList.h
159 ↗(On Diff #494076)

Not sure if this is the right place for this? Could be moved to FullComment or made a free function instead?

clang/lib/AST/RawCommentList.cpp
227 ↗(On Diff #494076)

I wasn't sure if it's worth making the Diagnostics parameter optional here - Our SourceMgr already has it anyway, does it impact performance significantly to use it, even though we don't need it (yet)?

Thanks, this looks like a good start. I left some comments about the details of implementation, as for placing and APIs in general, i think it's useful to see how things will be built on top inside clangd to make the right call here.

clang/include/clang/AST/RawCommentList.h
159 ↗(On Diff #494076)

i think this would make more sense inside clangd as a free function when we're processing comments. it's nice to get to re-use some Lexer/Sema/Parser creation logic, but i don't think it's big enough to justify a new public interfaces that's solely used by clangd.
if we can figure out more patterns that we can migrate (the unittest in this patch is a good example, but still just a unittest and doesn't benefit much from code reuse) we can consider lifting this logic up to a common place. the other users should hopefully make it easier to decide where this code should live.

clang/lib/AST/RawCommentList.cpp
227 ↗(On Diff #494076)

as things stand, Lexer, Sema and Parser expect a non-null diagnosticsengine so we have to pass something (i guess that's what you meant by making it optional).

the performance implications comes from two places:
1- extra analysis being performed in the face of malformed comments to produce "useful" diagnostics
2- creation cost of diagnostic itself, as all the data about ranges, decls etc. are copied around when constructing a diagnostic.

i don't know how costly the first action is today, it's unfortunately the case that would require a lot of code changes, if it's happening quite often in practice (especially when we're in codebases without proper doxygen comments)
the latter is somewhat easier to address by providing our own "mock" diagnostics engine, that'll just drop all of these extra diagnostic data on the floor instead of copying around.

i think taking a look at the codebase for the first (searching for Diag(...) << patterns in Comment{Lexer,Sema,Parser}.cpp should give some ideas) and assessing whether we're performing any extra complicated analysis would be a good start. i would expect us not to, and if that's the case i think we can leave it at that and don't try to make code changes.
the latter we can address easily and we should.

  • Move to free function in CodeCompletionStrings.h
  • Add back comment markers before parsing (Index stores comments without markers, but the comment lexer expects them)
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2023, 6:23 AM
Herald added a subscriber: arphaman. · View Herald Transcript
tom-anders marked an inline comment as done.Feb 18 2023, 6:29 AM
tom-anders added inline comments.
clang/lib/AST/RawCommentList.cpp
227 ↗(On Diff #494076)

1 - At first glance it indeed seems like there's no significant extra analysis performed, so I think we're good for now.
2 - I think this is already taken care of by SourceManagerForFile which just passes nullptr as the DiagnosticConsumer, right?

nridge added a subscriber: nridge.Feb 18 2023, 9:40 PM