This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Start using SyntaxTrees for folding ranges feature
ClosedPublic

Authored by kbobyrev on Sep 30 2020, 4:10 AM.

Details

Summary

This is an initial attempt to start using Syntax Trees in clangd while improving state of folding ranges feature and experimenting with Syntax Tree capabilities.

Diff Detail

Event Timeline

kbobyrev created this revision.Sep 30 2020, 4:10 AM
kbobyrev requested review of this revision.Sep 30 2020, 4:10 AM
kbobyrev updated this revision to Diff 299281.Oct 20 2020, 12:22 AM

Add support for compound statements.

kbobyrev updated this revision to Diff 299284.Oct 20 2020, 12:27 AM

Add one more test and remove unused include.

kbobyrev retitled this revision from [clangd] Stash a patch piece to [clangd] Start using SyntaxTrees for folding ranges feature.Oct 20 2020, 12:29 AM
kbobyrev edited the summary of this revision. (Show Details)
kbobyrev added a reviewer: sammccall.

@sammccall I've reduced the patch to the bare minimum (compound statements) as I've had some issues with couple of other kinds of folding ranges and I'll be adding support for the node kinds one by one.

Harbormaster completed remote builds in B75661: Diff 299284.
eduucaldas added inline comments.Oct 20 2020, 4:58 AM
clang-tools-extra/clangd/SemanticSelection.cpp
60–62
63–66

Take a look at clang/include/clang/Tooling/Syntax/Nodes.h, syntax constructs usually have nice classes with accessors.

For instance CompoundStatement has the accessors getLbrace and getRbrace that seem to be exactly what you want.

However these might not give exactly the first leaf and last leaf in the case of syntactically incorrect code.

gribozavr2 added inline comments.
clang-tools-extra/clangd/SemanticSelection.cpp
40–48

WDYT about "makeFoldingRange" or even "toFoldingRange" to emphasize it is a factory / conversion function (no actual computation)?

58

"collectFoldingRanges" was a better name I think.

59

Why not return the vector?

clang-tools-extra/clangd/unittests/SemanticSelectionTests.cpp
247

"Will" makes it sound like it is not intentional.

sammccall added inline comments.Oct 20 2020, 5:48 AM
clang-tools-extra/clangd/SemanticSelection.cpp
50–98

Have you considered how you want macro-expanded code to work?

As written, this code can produce ranges inside macro definitions, invalid ranges (if { and } aren't from the same macro expansion) or even crash (FirstToken->endLocation() may be an invalid one-past-the-end location).

My suggestion would be to have this return optional<FoldingRange> and simply bail out if the location are not file locations for now. (Later we can deal with macro args in some cases)

63–66

I think we should treat all bracket-like things generically. Today this is just CompoundStmt, but we want to handle init lists, function calls, parens around for-loop conditions, template parameter and arg lists etc in the same way.

This sort of use is why the OpenParen/CloseParen NodeRoles are generic - we can have one set of logic to handle all of these. (No opinion on whether that should live here or in the syntax trees library, but putting it here for now seems fine).

So in the end I think checking the class name and then grabbing the braces by role (not kind) is the right thing here.
We definitely want to avoid asserting that the code looks the way we expect though.

67

this is worthy of a comment (fold the entire range inside the brackets, including whitespace)

71

until we support FoldingRangeClientCapabilities, should we just have the line check?

clang-tools-extra/clangd/unittests/SemanticSelectionTests.cpp
206–208

This is too many test cases for straightforward compoundstmt, I think. We have only once type of node handled for now, but we can't afford to write and maintain this many tests once we have lots.

I think a single if{}-elseif(nobraces)-else{} along with the existing function examples is probably enough. (Function examples show missing range for empty body, though it could use a comment)

We definitely need cases for macros:

  • entire {} within one macro arg
  • entire {} in the macro body
  • some combination (e.g. { in macro body, } outside the macro)

(we won't need these for every node type, assuming we have some consistent handling)

kbobyrev updated this revision to Diff 299438.Oct 20 2020, 12:22 PM
kbobyrev marked 9 inline comments as done.

Resolve review comments.

clang-tools-extra/clangd/SemanticSelection.cpp
63–66

So in the end I think checking the class name and then grabbing the braces by role (not kind) is the right thing here.
We definitely want to avoid asserting that the code looks the way we expect though.

Can you elaborate a bit on how this would work? Is your proposal to iterate through CompoundStatement first-level children and grab the OpenParen + CloseParen roles?

sammccall added inline comments.Oct 20 2020, 12:35 PM
clang-tools-extra/clangd/SemanticSelection.cpp
63–66

Exactly. And bail out if both don't exist.

And this can be done on Tree, so it's trivial to add support for function calls etc (but feel free to keep the scope small)

clang/include/clang/Tooling/Syntax/Nodes.h
771 ↗(On Diff #299438)

This doesn't look right: now a const method grants mutable access to the child. I think we need both overloads :-(

(Fortunately this is to be tablegen'd one day...)

kbobyrev marked 3 inline comments as done.Oct 20 2020, 1:08 PM
kbobyrev updated this revision to Diff 299449.Oct 20 2020, 1:08 PM

Resolve comments.

eduucaldas added inline comments.Oct 21 2020, 1:18 AM
clang-tools-extra/clangd/SemanticSelection.cpp
63–66

I very much agree on all the points you made Sam.

I think the logic to get ranges from node roles should eventually live in the syntax trees library. We might want to hide some of these lower-level functions in the future, so it would be better to not rely on them. But it's no harm for now :).

eduucaldas added inline comments.Oct 21 2020, 2:57 AM
clang/include/clang/Tooling/Syntax/Tree.h
181–186

I think that makes sense, since all the functions used by findChild are public anyways.

WDYT Dmitri? Now that the API is being used I realize its surface ^^. Perhaps we should pour some thought into that in the future :)

clang/lib/Tooling/Syntax/Tree.cpp
304–308

Similarly to the const version of findFirstLeaf. I think this should work :)

Also you could put the definition in clang/Tooling/Syntax/Tree.h.

kbobyrev updated this revision to Diff 299619.Oct 21 2020, 3:03 AM
kbobyrev marked an inline comment as done.

Prevent code duplication for const-nonconst versions of the same function.

kbobyrev updated this revision to Diff 299620.Oct 21 2020, 3:04 AM

Fix a typo and remove invalid const-ness.

gribozavr2 added inline comments.Oct 21 2020, 7:10 AM
clang/include/clang/Tooling/Syntax/Tree.h
181–186

I think it makes sense. Const-correctness often creates this type of API duplication.

kbobyrev marked 2 inline comments as done.Oct 22 2020, 11:54 AM
sammccall added inline comments.Oct 23 2020, 1:00 AM
clang-tools-extra/clangd/SemanticSelection.cpp
40–48

Here you're looking at positions but ignoring file ID, which is always a reason to be cautious...

We've dropped macro IDs, and we're traversing only the syntax tree for this file, but we could still end up in the wrong file:

void foo() {
  #include "SomeTablegen.inc"
}

We need to verify both start/end are in the main file.

I'd suggest passing in the main FileID as a param here, and decomposing the start/end locations into (FileID, Offset), and bailing out if either FID doesn't match.
This subsumes the macro check (since macro expansions have their own FileIDs) though you might want to keep it to be explicit, as we likely want to support some cases later.

As a bonus, then you get to use SourceManager::get{Line,Column}Number(FileID, Offset) instead of the "spelling" variants that are slightly confusing as we've already established we have a file location already.

63

strictly this should probably be findLastChild both semantically and for performance... but that doesn't exist, because it's a single-linked list for now.

Not a big deal in practice, but we should fix this (and add STL iterators!)

164

this actually does the right thing (traverses the ASTContext's registered roots rather than the TUDecl itself), but... what a misleading API. We should fix that to take ASTContext instead...

(Not in this patch, maybe a followup?)

clang/include/clang/Tooling/Syntax/Tree.h
184

I think it's more conventional to implement the const variant and use const_cast for the non-const variant, so the compiler will verify that the function implementation is const, and the cast is just adding "I know if I'm not const then the returned object isn't either".

By implementing the *non-const* variant, the compiler doesn't help you verify any part of the const contract.

kbobyrev updated this revision to Diff 300479.Oct 24 2020, 6:12 AM
kbobyrev marked 3 inline comments as done.

Decompose locations and add checks for FileID == MainFileID.

Also, rebase on top of master.

kbobyrev added inline comments.Oct 24 2020, 7:43 AM
clang-tools-extra/clangd/SemanticSelection.cpp
63

Added a FIXME.

kbobyrev marked an inline comment as done.Oct 26 2020, 3:12 AM
sammccall accepted this revision.Oct 27 2020, 8:13 AM
sammccall added inline comments.
clang-tools-extra/clangd/SemanticSelection.cpp
42

just use SourceLocations or SourceRange here? We have the SourceManager to decompose them anyway.

This revision is now accepted and ready to land.Oct 27 2020, 8:13 AM
kbobyrev updated this revision to Diff 301008.Oct 27 2020, 8:41 AM

Address post-LGTM comments.

kbobyrev marked an inline comment as done.Oct 27 2020, 8:42 AM
kbobyrev updated this revision to Diff 301009.Oct 27 2020, 8:44 AM

Resolve merge conflict.

This revision was landed with ongoing or failed builds.Oct 27 2020, 8:51 AM
This revision was automatically updated to reflect the committed changes.