This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support `#pragma mark` in the outline
ClosedPublic

Authored by dgoldman on Jul 13 2021, 9:27 AM.

Details

Summary

Xcode uses #pragma mark - to draw a divider in the outline view
and #pragma mark Note to add Note in the outline view. For more
information, see https://nshipster.com/pragma/.

Since the LSP spec doesn't contain dividers for the symbol outline,
instead we treat #pragma mark - as a group with children - the
decls that come after it.

Work left to do:

  • Support some edge cases where the document symbols may not be nested / sorted properly.
  • Consider supporting #pragma marks before any decls
  • Consider supporting the following:
#pragma mark -
#pragma mark Bar

by treating it like #pragma mark - Bar instead of an unnamed
group followed by a named mark.

Diff Detail

Event Timeline

dgoldman created this revision.Jul 13 2021, 9:27 AM
dgoldman requested review of this revision.Jul 13 2021, 9:27 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 13 2021, 9:27 AM
dgoldman updated this revision to Diff 358307.Jul 13 2021, 9:29 AM

Minor TODO change

dgoldman updated this revision to Diff 358606.Jul 14 2021, 7:51 AM

Fix clang tidy warnings

So first of all, it is not clear to me if this is a useful enough improvement to justify all the new code, but looking at your investment I'll assume this is really common in ObjC land and gonna be useful for developers working with ObjC. That's enough of a justification for me, but I'd be more than happy to see some real data if you have any.

That being said, reading the code it seems like the intent is enabling #pragma mark directives the introduce symbols that are either leaves or containers:

  • Leaf case is obvious and straightforward, it is just like a normal declaration happening at that particular location. It is unclear which kind should be used. It seems you've went with File but I don't think it is suitable in this case. I would suggest using Null.
  • Container case is more sophisticated. One thing is for sure, they are nested under the same symbol a declaration occuring at that point would be. They also seem to contain all the symbols that would nest under current subtree and spelled after the mark and spelled before any other marks. Hence there is no way to explicitly terminate the scope of a mark. They are implicitly terminated by hitting the end of the container containing the mark or hitting a new mark directive. This also implies one cannot have multiple levels of nesting just by pragma directives, but it can be achieved if there are usual containers with more mark directives (see my comment around tests for an example).

Can you verify that I understood the intent correctly and make it more explicit in the comments as well?

(Terminating containers implicitly sounds like an unfortunate limitation to me. I'd rather only enable termination of them explicitly) but I suppose there's a lot of code that's already existing out in the world with the assumption of XCode's implicit termination rules, so this is probably not gonna be helpful in reality. Hence we can't do much.)

Before diving into the details of the implementation it looks like we are doing some post-processing to incorporate mark pragmas into the tree, why can't we just insert them while building the tree as we do with macros? for example:

  • when inserting a symbol we look for a possible mark-pragma container (similar to macro case), a mark pragma is only eligible as a container if it is contained within the range of the parent && spelled before current declaration at hand && is a group.
  • as for insertion of leaves, after traversal of all the children we can just add any mark pragmas that occured within the current symbol's range as a child. we can also drop any mark pragmas occuring inside symbols that doesn't require child traversal this way.

This approach relies on the fact that range calculated by declToSym will be correct enough. I think it should be good in most cases, but might require some adjustments around macros, as usual (can't think of a problematic example right now though).
This brings out the question about ordering of children in DocumentSymbol, lsp doesn't say anything about it today, but during rendering we can just order these based on the selectionRange (probably during build) and hope that clients will preserve it.

I believe implementing it as part of the traversal rather than as post processing both makes code easier to reason about and more efficient. So WDYT about going in that direction?

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1053

probably a little mix of token vs char ranges.

1055

can you also add something like:

#pragma mark - Foo
struct Foo {
  #pragma mark - Bar
  void bar() {
     #pragma mark - NotTopDecl // I don't think we want this to be part of documentsymbol.
  }
};
void bar() {}
1122

yikes, this looks bad, but also a little bit confusing to be added with this change. Can you move it to a separate one? (or just create a bug report in github/clangd/clangd).

So first of all, it is not clear to me if this is a useful enough improvement to justify all the new code, but looking at your investment I'll assume this is really common in ObjC land and gonna be useful for developers working with ObjC. That's enough of a justification for me, but I'd be more than happy to see some real data if you have any.

Yep, I can you send you some searches internally with how prevalent they are.

That being said, reading the code it seems like the intent is enabling #pragma mark directives the introduce symbols that are either leaves or containers:

  • Leaf case is obvious and straightforward, it is just like a normal declaration happening at that particular location. It is unclear which kind should be used. It seems you've went with File but I don't think it is suitable in this case. I would suggest using Null.
  • Container case is more sophisticated. One thing is for sure, they are nested under the same symbol a declaration occuring at that point would be. They also seem to contain all the symbols that would nest under current subtree and spelled after the mark and spelled before any other marks. Hence there is no way to explicitly terminate the scope of a mark. They are implicitly terminated by hitting the end of the container containing the mark or hitting a new mark directive. This also implies one cannot have multiple levels of nesting just by pragma directives, but it can be achieved if there are usual containers with more mark directives (see my comment around tests for an example).

Can you verify that I understood the intent correctly and make it more explicit in the comments as well?

Yep, although there are some TODOs there, we could allow letting leaf nodes to be contained in a group parent as well as potentially merging them in the #pragma mark - followed by #pragma mark Foo case.

There's no good kind at the moment IMO, I don't really have a preference as long as it doesn't render awkwardly in VS Code.

(Terminating containers implicitly sounds like an unfortunate limitation to me. I'd rather only enable termination of them explicitly) but I suppose there's a lot of code that's already existing out in the world with the assumption of XCode's implicit termination rules, so this is probably not gonna be helpful in reality. Hence we can't do much.)

I don't think this an Xcode limitation really - more of an LSP spec limitation. If we create a group inside of another symbol, we are effectively contained in it and cannot escape without violating the spec, right? To be clear, Xcode rules are a bit different - see the link above - the #pragma mark - just creates a divider which we can't mirror.

Before diving into the details of the implementation it looks like we are doing some post-processing to incorporate mark pragmas into the tree, why can't we just insert them while building the tree as we do with macros? for example:

  • when inserting a symbol we look for a possible mark-pragma container (similar to macro case), a mark pragma is only eligible as a container if it is contained within the range of the parent && spelled before current declaration at hand && is a group.

This assumes (like this code today) that the nodes are sorted properly and the ranges of the parent are accurate. I was thinking of adding a proper sort in a post-processing phase but maybe we could do it + range fix up during insertion. Seems complicated to move everything over to insertion, but if you think that's reasonable, I can try that out.

And that case generally works but we need to be sure to handle multiple pragmas next to each other. #pragma mark - Foo\n#pragma mark - Bar turns Foo into an empty group and Bar takes decls after. And then, currently, a #pragma mark Third would end the group as well.

  • as for insertion of leaves, after traversal of all the children we can just add any mark pragmas that occured within the current symbol's range as a child. we can also drop any mark pragmas occuring inside symbols that doesn't require child traversal this way.

I think it's a bit more complicated since the leaf nodes terminate the groups. As long as we do that it should be fine, assuming editors don't require it to be sorted.

This approach relies on the fact that range calculated by declToSym will be correct enough. I think it should be good in most cases, but might require some adjustments around macros, as usual (can't think of a problematic example right now though).
This brings out the question about ordering of children in DocumentSymbol, lsp doesn't say anything about it today, but during rendering we can just order these based on the selectionRange (probably during build) and hope that clients will preserve it.

I believe implementing it as part of the traversal rather than as post processing both makes code easier to reason about and more efficient. So WDYT about going in that direction?

We could - I considered it, but with the existing behavior and potential subtle ordering issues I thought it would be simpler, but less efficient to do it after. I took a look at the Macro Expansion code and didn't understand it + potential edge cases well enough to add it there.

Yep, I can you send you some searches internally with how prevalent they are.

Thanks!

I don't think this an Xcode limitation really - more of an LSP spec limitation. If we create a group inside of another symbol, we are effectively contained in it and cannot escape without violating the spec, right? To be clear, Xcode rules are a bit different - see the link above - the #pragma mark - just creates a divider which we can't mirror.

Sorry I don't follow what you mean here. I was talking about inability to have something like:

#pragma mark -Foo
  #pragma mark -Bar
  #pragma mark SPECIAL_BAR_END
#pragma mark SPECIAL_FOO_END

By making sure groups are terminated explicitly we gain the ability to nest them, whereas otherwise -Bar above will always terminate the previous group implicitly.

This assumes (like this code today) that the nodes are sorted properly and the ranges of the parent are accurate.

Well it is not as strong as that. We'll mostly rely on the fact that symbols are nested properly. Since we just need to figure out the mark container for a symbol, we don't care about the ordering (or am I missing something here?).
OTOH, range of the parent being accurate is indeed required. I think today it is only broken for macro parents, but hopefully it shouldn't cause as any troubles as in all the practical cases these are still contained inside an AST node (I think), e.g:

#define FOO(X) class X
FOO(Bar) {
#pragma mark -Test
void bar();
}

-Test mark is contained in class X so it can be nested properly. I don't think it is ever possible to have it nest directly under a macro name since you can't have other PP directives expanded from macros (even though it is probably possible via weird tricks like __pragma/_Pragma).
Now ranges for the pragma-mark containers will also be broken, but again it is not a problem as we are never going to attach any pragma-mark symbols directly into another pragma-mark (they'll either be siblings or there'll be a regular AST node in between, as explained previously).

I was thinking of adding a proper sort in a post-processing phase but maybe we could do it + range fix up during insertion. Seems complicated to move everything over to insertion, but if you think that's reasonable, I can try that out.

As explained above, I think we can just keep adjusting the ranges after building the tree as we do today. Let me know if I am missing something though.

I think it's a bit more complicated since the leaf nodes terminate the groups. As long as we do that it should be fine, assuming editors don't require it to be sorted.

Well actually handling the leafs that terminate the group they are contained in is easy (since we'll make use of the closest mark-pragma to the declaration), but the other way around is hard, e.g:

#pragma mark -Foo
class X {
  #pragma mark -Bar
};
void foo();

foo above needs to be nested under Foo, even though the closest mark-pragma is Bar. I think these can be handled more naturally by making sure we throw away mark-pragmas contained inside a symbol after processing the containing symbol (e.g. Bar will be dropped from our container after X has been inserted into the tree).

We could - I considered it, but with the existing behavior and potential subtle ordering issues I thought it would be simpler, but less efficient to do it after.

Does my new set of comments help clarify things a little more? I don't think we need any ordering guarantees, we mostly rely on the fact that mark-pragmas will be sorted properly and symbols are nested correctly.

I took a look at the Macro Expansion code and didn't understand it + potential edge cases well enough to add it there.

I don't think this should directly go into the macro parent logic, but rather be orthogonal. Then we can just chain them, by first looking for a macro container and then looking for a mark-pragma one above that container (since we know the other-way around is not possible, i.e. a mark-pragma cannot nest directly under a macro).

So first of all, it is not clear to me if this is a useful enough improvement to justify all the new code, but looking at your investment I'll assume this is really common in ObjC land and gonna be useful for developers working with ObjC. That's enough of a justification for me, but I'd be more than happy to see some real data if you have any.

+1. I think the key for this kind of feature is isolating the complexity so it doesn't need to be part of the core mental model of how the feature works.
For macros I basically failed to do this, but I think we can do better here.

Can you verify that I understood the intent correctly and make it more explicit in the comments as well?

Yes, I think the comments should explain how these marks are used without assuming any prior knowledge.
Simplest reason for this is that the impl language for clangd is not objc, and the technique seems basically unknown elsewhere.

Before diving into the details of the implementation it looks like we are doing some post-processing to incorporate mark pragmas into the tree, why can't we just insert them while building the tree as we do with macros?

OK, this is why I'm jumping in here...
I think the fact that we do macros as-we-go is bad but unavoidable. It forces the core algorithm to think about symbol nesting and macro expansion at the same time, and makes the code much harder to reason about than if they were completely separate.

I didn't manage to separate them because macros are so complicated - we need the detailed SourceLocation info, to account for cases where they don't nest concentrically, etc. But this failure isn't something we should be consistent about - the more features that are part of the central algorithm, the exponentially more interactions there are to worry about.

However those difficulties don't apply here - I believe you can take the current final representation (vector<DocumentSymbol>) + the list of marks and produce a markified vector<DocumentSymbol> via a fairly natural tree walk. And this does a *really* good job of isolating this feature - we literally run the rest of the algorithm without it.
(This is slightly different to what's done in the patch, I would actually take it out of the DocumentOutline class entirely).
I don't think this hurts efficiency much (I think/hope this ends up being a move rather than a copy for most nodes). And I think it actually makes the markifying code easier to reason about in this case, as it just deals with line numbers/positions, without having the distraction of SourceLocation.

@kadircet WDYT of this argument, before we make David go back and forth?

clang-tools-extra/clangd/ParsedAST.cpp
418

You'll need to do this with marks as well (collect + merge), or you'll miss marks from the preamble region.
Make sure there's a test for this (it's not necessary to actually have any includes, I think)

clang-tools-extra/clangd/Protocol.h
198

This doesn't belong in Protocol.h, rather in SourceCode.h or so - these aren't domain objects and adding logic to them can end up causing layering issues in general.
(There are some exceptions for good and bad reasons, but not many. I'm not sure why Range::contains is here)

Maybe union() to be a little less ambiguous?

clang-tools-extra/clangd/TextMarks.h
22

This seems like premature abstraction - you don't handle TODOs here, nor are they appropriate for the single use case we have. Moreover they don't satisfy the definition here: int foo; // TODO: turn into a float the TODO doesn't occupy the whole line.

If we need it, the appropriate generalization seems just as likely to be "pragmas" or "directives" as "textual marks".

Similarly, it's not clear that the logic around interpreting the strings e.g. as groups needs to be shared across features, nor is it terribly complicated.
So the interface here seems like it could be much simpler:

struct PragmaMark {
  unsigned Line;
  string Text;
}
unique_ptr<PPCallbacks> collectPragmaMarksCallback(const SourceManager&, vector<PragmaMark> &Out);

I'd suggest putting this in CollectMacros.h (which would be misnamed but not terribly, or we could always rename) rather than adding a new header.

45

please avoid puttng bodies inline unless trivial or hot

However those difficulties don't apply here - I believe you can take the current final representation (vector<DocumentSymbol>) + the list of marks and produce a markified vector<DocumentSymbol> via a fairly natural tree walk. And this does a *really* good job of isolating this feature - we literally run the rest of the algorithm without it.
(This is slightly different to what's done in the patch, I would actually take it out of the DocumentOutline class entirely).
I don't think this hurts efficiency much (I think/hope this ends up being a move rather than a copy for most nodes). And I think it actually makes the markifying code easier to reason about in this case, as it just deals with line numbers/positions, without having the distraction of SourceLocation.
@kadircet WDYT of this argument, before we make David go back and forth?

As discussed offline, I am sold. In the end both this approach and inserting marks as we build the tree share a lot in common. In both cases when inserting a node into the tree we need to figure out whether it should go under the current parent or there's a mark that should contain that node, which is possibly not present in the tree yet. Instead of looking at ast nodes, it is definitely more reasonable to look at the document symbol nodes. Also as you mentioned doing this afterwards definitely does a better job at isolating this from the rest + enables us to fix up ranges/ordering of the tree produced from ast.

Sorry for the hassle here David :/

Yep, I can you send you some searches internally with how prevalent they are.

Thanks!

I don't think this an Xcode limitation really - more of an LSP spec limitation. If we create a group inside of another symbol, we are effectively contained in it and cannot escape without violating the spec, right? To be clear, Xcode rules are a bit different - see the link above - the #pragma mark - just creates a divider which we can't mirror.

Sorry I don't follow what you mean here. I was talking about inability to have something like:

#pragma mark -Foo
  #pragma mark -Bar
  #pragma mark SPECIAL_BAR_END
#pragma mark SPECIAL_FOO_END

By making sure groups are terminated explicitly we gain the ability to nest them, whereas otherwise -Bar above will always terminate the previous group implicitly.

Ah I see what you mean. Xcode has implicit termination - the #pragma mark - Foo just makes a horizontal line and labels Foo after, so a subsequent #pragma mark - Bar ends up creating a new line, ending the previous. There's too much relying on this behavior for us to change it. My point was we can't avoid an implicit end here of the Foo mark due to how we nest things, so either way we'd have some form of implicit termination:

@interface Foo
#pragma mark - Foo
@end

However those difficulties don't apply here - I believe you can take the current final representation (vector<DocumentSymbol>) + the list of marks and produce a markified vector<DocumentSymbol> via a fairly natural tree walk. And this does a *really* good job of isolating this feature - we literally run the rest of the algorithm without it.
(This is slightly different to what's done in the patch, I would actually take it out of the DocumentOutline class entirely).
I don't think this hurts efficiency much (I think/hope this ends up being a move rather than a copy for most nodes). And I think it actually makes the markifying code easier to reason about in this case, as it just deals with line numbers/positions, without having the distraction of SourceLocation.

That seems reasonable, it seems quite similar to this, we can probably simplify it if we remove the requirement that the children are in order. Note that my current approach doesn't really add much overhead if the absence of the TextMarks, but we should be able to special case that there as well.

clang-tools-extra/clangd/TextMarks.h
22

We don't handle them at the moment, but Xcode does (for both Swift + ObjC): https://medium.com/@cboynton/todo-make-your-notes-on-xcode-stand-out-5f5124ec064c. You're right that they don't necessarily have to occupy the whole line though, it's possible to have multiple per line although I'm not sure how often that is used in practice.

sammccall added inline comments.Jul 15 2021, 10:10 AM
clang-tools-extra/clangd/TextMarks.h
22

This seems like a very different case, and I'm not convinced we should support it.

That #pragma marks are used for grouping seems like the strongest argument for including them in the outline. TODOs are not used for grouping.

The majority of #pragma marks are written with XCode's conventions in mind, the majority of TODO comments are not. So there's a real question of whether authors want this. (And whether it should include all comments, or other kinds of structured comments...)

I'd suggest keeping the scope small and concrete. If you'd like to add abstractions because more cases are imminent, we probably need to get consensus on (some of) these cases first.

dgoldman added inline comments.Jul 15 2021, 12:00 PM
clang-tools-extra/clangd/TextMarks.h
22

Technically Xcode also supports // MARK comments as well, but almost all users internally use #pragma mark. I do think the TODO/FIXME in the outline could be useful (Xcode does it...), but if you're against it I'll remove this.

Well it is not as strong as that. We'll mostly rely on the fact that symbols are nested properly. Since we just need to figure out the mark container for a symbol, we don't care about the ordering (or am I missing something here?).
OTOH, range of the parent being accurate is indeed required. I think today it is only broken for macro parents, but hopefully it shouldn't cause as any troubles as in all the practical cases these are still contained inside an AST node (I think), e.g:

I think the difficulty is when you "figure out the mark container for a symbol" since we implicitly scope the mark containers based on their location, it's possible the closest #pragma isn't the right container. I'm not sure of a simple way to do that without sorting, LMK if you have some ideas.

dgoldman added inline comments.Jul 15 2021, 1:36 PM
clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1122

I'm not sure this is worth disregarding just yet, if we're not able to fix this at the clang level we might want to consider a clangd-style fix here (which actually would behave similar to what we have right now with the text marks needing to be merged in).

dgoldman updated this revision to Diff 359878.Jul 19 2021, 12:19 PM

Fetch marks from preamble as well

Move Range helper into SourceCode.h

Thanks. AFAIK this is now waiting on you David with Kadir to review impl.

I think the difficulty is when you "figure out the mark container for a symbol" since we implicitly scope the mark containers based on their location, it's possible the closest #pragma isn't the right container. I'm not sure of a simple way to do that without sorting, LMK if you have some ideas.

I tried to work through this and yes I think you need the containers to be sorted. (If only because otherwise you have to reorder children within a container to group them, which is bizarre). If we do this with postprocessing, then where we fix-up the ranges is a good time to sort too. And mark insertion can preserve range nesting/sorting.
If the nesting is "incorrect" (siblings are not disjoint as in the function-in-@implementation example you gave) there doesn't seem to always be a great answer. For now I think it's OK to do something "wrong" in that case, as long as it's not infloop/crash.

clang-tools-extra/clangd/TextMarks.h
22

I think // MARK is so rarely used to be worth ignoring for now.

I think including TODO/FIXME is *probably* not great (unlike xcode, we can't tailor the UI to it and it doesn't seem to fit LSP).

We can always discuss and work out generalize later though, mostly I'm against adding the abstraction now just in case we have the discussion later.

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1122

Let's not disregard it, but we definitely need to think about it in a comprehensive way that isn't specific to this patch.

e.g. all selection operations (hover, go-to-definition, etc) fail inside such a function, because it violates the almost-rule that the AST corresponds to the lexical structure.

dgoldman updated this revision to Diff 360855.Jul 22 2021, 9:16 AM
dgoldman marked 3 inline comments as done.

Rebase and remove TextMark files

Removed the TextMarks file, just need to swap over how the insertions are done

clang-tools-extra/clangd/TextMarks.h
45

Ah CollectMacros has lots of stuff inline, so I mirrored that, doesn't look like many other file do that though

dgoldman updated this revision to Diff 360972.Jul 22 2021, 2:22 PM

Merge pragmas outside of the DocumentOutline

dgoldman updated this revision to Diff 360980.Jul 22 2021, 2:39 PM

Misc vector changes

PTAL, now merge outside of DocumentOutline although the algorithm I've used is roughly the same

sammccall added inline comments.Jul 23 2021, 5:52 AM
clang-tools-extra/clangd/CollectMacros.cpp
23 ↗(On Diff #360980)

Interpretation of the marks belongs with the rest of the DocumentSymbol code too, I think - until we have some evidence that this is structure we want to share.

67 ↗(On Diff #360980)

This code belongs with the rest of the DocumentSymbol code

clang-tools-extra/clangd/CollectMacros.h
106 ↗(On Diff #360980)

Line number is enough, right?
Column is not interesting, and pragma marks expanded from macros are not possible (and wouldn't be in the main file for our purposes)

115 ↗(On Diff #360980)

this should be tested in CollectMacrosTests.cpp or ParsedASTTests.cpp
(It's fine to test via TestTU though, and this will let you test both the preamble and non-preamble case)

sammccall added inline comments.Jul 23 2021, 5:52 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

FWIW the flow control/how we make progress seem hard to follow here to me.

In particular I think I'm struggling with the statefulness of "is there an open mark group".

Possible simplifications:

  • define a dummy root symbol, which seems clearer than the vector<symbols> + range
  • avoid reverse-sorting the list of pragma symbols, and just consume from the front of an ArrayRef instead
  • make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)
dgoldman updated this revision to Diff 361208.Jul 23 2021, 7:48 AM
dgoldman marked 3 inline comments as done.

Fixes for review

dgoldman added inline comments.Jul 23 2021, 7:57 AM
clang-tools-extra/clangd/CollectMacros.h
106 ↗(On Diff #360980)

No, I think the column is useful, ideally we would point directly to the text that the user has although currently we point to the mark symbol.

clang-tools-extra/clangd/FindSymbols.cpp
634

define a dummy root symbol, which seems clearer than the vector<symbols> + range

I guess? Then we'd take in a `DocumentSymbol & and a ArrayRef<PragmaMarkSymbol> & (or just by value and then return it as well). The rest would be the same though

In particular I think I'm struggling with the statefulness of "is there an open mark group".

We need to track the current open group if there is one in order to move children to it.

make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)

The important thing here is knowing where the pragma mark ends - if it doesn't, it actually gets all of the children. So we'd have to peak at the next pragma mark, add all symbols before it to us as children, and then potentially recurse to nest it inside of a symbol. I'll try it out and see if it's simpler.

dgoldman updated this revision to Diff 361246.Jul 23 2021, 8:50 AM

Add in suggested algorithm

dgoldman added inline comments.Jul 23 2021, 8:54 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

I've implemented your suggestion. I don't think it's simpler, but LMK, maybe it can be improved.

dgoldman updated this revision to Diff 361247.Jul 23 2021, 8:57 AM

minor comment fix

dgoldman updated this revision to Diff 361344.Jul 23 2021, 2:02 PM

Test collectPragmaMarksCallback + preamble merging

dgoldman marked an inline comment as done.Jul 23 2021, 2:02 PM
dgoldman added inline comments.Jul 26 2021, 12:48 PM
clang-tools-extra/clangd/TextMarks.h
22

Thought of another use case - for code folding it would be nice to let the #pragma marks fold. Not sure if ya'll are planning to ship clangd's code folding though.

kadircet added inline comments.Jul 29 2021, 7:38 AM
clang-tools-extra/clangd/CollectMacros.cpp
13 ↗(On Diff #360980)

can you nest this inside clang::clangd and drop the qualifiers ?

20 ↗(On Diff #360980)

nit: drop braces

clang-tools-extra/clangd/CollectMacros.h
106 ↗(On Diff #360980)

okay then let's store a Range here instead, and get rid of the dependency on a SourceManager for rendering it maybe?

clang-tools-extra/clangd/FindSymbols.cpp
634
while(Pragmas) {
// We'll figure out where the Pragmas.front() should go.
Pragma P = Pragmas.front();
DocumentSymbol *Cur = Root;
while(Cur->contains(P)) {
  auto *OldCur = Cur;
  for(auto *C : Cur->children) {
     // We assume at most 1 child can contain the pragma (as pragmas are on a single line, and children have disjoint ranges)
     if (C->contains(P)) {
         Cur = C;
         break;
     }
  }
  // Cur is immediate parent of P
  if (OldCur == Cur) {
    // Just insert P into children if it is not a group and we are done.
    // Otherwise we need to figure out when current pragma is terminated:
// if next pragma is not contained in Cur, or is contained in one of the children, It is at the end of Cur, nest all the children that appear after P under the symbol node for P.
// Otherwise nest all the children that appear after P but before next pragma under the symbol node for P.
// Pop Pragmas and break
  }
}
}

Does that make sense, i hope i am not missing something obvious? Complexity-wise in the worst case we'll go all the way down to a leaf once per pragma, since there will only be a handful of pragmas most of the time it shouldn't be too bad.

781

I think this reads easier:

bool IsGroup = Name.consume_front("-");
Name = Name.ltrim();
if (Name.empty())
  Name = IsGroup ? "unnamed group" : ...;
kadircet added inline comments.Jul 29 2021, 7:42 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

oops, i was looking into an older revision and missed mergepragmas2, i think it looks quite similar to this one but we can probably get rid of the recursion as well and simplify a couple more cases

dgoldman added inline comments.Aug 2 2021, 9:30 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

This makes sense, I think that works for the most part besides dropping the recursion, specifically for

// Next pragma is contained in the Sym, it belongs there and doesn't
// affect us at all.
if (Sym.range.contains(NextPragma.DocSym.range)) {
  Sym.children = mergePragmas2(Sym.children, Pragmas, Sym.range);
  continue;
}

I guess we could explicitly forbid 3+ layers of nesting and handle it inline there? But I'm not sure it's worth the effort to rewrite all of this - the recursion shouldn't be deep and we avoid needing to shift vector elements over by recreating a new one.

dgoldman updated this revision to Diff 363562.Aug 2 2021, 1:36 PM
dgoldman marked 4 inline comments as done.

More fixes for review

clang-tools-extra/clangd/CollectMacros.cpp
13 ↗(On Diff #360980)

Done, had to keep clangd qualifier to prevent mixup with the function below, otherwise I get error: call to non-static member function without an object argument

clang-tools-extra/clangd/FindSymbols.cpp
781

That behavior is slightly different, we want to treat #pragma mark -Foo as -Foo non group but #pragma mark - Foo as Foo group.

kadircet added inline comments.Aug 3 2021, 2:13 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

Sorry I don't follow why we can't get rid of the recursion in this case.

Two loop solution I described above literally tries to find the document symbol node, such that the current pragma is contained in that node && current pragma isn't contained in any of that node's children. Afterwards it inserts the pragma into that node and starts traversing the tree from root again for the next pragma.

Again I don't follow where the 3+ layers of nesting constraint came from. But I do feel like the iterative version is somewhat easier to reason about (especially keeping track of what's happening with pragmas.front() and the way it bails out via parentrange check). Shifting of the vector is definitely unfortunate but I think it shouldn't imply big performance hits in practice as we are only shifting the children of a single node.

781

oh I see. I indeed missed the \s+ in the comment assumed it was \s*.
Is it really important to have that difference? If so, it might be useful to spell it out explicitly (e.g. -Foo is not considered as a group)

dgoldman marked 2 inline comments as done.Aug 3 2021, 6:54 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
634

Yeah, I understand the first part, I think specifically handling the group case after you discover where it needs to be inserted is a bit more complicated, something like the following:

// Pragma is a group, so we need to figure out where it terminates:
// - If the next Pragma is not contained in Cur, P owns all of its
//   parent's children which occur after P.
// - If the next pragma is contained in Cur but actually belongs to one
//   of the parent's children, we temporarily skip over it and look at
//   the next pragma to decide where we end.
// - Otherwise nest all of its parent's children which occur after P but
//   before the next pragma.

And yeah, shifting in the worst case is definitely worst (due to repeat shifting) although it shouldn't be too common in practice (things like a large @implementation block would probably have the most children).

kadircet added inline comments.Aug 4 2021, 3:57 AM
clang-tools-extra/clangd/FindSymbols.cpp
634

we temporarily skip over it and look at the next pragma to decide where we end.

Ah you are right I was missing this part. As you suggested we'll need to loop over all the remaining pragmas, which will make the complexity ((quadratic in terms of # of pragmas) * (depth of document symbol tree)) I think it is still acceptable in practice. I don't think there'll be more than a dozen pragmas in a file. (moreover i think this problem exists with the current implementation you've provided, see my comment below)

Are you fine with reshaping the implementation in that way then? Because I still feel like the code will end up looking a lot more straightforward, even if not the most optimal and we can try to optimize the rest if it shows up in the latency graphs.

763

I suppose this is not correct in cases like.

#pragma mark - Outer
foo {
  #pragma mark - Foo
}
bar {
  # pragma mark - Bar
}
baz {
  # pragma mark - Baz
}

We'll first move foo under Outer, then start processing bar we notice that It comes after pragma Foo, and then also notice that pragma Foo is not contained in bar hence break. but all of foo,bar,baz should've been moved to be under Outer instead.

dgoldman marked 2 inline comments as done.Aug 4 2021, 9:50 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
634

Implementing this now - there's one other edge case here, in that we don't want to treat #pragma mark Blah as a group. I guess I can maintain state similar to the first solution to solve that.

763

Yeah that's a good point, we'd have to do something like your suggested solution or maintain more state to consider the children that were moved.

dgoldman marked an inline comment as done.Aug 4 2021, 9:51 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
634

Wait nevermind, I don't think that's a problem since the pragmas should never overlap, just have some other issue in my code I need to figure out

dgoldman updated this revision to Diff 364263.Aug 4 2021, 3:07 PM

the third time is the charm

Okay I think this is what you had in mind. LMK, if it's good I'll go ahead and delete the other ones

Thanks! I think it looks good, I've suggested some more simplifications to termination detection. If you can delete the rest I'd like to take a final look at the rest of the details.

clang-tools-extra/clangd/FindSymbols.cpp
779

nit: std::move

811
bool TerminatedByNextPragma = false;
for (auto &NextPragma : Pragmas) {
  // If we hit a pragma outside of Cur, the rest will be outside as well.
  if (!Cur->contains(NextPragma))
       break;

  // NextPragma cannot terminate P if it is nested inside a children, look for the next one.
  if (any_of(Cur->children, [](...) { return Child->contains(NextPragma); })
      continue;

  // Pragma owns all the children between P and NextPragma
  auto It = std::partition(....);
  P.children.assign(make_move_iterator(It), make_move_iterator(Cur->children.end()));
  Cur->children.erase(It, ...);
  TerminatedByNextPragma = true;
  break;
}
if(!TerminatedByNextPragma) {
  // P is terminated by the end of current symbol, hence it owns all the children after P.
  auto It = std::partition(....);
  P.children.assign(make_move_iterator(It), make_move_iterator(Cur->children.end()));
  Cur->children.erase(It, ...);  
}
// Update the range for P to cover children and append to Cur.
dgoldman updated this revision to Diff 364840.Aug 6 2021, 10:15 AM
dgoldman marked an inline comment as done.

Simplify children moving

dgoldman marked an inline comment as done.Aug 6 2021, 10:17 AM

PTAL, also updated the test to no longer check the children's order

thanks, mostly looks good couple of nits!

clang-tools-extra/clangd/CollectMacros.cpp
42 ↗(On Diff #364840)

are we fine with these annotations spanning #pragma [[mark XX]] rather than the whole line or just XX ?

clang-tools-extra/clangd/FindSymbols.cpp
632

nit: Pragmas.reserve

636

nit: 0,0 should suffice + makes sure the range is valid (in theory these are zero-based)

642

nit: llvm::makeArrayRef(Pragmas)

680

nit: s/std::partition(Cur->children.begin(), Cur->children.end()/llvm::partition(Cur->children/

clang-tools-extra/clangd/FindTarget.cpp
655 ↗(On Diff #364840)

this change does not look relevant, can you please revert?

clang-tools-extra/clangd/ParsedAST.cpp
429

We are not patching marks for stale preambles, which I believe is fine but worth a FIXME. If you are interested please take a look at https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/Preamble.cpp#L427, it should be fairly easy to handle movements of #pragma marks in preamble section.

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1055

can you add these cases as well to make sure details of the merge logic are working as expected?

dgoldman updated this revision to Diff 365196.Aug 9 2021, 8:44 AM
dgoldman marked 9 inline comments as done.

Address nits

clang-tools-extra/clangd/CollectMacros.cpp
42 ↗(On Diff #364840)

I think it's okay for now, I added a FIXME which I'll address in a follow up

clang-tools-extra/clangd/FindTarget.cpp
655 ↗(On Diff #364840)

Reverted, will send out another change to fix the warning

clang-tools-extra/clangd/ParsedAST.cpp
429

Added a FIXME. Does something similar need to be done for MainFileMacros?

kadircet accepted this revision.Aug 9 2021, 12:03 PM

thanks, lgtm!

clang-tools-extra/clangd/ParsedAST.cpp
429

we already handle macros today in https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/Preamble.cpp#L208. we just need to handle pragma mark in the callbacks and generate the relevant patch.

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1100

nit: drop AllOf

1102

nit: drop AllOf

This revision is now accepted and ready to land.Aug 9 2021, 12:03 PM
dgoldman updated this revision to Diff 365273.Aug 9 2021, 12:58 PM
dgoldman marked 2 inline comments as done.

Remove unnecessary AllOf

dgoldman updated this revision to Diff 365277.Aug 9 2021, 1:05 PM

Remove more AllOfs

This revision was landed with ongoing or failed builds.Aug 9 2021, 2:07 PM
This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Aug 10 2021, 6:26 AM
This revision was automatically updated to reflect the committed changes.