Page MenuHomePhabricator

[clangd] Support `#pragma mark` in the outline
ClosedPublic

Authored by dgoldman on Jul 13 2021, 9:27 AM.

Details

Summary

Xcode uses #pragma mark - to draw a divider in the outline view
and #pragma mark Note to add Note in the outline view. For more
information, see https://nshipster.com/pragma/.

Since the LSP spec doesn't contain dividers for the symbol outline,
instead we treat #pragma mark - as a group with children - the
decls that come after it.

Work left to do:

  • Support some edge cases where the document symbols may not be nested / sorted properly.
  • Consider supporting #pragma marks before any decls
  • Consider supporting the following:
#pragma mark -
#pragma mark Bar

by treating it like #pragma mark - Bar instead of an unnamed
group followed by a named mark.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sammccall added inline comments.Jul 23 2021, 5:52 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

FWIW the flow control/how we make progress seem hard to follow here to me.

In particular I think I'm struggling with the statefulness of "is there an open mark group".

Possible simplifications:

  • define a dummy root symbol, which seems clearer than the vector<symbols> + range
  • avoid reverse-sorting the list of pragma symbols, and just consume from the front of an ArrayRef instead
  • make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)
dgoldman updated this revision to Diff 361208.Jul 23 2021, 7:48 AM
dgoldman marked 3 inline comments as done.

Fixes for review

dgoldman added inline comments.Jul 23 2021, 7:57 AM
clang-tools-extra/clangd/CollectMacros.h
106

No, I think the column is useful, ideally we would point directly to the text that the user has although currently we point to the mark symbol.

clang-tools-extra/clangd/FindSymbols.cpp
535

define a dummy root symbol, which seems clearer than the vector<symbols> + range

I guess? Then we'd take in a `DocumentSymbol & and a ArrayRef<PragmaMarkSymbol> & (or just by value and then return it as well). The rest would be the same though

In particular I think I'm struggling with the statefulness of "is there an open mark group".

We need to track the current open group if there is one in order to move children to it.

make the outer loop over pragmas, rather than symbols. It would first check if the pragma belongs directly here or not, and if so, loop over symbols to work out which should become children. This seems very likely to be efficient enough in practice (few pragmas, or most children are grouped into pragmas)

The important thing here is knowing where the pragma mark ends - if it doesn't, it actually gets all of the children. So we'd have to peak at the next pragma mark, add all symbols before it to us as children, and then potentially recurse to nest it inside of a symbol. I'll try it out and see if it's simpler.

dgoldman updated this revision to Diff 361246.Jul 23 2021, 8:50 AM

Add in suggested algorithm

dgoldman added inline comments.Jul 23 2021, 8:54 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

I've implemented your suggestion. I don't think it's simpler, but LMK, maybe it can be improved.

dgoldman updated this revision to Diff 361247.Jul 23 2021, 8:57 AM

minor comment fix

dgoldman updated this revision to Diff 361344.Jul 23 2021, 2:02 PM

Test collectPragmaMarksCallback + preamble merging

dgoldman marked an inline comment as done.Jul 23 2021, 2:02 PM
dgoldman added inline comments.Jul 26 2021, 12:48 PM
clang-tools-extra/clangd/TextMarks.h
22 ↗(On Diff #358606)

Thought of another use case - for code folding it would be nice to let the #pragma marks fold. Not sure if ya'll are planning to ship clangd's code folding though.

kadircet added inline comments.Jul 29 2021, 7:38 AM
clang-tools-extra/clangd/CollectMacros.cpp
13

can you nest this inside clang::clangd and drop the qualifiers ?

20

nit: drop braces

clang-tools-extra/clangd/CollectMacros.h
106

okay then let's store a Range here instead, and get rid of the dependency on a SourceManager for rendering it maybe?

clang-tools-extra/clangd/FindSymbols.cpp
535
while(Pragmas) {
// We'll figure out where the Pragmas.front() should go.
Pragma P = Pragmas.front();
DocumentSymbol *Cur = Root;
while(Cur->contains(P)) {
  auto *OldCur = Cur;
  for(auto *C : Cur->children) {
     // We assume at most 1 child can contain the pragma (as pragmas are on a single line, and children have disjoint ranges)
     if (C->contains(P)) {
         Cur = C;
         break;
     }
  }
  // Cur is immediate parent of P
  if (OldCur == Cur) {
    // Just insert P into children if it is not a group and we are done.
    // Otherwise we need to figure out when current pragma is terminated:
// if next pragma is not contained in Cur, or is contained in one of the children, It is at the end of Cur, nest all the children that appear after P under the symbol node for P.
// Otherwise nest all the children that appear after P but before next pragma under the symbol node for P.
// Pop Pragmas and break
  }
}
}

Does that make sense, i hope i am not missing something obvious? Complexity-wise in the worst case we'll go all the way down to a leaf once per pragma, since there will only be a handful of pragmas most of the time it shouldn't be too bad.

682

I think this reads easier:

bool IsGroup = Name.consume_front("-");
Name = Name.ltrim();
if (Name.empty())
  Name = IsGroup ? "unnamed group" : ...;
kadircet added inline comments.Jul 29 2021, 7:42 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

oops, i was looking into an older revision and missed mergepragmas2, i think it looks quite similar to this one but we can probably get rid of the recursion as well and simplify a couple more cases

dgoldman added inline comments.Aug 2 2021, 9:30 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

This makes sense, I think that works for the most part besides dropping the recursion, specifically for

// Next pragma is contained in the Sym, it belongs there and doesn't
// affect us at all.
if (Sym.range.contains(NextPragma.DocSym.range)) {
  Sym.children = mergePragmas2(Sym.children, Pragmas, Sym.range);
  continue;
}

I guess we could explicitly forbid 3+ layers of nesting and handle it inline there? But I'm not sure it's worth the effort to rewrite all of this - the recursion shouldn't be deep and we avoid needing to shift vector elements over by recreating a new one.

dgoldman updated this revision to Diff 363562.Aug 2 2021, 1:36 PM
dgoldman marked 4 inline comments as done.

More fixes for review

clang-tools-extra/clangd/CollectMacros.cpp
13

Done, had to keep clangd qualifier to prevent mixup with the function below, otherwise I get error: call to non-static member function without an object argument

clang-tools-extra/clangd/FindSymbols.cpp
682

That behavior is slightly different, we want to treat #pragma mark -Foo as -Foo non group but #pragma mark - Foo as Foo group.

kadircet added inline comments.Aug 3 2021, 2:13 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

Sorry I don't follow why we can't get rid of the recursion in this case.

Two loop solution I described above literally tries to find the document symbol node, such that the current pragma is contained in that node && current pragma isn't contained in any of that node's children. Afterwards it inserts the pragma into that node and starts traversing the tree from root again for the next pragma.

Again I don't follow where the 3+ layers of nesting constraint came from. But I do feel like the iterative version is somewhat easier to reason about (especially keeping track of what's happening with pragmas.front() and the way it bails out via parentrange check). Shifting of the vector is definitely unfortunate but I think it shouldn't imply big performance hits in practice as we are only shifting the children of a single node.

682

oh I see. I indeed missed the \s+ in the comment assumed it was \s*.
Is it really important to have that difference? If so, it might be useful to spell it out explicitly (e.g. -Foo is not considered as a group)

dgoldman marked 2 inline comments as done.Aug 3 2021, 6:54 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
535

Yeah, I understand the first part, I think specifically handling the group case after you discover where it needs to be inserted is a bit more complicated, something like the following:

// Pragma is a group, so we need to figure out where it terminates:
// - If the next Pragma is not contained in Cur, P owns all of its
//   parent's children which occur after P.
// - If the next pragma is contained in Cur but actually belongs to one
//   of the parent's children, we temporarily skip over it and look at
//   the next pragma to decide where we end.
// - Otherwise nest all of its parent's children which occur after P but
//   before the next pragma.

And yeah, shifting in the worst case is definitely worst (due to repeat shifting) although it shouldn't be too common in practice (things like a large @implementation block would probably have the most children).

kadircet added inline comments.Aug 4 2021, 3:57 AM
clang-tools-extra/clangd/FindSymbols.cpp
535

we temporarily skip over it and look at the next pragma to decide where we end.

Ah you are right I was missing this part. As you suggested we'll need to loop over all the remaining pragmas, which will make the complexity ((quadratic in terms of # of pragmas) * (depth of document symbol tree)) I think it is still acceptable in practice. I don't think there'll be more than a dozen pragmas in a file. (moreover i think this problem exists with the current implementation you've provided, see my comment below)

Are you fine with reshaping the implementation in that way then? Because I still feel like the code will end up looking a lot more straightforward, even if not the most optimal and we can try to optimize the rest if it shows up in the latency graphs.

664

I suppose this is not correct in cases like.

#pragma mark - Outer
foo {
  #pragma mark - Foo
}
bar {
  # pragma mark - Bar
}
baz {
  # pragma mark - Baz
}

We'll first move foo under Outer, then start processing bar we notice that It comes after pragma Foo, and then also notice that pragma Foo is not contained in bar hence break. but all of foo,bar,baz should've been moved to be under Outer instead.

dgoldman marked 2 inline comments as done.Aug 4 2021, 9:50 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
535

Implementing this now - there's one other edge case here, in that we don't want to treat #pragma mark Blah as a group. I guess I can maintain state similar to the first solution to solve that.

664

Yeah that's a good point, we'd have to do something like your suggested solution or maintain more state to consider the children that were moved.

dgoldman marked an inline comment as done.Aug 4 2021, 9:51 AM
dgoldman added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
535

Wait nevermind, I don't think that's a problem since the pragmas should never overlap, just have some other issue in my code I need to figure out

dgoldman updated this revision to Diff 364263.Aug 4 2021, 3:07 PM

the third time is the charm

Okay I think this is what you had in mind. LMK, if it's good I'll go ahead and delete the other ones

Thanks! I think it looks good, I've suggested some more simplifications to termination detection. If you can delete the rest I'd like to take a final look at the rest of the details.

clang-tools-extra/clangd/FindSymbols.cpp
680

nit: std::move

712
bool TerminatedByNextPragma = false;
for (auto &NextPragma : Pragmas) {
  // If we hit a pragma outside of Cur, the rest will be outside as well.
  if (!Cur->contains(NextPragma))
       break;

  // NextPragma cannot terminate P if it is nested inside a children, look for the next one.
  if (any_of(Cur->children, [](...) { return Child->contains(NextPragma); })
      continue;

  // Pragma owns all the children between P and NextPragma
  auto It = std::partition(....);
  P.children.assign(make_move_iterator(It), make_move_iterator(Cur->children.end()));
  Cur->children.erase(It, ...);
  TerminatedByNextPragma = true;
  break;
}
if(!TerminatedByNextPragma) {
  // P is terminated by the end of current symbol, hence it owns all the children after P.
  auto It = std::partition(....);
  P.children.assign(make_move_iterator(It), make_move_iterator(Cur->children.end()));
  Cur->children.erase(It, ...);  
}
// Update the range for P to cover children and append to Cur.
dgoldman updated this revision to Diff 364840.Aug 6 2021, 10:15 AM
dgoldman marked an inline comment as done.

Simplify children moving

dgoldman marked an inline comment as done.Aug 6 2021, 10:17 AM

PTAL, also updated the test to no longer check the children's order

thanks, mostly looks good couple of nits!

clang-tools-extra/clangd/CollectMacros.cpp
42

are we fine with these annotations spanning #pragma [[mark XX]] rather than the whole line or just XX ?

clang-tools-extra/clangd/FindSymbols.cpp
581

nit: s/std::partition(Cur->children.begin(), Cur->children.end()/llvm::partition(Cur->children/

641

nit: Pragmas.reserve

645

nit: 0,0 should suffice + makes sure the range is valid (in theory these are zero-based)

651

nit: llvm::makeArrayRef(Pragmas)

clang-tools-extra/clangd/FindTarget.cpp
655 ↗(On Diff #364840)

this change does not look relevant, can you please revert?

clang-tools-extra/clangd/ParsedAST.cpp
454

We are not patching marks for stale preambles, which I believe is fine but worth a FIXME. If you are interested please take a look at https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/Preamble.cpp#L427, it should be fairly easy to handle movements of #pragma marks in preamble section.

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1055

can you add these cases as well to make sure details of the merge logic are working as expected?

dgoldman updated this revision to Diff 365196.Aug 9 2021, 8:44 AM
dgoldman marked 9 inline comments as done.

Address nits

clang-tools-extra/clangd/CollectMacros.cpp
42

I think it's okay for now, I added a FIXME which I'll address in a follow up

clang-tools-extra/clangd/FindTarget.cpp
655 ↗(On Diff #364840)

Reverted, will send out another change to fix the warning

clang-tools-extra/clangd/ParsedAST.cpp
454

Added a FIXME. Does something similar need to be done for MainFileMacros?

kadircet accepted this revision.Aug 9 2021, 12:03 PM

thanks, lgtm!

clang-tools-extra/clangd/ParsedAST.cpp
454

we already handle macros today in https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/Preamble.cpp#L208. we just need to handle pragma mark in the callbacks and generate the relevant patch.

clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
1100

nit: drop AllOf

1102

nit: drop AllOf

This revision is now accepted and ready to land.Aug 9 2021, 12:03 PM
dgoldman updated this revision to Diff 365273.Aug 9 2021, 12:58 PM
dgoldman marked 2 inline comments as done.

Remove unnecessary AllOf

dgoldman updated this revision to Diff 365277.Aug 9 2021, 1:05 PM

Remove more AllOfs

This revision was landed with ongoing or failed builds.Aug 9 2021, 2:07 PM
This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Aug 10 2021, 6:26 AM
This revision was automatically updated to reflect the committed changes.