This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Revamp textDocument/onTypeFormatting.
ClosedPublic

Authored by sammccall on Apr 12 2019, 3:46 AM.

Details

Summary

The existing implementation (which triggers on }) is fairly simple and
has flaws:

  • doesn't trigger frequently/regularly enough (particularly in editors that type the } for you)
  • often reformats too much code around the edit
  • has jarring cases that I don't have clear ideas for fixing

This implementation is designed to trigger on newline, which feels to me more
intuitive than } or ;.
It does have allow for reformatting after other characters - it has a
basic behavior and a model for adding specialized behavior for
particular characters. But at least initially I'd stick to advertising
\n in the capabilities.

This also handles comment splitting: when you insert a line break inside
a line comment, it will make the new line into an aligned line comment.

Working on tests, but want people to patch it in and try it - it's hard to
see if "feel" is right purely by looking at a test.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Apr 12 2019, 3:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2019, 3:46 AM
sammccall updated this revision to Diff 194855.Apr 12 2019, 5:45 AM

Unit tests.

sammccall updated this revision to Diff 195338.Apr 16 2019, 1:47 AM

Lots more test cases and better handling of braced blocks.

This turns out to be an interesting case:

if (foo) {
^}

Really the right thing to do here is insert another newline.
Unfortuantely that means inserting text before and after the cursor, which
confuses tooling::Replacements so... lots of churn in this patch :-(

@kadircet if you're interested in the behavior here, you can patch this in and try out with vscode.

Found an issue today.

Input (break a line at ^):

// here is my comment ^// another comment

Expected:

// here is my comment
^// another comment

Actual (an extra comment marker is added):

// here is my comment
// // another comment
sammccall updated this revision to Diff 196538.Apr 24 2019, 3:42 PM

Fix comment wrapping behavior:

  • when splitting a comment before , don't add another
  • if the editor inserts // before the cursor to continue a line comment, indent it and adjust to three slashes if needed.

This doesn't mean enter at the *end* of a comment will continue the comment,
but if editors do that (vim does) then it will be respected.

Another example:

int test() {

  ^
}

Expected: a newline was added.
Actual: newline does not allow to be added.

Input:

int test() {
}^

Expected:

int test() {
}
^

Actual:

int test() {}
^
sammccall updated this revision to Diff 196656.Apr 25 2019, 9:29 AM

Avoid removing lines when formatting after \n.

Another example:

int test() {

  ^
}

Expected: a newline was added.
Actual: newline does not allow to be added.

Fixed.

Input:

int test() {
}^

Actual:

int test() {}
^

I really do think this is the right thing in particular if you typed the brace.
It's visually confusing that hitting \n results in the cursor staying on the same line and the } moving.
And this interacts badly with the previous bug.
But can you see if this is something you could live with, and if not, explain why you don't want the {} formatted?

Input:

int test() {
}^

Actual:

int test() {}
^

I really do think this is the right thing in particular if you typed the brace.
It's visually confusing that hitting \n results in the cursor staying on the same line and the } moving.
And this interacts badly with the previous bug.
But can you see if this is something you could live with, and if not, explain why you don't want the {} formatted?

In general, I have high tolerance for quirks in IDE features. But this one really makes me itchy. I'd say clangd should never remove lines while I'm trying to add them. Are there examples where this behavior is useful?

Input:

// this is my comment. ^
// and it continues on the next line.

Expected:

// this is my comment.
// ^
// and it continues on the next line.

Actual:

// this is my comment.
^
// and it continues on the next line.

Input:

if (something)
  foo; // a comment^

Expected:

if (something)
  foo; // a comment
^

Actual (indented to align with a comment):

if (something)
  foo; // a comment
       ^
sammccall updated this revision to Diff 196864.Apr 26 2019, 9:13 AM

Fix more comment contination cases.

Found today:

if (something)
  return; // some long comment
          // that takes two lines.^

Expected: comment on last line is not re-indented.
Actual (comment re-indented):

if (something)
  return; // some long comment
// that takes two lines.

Found today:

I'm not able to reproduce this, can you give a complete example?

I'm not able to reproduce this, can you give a complete example?

After adding a newline here:

int test(bool x) {
  if (x)
    return 10; // All spelled tokens are accounted for.
               // that takes two lines^
}

I get:

int test(bool x) {
  if (x)
    return 10; // All spelled tokens are accounted for.
  // that takes two lines
  ^
}

Maybe that's due to some extra logic applied in VSCode on top of what we do.
Let me know if this does not reproduce.

sammccall updated this revision to Diff 197928.May 3 2019, 1:34 AM

Fix latest case (trailing comment outdented in vscode).

Maybe that's due to some extra logic applied in VSCode on top of what we do.
Let me know if this does not reproduce.

Aha, the missing piece was that vscode reindented the cursor to align with the comment (but didn't continue the comment).
This caused us to replace the cursor with an *indented* identifier, and clang-format thought the last line of the comment was attached to it.

It almost seems silly to keep patching the code rather than add a "break here" option to clang-format that doesn't have side-effects. However the fix here is pretty simple, so why not.

Do you think this is converging to something we should land? The last few have been pretty small tweaks and I'm happy to keep fixing those as incremental patches.
Conversely, if the basic design isn't right, then further polish probably isn't worthwhile.

Sorry for the delay, I'll make sure to take a close look at the change tomorrow.

As mentioned in offline discussions, doing this through clang-format seems like the right approach. At the same the markers we put into the files to drive formatting seem fragile, giving results we don't want.
The biggest issue that I can see is that it's super-hard to predict what clang-format is going to do in each case and even harder to certify that this should work in general case.

Intuitively, moving the logic to clang-format seems like the right thing to do in the long run (e.g. introducing a special cursor marker into clang-format, similar a code completion marker used by clang or something similar).
OTOH, it's hard for me to asses the amount of work needed to do this inside clang-format itself (not a clang-format expert), and the patch is definitely an improvement to what we had before (I've been using it for awhile now). The problems do get fixed, so I'm totally happy with it landing as is if we commit to fixing those nasty cases until we run out of them.

PS. In the meantime, I've found another case where newline gets eaten:

class SyntaxTreeTest {
public:
  ^ 
};

Expected: newline is added.
Actual: does not let to add a newline (the added newline is removed).

yvvan added a subscriber: yvvan.May 8 2019, 8:55 AM

@ilya-biryukov
What do you think about D53072? It can be polished and combined with this change removing some code from here (which I assume is a good thing).
The idea there is that clang-format knows that it's not allowed to remove new lines and it always marks them with MustBreakBefore.
I'm not sure if anybody needs an executable but I think it's not a big deal to have an extra reformat() function.

It is also safe because it was the only way I found which does not break the code style like, for instance, adding comment block to the end of the previous line.

@ilya-biryukov
What do you think about D53072? It can be polished and combined with this change removing some code from here (which I assume is a good thing).
The idea there is that clang-format knows that it's not allowed to remove new lines and it always marks them with MustBreakBefore.
I'm not sure if anybody needs an executable but I think it's not a big deal to have an extra reformat() function.

It is also safe because it was the only way I found which does not break the code style like, for instance, adding comment block to the end of the previous line.

I also think (both Sam and you seem to agree with this) that we'll definitely need changes to clang-format to support this use-case.
Also sympathetic to the view that this change should probably live in clang-format, but having it in clangd first and moving to clang-format later also LG.
@sammccall, what's your plan there? Experimenting in clangd and moving to clang-format later? Any reason to not start in clang-format in the first place?

Will add a few comments regarding the need for a separate tool in D53072 directly.

ilya-biryukov accepted this revision.May 27 2019, 5:01 AM

LGTM with a few NITs and a few questions about the possibility of moving this to clang-format at some point in the future.
From what I can to simplify the calling, we need to:

  1. teach clang-format about a cursor (by making it a special token or somehow else), this would allow to avoid some book-keeping;
  2. teach clang-format to format in presence of unmatched parentheses. Again, would simplify things. Although not sure it's easy to do in clang-format either;
  3. teach clang-format to respect indentation before the cursor (added by the editor in our case).

After that we don't seem to need the complicated multi-pass replacements.

  • Am I missing something that we also need from clang-format here?
  • How would you estimate the amount of work needed to move this functionality to clang-format?
  • In general, do you think it's worth moving it there or not?
clangd/Format.cpp
83 ↗(On Diff #197928)

NIT: put a comment about filename being required but ignored here to explain why we need this variable.

clangd/Format.h
25 ↗(On Diff #197928)

NIT: maybe be more concrete? near \p Cursor means right before \p Cursor? Or can it have other characters (e.g. whitespace in your example)?

This revision is now accepted and ready to land.May 27 2019, 5:01 AM
sammccall marked 3 inline comments as done.Jun 10 2019, 7:21 AM

OK, this has been stuck for a while and (as discussed a bit offline) I haven't been able to make the alternative approaches work.

@ilya-biryukov
What do you think about D53072? It can be polished and combined with this change removing some code from here (which I assume is a good thing).
The idea there is that clang-format knows that it's not allowed to remove new lines and it always marks them with MustBreakBefore.
I'm not sure if anybody needs an executable but I think it's not a big deal to have an extra reformat() function.

It is also safe because it was the only way I found which does not break the code style like, for instance, adding comment block to the end of the previous line.

I'm not sure D53072 helps in its current form. Preserving newlines with content on the line isn't always the (IMO) right formatting: see e.g. FormatBrace test.
(I have other concerns about the details of that patch, but they don't really relate to this one).

LGTM with a few NITs and a few questions about the possibility of moving this to clang-format at some point in the future.
From what I can to simplify the calling, we need to:

  1. teach clang-format about a cursor (by making it a special token or somehow else), this would allow to avoid some book-keeping;

There's at least one wrinkle with making it a special token: it may be within a token. E.g. after splitting a line comment, the cursor is at offset 3 within // the new line. Not sure how deep this problem goes.

  1. teach clang-format to format in presence of unmatched parentheses. Again, would simplify things. Although not sure it's easy to do in clang-format either;

Note this needs to be a style or other option, as the fact that clang-format bails out on broken code in general is well-established desired behavior.
I attempted this (by having the parser return "true" rather than "false" on reaching the end of a paren/brace list, and got lost in the details. I'm not sure whether it's likely to work, or whether clang-format would just do the same source transformation as we do here. (Obviously sharing it would be nice).

  1. teach clang-format to respect indentation before the cursor (added by the editor in our case).

I'm not sure what you mean by this: I don't think we want that indentation to be respected, but rather reformatted away.

After that we don't seem to need the complicated multi-pass replacements.

We currently have pre-formatting changes (e.g. comment splitting). I'm not sure if you expect these to belong to clang-format, but I wouldn't. (They're more like a "respond to newline" helper for editors than they are like formatting a range). So you'd go from {pre-formatting changes, insert placeholder, clang-format, remove placeholder} to {pre-formatting changes, clang-format}.
So the logic to implement multi-pass replacements is still needed I think, though you'd have fewer actual passes.

  • Am I missing something that we also need from clang-format here?

The comment splitting and {} splitting, I think.

  • How would you estimate the amount of work needed to move this functionality to clang-format?
  • In general, do you think it's worth moving it there or not?

I don't know, I put maybe 8 hours into this approach and didn't make any progress.
If someone finds a clean way to do it, I'd be in favour. But a hacky/brute-force way would do more harm than good in my opinion.

In the meantime, I've found another case where newline gets eaten:

Yes :-(
I've added a test to document this bad behavior. It's very hard to fix with clang-format as a black-box, because it hard-codes erasure of blank lines adjacent to a formatted region.
Probably the simplest approach is to add a clang-format option to always preserve blank lines. Although the motivation isn't an actual style guide, it makes sense to think of this as a style rule, it's very simple, and it's related to existing options.

clangd/Format.h
25 ↗(On Diff #197928)

s/near/before/.
Yes, it can have other characters, these are already mentioned in the comment.

This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2019, 7:24 AM