Page MenuHomePhabricator

clang-format: insert trailing commas into containers.
ClosedPublic

Authored by mprobst on Jan 24 2020, 7:22 AM.

Details

Summary

This change adds an option to insert trailing commas into container
literals. For example, in JavaScript:

const x = [
  a,
  b,
   ^~~~~ inserted if missing.
]

This is implemented as a seperate post-processing pass after formatting
(because formatting might change whether the container literal does or
does not wrap). This keeps the code relatively simple and orthogonal,
though it has the notable drawback that the newly inserted comma is not
taken into account for formatting decisions (e.g. it might exceed the 80
char limit).

Event Timeline

mprobst created this revision.Jan 24 2020, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2020, 7:22 AM

This is required by JavaScript style, but this change as is optimistically runs the tool for any language. I think that's probably not what we want here initially, just throwing it out there. WDYT?

mprobst updated this revision to Diff 240194.Jan 24 2020, 7:33 AM
  • - only run comma insertion for JavaScript.

General comment:

I think this is useful and could also be useful for other languages too (some folks have asked for this for some C++ code).
As this is a non-whitespace only change, it's a bit more risky than whitespace-only changes.

I think we need to establish a policy on such more riskier style options, at least have a convenient way for users to to opt-out of such and be extra careful including them by default in default styles.
This is tricky because some style guides explicitly recommend such changes.

For rolling this out, I think the best path is:

  • check in the option, but don't turn it on with any styles yet
  • test it by taking a large codebase, formatting it (normal options), format it (with comma insertion), look at the diffs (internally google3diff can do this)
  • turn it on for -style=google

This could all be in one commit, but that means coupling review of the code to getting results.

clang/include/clang/Format/Format.h
557

Hadn't occurred to me that this will interact with bin-packing. That seems to increase the chances of going over the column limit :(

clang/lib/Format/Format.cpp
1488

Inlining this in format.cpp seems worse than having passes in their own files, but is the pattern so far. Ugh, up to you.

1493

Worth a comment (somewhere) about the fact that this never rewraps, in particular if the line is long, for simplicity.

Another policy that occurred to me: don't insert the comma if you're exactly at the line limit.

1522

You're checking for the existence of a previous token, and below you use getPreviousNonComment() without a null-check.

I think you want to either assume that the existence of a match means there's a prev token (and remove the check here), or you want to verify in which case I think you also need to guard against it being a comment (by checking Prev below instead)

1537

this is a can't-happen case, right? (comma-insertions can't conflict with each other, and this pass gets its own Replacements).

use cantFail(Result.add(...))

clang/unittests/Format/FormatTestJS.cpp
343

AFAICT one-arg verifyFormat doesn't actually test this change - the new tests with commas should pass both before/after this change.
So there's just one testcase, I think.

1864

Pretty hard to spot the change here, add a comment or change the a/bs to a description?

I'd also place this next to a test that verifies that when you put the same construct on one line, no comma is inserted (even if the test is redundant), and include a case with a dict/map/hash/thing :-)

Bouska added a subscriber: Bouska.Jan 24 2020, 8:18 AM
This comment was removed by Bouska.
Bouska added a comment.EditedJan 24 2020, 8:29 AM

I have an issue with this change. Currently (at least for C++), the presence of a trailing comma is used as a formatting hint to put all the element in one line or one per line as below:

enum test1 = {ONE, TWO, THREE};

enum test2 = {
    ONE,
    TWO,
    THREE,
};

As your change come after the formatting, clang-format won't be idempotent anymore. You are going to have the following if the container is bin packed (I did not actually test this):

First run:  
enum test1 = {
    ONE, TWO,
    THREE,};

Second run:
enum test1 = {
    ONE,
    TWO,
    THREE,
};

So I definitely think this change should be done before formatting and not after.

I have an issue with this change. Currently (at least for C++), the presence of a trailing comma is used as a formatting hint to put all the element in one line or one per line as below:

Good point about the interaction with bin-packing.
Do the style guides that want this actually endorse bin-packing containers? Insertion + bin-packing + comma-as-hint basically can't coexist - we could turn off the hint or make this option mutually exclusive with bin-packing.

...
So I definitely think this change should be done before formatting and not after.

I don't think this is a solution - we might wrap a container and fail to insert the comma :-)

I'm a little uncomfortable about all the tests changing, and I'm unsure if we should be changing the default behaviour.

The rules in the past here was to show that this is part of a public style guide. The assumption here is google style wants this. Could you please point to that documentation so at least there is some comeback when we break the world.

clang/lib/Format/Format.cpp
960

Are you sure the right decision is to make this on by default for something that's going to insert the comma? is this in google's javascript style guide?

I think this could cause clang-format to suddenly start adding lost of commas (as we see with the tests)

1488

Actually I think there is precedent to put TokenAnalyzers in their own class, I don't think it should be in Format.cpp

clang/unittests/Format/FormatTestJS.cpp
1290

is this correct?

1762

is this correct?

An alternative approach (I'm not endorsing this) that would *in theory* circumvent non-idempotency issue (but would be more fiddly~fiddly to implement and spill across different layers of the formatter):
This is just a rough idea and can have multiple issues in itself. In general the trick is that if we can teach the token penalty computer and the format outputter about this, we should be able to use it together with other formatting decisions.

  • add a new synthetic (annotated) token type, something like TT_PossiblyMissingTrailingComma, that would capture the length-0 text range where this could be inserted;
  • use the algorithm here to splay it into the annotated token stream;
  • mark it so that no newline can be inserted before such a token
  • modify the penalty calculation algorithm to penalize for inserting a newline after this token: if we're inserting a newline after this token and if this token was inserted at a position at or past the maximal number of columns, add a PenaltyExcessCharacter for it.
  • during outputting, if this token is followed by a newline, output it as ,; otherwise output it as an empty string or do not output it.
krasimir added inline comments.Jan 27 2020, 8:23 AM
clang/lib/Format/Format.cpp
1493

+1 for "Another policy that occurred to me: don't insert the comma if you're exactly at the line limit." (or over the column limit). Seems like that would be enough to keep it idempotent.

mprobst updated this revision to Diff 240847.Jan 28 2020, 5:50 AM
mprobst marked 16 inline comments as done.
  • - only run comma insertion for JavaScript.
  • review fixes
  • Fix col limit
  • test for comma insertion
mprobst marked 3 inline comments as done.Jan 28 2020, 5:58 AM
mprobst added inline comments.
clang/include/clang/Format/Format.h
557

Uh. Turns out we don't have a separate option for bin packing containers, only for bin packing arguments and parameters.

Should we add that option to make this setting work?
Should we have this setting imply bin packing containers (as opposed to parameters) is disabled?

clang/lib/Format/Format.cpp
960

It matches the style guide: https://google.github.io/styleguide/jsguide.html#features-arrays-trailing-comma

But it's indeed not entirely clear re bin packing, and certainly a good idea to split adding the option vs landing it.

1488

Re inlining vs separate file, idk. On one hand, this file is on the large side, otoh these ~60 LOC don't carry their own weight in a separate file. Maybe in a follow up we should extract all passes into a separate file (maybe all passes except the actual indenter?).

1493

Done re comment, and also done re policy, good idea.

I've also added a comment that this doesn't work together with bin packing.

1522

Good catch. I think the right fix is to check if there is a previous non comment token, as that's what I'm using below.

clang/unittests/Format/FormatTestJS.cpp
343

Ha, I just adjusted the test cases to my change, but forgot uploading the patch with the tests.

I've reverted all these changes (because I'm splitting flipping the default from this), but added an explicit test.

1290

Yes, type definitions should use ;

1762

Yes.

Nit: Could you add an entry for ReleaseNotes.rst and regenerate the ClangFormatStyleOption.rst with the docs/tools/dump_style.py

clang/unittests/Format/FormatTestJS.cpp
1290

if you hadn't added this to the test would it have added a "," ?

mprobst marked 6 inline comments as done.Jan 28 2020, 7:14 AM

PTAL.

clang/include/clang/Format/Format.h
557

OK, so I added a bit of validation logic to reject bin packing arguments together with TSC_Wrapped. We can figure out in a next step whether we want to expose an option for bin packing containers, or whether we can just disable bin packing for those people who want trailing commas.

clang/unittests/Format/FormatTestJS.cpp
1290

More precisely:
you can use either ; or , to separate entries in object literal types. So using ; is correct here, but c-f with this option would insert a ,, which is correct as well. I've added a test.

mprobst updated this revision to Diff 240873.Jan 28 2020, 7:14 AM
mprobst marked an inline comment as done.
  • - only run comma insertion for JavaScript.
  • review fixes
  • Fix col limit
  • test for comma insertion
  • - validate options, reject bin packing + trailing commas
clang/lib/Format/Format.cpp
2531

Ok, this comment is more a discussion point rather than a review comment. Was there a reason you needed to put the TrailingCommaInserter pass after the Formatter pass? (maybe you needed the Tokens annotated?)

From my understanding of some of the other conversations, it seemed the reason for ignoring the Column limit was because the "," insertion came after it had been formatted (is that correct?)

If there was a good reason that's also fine, I was just interested to learn if there was some part of the Formatter that perhaps needs to be pulled out into its own separate pass.

mprobst marked 2 inline comments as done.Jan 29 2020, 3:44 AM
mprobst added inline comments.
clang/lib/Format/Format.cpp
2531

The problem with inserting the comma during formatting is that it'd need to be inserted during the state exploration as part of Dijkstra's implemented in clang-format. I've considered that, but it'd be complex (if we make formatting decision X, we now add a token, which might invalidate the formatting decision). Keeping it as a separate pass has drawbacks, such as potentially not ending up with perfect formatting, thus the backing off to insert over ColumnLimit, but seems overall simpler.

sammccall accepted this revision.Jan 29 2020, 4:18 AM

(Maybe we should have a basic test for C++, but I'm not sure how this usually goes)

BTW it occurs to me that turning this on by default may be a surprising breakage for:

  • people who use clang-format to format JSON, where the comma is illegal.
  • people who care about IE8 or something
clang/include/clang/Format/Format.h
42

Back->Pack?

clang/lib/Format/Format.cpp
2531

Ok, this comment is more a discussion point rather than a review comment. Was there a reason you needed to put the TrailingCommaInserter pass after the Formatter pass? (maybe you needed the Tokens annotated?)

The comma must only be inserted when the items are wrapped one-per-line (i.e. the last item is not on the same line as the closing bracket). This wrapping is a decision taken by the formatter, so we need to run it first to know.
(Apologies if this is obvious, but maybe missed)

This revision is now accepted and ready to land.Jan 29 2020, 4:18 AM
mprobst marked 4 inline comments as done.Jan 29 2020, 4:21 AM
mprobst added inline comments.
clang/include/clang/Format/Format.h
42

That's what you get when you fix my auto-complete: consistent misspelling.

MyDeveloperDay accepted this revision.Jan 30 2020, 12:16 PM

I'm happy if you are, I think given that you've really been one of the main contributors of clang-format [JS] for the last number of years you are the best person to probably assess the impact on Javascript code.

Thanks for the patch, it LGTM and good luck!

Nit: don't forget the ClangFormatStyleOption.rst and ReleaseNote.rst changes

mprobst closed this revision.Feb 3 2020, 1:02 AM
mprobst marked an inline comment as done.

This has landed as https://reviews.llvm.org/rGa324fcf1ae6 (not sure why this hasn't closed automatically).

This has landed as https://reviews.llvm.org/rGa324fcf1ae6 (not sure why this hasn't closed automatically).

Thank you,

as for not autoclosing my guess it because perhaps you dropped the "Differential Revision" line from the commit message. I know there was some discussion about people not keeping all the Phabricator tags (Subscribers etc..), but I think if the commit message doesn't have the Differential Revision you won't get the AutoClose behaviour

MyDeveloperDay added inline comments.Apr 29 2020, 8:01 AM
clang/include/clang/Format/Format.h
563

I think we need to add some text in here to ensure it gets documented automatically in ClangFormatStyleOptions.rst rather than the by hand version done in D73768: clang-format: [JS] document InsertTrailingCommas. otherwise revisions like D78982: [clang-format] Fix Microsoft style for enums will end up removing it..