This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add basic support for formatting JSON
ClosedPublic

Authored by MyDeveloperDay on Dec 18 2020, 2:32 AM.

Details

Summary

I find as I develop I'm moving between many different languages C++,C#,JavaScript all the time. As I move between the file types I like to keep clang-format as my formatting tool of choice. (hence why I initially added C# support in D58404: [clang-format] Add basic support for formatting C# files) I know those other languages have their own tools but I have to learn them all, and I have to work out how to configure them, and they may or may not have integration into my IDE or my source code integration.

I am increasingly finding that I'm editing additional JSON files as part of my daily work and my editor and git commit hooks are just not setup to go and run jq, So I tend to go to JSON Formatter and copy and paste back and forth. To get nicely formatted JSON. This is a painful process and I'd like a new one that causes me much less friction.

This has come up from time to time:

D10543: clang-format: [JS] recognize .ts and .json in git-clang-format.
https://stackoverflow.com/questions/35856565/clang-format-a-json-file
https://bugs.llvm.org/show_bug.cgi?id=18699

I would like to stop having to do that and have formatting JSON as a first class clang-format support Language (even if it has minimal style settings at present).

This revision adds support for formatting JSON using the inbuilt JSON serialization library of LLVM, With limited control at present only over the indentation level

This adds an additional Language into the .clang-format file to separate the settings from your other supported languages.

-- 
Language: Json
IndentWidth: 2

For example of it working, this snipped from the Wikipedia definition of JSON can be formatted as below without the need to trick clang-format into thinking this is javascript:

file.json

{ "firstName": "John", "lastName": "Smith", "isAlive": true, "age": 27, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ], "children": [], "spouse": null }
$ clang-format.exe file.json
{
  "address": {
    "city": "New York",
    "postalCode": "10021-3100",
    "state": "NY",
    "streetAddress": "21 2nd Street"
  },
  "age": 27,
  "children": [],
  "firstName": "John",
  "isAlive": true,
  "lastName": "Smith",
  "phoneNumbers": [
    {
      "number": "212 555-1234",
      "type": "home"
    },
    {
      "number": "646 555-4567",
      "type": "office"
    }
  ],
  "spouse": null
}

Diff Detail

Event Timeline

MyDeveloperDay created this revision.Dec 18 2020, 2:32 AM
MyDeveloperDay requested review of this revision.Dec 18 2020, 2:32 AM
MyDeveloperDay edited the summary of this revision. (Show Details)Dec 18 2020, 3:34 AM
MyDeveloperDay added a reviewer: thakis.

Why not using clang-format's support for JavaScript? It seems more complicated now, but it will probably allow to more easily change/add style options. WDYT?
And what happens when reformatting only a part of a JSON? Should there be some tests for that?

Why not using clang-format's support for JavaScript? It seems more complicated now, but it will probably allow to more easily change/add style options. WDYT?
And what happens when reformatting only a part of a JSON? Should there be some tests for that?

I've attempted to do this via JavaScript before and failed ;-( and had to abandoned the attempt. This is partially because clang-format isn't very good at even handling any braced initialization (let alone nested initialization) and it made the code much more complex. (my guess is that is why it was never attempted before by others before)

So I decided to reset my expectations first, My intent here is to introduce the concept of JSON Language support (to gauge if adding JSON support is something the community might accept), but also to let me build the infrastructure that means we have some support which we know will work out of the box (at least to give us clean JSON full file formatting formatting)

Ultimately the JSON pretty printing in the Support library cannot really handle what realistically might be wanted i.e. ideally I'd like to support different positioning types for the {

[
    "name": {
         "bar": 1 
    },
    "name": {
         "foo": 2
    }
]

This first revision handles some basic concepts, introduction of the JSON support and full file formatting (to a compatible LLVM Style as defined by its own JSON library) + allowing setting different IndentWidths

Ultimately once we have this support, which I think is a necessity, we can switch the underlying formatting to support, partial formatting (as we would see via git clang-format) and the potential other options we'd like to support (which I consider we'd most likely handle inside reformat using tokens as you are suggesting).

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

And what happens when reformatting only a part of a JSON? Should there be some tests for that?

Ultimately yes, but I don't think I can easily do that with the llvm/Support code, so that would need to be integrated into the refomat() (and this is why I didn't include a change into git-clang-format or clang-format-diff.py)

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

For me that's three times yes.

clang/unittests/Format/FormatTestJson.cpp
15

I don't know what's the practice right now. But I would suggest renaming that to "format-test-json", so you can see directly that's for JSON.

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

For me that's three times yes.

For me as well. I believe it's a useful addition, even if it handles only the basic cases for the moment.
I was just wondering whether it would be better to start a different way, but I understand that implementing it the "clang-format's way" can take some effort.

Anyway, I'd like to see a big warning banner (somewhere in the doc) about current limitations, so that nobody switches hastily to clang-format and then gets disappointed by these limitations.
Not sure that having a small note about it in release notes is enough.
So to the point, please document that only full-context/file is supported and that the only formatting option is indentation level (as you partially did in release notes).

clang/docs/ReleaseNotes.rst
284 ↗(On Diff #312728)

Maybe instead of putting "with very limited options", you may add a link to the doc describing limitations?

In theory, this version doesn't seem so useful - a simple JSON pretty-printer that ignores column widths etc is easy to write, doesn't need any of clang-format's infrastructure, and this isn't a particularly great one.

In practice though, not having to configure yet another tool is a big deal:

my editor and git commit hooks are just not setup to go and run jq

+1, this is often the difference between using a mediocre tool and nothing.

And I think for JSON data, a formulaic pretty-printer is maybe a better starting point than clang-format's context-sensitive optimizing approach.


But we'll have to deal with the fact that this isn't going to format the way all people like. It seems reasonable to to expect to be able to:

  • format ["foo"] on one line
  • indent with tab characters rather than spaces
  • format "JSON" that is incomplete/broken/has extensions (see JSON5)

With my maintainer-of-Support/JSON hat on, I don't like the idea of these features being shoehorned into the library to make clang-format an incrementally more featureful JSON formatter. They're likely to lead to a lot of conceptual+implementation+API complexity in a library that solves its primary use cases quite well and simply.


So if the plan is to

  • use this implementation to bootstrap JSON-in-clang-format
  • with a firm intention of replacing it with parsing/formatting logic that lives inside clang-format itself
  • long-term, maybe making use of Support/JSON long-term for some low level bits (normalizing string tokens?) but not where flexibility is needed

then I think this is great. However I'm a bit leery if the plan is "land this a MVP, and then play it by ear".

MyDeveloperDay marked an inline comment as done.Dec 19 2020, 5:48 AM

With my maintainer-of-Support/JSON hat on, I don't like the idea of these features being shoehorned into the library to make clang-format an incrementally more featureful JSON formatter. They're likely to lead to a lot of conceptual+implementation+API complexity in a library that solves its primary use cases quite well and simply.

No plans at all to touch Support/JSON, if this needed a separate reformat() for json then I'd say I'm more likely to do that in lib/Format as we'd only need a "PrettyPrint" type function which supported the options we want to support

So if the plan is to

  • use this implementation to bootstrap JSON-in-clang-format

That was kind of the plan

  • with a firm intention of replacing it with parsing/formatting logic that lives inside clang-format itself

So I tried again to do this yesterday, but once again seems to fail, mainly around the area of clang-formatting trying to reflow and rejoin split lines but also the level of indentation is off.

I'm starting to wonder if perhaps it needs its own separate pass, use more of clang-format but less of the actual "parsing into known structures"...

  • long-term, maybe making use of Support/JSON long-term for some low level bits (normalizing string tokens?) but not where flexibility is needed

I think I'd really just like to use the FormatTokens, but I agree if we ended up writing a specific JSON pretty-printer I could see using the inner parts (but I really have no interesting in touching lib/Support/JSON at all)

then I think this is great. However I'm a bit leery if the plan is "land this a MVP, and then play it by ear".

Sorry I'm not sure I understand what MVP means in this context?

clang/unittests/Format/FormatTestJson.cpp
15

I'm not sure of the convention or even where/when its used (I just know you need it)

lets take that change offline, we should either change them all or none of the them.

FormatTest.cpp:#define DEBUG_TYPE "format-test"
FormatTestCSharp.cpp:#define DEBUG_TYPE "format-test"
FormatTestComments.cpp:#define DEBUG_TYPE "format-test"
FormatTestJS.cpp:#define DEBUG_TYPE "format-test"
FormatTestJava.cpp:#define DEBUG_TYPE "format-test"
FormatTestJson.cpp:#define DEBUG_TYPE "format-test"
FormatTestObjC.cpp:#define DEBUG_TYPE "format-test"
FormatTestProto.cpp:#define DEBUG_TYPE "format-test"
FormatTestRawStrings.cpp:#define DEBUG_TYPE "format-test"
FormatTestSelective.cpp:#define DEBUG_TYPE "format-test"
FormatTestTableGen.cpp:#define DEBUG_TYPE "format-test"
FormatTestTextProto.cpp:#define DEBUG_TYPE "format-test"
NamespaceEndCommentsFixerTest.cpp:#define DEBUG_TYPE "namespace-end-comments-fixer-test"
SortImportsTestJS.cpp:#define DEBUG_TYPE "format-test"
SortImportsTestJava.cpp:#define DEBUG_TYPE "format-test"
SortIncludesTest.cpp:#define DEBUG_TYPE "format-test"
UsingDeclarationsSorterTest.cpp:#define DEBUG_TYPE "using-declarations-sorter-test"
MyDeveloperDay planned changes to this revision.Jan 18 2021, 10:10 AM
MyDeveloperDay marked an inline comment as done.
MyDeveloperDay updated this revision to Diff 352744.EditedJun 17 2021, 8:56 AM

I have reworked this patch to utilise the fact that clang-format can format a JSON object in javascript reasonably well.

This technique adds for JSON files only a replacement to the front of the JSON transforming:

{
    "A": 1,
    "B": [ 2,3 ]
}

into

x = {
    "A": 1,
    "B": [ 2,3 ]
}

This is then parsed by clang-format the a more traditional sense, allowing a similar technique as would be used when formatting the code, (but with some JSON specific rules, to make it follow primarily the style used by https://github.com/kapilratnani/JSON-Viewer (Notepad++ plugin, which at least I've used extensively)

After formatting, a subsequent replacement is merged to remove the x = from the front of the json, rendering it back to its original form.

{
    "A": 1,
    "B": [ 2,3 ]
}

I prefer to use this approach rather than our Support/JSON libraries which aren't really setup for formatting options, this give us greater flexibility later on if users want custom rules from brace wrapping.

Missing new test file

JSON strings can't be split when the line is too long like c++ strings

This revision is now accepted and ready to land.Jun 17 2021, 1:51 PM

Ensure git clang-format can handle json

Add more unit tests and ensure clang-format-diff is setup to check json

something @sammccall said about support foo["name"] made me realise that our javascript support doesn't always support what all the options

x = {
  "firstName" : "John",
  "lastName" : "Smith",
  "isAlive" : true,
  "age" : 27,
  "address" : {
    "streetAddress" : "21 2nd Street",
    "city" : "New York",
    "state" : "NY",
    "postalCode" : "10021-3100"
  },
  "phoneNumbers" : [
    {
      "type" : "home",
      "number" : "212 555-1234",
      "address" : "10 long address line, special road to nowhere",
    },
    {"type" : "office", "number" : "646 555-4567"}
  ],
  "children" : [],
  "children" : [ "foo", "bar" ],
  "children" : [ "foo" ],
  "children" : {},
  "spouse" : null
}

with

Language:        JavaScript
BasedOnStyle:  LLVM
SpaceInEmptyBlock: true
SpacesInSquareBrackets: false

made me think it should have been

"children" : ["foo","bar"],
"children" : ["foo"],
"children" : { },

it seems perhaps that ensuring javascript supports these would likely given JSON some of the style capabilities that people might want.

  • Add release note
  • Add default Column Limit of 0 for default Json Style (add unit test to ensure column limits can still be used)
MyDeveloperDay marked an inline comment as done.Jun 22 2021, 12:31 AM
MyDeveloperDay added inline comments.
clang/docs/ReleaseNotes.rst
284 ↗(On Diff #312728)

This new strategy actually gives us lots of pre-existing options, for example:

using TabWidth and UseTab we can tabify/de-tabify the json

using SpacesInContainerLiterals we can control if the json is ["foo"] or [ "foo" ]

Any option that can be used to format the javascript:

x = { ... }

can be used to format the JSON, I believe this gives us the best of both worlds, and some flexibility for the future if people want to format JSON in a specific BraceWrapping style.

MyDeveloperDay marked an inline comment as done.Jun 22 2021, 1:07 PM

@HazardyKnusperkeks thank you for the accept, if there are no objections from @curdeius , @krasimir and @sammccall I'd like to proceed.

ping, if there are no other opinions I plan to land this sometime this weekend.