Page MenuHomePhabricator

[clang-format] Add basic support for formatting JSON
Changes PlannedPublic

Authored by MyDeveloperDay on Dec 18 2020, 2:32 AM.

Details

Summary

I find as I develop I'm moving between many different languages C++,C#,JavaScript all the time. As I move between the file types I like to keep clang-format as my formatting tool of choice. (hence why I initially added C# support in D58404: [clang-format] Add basic support for formatting C# files) I know those other languages have their own tools but I have to learn them all, and I have to work out how to configure them, and they may or may not have integration into my IDE or my source code integration.

I am increasingly finding that I'm editing additional JSON files as part of my daily work and my editor and git commit hooks are just not setup to go and run jq, So I tend to go to JSON Formatter and copy and paste back and forth. To get nicely formatted JSON. This is a painful process and I'd like a new one that causes me much less friction.

This has come up from time to time:

D10543: clang-format: [JS] recognize .ts and .json in git-clang-format.
https://stackoverflow.com/questions/35856565/clang-format-a-json-file
https://bugs.llvm.org/show_bug.cgi?id=18699

I would like to stop having to do that and have formatting JSON as a first class clang-format support Language (even if it has minimal style settings at present).

This revision adds support for formatting JSON using the inbuilt JSON serialization library of LLVM, With limited control at present only over the indentation level

This adds an additional Language into the .clang-format file to separate the settings from your other supported languages.

-- 
Language: Json
IndentWidth: 2

For example of it working, this snipped from the Wikipedia definition of JSON can be formatted as below without the need to trick clang-format into thinking this is javascript:

file.json

{ "firstName": "John", "lastName": "Smith", "isAlive": true, "age": 27, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ], "children": [], "spouse": null }
$ clang-format.exe file.json
{
  "address": {
    "city": "New York",
    "postalCode": "10021-3100",
    "state": "NY",
    "streetAddress": "21 2nd Street"
  },
  "age": 27,
  "children": [],
  "firstName": "John",
  "isAlive": true,
  "lastName": "Smith",
  "phoneNumbers": [
    {
      "number": "212 555-1234",
      "type": "home"
    },
    {
      "number": "646 555-4567",
      "type": "office"
    }
  ],
  "spouse": null
}

Diff Detail

Unit TestsFailed

TimeTest
20 msx64 windows > Clangd Unit Tests._/ClangdTests_exe::DirectoryBasedGlobalCompilationDatabaseCacheTest.Cacheable
Note: Google Test filter = DirectoryBasedGlobalCompilationDatabaseCacheTest.Cacheable [==========] Running 1 test from 1 test case.

Event Timeline

MyDeveloperDay created this revision.Dec 18 2020, 2:32 AM
MyDeveloperDay requested review of this revision.Dec 18 2020, 2:32 AM
MyDeveloperDay edited the summary of this revision. (Show Details)Dec 18 2020, 3:34 AM
MyDeveloperDay added a reviewer: thakis.

Why not using clang-format's support for JavaScript? It seems more complicated now, but it will probably allow to more easily change/add style options. WDYT?
And what happens when reformatting only a part of a JSON? Should there be some tests for that?

Why not using clang-format's support for JavaScript? It seems more complicated now, but it will probably allow to more easily change/add style options. WDYT?
And what happens when reformatting only a part of a JSON? Should there be some tests for that?

I've attempted to do this via JavaScript before and failed ;-( and had to abandoned the attempt. This is partially because clang-format isn't very good at even handling any braced initialization (let alone nested initialization) and it made the code much more complex. (my guess is that is why it was never attempted before by others before)

So I decided to reset my expectations first, My intent here is to introduce the concept of JSON Language support (to gauge if adding JSON support is something the community might accept), but also to let me build the infrastructure that means we have some support which we know will work out of the box (at least to give us clean JSON full file formatting formatting)

Ultimately the JSON pretty printing in the Support library cannot really handle what realistically might be wanted i.e. ideally I'd like to support different positioning types for the {

[
    "name": {
         "bar": 1 
    },
    "name": {
         "foo": 2
    }
]

This first revision handles some basic concepts, introduction of the JSON support and full file formatting (to a compatible LLVM Style as defined by its own JSON library) + allowing setting different IndentWidths

Ultimately once we have this support, which I think is a necessity, we can switch the underlying formatting to support, partial formatting (as we would see via git clang-format) and the potential other options we'd like to support (which I consider we'd most likely handle inside reformat using tokens as you are suggesting).

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

And what happens when reformatting only a part of a JSON? Should there be some tests for that?

Ultimately yes, but I don't think I can easily do that with the llvm/Support code, so that would need to be integrated into the refomat() (and this is why I didn't include a change into git-clang-format or clang-format-diff.py)

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

For me that's three times yes.

clang/unittests/Format/FormatTestJson.cpp
14

I don't know what's the practice right now. But I would suggest renaming that to "format-test-json", so you can see directly that's for JSON.

So I guess my question is:

  1. is this useful to pursue (either as is, or by switching to us the reformat method)
  2. is this useful standalone as the first pass
  3. is adding support for JSON something people would welcome (as we did with C#)

If the answer is no, then I don't want to waste the effort trying to fathom how we might do this inside reformat.

For me that's three times yes.

For me as well. I believe it's a useful addition, even if it handles only the basic cases for the moment.
I was just wondering whether it would be better to start a different way, but I understand that implementing it the "clang-format's way" can take some effort.

Anyway, I'd like to see a big warning banner (somewhere in the doc) about current limitations, so that nobody switches hastily to clang-format and then gets disappointed by these limitations.
Not sure that having a small note about it in release notes is enough.
So to the point, please document that only full-context/file is supported and that the only formatting option is indentation level (as you partially did in release notes).

clang/docs/ReleaseNotes.rst
284

Maybe instead of putting "with very limited options", you may add a link to the doc describing limitations?

In theory, this version doesn't seem so useful - a simple JSON pretty-printer that ignores column widths etc is easy to write, doesn't need any of clang-format's infrastructure, and this isn't a particularly great one.

In practice though, not having to configure yet another tool is a big deal:

my editor and git commit hooks are just not setup to go and run jq

+1, this is often the difference between using a mediocre tool and nothing.

And I think for JSON data, a formulaic pretty-printer is maybe a better starting point than clang-format's context-sensitive optimizing approach.


But we'll have to deal with the fact that this isn't going to format the way all people like. It seems reasonable to to expect to be able to:

  • format ["foo"] on one line
  • indent with tab characters rather than spaces
  • format "JSON" that is incomplete/broken/has extensions (see JSON5)

With my maintainer-of-Support/JSON hat on, I don't like the idea of these features being shoehorned into the library to make clang-format an incrementally more featureful JSON formatter. They're likely to lead to a lot of conceptual+implementation+API complexity in a library that solves its primary use cases quite well and simply.


So if the plan is to

  • use this implementation to bootstrap JSON-in-clang-format
  • with a firm intention of replacing it with parsing/formatting logic that lives inside clang-format itself
  • long-term, maybe making use of Support/JSON long-term for some low level bits (normalizing string tokens?) but not where flexibility is needed

then I think this is great. However I'm a bit leery if the plan is "land this a MVP, and then play it by ear".

MyDeveloperDay marked an inline comment as done.Dec 19 2020, 5:48 AM

With my maintainer-of-Support/JSON hat on, I don't like the idea of these features being shoehorned into the library to make clang-format an incrementally more featureful JSON formatter. They're likely to lead to a lot of conceptual+implementation+API complexity in a library that solves its primary use cases quite well and simply.

No plans at all to touch Support/JSON, if this needed a separate reformat() for json then I'd say I'm more likely to do that in lib/Format as we'd only need a "PrettyPrint" type function which supported the options we want to support

So if the plan is to

  • use this implementation to bootstrap JSON-in-clang-format

That was kind of the plan

  • with a firm intention of replacing it with parsing/formatting logic that lives inside clang-format itself

So I tried again to do this yesterday, but once again seems to fail, mainly around the area of clang-formatting trying to reflow and rejoin split lines but also the level of indentation is off.

I'm starting to wonder if perhaps it needs its own separate pass, use more of clang-format but less of the actual "parsing into known structures"...

  • long-term, maybe making use of Support/JSON long-term for some low level bits (normalizing string tokens?) but not where flexibility is needed

I think I'd really just like to use the FormatTokens, but I agree if we ended up writing a specific JSON pretty-printer I could see using the inner parts (but I really have no interesting in touching lib/Support/JSON at all)

then I think this is great. However I'm a bit leery if the plan is "land this a MVP, and then play it by ear".

Sorry I'm not sure I understand what MVP means in this context?

clang/unittests/Format/FormatTestJson.cpp
14

I'm not sure of the convention or even where/when its used (I just know you need it)

lets take that change offline, we should either change them all or none of the them.

FormatTest.cpp:#define DEBUG_TYPE "format-test"
FormatTestCSharp.cpp:#define DEBUG_TYPE "format-test"
FormatTestComments.cpp:#define DEBUG_TYPE "format-test"
FormatTestJS.cpp:#define DEBUG_TYPE "format-test"
FormatTestJava.cpp:#define DEBUG_TYPE "format-test"
FormatTestJson.cpp:#define DEBUG_TYPE "format-test"
FormatTestObjC.cpp:#define DEBUG_TYPE "format-test"
FormatTestProto.cpp:#define DEBUG_TYPE "format-test"
FormatTestRawStrings.cpp:#define DEBUG_TYPE "format-test"
FormatTestSelective.cpp:#define DEBUG_TYPE "format-test"
FormatTestTableGen.cpp:#define DEBUG_TYPE "format-test"
FormatTestTextProto.cpp:#define DEBUG_TYPE "format-test"
NamespaceEndCommentsFixerTest.cpp:#define DEBUG_TYPE "namespace-end-comments-fixer-test"
SortImportsTestJS.cpp:#define DEBUG_TYPE "format-test"
SortImportsTestJava.cpp:#define DEBUG_TYPE "format-test"
SortIncludesTest.cpp:#define DEBUG_TYPE "format-test"
UsingDeclarationsSorterTest.cpp:#define DEBUG_TYPE "using-declarations-sorter-test"
MyDeveloperDay planned changes to this revision.Jan 18 2021, 10:10 AM
MyDeveloperDay marked an inline comment as done.