This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Support/
-
llvm/
-
Support/
18/54
JSON.h
-
lib/Support/
-
Support/
-
CMakeLists.txt
5/13
JSON.cpp
-
unittests/Support/
-
Support/
-
CMakeLists.txt
-
JSONTest.cpp

Differential D45753

Lift JSON library from clang-tools-extra/clangd to llvm/Support.
ClosedPublic

Authored by sammccall on Apr 17 2018, 9:56 PM.

Download Raw Diff

Details

Reviewers

bkramer
labath
chandlerc

Commits

rG6be3824721c1: Lift JSON library from clang-tools-extra/clangd to llvm/Support.
rL336534: Lift JSON library from clang-tools-extra/clangd to llvm/Support.

Summary

This consists of four main parts:

an type json::Expr representing JSON values of dynamic kind, which can be composed, inspected, and modified
a JSON parser from string -> json::Expr
a JSON printer from json::Expr -> string, with optional pretty-printing
a convention for mapping json::Expr <=> native types (fromJSON/toJSON) Mapping functions are provided for primitives (e.g. int, vector) and the ObjectMapper helper helps implement fromJSON for struct/object types.

Based on clangd's usage, a couple of places I'd appreciate review attention:

fromJSON returns only bool. A richer error-signaling mechanism may be useful to provide useful messages, or let recursive fromJSONs (containers/structs) do careful error recovery.
should json::obj be always explicitly written (like json::ary)
there's no streaming parse API. I suspect there are some simple wins like a callback API where the document is a long array, and each element is small. But this can probably be bolted on easily when we see the need.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 17285
Build 17285: arc lint + arc unit

Event Timeline

sammccall created this revision.Apr 17 2018, 9:56 PM

Herald added subscribers: llvm-commits, MaskRay, ioeric and 2 others. · View Herald TranscriptApr 17 2018, 9:56 PM

I'm not sure who should be the main person reviewing this, but I think the implementation looks pretty good, and would be a great replacement for the one we have in lldb. The main thing I noticed is that you seem to be rolling your own utf8 parser -- I would hope we can reuse the existing unicode utilities here.

fromJSON returns only bool. A richer error-signaling mechanism may be useful to provide useful messages, or let recursive fromJSONs (containers/structs) do careful error recovery.

For our (lldb) use case, we don't need a fancy error mechanism. It seems it should be possible to make these return llvm::Error if it turns out to be necessary later.

should json::obj be always explicitly written (like json::ary)

I don't have an opinion on that, though I do think that json::ary should be renamed.

there's no streaming parse API. I suspect there are some simple wins like a callback API where the document is a long array, and each element is small. But this can probably be bolted on easily when we see the need.

All the messages we are parsing are reasonably small (<= 1k), so we don't have a need for a streaming parser right now. The parser is essentially streaming already. If there was something like llvm::raw_istream, it could be trivially converted, but it looks like input streams aren't a thing for llvm in general, so I wouldn't be worried about it here.

include/llvm/Support/JSON.h
59	It would be good to emphasize that the returned object's lifetime is independent of the string being parsed (i.e. the contained strings are not references to the strings in the original text).
450–451	I find it a bit inconsistent when I see a `json::Expr` (uppercase) for the base type and then `json::obj` (lowercase) for the derived ones. The lowercase names don't really follow the naming convention and `ary` is also fairly un-obvious. Could we just call these `Object` and `Array` ?
lib/Support/JSON.cpp
284–365	Is there any reason `Support/ConvertUTF.h` cannot be used here? Is sounds like you just need the "lenient" conversion mode here.
395	llvm coding standards say we use static for functions instead of anonymous namespaces. Also, the llvm and json namespaces are opened and closed twice, so this may be a good opportunity to merge them.

In D45753#1070724, @labath wrote:

I'm not sure who should be the main person reviewing this,

I'm also not sure, but I really appreciate your feedback, and wanted to give you as a likely user a chance to shape the API.
Thanks for the great advice, don't feel any pressure to accept or examine everything if you don't feel like the right person.

but I think the implementation looks pretty good, and would be a great replacement for the one we have in lldb. The main thing I noticed is that you seem to be rolling your own utf8 parser -- I would hope we can reuse the existing unicode utilities here.

Thanks for pointing that out, I hadn't seen them, and you're right. (At the same time, ick!)

fromJSON returns only bool. A richer error-signaling mechanism may be useful to provide useful messages, or let recursive fromJSONs (containers/structs) do careful error recovery.

For our (lldb) use case, we don't need a fancy error mechanism. It seems it should be possible to make these return llvm::Error if it turns out to be necessary later.

should json::obj be always explicitly written (like json::ary)

I don't have an opinion on that, though I do think that json::ary should be renamed.

Agree, these were a wart. ary -> Array, obj -> Object, Expr -> Value.

there's no streaming parse API. I suspect there are some simple wins like a callback API where the document is a long array, and each element is small. But this can probably be bolted on easily when we see the need.

All the messages we are parsing are reasonably small (<= 1k), so we don't have a need for a streaming parser right now. The parser is essentially streaming already. If there was something like llvm::raw_istream, it could be trivially converted, but it looks like input streams aren't a thing for llvm in general, so I wouldn't be worried about it here.

Yeah. Rather than the input data, I was more thinking of passing the output objects to a callback without waiting to pass the whole containing array/object into memory. But agree it's a thing for later.

include/llvm/Support/JSON.h
59	Done (on the function doc for `parse()'
450–451	Done. These are now defined as `json::Object` and `json::Array`. Since Value already had enumerator with these names, I un-nested the classes which required reordering some things and defining functions out-of-line. I put them in the cpp file for readability reasons, they could move to the bottom of the header if we care about inlining. While renaming, also changed `json::Expr` -> `json::Value` which seems more accurate.
lib/Support/JSON.cpp
284–365	Done. That's really slow code with a really inconvenient API, but it shouldn't matter.
395	Switched to static and removed the namespace (enum doesn't need to be in it). The format_provider specialization does need to be outside the namespace and there is some order dependency around there, but I'vo avoided opening/closing a lot by qualifying definition names.

Herald added a subscriber: jkorous. · View Herald TranscriptApr 20 2018, 10:05 AM

Address review comments.

Harbormaster completed remote builds in B17285: Diff 143338.Apr 20 2018, 10:06 AM

This looks good to me, but I do feel someone else should comment on the appropriateness of including this library in llvm/Support. Chandler is listed as the code owner of Support, and he used to have opinions on json parsers in the past, so maybe he would be a good candidate (?)

include/llvm/Support/JSON.h
93–96	Leftover references to `obj` and `ary`
457	Could we use `ObjectKey` as the property type here?

In D45753#1075026, @labath wrote:

This looks good to me, but I do feel someone else should comment on the appropriateness of including this library in llvm/Support. Chandler is listed as the code owner of Support, and he used to have opinions on json parsers in the past, so maybe he would be a good candidate (?)

Good suggestion, I'll try to get this on his radar. Thanks!

include/llvm/Support/JSON.h
457	Better, StringRef. (ObjectKey is just a maybe-owning stringref, and non-owning is fine here)
lib/Support/JSON.cpp
284–365	Hmm, I guess I only thought I ran the tests :-( It turns out `DecodeUTF16ToUTF8` doesn't do the right thing - at least it doesn't do anything particularly compatible with JSON. The problematic cases are when UTF-16 surrogate code units appear without being properly paired. a JSON parser has to accept these, as they conform to the grammar the best behavior per Unicode is to replace them with U+FFFD ConvertUTF in lenient mode simply drops them in most cases (permitted, but not recommended). When encountering a lone leading surrogate at the end of text, it returns an error even in lenient mode. References: http://seriot.ch/parsing_json.php https://www.rfc-editor.org/errata_search.php?rfc=7159&eid=3984 http://unicode.org/review/pr-121.html I've restored the previous code and added more comments, including reasons not to use ConvertUTF. This implements unicode's preferred handling of invalid UTF-16 ("Replace each maximal subpart of the ill-formed subsequence by a single U+FFFD").

Address review comments, add a few more docs.
Go back to custom UTF transcoding because of failing test cases :-(

sammccall added a reviewer: chandlerc.Apr 23 2018, 10:34 AM

@chandlerc: any interest in reviewing this JSON library (either at a high level for suitability in llvm/Support, or for detailed review)?

It has been used in clangd for a while, and seems suitable for lldb's purposes too. llvm/Support seems like their common ancestor, but I'm also happy if there's somewhere better.

In D45753#1075648, @sammccall wrote:

@chandlerc: any interest in reviewing this JSON library (either at a high level for suitability in llvm/Support, or for detailed review)?

I'm happy to tackle both of these. I'll want to do a bit of homework to make sure that the high level concerns that came up previously are adequately addressed (or there is some plan to address them).

It has been used in clangd for a while, and seems suitable for lldb's purposes too. llvm/Support seems like their common ancestor, but I'm also happy if there's somewhere better.

This seems at least sufficient for us to invest some time sorting this all out one way or another.

chandlerc added inline comments.Apr 24 2018, 3:52 AM

include/llvm/Support/JSON.h
9	One thing that would help me even as I start to dig into this would be an overview comment at the top of the file... What is the intended / expected usage pattern? Mention alternatives given that it seems unlikely we'll eliminate YAMLIO immediately. Why use one vs. the other? I'd also like to see your initial thoughts on why this library is better as a separate API/library rather than a separate interface that is part and parcel of the YAML library we already have. I don't have any real opinion (yet) about what does or doesn't make sense, and having your perspective on this would help me form an opinion I suspect.
27–28	Feel free to defer this until higher level stuff is addressed, but here and throughout this entire file, you have excellent comments but don't use a doxygen comment prefix. I think all of these should be converted to be actual doxygen comments at whatever point this is moving forward.

JDevlieghere added a subscriber: JDevlieghere.Apr 24 2018, 6:12 AM

Meinersbur added a subscriber: Meinersbur.Apr 24 2018, 7:55 AM

Meinersbur added inline comments.Apr 24 2018, 8:50 AM

include/llvm/Support/JSON.h
314–317	Is this `mutable` here also required for "cheating"?
356	Can you elaborate on why this is needed? AFAIK `std::initializer_list`s are not meant to be moved from.
lib/Support/JSON.cpp
328	The error message we get seem to be: If the token starts with an 'e' or 'E', the error we get is "Invalid number". For the letters 'n', 't', or 'f' we get "Invalid bareword". For any other first letter we get "Expected JSON value". I'd hope for more consistent error messages.
615	Did you consider using `llvm_unreachable`?

Thanks for the work, I would like to replace Polly's jsoncpp with this one once it is done.

In D45753#1076478, @chandlerc wrote:

In D45753#1075648, @sammccall wrote:

@chandlerc: any interest in reviewing this JSON library (either at a high level for suitability in llvm/Support, or for detailed review)?

I'm happy to tackle both of these. I'll want to do a bit of homework to make sure that the high level concerns that came up previously are adequately addressed (or there is some plan to address them).

Thank you! Other than "how does this relate to YAML", I don't think I'm familiar with the concerns so any pointers you have would be appreciated.

include/llvm/Support/JSON.h
9	Oops, the big comment on Value was meant to serve this purpose, but I can't find a way to get it to be first in the file. Added a real file overview - right level of detail? There's a few reasons to have this separate from the YAML library, and I'm not sure which are important or even good. a YAML based dom/parser will have a poor API for people who want to deal with JSON - it's too complicated and the names are incorrect. The thing I like most about this API is that the usages are simple. neither the YAMLParser nor YAML I/O design is useful for the use cases I've run into so far (e.g. parsing LSP correctly really requires a DOM, YAMLParser is too low-level and YAML I/O is too restrictive). There'd be a lot of work in filling out the format x API matrix to make it coherent. Maybe it's worth it, so far it could be YAGNI. I'm trying to avoid becoming a YAML expert myself, it's complicated and IMO a dead-end. I simply don't have good ideas about how to combine the pieces I want to add into a single coherent API that I'd want to use. How should YAML anchor data be represented in a DOM? What happens if we try to write JSON-incompatible data to a JSON stream? How can we make YAML I/O more flexible without making it yet more complicated? I'm sure these are solvable, but they might need to be solved by someone who's more in tune with the goals of the YAML library.
27–28	Ack, I'll do this conversion shortly.
314–317	Yes. This is documented at `moveFrom`, but added another comment.
356	I was able to mitigate this, eliminating the const-rvalue-reference constructors. (This optimization isn't important for ObjectKey, and for Object I friended the relevant classes instead.) The reasoning here is roughly: we need some syntax to support many KV pairs for an object, or elements in an array function syntax fails because it doesn't format well in long-list cases. Clang-format does a better job if lists are braced lists, in particular it offers you the ability to force one-per-line with a trailing comma in the list. a variadic constructor fails because this must be templated on the arg type, which means args can't be braced list expressions themselves, as those do not have a deducible type. This would hurt map-like object-literal syntax... so we're left with the `std::initializer_list` constructor as the way to pass variable numbers of arguments, in a way that formats nicely, and allows them to be coerced to a chosen type. However `initializer_list` acts like a container of `const<T>`, which would mean naively json::Value{{{{{1}}}}} would result in a deep copy at every level of nesting. Fortunately the standard spells out enough of how the contents of init-lists are constructed that moving the data out of them seems well-defined. I'm not sure how much of this stuff belongs in the comments here - it's more design doc than user guide.

sammccall updated this revision to Diff 143858.Apr 24 2018, 7:57 PM

sammccall marked 2 inline comments as done.

Address some review comments (but no doxygen yet)

labath mentioned this in D46054: [TableGen] Add a general-purpose JSON backend..Apr 25 2018, 8:28 AM

simon_tatham added a subscriber: simon_tatham.Apr 26 2018, 2:00 AM

Just my 5 cents, I feel that including this library would be useful for LLVM and Polly would be a happy user.

Doxygen, comment and error message tweaks.

Harbormaster completed remote builds in B17493: Diff 144305.Apr 27 2018, 2:51 AM

Looks like I forgot to clang-format before sending this :-/

To avoid adding spurious diffs during review, I'll do that before landing (if applicable!).

include/llvm/Support/JSON.h
27–28	Used doxygen comment prefix and wrapped a long example in `\code...\endcode`. Many comments apply to blocks of related functions so are not doxygenated. For the most part no actual doxygen annotations seemed appropriate, and I think rewriting comments to make more use of doxygen often hurts the inline readability of the comments. So I think this is done, but LMK if you disagree.
lib/Support/JSON.cpp
328	There's a tension here between precise errors, consistent/useful errors, and parser complexity. The "invalid number" is a false positive for `elephant` but a true positive for `123,00`. I've renamed these messages to be consistent but with a hint: "Invalid JSON value (number?)", "Invalid JSON value (null?)" etc, and "Invalid JSON value" respectively - WDYT?

simon_tatham added inline comments.Apr 27 2018, 5:23 AM

include/llvm/Support/JSON.h
316	When I built this locally, I had a strange build failure involving this function with g++ 5.4.0 (i.e. the default compiler on Ubuntu 16.04). It reported, at the definition of this function in `JSON.cpp`, this error: error: ‘llvm::raw_ostream& llvm::json::operator<<(llvm::raw_ostream&, const llvm::json::Value&)’ should have been declared inside ‘llvm::json’ for which, apparently, the fix was to add a repeated declaration of this function without the 'friend' keyword and outside the definition of `class Value`. clang was happy with it, on the other hand. I've no idea :-)
lib/Support/JSON.cpp
88	When I tried to build with this change locally, g++ pointed out a spurious semicolon here.
561	Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with. Also, over in D46054 I'm working on a JSON back end for TableGen, for which I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python `json.load`) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double. So, would it be possible to have some method of constructing a json::Value that formats as a 64-bit integer literal?

Fix GCC warnings/errors.
Add note about integer representation.

include/llvm/Support/JSON.h
316	Indeed, thanks for catching this. I think GCC is technically correct here, and most other compilers prefer to be helpful instead :-) Added the namespace-scope declaration.
lib/Support/JSON.cpp
561	Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with. Yes, though is there an existing aware of round-trip safe double formatter in llvm? I suspect this only actually matters when the values are integers, so we should consider your second suggestion first :-) I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python json.load) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double. What about this design: Internally, a numeric value can be an integer or a double. i.e we split the internal `ValueType` `T_Number` into two, `T_Integer` and `T_Double`. public `Kind` remains unchanged. when constructing, you get one or the other depending on the static type when parsing, you get integer unless it has a nonzero decimal part or is out-of-range. `asDouble()` always succeeds `asInteger()` succeeds if the underlying value is integer or if it's a double that can be exactly represented as `int64_t` (same as now) when serializing, you get `%g` for double and the usual representation for integers Open questions: is 1.2e3 a double or an integer? I kind of want the former, which complicates our heuristic. `int64_t` leaves anyone who wants `uint64_t` out in the cold. But adding more options for types is going to lead to madness. Can we live with this limitation? If this sounds good I can start on the changes, but I'd like to defer adding new features to another patch if that's OK. This one is largely moving mostly-battleworn code from clangd, and new features need closer review of the implementation.

simon_tatham added inline comments.Apr 27 2018, 7:20 AM

lib/Support/JSON.cpp
561	That design certainly sounds as if it would do what I need, and a great deal more besides. Another option that would be fine for me personally would be to have a means of constructing a `ValueType` called, say, `T_Custom`, which internally holds a string value, and serializes as exactly that string, unquoted. I could imagine that being used for other unusual purposes as well, such as controlling which of `\uXXXX` and UTF-8 was used to represent a non-ASCII character in a string literal. (And that possibility is simple enough that I could add it myself as part of my patch.)

Oh yes, nearly forgot – you might be interested to know that when I ported my TableGen JSON back end to use this library in place of my previous serialization code, it reduced the running time to 60% of what it previously was and the client code ended up more legible. So in spite of posting nitpicks, I like this patch :-)

Meinersbur added inline comments.Apr 27 2018, 8:31 AM

include/llvm/Support/JSON.h
356	`std::initializer_list` acting like a container of const elements is probably for a reason. I'd prefer no such hacks, but also see that endless copying of elements might justify cheating. cppreference.com mentions that its elements must be copy-initialized, but only since C++17. does anyone else have an opinion on this?

sammccall added a child revision: D46209: [Support] Make JSON handle doubles and int64s losslessly.Apr 27 2018, 1:20 PM

sammccall added inline comments.Apr 27 2018, 2:13 PM

include/llvm/Support/JSON.h
356	I would also prefer no such hacks, but I can't see a way to get good syntax without the recursive copy, and without the hacks :( Any ideas? Also interested in hearing more opinions on whether this is too hairy to rely on. That said, I believe this is valid, per all relevant versions of the standard. cppreference.com mentions that its elements must be copy-initialized, but only since C++17. I think I'm missing your point here - can you explain why this is good or bad, and what the implication is? It also seems to be a cppreference mistake, the copy-initialized requirement is older. C++11 says as if the implementation allocated an array of N elements of type E [...] Each element of that array is copy-initialized i.e. it creates a `T[N]`. We could probably even `const_cast`! C++14 says as if the implementation allocated a temporary array of N elements of type const E [...]. Each element of that array is copy-initialized [...] The implementation is free to allocate the array in read-only memory if an explicit array with the same initializer could be so allocated (emphasis mine). So now the type changes to `const T[N]` (so `const_cast` would be invalid) but mutating mutable members of const objects is allowed, so read-only memory can't be used. The C++17 language is more obscure as if the implementation generated and materialized (7.4) a prvalue of type “array of N const E” [...] Each element of that array is copy-initialized but seems to be the same for these purposes.
lib/Support/JSON.cpp
561	D46209 adds int64 support, and fixes use of %g to retain full precision. `T_Custom` is a cool idea and I definitely don't want to rule it out, but large integers in particular seems like something common that should "just work" if possible. (It's also unclear how a `T_Custom` could solve the problem on the parse side, which is nice to have)

simon_tatham mentioned this in D46209: [Support] Make JSON handle doubles and int64s losslessly.Apr 28 2018, 6:41 AM

sammccall mentioned this in D46035: [clangd] Fix unicode handling, using UTF-16 where LSP requires it..Apr 30 2018, 1:42 AM

simon_tatham added a child revision: D46054: [TableGen] Add a general-purpose JSON backend..Apr 30 2018, 7:57 AM

sammccall added a child revision: D46274: [Support] Harden JSON against invalid UTF-8..Apr 30 2018, 10:27 AM

@chandlerc This is ready for you again whenever you have cycles. To summarize:

I think everything above was resolved except for the question of whether the mutable/initializer_list tricks are too gross to use.
a few more potential users seem pleased about this idea, and @simon_tatham has verified that it works for tablegen in D46054.
I've had a couple of suggestions which have been moved into D46209 (numeric precision) and D46274 (unicode validation), I don't think you personally need to review those.

Meinersbur added inline comments.May 2 2018, 9:46 AM

include/llvm/Support/JSON.h
356	Thanks for looking into the standard (note that cppreference mentions C++14, not C++17 as I claimed). I understood the standard the following way: If the element is copy-initialized, it should call the copy-constructor. In your implementation, it calls `copyFrom`, i.e. it still makes a recursive copy on each level, meaning the problem is not solved (But you save one by not recursively copying again when constructing from initializer_list). However, I looked into the implementation of initializer_list in libc++ and msvc. It is a list of pointers-to-elements, rather than a flat array of objects. That is, no copy-initialization is not happening, it just points to the already existing elements. Not sure whether this is mandated by the standard.

sammccall added inline comments.May 2 2018, 11:52 PM

include/llvm/Support/JSON.h
356	I understood the standard the following way: If the element is copy-initialized, it should call the copy-constructor Ah, this is just the standard being confusing. copy-initialization is basically unrelated to copy-constructors. Certain syntaxes trigger copy-initialization, and others trigger direct-initialization, which behaves slightly differently (copy-initialization won't call `explicit` constructors). http://en.cppreference.com/w/cpp/language/copy_initialization gives this example, which is relevant here: std::string s2 = std::move(s); // this copy-initialization performs a move

dberris added a subscriber: dberris.May 3 2018, 12:04 AM

@chandlerc Ping - I think you're the best person to decide whether a JSON-specific library should be added.
It'd be great to make some progress on that, even if actual review would take longer or be left for others.
(I think simon_tatham has work blocked on this, and there are some clangd changes I'm holding off to avoid diverging)
Are the arguments above compelling? Any more info I can provide?

(I'm going to be sporadically available over the next two weeks, but will keep an eye on this)

Sorry for delays circling back to this.

I think the primary concerns about putting this into the Support library is that we already have one API that is quite similar there: YAMLIO. However, that API is clearly not serving all of the needs of users given that there are mutliple JSON-parsing code paths in the wider project that are already using other libraries (this one, but also Polly). So I think adding this makes lots of sense at this point.

However, I'd like to take this opportunity to try to iterate a bit on the API since it is going into a new, more widely visible home and will almost certainly grow new users as soon as it lands. I'm fairly uncomfortable with the inheritance approach, and I think some of the additional APIs would benefit from (hopefully minor) adjustment. None of this is intended to change the fundamental design in any way though.

include/llvm/Support/JSON.h
63	I understand the pragmatic reason for this, but I am pretty uncomfortable deriving from standard types like this. I'm worried it will hurt portability due to subtle variations in standard libraries, or subtle changes in standard libraries as we switch to C++N+1 or whatever. I would be much more comfortable using something internal and owning the API you export. Unrelatedly (and not blocking anything), I struggle to believe that std::map is the correct tool for the job... Is it just that DenseMap is a pain to use currently? Is it that these are typically small and not worth a DenseMap? I'm sorry to ask this as I suspect you've already explained this in the original review, but I must admit I'm curious.
70–72	This does more than what it says. It hides std::map's operator[] overloads, no matter what they are. Is that what you intended? This is perhaps a good example of why I find inheritance challenging here...
101	Do you expect users to primarily use these typed accessors? Or the underlying vector? Should they take iterators instead of indices? These seem to make the API somewhat hostile to range based for loops. I'm surprised this isn't more a facility provided by the iterator... I find the name somewhat confusing too -- see my comment below for some clarity of why. But what's the advantage here? Why not just `arr[i].asNull()`?
267	A more conventional name would be `getAsFoo` matching `getAs<T>`. Also, is it really worth having all of these? `getAs<bool>` and `getAs<double>` seem just as nice as the non-templated versions to me... But I guess `getAsString` and `getAsInteger` do interesting validation and such.

Thanks for the thoughts, I'll take another pass and try to incorporate
them. But I'm out until Monday, so some answers/questions while this is in
your cache...

(Looks like Phabricator swallowed most of my email reply, please excuse some repetition)

In D45753#1094739, @chandlerc wrote:

Sorry for delays circling back to this.

I think the primary concerns about putting this into the Support library is that we already have one API that is quite similar there: YAMLIO. However, that API is clearly not serving all of the needs of users given that there are mutliple JSON-parsing code paths in the wider project that are already using other libraries (this one, but also Polly). So I think adding this makes lots of sense at this point.

However, I'd like to take this opportunity to try to iterate a bit on the API since it is going into a new, more widely visible home and will almost certainly grow new users as soon as it lands. I'm fairly uncomfortable with the inheritance approach, and I think some of the additional APIs would benefit from (hopefully minor) adjustment. None of this is intended to change the fundamental design in any way though.

That sounds good. Any pushback below is to clarify the reasons for decisions, not to resist changes - happy to change whatever you don't find convincing.

include/llvm/Support/JSON.h
63	Are we worried about inheriting the interface of llvm types, or just `std` types? I've switched to inheriting from `DenseMap` here, but I can wrap it if you prefer. (`std::map`'s ordered-ness made `operator==` and `print()` simple, so there's a few more lines now). `Array` is more complicated: even `SmallVector<Value, 0>` needs `Value`, which sets up a cycle that's hard to break. So I've resorted to wrapping std::vector and exposing a hopefully-sane subset of the API. (Aside: I suspect we do want to track e.g. the C++17 changes to `vector`, but that's a burden that shouldn't fall on whoever does the upgrade)
70–72	There are already no such overloads available, as Value is not default-constructible. I reworded the comment a bit. Agree that inheritance makes this a little subtle to reason about (though obviously if wrapping, this particular operator would look just the same).
101	I expect the most common access patterns to be: mapping to a `vector<Something>` using `fromJSON` iterating over `Value`s using ranged-for (followed by `asFoo()` on each element) these `getFoo(I)` accessors They're not nearly as useful/important as the typed accessors on Object. The benefit is: symmetry with Object makes the API easier to follow (thus `size_t` rather than `iterator`) convenient when particular indices have semantics (like `executeCommand` parameter list in LSP), rather than being a homogeneous collection more readable than `operator[]` when calling on a pointer, which is common due to the `if (Array* A = V.asArray())` idiom - `(*A)[i].asNull()` vs `A->getNull(i)`. These seem to make the API somewhat hostile to range based for loops. I'm surprised this isn't more a facility provided by the iterator... Can you elaborate on what kind of loop you'd like to write? I tend to find clever iterator APIs fairly undiscoverable/obscure, but I'm not sure what I'm missing.
267	A more conventional name would be getAsFoo matching getAs<T>. Sure, in Java ;-) Adding "get" because a function must have a verb seems like cargo culting to me - we're trying to signal the effects, but there aren't any! Maybe it's my exposure, but I don't see any ambiguity as to what `asBoolean()` might do. I'd like to propose a style guide change to be more like https://swift.org/documentation/api-design-guidelines/#strive-for-fluent-usage here. Meanwhile, I can change this to match the style guide as it stands if you think this is important. I do think it hurts the signal/noise. Also, is it really worth having all of these? getAs<bool> and getAs<double> seem just as nice as the non-templated versions to me... I do prefer the non-templated version: `getAs<bool>` etc isn't fewer things to understand, and it's not shorter, it's just fewer distinct tokens. I'm not sure what we're trying to conserve here. the non-templated functions are more discoverable: easier to read in the header, and work better with code completion, and easier to search for `getAsNumber` is a better name than `getAs<double>`, as the relevant concept here is JSON's "Number" not the types used to represent them in C++. (this is "interesting validation and such" I think) I don't want people to think they can write `getAs<int>()`, and if they do I want the error message to be easy to understand. Templates make both of these harder.

Array wraps vector instead of inheriting it.
Object inherits DenseMap instead of std::map
Change some Object accessors to use StringRef instead of ObjectKey.
clang-format, because it was getting out of control.

Herald added a subscriber: mgrang. · View Herald TranscriptMay 15 2018, 9:27 AM

sammccall added inline comments.May 15 2018, 9:32 AM

include/llvm/Support/JSON.h
63	(And I forgot to mention a new dirty trick here: the explicit use of DenseMapInfo<StringRef> works since ObjectKey can already be implicitly converted back and forth to StringRef. This saves a bunch of unreadable boilerplate which has to go at the top of the file, but is admittedly a bit fragile - WDYT?)

@chandlerc: Back-from-vacation ping.
(A short response like "yes, do change X" is fine, keeps me busy :-)

Thanks for the ping!

include/llvm/Support/JSON.h
63	I'm somewhat concerned with inheritance generally. I feel like wrapping is just a better pattern and more easily understood and debugged. The additional code doesn't worry me much.
101	Definitely not advocating for more clever iterator API. Mostly pointing out that index-oriented APIs are particularly hard to use with range based for loops. Whereas, using `foo[i].asNull()` is fine because the indexing is orthogonal to the query and can be replaced with something more range-based or iterator based if useful. If the issue is just that operator[] is annoying with pointers, how about just a method for indexing so that you can use `->` to call it? `A->at(i).asNull()` or `A->get(i).asNull()` seem fine?
267	I find the arguments against templates compelling FWIW. That said, while I actually support not using `get` superfluously in many cases, I don't actually like it here. There is an interesting and potentially complex conversion happening, and I personally much prefer `getAsFoo()` for that. The only time I'm really quite happy omitting the `get` is when it is truly just an accessor and not providing any "interesting" logic (a higher bar than the Swift convention you cite in the llvm-dev thread). Anyways, for now, I suggest the `getAsNumber` pattern.

sammccall marked an inline comment as done.May 25 2018, 10:32 AM

sammccall added inline comments.

include/llvm/Support/JSON.h
63	Object now wraps DenseMap instead of inheriting it. I've removed some less-frequently used methods (resize, reserve, etc). I'm reluctant to simplify commonly used interfaces in ways that might be surprising (like fewer insert overloads where it may add more copies).
101	The `Object` case is different and more important, and worth resolving first. `Object::getNumber(StringRef)` returns a number if the property exists and is a number. Dropping this accessor and writing `if (auto N = O->get("foo").getAsNumber()) ...` isn't viable as it crashes if the property is missing. Instead you'd have to write `if (auto MN = O->get("foo")) if (auto N = MN->getAsNumber()) ...`. This is the overwhelmingly common pattern for parsing objects, and I do think we should support this in one expression and preferably one call. If you disagree with that, let's talk about that first :-) If the issue is just that operator[] is annoying with pointers, how about just a method for indexing so that you can use -> to call it? A->at(i).asNull() or A->get(i).asNull() seem fine? So assuming we're going to have these methods for object, I think the biggest issue is consistency with object. I do also think `A->getNumber(I)` is a lot nicer than `A->get(I).getAsNumber()`. But neither of these are hard objections and I'm happy to just drop all of the `Array::get*`.
267	OK. I'd really like to avoid three-word names here so I'll have a think about this over the weekend. I do think `get` is always superfluous - if you want to signal that something tricky is happening, `get` doesn't do that. This is logically just accessing a member of a discriminated union, which I don't think is anything tricky at all, so I'm not sure what the verb should be. But I'll find something.

Wrap DenseMap instead of inheriting, drop some container member functions.

Harbormaster completed remote builds in B18603: Diff 148627.May 25 2018, 10:33 AM

Rename Value accessors asNumber() -> getAsNumber().
Remove typed element accessors on Array.

Harbormaster completed remote builds in B18680: Diff 148916.May 29 2018, 8:17 AM

OK, i'm done being recalcitrant, I think everything is addressed now :-)

include/llvm/Support/JSON.h
101	I've dropped the typed `Array::get`, but kept `Object::get` as explained above. We now have this glorious code in the test: `((A)[4].getAsArray())[1].getAsInteger()` but that's not terribly representative of likely real-world use.
267	I tried a few out with a small survey (N=4). From most to least preferred `asNumber()` - this was most people's favorite `getAsNumber()` - everyone could live with this; one person liked it as much as `asNumber` `getNumber()` - an awkward compromise `toNumber()` - probably sounds too much like a conversion `number()` - I really like this, but nobody else does I've changed them to `getAsNumber`, but I'm still a bit conflicted here.

@chandlerc Ping. Hope you had fun in committee :-)

It feels a bit strange to be pinging a code review that's not even my own :-), but is this still being reviewed, or is it now entirely stalled?

If the latter, I can rework my JSON tablegen patch in D46054 to reintroduce the (much more trivial) JSON output code from its first draft. But I'd prefer to use this library if it still looks like landing.

@chandlerc Ping again :-)
I'm going on a month's leave fairly soon (a week-ish exact date TBD), it'd be great to get this wrapped up before then. Or a realistic estimate - a few things depending on this now.

@simon_tatham The plan is still for this to land. Chandler's schedule has been busy and also disrupted, so I'm hoping to catch him at a weak moment :-)

Ok, I generally think this is looking good to go in... As mentioned previously, I think there is a good rationale around supporting this despite the existing other/similar libraries.

The code quality is crazy high. Awesome job there.

I have a bunch of pretty minor questions or suggestions below. But I think they're mostly optional stuff. Even if you decide to make the changes, I'm even fine with those being follow-up patches and such as they seem unlikely to wildly change the API surface here, and none of them seem to represent serious bugs or anything.

So don't block on any of this to land the patch and move to a more iterative model. The patch as a whole is LGTM, and I'd just suggest folding the suggestions below that make sense to you to fold into the initial one you land, and follow-up on anything else.

include/llvm/Support/JSON.h
293	Does some compiler reject this as just `std::enable_if<!std::is_same<T, bool>::value>::type`? I'd guess MSVC might complain about the use of ! but that would make me sad.
354	I would write this as `std::modf(D, &D) == 0.0`. The return is a double, and this way it looks less like you're checking for the absence of an error code and more like actually checking the fractional component is zero.
355–356	I would explicitly convert the RHS of these two to a double to make it clear that this comparison takes place is double precision floating point, not in an integral type. I think that would help the reader out, because otherwise it could look a bit like a tautology.
397	Sadly, I think this is invalid by the pedantic wording of the spec. Because `as<T>()` binds a reference to the pointer, I think that the pointer has to point to a valid object. And it doesn't yet... I would just inline the `reinterpret_cast` here despite that repetition because I think all other users of `as<T>()` are correct. However, since `as<T>()` is a private implementation detail, you could alternatively just return the pointer. Really up to you.
459–461	Sink this to be a non-friend operator like operator `==` and `!=`? Or hoist those to be friend operators? I don't have a strong opinion about one pattern or the other, mostly just advocating for consistency whichever way you want to go.
464–465	Are there likely to be a lot of these? Might make sense to leave a note for the future that this could be optimized a lot by having a custom `StringRef` like implementation that encodes whether the data is owned w/o spending an extra pointer on it. Clearly not needed in this patch, just a thought for the future if it comes up.
475–478	Consider inlining this above? I actually think it makes the `std::initializer_list<KV>` used in a public API much easier to read if this trivial wrapper is immediately visible. Oh, I guess this is kept out-of-line in order to define `ObjectKey` after `Object`, `Array`, and `Value`? Not sure this ordering is buying that much in terms of readability. Given that you can define `ObjectKey` first, I might just do that and avoid the need to make this out-of-line... Anyways, this is just an optional suggestion, I'm fine with whichever way you end up.
491–492	Given that these have `JSON` in the name, does it make sense to move them out of the `json` namespace?
555–566	This is really nice btw. =D
597–600	Would it be a readability improvement to have this be `llvm::parseJSON` rather than `llvm::json::parse`?

This revision is now accepted and ready to land.Jul 6 2018, 2:21 PM

Herald added a subscriber: omtcyfz. · View Herald TranscriptJul 6 2018, 2:21 PM

Thanks a lot for the review!
Made most of the changes. The one I'm least certain about: I didn't move json::parse --> parseJSON.
As you suggest, going to land this patch and happy to make any followup changes, I don't expect much API churn (even parse doesn't tend to have a lot of callsites).

include/llvm/Support/JSON.h
293	That works fine. I just didn't look critically enough at this once it compiled :-)
475–478	I wish this was just readability :-( The problem is a circular dependency: defining `KV` needs `Value` to be complete, `Value` requires `Object` to be complete (because of `Union`). So the `Object` -> `Value` -> `KV` ordering is necessary, I think. `ObjectKey` could indeed be hoisted higher, but that alone doesn't buy anything, I think.
491–492	I'm not sure. The intent is these are part of a family of functions found via ADL on the second argument. When used in that way, they are always available (because of ADL on the first argument), and they don't particularly "belong "in namespace llvm (they overload for builtin and `std` types). Most of these particular functions are for generic code (like ObjectMapper, or `fromJSON(Value, std::vector)` itself) . e.g. calling `Value.getAsNumber()` is generally nicer than `fromJSON(Value, double)`. So moving these particular overloads into `namespace llvm` seems to give them more prominence compared to the rest of the API than seems warranted. c.f. `llvm::json::parse` vs `llvm::parseJSON`, which is a main entrypoint.
555–566	:-) We had plenty of this code in clangd to test the API on. The one thing that's pretty awful is that when parsing fails there's no detailed representation of the error. But worse seems to be better for us so far.
597–600	Hmm, this seems like a wash to me. `parseJSON` is slightly shorter than `json::parse`, while the latter hints (accurately) at a library boundary. Given the other entrypoints (`Value`, `ObjectMapper` etc) are in `namespace json`, I lean towards keeping this there too - among other things, it makes the spelling/capitalization of `json::parse`, `json::Value` etc easier to remember I think. Certainly happy to change this if you have a strong opinion here though.

Closed by commit rL336534: Lift JSON library from clang-tools-extra/clangd to llvm/Support. (authored by sammccall). · Explain WhyJul 9 2018, 3:10 AM

This revision was automatically updated to reflect the committed changes.

sammccall marked 3 inline comments as done.

Revision Contents

Path

Size

include/

llvm/

Support/

JSON.h

507 lines

lib/

Support/

CMakeLists.txt

1 line

JSON.cpp

586 lines

unittests/

Support/

CMakeLists.txt

1 line

JSONTest.cpp

291 lines

Diff 143338

include/llvm/Support/JSON.h

This file was added.

				//===--- JSON.h - JSON values, parsing and serialization - C++ ---------*-===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===---------------------------------------------------------------------===//

				chandlercUnsubmitted Not Done Reply Inline Actions One thing that would help me even as I start to dig into this would be an overview comment at the top of the file... What is the intended / expected usage pattern? Mention alternatives given that it seems unlikely we'll eliminate YAMLIO immediately. Why use one vs. the other? I'd also like to see your initial thoughts on why this library is better as a separate API/library rather than a separate interface that is part and parcel of the YAML library we already have. I don't have any real opinion (yet) about what does or doesn't make sense, and having your perspective on this would help me form an opinion I suspect. chandlerc: One thing that would help me even as I start to dig into this would be an overview comment at…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Oops, the big comment on Value was meant to serve this purpose, but I can't find a way to get it to be first in the file. Added a real file overview - right level of detail? There's a few reasons to have this separate from the YAML library, and I'm not sure which are important or even good. a YAML based dom/parser will have a poor API for people who want to deal with JSON - it's too complicated and the names are incorrect. The thing I like most about this API is that the usages are simple. neither the YAMLParser nor YAML I/O design is useful for the use cases I've run into so far (e.g. parsing LSP correctly really requires a DOM, YAMLParser is too low-level and YAML I/O is too restrictive). There'd be a lot of work in filling out the format x API matrix to make it coherent. Maybe it's worth it, so far it could be YAGNI. I'm trying to avoid becoming a YAML expert myself, it's complicated and IMO a dead-end. I simply don't have good ideas about how to combine the pieces I want to add into a single coherent API that I'd want to use. How should YAML anchor data be represented in a DOM? What happens if we try to write JSON-incompatible data to a JSON stream? How can we make YAML I/O more flexible without making it yet more complicated? I'm sure these are solvable, but they might need to be solved by someone who's more in tune with the goals of the YAML library. sammccall: Oops, the big comment on Value was meant to serve this purpose, but I can't find a way to get…
				#ifndef LLVM_SUPPORT_JSON_H
				#define LLVM_SUPPORT_JSON_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/Support/raw_ostream.h"
				#include <map>

				namespace llvm {
				namespace json {
				class Array;
				class ObjectKey;
				class Value;

				// An Object is a JSON object, which maps strings to heterogenous JSON values.
				// The string keys may be owned or references.
				class Object : public std::map<ObjectKey, Value> {
				chandlercUnsubmitted Done Reply Inline Actions Feel free to defer this until higher level stuff is addressed, but here and throughout this entire file, you have excellent comments but don't use a doxygen comment prefix. I think all of these should be converted to be actual doxygen comments at whatever point this is moving forward. chandlerc: Feel free to defer this until higher level stuff is addressed, but here and throughout this…
				sammccallAuthorUnsubmitted Done Reply Inline Actions Ack, I'll do this conversion shortly. sammccall: Ack, I'll do this conversion shortly.
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Used doxygen comment prefix and wrapped a long example in `\code...\endcode`. Many comments apply to blocks of related functions so are not doxygenated. For the most part no actual doxygen annotations seemed appropriate, and I think rewriting comments to make more use of doxygen often hurts the inline readability of the comments. So I think this is done, but LMK if you disagree. sammccall: Used doxygen comment prefix and wrapped a long example in `\code...\endcode`. Many comments…
				public:
				explicit Object() {}
				// Use a custom struct for list-init, because pair forces extra copies.
				struct KV;
				explicit Object(std::initializer_list<KV> Properties);

				// Allow [] as if Value was default-constructible as null.
				Value &operator[](const ObjectKey &K);
				Value &operator[](ObjectKey &&K);

				// Look up a property, returning nullptr if it doesn't exist.
				Value *get(const ObjectKey &K);
				const Value *get(const ObjectKey &K) const;
				// Typed accessors return None/nullptr if
				// - the property doesn't exist
				// - or it has the wrong type
				llvm::Optional<std::nullptr_t> getNull(const ObjectKey &K) const;
				llvm::Optional<bool> getBoolean(const ObjectKey &K) const;
				llvm::Optional<double> getNumber(const ObjectKey &K) const;
				llvm::Optional<int64_t> getInteger(const ObjectKey &K) const;
				llvm::Optional<llvm::StringRef> getString(const ObjectKey &K) const;
				const json::Object *getObject(const ObjectKey &K) const;
				json::Object *getObject(const ObjectKey &K);
				const json::Array *getArray(const ObjectKey &K) const;
				json::Array *getArray(const ObjectKey &K);
				};

				// An Array is a JSON array, which contains heterogeneous JSON values.
				class Array : public std::vector<Value> {
				public:
				explicit Array() {}
				labathUnsubmitted Done Reply Inline Actions It would be good to emphasize that the returned object's lifetime is independent of the string being parsed (i.e. the contained strings are not references to the strings in the original text). labath: It would be good to emphasize that the returned object's lifetime is independent of the string…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Done (on the function doc for `parse()' sammccall: Done (on the function doc for `parse()'
				explicit Array(std::initializer_list<Value> Elements);
				template <typename Collection> explicit Array(const Collection &C) {
				for (const auto &V : C)
				emplace_back(V);
				chandlercUnsubmitted Not Done Reply Inline Actions I understand the pragmatic reason for this, but I am pretty uncomfortable deriving from standard types like this. I'm worried it will hurt portability due to subtle variations in standard libraries, or subtle changes in standard libraries as we switch to C++N+1 or whatever. I would be much more comfortable using something internal and owning the API you export. Unrelatedly (and not blocking anything), I struggle to believe that std::map is the correct tool for the job... Is it just that DenseMap is a pain to use currently? Is it that these are typically small and not worth a DenseMap? I'm sorry to ask this as I suspect you've already explained this in the original review, but I must admit I'm curious. chandlerc: I understand the pragmatic reason for this, but I am pretty uncomfortable deriving from…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Are we worried about inheriting the interface of llvm types, or just `std` types? I've switched to inheriting from `DenseMap` here, but I can wrap it if you prefer. (`std::map`'s ordered-ness made `operator==` and `print()` simple, so there's a few more lines now). `Array` is more complicated: even `SmallVector<Value, 0>` needs `Value`, which sets up a cycle that's hard to break. So I've resorted to wrapping std::vector and exposing a hopefully-sane subset of the API. (Aside: I suspect we do want to track e.g. the C++17 changes to `vector`, but that's a burden that shouldn't fall on whoever does the upgrade) sammccall: Are we worried about inheriting the interface of llvm types, or just `std` types? I've…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions (And I forgot to mention a new dirty trick here: the explicit use of DenseMapInfo<StringRef> works since ObjectKey can already be implicitly converted back and forth to StringRef. This saves a bunch of unreadable boilerplate which has to go at the top of the file, but is admittedly a bit fragile - WDYT?) sammccall: (And I forgot to mention a new dirty trick here: the explicit use of DenseMapInfo<StringRef>…
				chandlercUnsubmitted Done Reply Inline Actions I'm somewhat concerned with inheritance generally. I feel like wrapping is just a better pattern and more easily understood and debugged. The additional code doesn't worry me much. chandlerc: I'm somewhat concerned with inheritance generally. I feel like wrapping is just a better…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Object now wraps DenseMap instead of inheriting it. I've removed some less-frequently used methods (resize, reserve, etc). I'm reluctant to simplify commonly used interfaces in ways that might be surprising (like fewer insert overloads where it may add more copies). sammccall: Object now wraps DenseMap instead of inheriting it. I've removed some less-frequently used…
				}

				// Typed accessors return None/nullptr if the element has the wrong type.
				llvm::Optional<std::nullptr_t> getNull(size_t I) const;
				llvm::Optional<bool> getBoolean(size_t I) const;
				llvm::Optional<double> getNumber(size_t I) const;
				llvm::Optional<int64_t> getInteger(size_t I) const;
				llvm::Optional<llvm::StringRef> getString(size_t I) const;
				const Object *getObject(size_t I) const;
				chandlercUnsubmitted Not Done Reply Inline Actions This does more than what it says. It hides std::map's operator[] overloads, no matter what they are. Is that what you intended? This is perhaps a good example of why I find inheritance challenging here... chandlerc: This does more than what it says. It hides std::map's operator[] overloads, no matter what they…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions There are already no such overloads available, as Value is not default-constructible. I reworded the comment a bit. Agree that inheritance makes this a little subtle to reason about (though obviously if wrapping, this particular operator would look just the same). sammccall: There are already no such overloads available, as Value is not default-constructible. I…
				Object *getObject(size_t I);
				const Array *getArray(size_t I) const;
				Array *getArray(size_t I);
				};

				// A Value is an JSON value of unknown type.
				// They can be copied, but should generally be moved.
				//
				// === Composing values ===
				//
				// You can implicitly construct Values from:
				// - strings: std::string, SmallString, formatv, StringRef, char*
				// (char*, and StringRef are references, not copies!)
				// - numbers
				// - booleans
				// - null: nullptr
				// - arrays: {"foo", 42.0, false}
				// - serializable things: types with toJSON(const T&)->Value, found by ADL
				//
				// They can also be constructed from object/array helpers:
				// - json::obj is a type like map<ObjectKey, Value>
				// - json::ary is a type like vector<Value>
				// These can be list-initialized, or used to build up collections in a loop.
				// json::ary(Collection) converts all items in a collection to Values.
				labathUnsubmitted Done Reply Inline Actions Leftover references to `obj` and `ary` labath: Leftover references to `obj` and `ary`
				//
				// === Inspecting values ===
				//
				// Each Value is one of the JSON kinds:
				// null (nullptr_t)
				chandlercUnsubmitted Not Done Reply Inline Actions Do you expect users to primarily use these typed accessors? Or the underlying vector? Should they take iterators instead of indices? These seem to make the API somewhat hostile to range based for loops. I'm surprised this isn't more a facility provided by the iterator... I find the name somewhat confusing too -- see my comment below for some clarity of why. But what's the advantage here? Why not just `arr[i].asNull()`? chandlerc: Do you expect users to primarily use these typed accessors? Or the underlying vector? Should…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I expect the most common access patterns to be: mapping to a `vector<Something>` using `fromJSON` iterating over `Value`s using ranged-for (followed by `asFoo()` on each element) these `getFoo(I)` accessors They're not nearly as useful/important as the typed accessors on Object. The benefit is: symmetry with Object makes the API easier to follow (thus `size_t` rather than `iterator`) convenient when particular indices have semantics (like `executeCommand` parameter list in LSP), rather than being a homogeneous collection more readable than `operator[]` when calling on a pointer, which is common due to the `if (Array* A = V.asArray())` idiom - `(A)[i].asNull()` vs `A->getNull(i)`. These seem to make the API somewhat hostile to range based for loops. I'm surprised this isn't more a facility provided by the iterator... Can you elaborate on what kind of loop you'd like to write? I tend to find clever iterator APIs fairly undiscoverable/obscure, but I'm not sure what I'm missing. sammccall:* I expect the most common access patterns to be: 1. mapping to a `vector<Something>` using…
				chandlercUnsubmitted Not Done Reply Inline Actions Definitely not advocating for more clever iterator API. Mostly pointing out that index-oriented APIs are particularly hard to use with range based for loops. Whereas, using `foo[i].asNull()` is fine because the indexing is orthogonal to the query and can be replaced with something more range-based or iterator based if useful. If the issue is just that operator[] is annoying with pointers, how about just a method for indexing so that you can use `->` to call it? `A->at(i).asNull()` or `A->get(i).asNull()` seem fine? chandlerc: Definitely not advocating for more clever iterator API. Mostly pointing out that index-oriented…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions The `Object` case is different and more important, and worth resolving first. `Object::getNumber(StringRef)` returns a number if the property exists and is a number. Dropping this accessor and writing `if (auto N = O->get("foo").getAsNumber()) ...` isn't viable as it crashes if the property is missing. Instead you'd have to write `if (auto MN = O->get("foo")) if (auto N = MN->getAsNumber()) ...`. This is the overwhelmingly common pattern for parsing objects, and I do think we should support this in one expression and preferably one call. If you disagree with that, let's talk about that first :-) If the issue is just that operator[] is annoying with pointers, how about just a method for indexing so that you can use -> to call it? A->at(i).asNull() or A->get(i).asNull() seem fine? So assuming we're going to have these methods for object, I think the biggest issue is consistency with object. I do also think `A->getNumber(I)` is a lot nicer than `A->get(I).getAsNumber()`. But neither of these are hard objections and I'm happy to just drop all of the `Array::get`. sammccall:* The `Object` case is different and more important, and worth resolving first. `Object…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I've dropped the typed `Array::get`, but kept `Object::get` as explained above. We now have this glorious code in the test: `((A)[4].getAsArray())[1].getAsInteger()` but that's not terribly representative of likely real-world use. sammccall: I've dropped the typed `Array::get`, but kept `Object::get` as explained above. We now have…
				// boolean (bool)
				// number (double)
				// string (StringRef)
				// array (json::Array)
				// object (json::Object)
				//
				// The kind can be queried directly, or implicitly via the typed accessors:
				// if (Optional<StringRef> S = E.asString()
				// assert(E.kind() == Value::String);
				//
				// Array and Object also have typed indexing accessors for easy traversal:
				// Expected<Value> E = parse(R"( {"options": {"font": "sans-serif"}} )");
				// if (json::obj* O = E->asObject())
				// if (json::obj* Opts = O->getObject("options"))
				// if (Optional<StringRef> Font = Opts->getString("font"))
				// assert(Opts->at("font").kind() == Value::String);
				//
				// === Converting values to C++ types ===
				//
				// The convention is to have a deserializer function findable via ADL:
				// fromJSON(const json::Value&, T&)->bool
				// Deserializers are provided for:
				// - bool
				// - int
				// - double
				// - std::string
				// - vector<T>, where T is deserializable
				// - map<string, T>, where T is deserializable
				// - Optional<T>, where T is deserializable
				//
				// ObjectMapper can help writing fromJSON() functions for object types:
				// bool fromJSON(const Value &E, MyStruct &R) {
				// ObjectMapper O(E);
				// if (!O \|\| !O.map("mandatory_field", R.MandatoryField))
				// return false;
				// O.map("optional_field", R.OptionalField);
				// return true;
				// }
				//
				// === Serialization ===
				//
				// Values can be serialized to JSON:
				// 1) raw_ostream << Value // Basic formatting.
				// 2) raw_ostream << formatv("{0}", Value) // Basic formatting.
				// 3) raw_ostream << formatv("{0:2}", Value) // Pretty-print with indent 2.
				//
				// And parsed:
				// Expected<Value> E = json::parse("[1, 2, null]");
				// assert(E && E->kind() == Value::Array);
				class Value {
				public:
				enum Kind {
				Null,
				Boolean,
				Number,
				String,
				Array,
				Object,
				};

				// It would be nice to have Value() be null. But that would make {} null too.
				Value(const Value &M) { copyFrom(M); }
				Value(Value &&M) { moveFrom(std::move(M)); }
				// "cheating" move-constructor for moving from initializer_list.
				Value(const Value &&M) { moveFrom(std::move(M)); }
				Value(std::initializer_list<Value> Elements);
				Value(json::Array &&Elements) : Type(T_Array) {
				create<json::Array>(std::move(Elements));
				}
				Value(json::Object &&Properties) : Type(T_Object) {
				create<json::Object>(std::move(Properties));
				}
				// Strings: types with value semantics.
				Value(std::string &&V) : Type(T_String) { create<std::string>(std::move(V)); }
				Value(const std::string &V) : Type(T_String) { create<std::string>(V); }
				Value(const llvm::SmallVectorImpl<char> &V) : Type(T_String) {
				create<std::string>(V.begin(), V.end());
				}
				Value(const llvm::formatv_object_base &V) : Value(V.str()){};
				// Strings: types with reference semantics.
				Value(llvm::StringRef V) : Type(T_StringRef) { create<llvm::StringRef>(V); }
				Value(const char *V) : Type(T_StringRef) { create<llvm::StringRef>(V); }
				Value(std::nullptr_t) : Type(T_Null) {}
				// Prevent implicit conversions to boolean.
				template <typename T, typename = typename std::enable_if<
				std::is_same<T, bool>::value>::type>
				Value(T B) : Type(T_Boolean) {
				create<bool>(B);
				}
				// Numbers: arithmetic types that are not boolean.
				template <
				typename T,
				typename = typename std::enable_if<std::is_arithmetic<T>::value>::type,
				typename = typename std::enable_if<std::integral_constant<
				bool, !std::is_same<T, bool>::value>::value>::type>
				Value(T D) : Type(T_Number) {
				create<double>(D);
				}
				// Types with a toJSON(const T&)->Value function, found by ADL.
				template <typename T,
				typename = typename std::enable_if<std::is_same<
				Value, decltype(toJSON((const T )nullptr))>::value>>
				Value(const T &V) : Value(toJSON(V)) {}

				Value &operator=(const Value &M) {
				destroy();
				copyFrom(M);
				return *this;
				}
				Value &operator=(Value &&M) {
				destroy();
				moveFrom(std::move(M));
				return *this;
				}
				~Value() { destroy(); }

				Kind kind() const {
				switch (Type) {
				case T_Null:
				return Null;
				case T_Boolean:
				return Boolean;
				case T_Number:
				return Number;
				case T_String:
				case T_StringRef:
				return String;
				case T_Object:
				return Object;
				case T_Array:
				return Array;
				}
				llvm_unreachable("Unknown kind");
				}

				// Typed accessors return None/nullptr if the Value is not of this type.
				llvm::Optional<std::nullptr_t> asNull() const {
				if (LLVM_LIKELY(Type == T_Null))
				return nullptr;
				return llvm::None;
				}
				llvm::Optional<bool> asBoolean() const {
				if (LLVM_LIKELY(Type == T_Boolean))
				return as<bool>();
				return llvm::None;
				}
				llvm::Optional<double> asNumber() const {
				if (LLVM_LIKELY(Type == T_Number))
				return as<double>();
				return llvm::None;
				}
				llvm::Optional<int64_t> asInteger() const {
				if (LLVM_LIKELY(Type == T_Number)) {
				double D = as<double>();
				if (LLVM_LIKELY(std::modf(D, &D) == 0 &&
				D >= std::numeric_limits<int64_t>::min() &&
				D <= std::numeric_limits<int64_t>::max()))
				return D;
				}
				return llvm::None;
				}
				llvm::Optional<llvm::StringRef> asString() const {
				if (Type == T_String)
				return llvm::StringRef(as<std::string>());
				if (LLVM_LIKELY(Type == T_StringRef))
				return as<llvm::StringRef>();
				chandlercUnsubmitted Not Done Reply Inline Actions A more conventional name would be `getAsFoo` matching `getAs<T>`. Also, is it really worth having all of these? `getAs<bool>` and `getAs<double>` seem just as nice as the non-templated versions to me... But I guess `getAsString` and `getAsInteger` do interesting validation and such. chandlerc: A more conventional name would be `getAsFoo` matching `getAs<T>`. Also, is it really worth…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions A more conventional name would be getAsFoo matching getAs<T>. Sure, in Java ;-) Adding "get" because a function must have a verb seems like cargo culting to me - we're trying to signal the effects, but there aren't any! Maybe it's my exposure, but I don't see any ambiguity as to what `asBoolean()` might do. I'd like to propose a style guide change to be more like https://swift.org/documentation/api-design-guidelines/#strive-for-fluent-usage here. Meanwhile, I can change this to match the style guide as it stands if you think this is important. I do think it hurts the signal/noise. Also, is it really worth having all of these? getAs<bool> and getAs<double> seem just as nice as the non-templated versions to me... I do prefer the non-templated version: `getAs<bool>` etc isn't fewer things to understand, and it's not shorter, it's just fewer distinct tokens. I'm not sure what we're trying to conserve here. the non-templated functions are more discoverable: easier to read in the header, and work better with code completion, and easier to search for `getAsNumber` is a better name than `getAs<double>`, as the relevant concept here is JSON's "Number" not the types used to represent them in C++. (this is "interesting validation and such" I think) I don't want people to think they can write `getAs<int>()`, and if they do I want the error message to be easy to understand. Templates make both of these harder. sammccall: > A more conventional name would be getAsFoo matching getAs<T>. Sure, in Java ;-) Adding "get"…
				chandlercUnsubmitted Not Done Reply Inline Actions I find the arguments against templates compelling FWIW. That said, while I actually support not using `get` superfluously in many cases, I don't actually like it here. There is an interesting and potentially complex conversion happening, and I personally much prefer `getAsFoo()` for that. The only time I'm really quite happy omitting the `get` is when it is truly just an accessor and not providing any "interesting" logic (a higher bar than the Swift convention you cite in the llvm-dev thread). Anyways, for now, I suggest the `getAsNumber` pattern. chandlerc: I find the arguments against templates compelling FWIW. That said, while I actually support…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions OK. I'd really like to avoid three-word names here so I'll have a think about this over the weekend. I do think `get` is always superfluous - if you want to signal that something tricky is happening, `get` doesn't do that. This is logically just accessing a member of a discriminated union, which I don't think is anything tricky at all, so I'm not sure what the verb should be. But I'll find something. sammccall: OK. I'd really like to avoid three-word names here so I'll have a think about this over the…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I tried a few out with a small survey (N=4). From most to least preferred `asNumber()` - this was most people's favorite `getAsNumber()` - everyone could live with this; one person liked it as much as `asNumber` `getNumber()` - an awkward compromise `toNumber()` - probably sounds too much like a conversion `number()` - I really like this, but nobody else does I've changed them to `getAsNumber`, but I'm still a bit conflicted here. sammccall: I tried a few out with a small survey (N=4). From most to least preferred - `asNumber()`…
				return llvm::None;
				}
				const json::Object *asObject() const {
				return LLVM_LIKELY(Type == T_Object) ? &as<json::Object>() : nullptr;
				}
				json::Object *asObject() {
				return LLVM_LIKELY(Type == T_Object) ? &as<json::Object>() : nullptr;
				}
				const json::Array *asArray() const {
				return LLVM_LIKELY(Type == T_Array) ? &as<json::Array>() : nullptr;
				}
				json::Array *asArray() {
				return LLVM_LIKELY(Type == T_Array) ? &as<json::Array>() : nullptr;
				}

				friend llvm::raw_ostream &operator<<(llvm::raw_ostream &, const Value &);

				private:
				void destroy();
				void copyFrom(const Value &M);
				// We allow moving from const Values, by marking all members as mutable!
				// This hack is needed to support initializer-list syntax efficiently.
				// (std::initializer_list<T> is a container of const T).
				void moveFrom(const Value &&M);

				template <typename T, typename... U> void create(U &&... V) {
				chandlercUnsubmitted Done Reply Inline Actions Does some compiler reject this as just `std::enable_if<!std::is_same<T, bool>::value>::type`? I'd guess MSVC might complain about the use of ! but that would make me sad. chandlerc: Does some compiler reject this as just `std::enable_if<!std::is_same<T, bool>::value>::type`?
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions That works fine. I just didn't look critically enough at this once it compiled :-) sammccall: That works fine. I just didn't look critically enough at this once it compiled :-)
				new (&as<T>()) T(std::forward<U>(V)...);
				}
				template <typename T> T &as() const {
				return reinterpret_cast<T >(Union.buffer);
				}

				template <typename Indenter>
				void print(llvm::raw_ostream &, const Indenter &) const;
				friend struct llvm::format_provider<llvm::json::Value>;

				enum ValueType : char {
				T_Null,
				T_Boolean,
				T_Number,
				T_StringRef,
				T_String,
				T_Object,
				T_Array,
				};
				mutable ValueType Type;
				mutable llvm::AlignedCharArrayUnion<bool, double, llvm::StringRef,
				std::string, json::Array, json::Object>
				Union;
				simon_tathamUnsubmitted Done Reply Inline Actions When I built this locally, I had a strange build failure involving this function with g++ 5.4.0 (i.e. the default compiler on Ubuntu 16.04). It reported, at the definition of this function in `JSON.cpp`, this error: error: ‘llvm::raw_ostream& llvm::json::operator<<(llvm::raw_ostream&, const llvm::json::Value&)’ should have been declared inside ‘llvm::json’ for which, apparently, the fix was to add a repeated declaration of this function without the 'friend' keyword and outside the definition of `class Value`. clang was happy with it, on the other hand. I've no idea :-) simon_tatham: When I built this locally, I had a strange build failure involving this function with g++ 5.4.0…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Indeed, thanks for catching this. I think GCC is technically correct here, and most other compilers prefer to be helpful instead :-) Added the namespace-scope declaration. sammccall: Indeed, thanks for catching this. I think GCC is technically correct here, and most other…
				};
				MeinersburUnsubmitted Done Reply Inline Actions Is this `mutable` here also required for "cheating"? Meinersbur: Is this `mutable` here also required for "cheating"?
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Yes. This is documented at `moveFrom`, but added another comment. sammccall: Yes. This is documented at `moveFrom`, but added another comment.

				bool operator==(const Value &, const Value &);
				inline bool operator!=(const Value &L, const Value &R) { return !(L == R); }

				// ObjectKey is a used to capture keys in Object. Like Value but:
				// - only strings are allowed
				// - it's optimized for the string literal case (Owned == nullptr)
				class ObjectKey {
				public:
				ObjectKey(const char *S) : Data(S) {}
				ObjectKey(llvm::StringRef S) : Data(S) {}
				ObjectKey(std::string &&V)
				: Owned(new std::string(std::move(V))), Data(*Owned) {}
				ObjectKey(const std::string &V) : Owned(new std::string(V)), Data(*Owned) {}
				ObjectKey(const llvm::SmallVectorImpl<char> &V)
				: ObjectKey(std::string(V.begin(), V.end())) {}
				ObjectKey(const llvm::formatv_object_base &V) : ObjectKey(V.str()) {}

				ObjectKey(const ObjectKey &C) { *this = C; }
				ObjectKey(ObjectKey &&C) : ObjectKey(static_cast<const ObjectKey &&>(C)) {}
				ObjectKey &operator=(const ObjectKey &C) {
				if (C.Owned) {
				Owned.reset(new std::string(*C.Owned));
				Data = *Owned;
				} else {
				Data = C.Data;
				}
				return *this;
				}
				ObjectKey &operator=(ObjectKey &&) = default;

				operator llvm::StringRef() const { return Data; }

				friend bool operator<(const ObjectKey &L, const ObjectKey &R) {
				return L.Data < R.Data;
				}

				chandlercUnsubmitted Done Reply Inline Actions I would write this as `std::modf(D, &D) == 0.0`. The return is a double, and this way it looks less like you're checking for the absence of an error code and more like actually checking the fractional component is zero. chandlerc: I would write this as `std::modf(D, &D) == 0.0`. The return is a double, and this way it looks…
				// "cheating" move-constructor for moving from initializer_list.
				ObjectKey(const ObjectKey &&V) {
				MeinersburUnsubmitted Done Reply Inline Actions Can you elaborate on why this is needed? AFAIK `std::initializer_list`s are not meant to be moved from. Meinersbur: Can you elaborate on why this is needed? AFAIK `std::initializer_list`s are not meant to be…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I was able to mitigate this, eliminating the const-rvalue-reference constructors. (This optimization isn't important for ObjectKey, and for Object I friended the relevant classes instead.) The reasoning here is roughly: we need some syntax to support many KV pairs for an object, or elements in an array function syntax fails because it doesn't format well in long-list cases. Clang-format does a better job if lists are braced lists, in particular it offers you the ability to force one-per-line with a trailing comma in the list. a variadic constructor fails because this must be templated on the arg type, which means args can't be braced list expressions themselves, as those do not have a deducible type. This would hurt map-like object-literal syntax... so we're left with the `std::initializer_list` constructor as the way to pass variable numbers of arguments, in a way that formats nicely, and allows them to be coerced to a chosen type. However `initializer_list` acts like a container of `const<T>`, which would mean naively json::Value{{{{{1}}}}} would result in a deep copy at every level of nesting. Fortunately the standard spells out enough of how the contents of init-lists are constructed that moving the data out of them seems well-defined. I'm not sure how much of this stuff belongs in the comments here - it's more design doc than user guide. sammccall: I was able to mitigate this, eliminating the const-rvalue-reference constructors. (This…
				MeinersburUnsubmitted Not Done Reply Inline Actions `std::initializer_list` acting like a container of const elements is probably for a reason. I'd prefer no such hacks, but also see that endless copying of elements might justify cheating. cppreference.com mentions that its elements must be copy-initialized, but only since C++17. does anyone else have an opinion on this? Meinersbur: `std::initializer_list` acting like a container of const elements is probably for a reason. I'd…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I would also prefer no such hacks, but I can't see a way to get good syntax without the recursive copy, and without the hacks :( Any ideas? Also interested in hearing more opinions on whether this is too hairy to rely on. That said, I believe this is valid, per all relevant versions of the standard. cppreference.com mentions that its elements must be copy-initialized, but only since C++17. I think I'm missing your point here - can you explain why this is good or bad, and what the implication is? It also seems to be a cppreference mistake, the copy-initialized requirement is older. C++11 says as if the implementation allocated an array of N elements of type E [...] Each element of that array is copy-initialized i.e. it creates a `T[N]`. We could probably even `const_cast`! C++14 says as if the implementation allocated a temporary array of N elements of type const E [...]. Each element of that array is copy-initialized [...] The implementation is free to allocate the array in read-only memory if an explicit array with the same initializer could be so allocated (emphasis mine). So now the type changes to `const T[N]` (so `const_cast` would be invalid) but mutating mutable members of const objects is allowed, so read-only memory can't be used. The C++17 language is more obscure as if the implementation generated and materialized (7.4) a prvalue of type “array of N const E” [...] Each element of that array is copy-initialized but seems to be the same for these purposes. sammccall: I would also prefer no such hacks, but I can't see a way to get good syntax without the…
				MeinersburUnsubmitted Not Done Reply Inline Actions Thanks for looking into the standard (note that cppreference mentions C++14, not C++17 as I claimed). I understood the standard the following way: If the element is copy-initialized, it should call the copy-constructor. In your implementation, it calls `copyFrom`, i.e. it still makes a recursive copy on each level, meaning the problem is not solved (But you save one by not recursively copying again when constructing from initializer_list). However, I looked into the implementation of initializer_list in libc++ and msvc. It is a list of pointers-to-elements, rather than a flat array of objects. That is, no copy-initialization is not happening, it just points to the already existing elements. Not sure whether this is mandated by the standard. Meinersbur: Thanks for looking into the standard (note that cppreference mentions C++14, not C++17 as I…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I understood the standard the following way: If the element is copy-initialized, it should call the copy-constructor Ah, this is just the standard being confusing. copy-initialization is basically unrelated to copy-constructors. Certain syntaxes trigger copy-initialization, and others trigger direct-initialization, which behaves slightly differently (copy-initialization won't call `explicit` constructors). http://en.cppreference.com/w/cpp/language/copy_initialization gives this example, which is relevant here: std::string s2 = std::move(s); // this copy-initialization performs a move sammccall: > I understood the standard the following way: If the element is copy-initialized, it should…
				chandlercUnsubmitted Done Reply Inline Actions I would explicitly convert the RHS of these two to a double to make it clear that this comparison takes place is double precision floating point, not in an integral type. I think that would help the reader out, because otherwise it could look a bit like a tautology. chandlerc: I would explicitly convert the RHS of these two to a double to make it clear that this…
				Owned = std::move(V.Owned);
				Data = V.Data;
				}

				private:
				mutable std::unique_ptr<std::string> Owned; // mutable for cheating.
				llvm::StringRef Data;
				};

				inline bool operator==(const ObjectKey &L, const ObjectKey &R) {
				return llvm::StringRef(L) == llvm::StringRef(R);
				}
				inline bool operator!=(const ObjectKey &L, const ObjectKey &R) {
				return !(L == R);
				}

				struct Object::KV {
				ObjectKey K;
				Value V;
				};

				inline Object::Object(std::initializer_list<KV> Properties) {
				for (const auto &P : Properties)
				emplace(std::move(P.K), std::move(P.V));
				}

				// Standard deserializers.
				inline bool fromJSON(const Value &E, std::string &Out) {
				if (auto S = E.asString()) {
				Out = *S;
				return true;
				}
				return false;
				}
				inline bool fromJSON(const Value &E, int &Out) {
				if (auto S = E.asInteger()) {
				Out = *S;
				return true;
				}
				return false;
				}
				chandlercUnsubmitted Done Reply Inline Actions Sadly, I think this is invalid by the pedantic wording of the spec. Because `as<T>()` binds a reference to the pointer, I think that the pointer has to point to a valid object. And it doesn't yet... I would just inline the `reinterpret_cast` here despite that repetition because I think all other users of `as<T>()` are correct. However, since `as<T>()` is a private implementation detail, you could alternatively just return the pointer. Really up to you. chandlerc: Sadly, I think this is invalid by the pedantic wording of the spec. Because `as<T>()` binds a…
				inline bool fromJSON(const Value &E, double &Out) {
				if (auto S = E.asNumber()) {
				Out = *S;
				return true;
				}
				return false;
				}
				inline bool fromJSON(const Value &E, bool &Out) {
				if (auto S = E.asBoolean()) {
				Out = *S;
				return true;
				}
				return false;
				}
				template <typename T> bool fromJSON(const Value &E, llvm::Optional<T> &Out) {
				if (E.asNull()) {
				Out = llvm::None;
				return true;
				}
				T Result;
				if (!fromJSON(E, Result))
				return false;
				Out = std::move(Result);
				return true;
				}
				template <typename T> bool fromJSON(const Value &E, std::vector<T> &Out) {
				if (auto *A = E.asArray()) {
				Out.clear();
				Out.resize(A->size());
				for (size_t I = 0; I < A->size(); ++I)
				if (!fromJSON((*A)[I], Out[I]))
				return false;
				return true;
				}
				return false;
				}
				template <typename T>
				bool fromJSON(const Value &E, std::map<std::string, T> &Out) {
				if (auto *O = E.asObject()) {
				Out.clear();
				for (const auto &KV : *O)
				if (!fromJSON(KV.second, Out[llvm::StringRef(KV.first)]))
				return false;
				return true;
				}
				return false;
				}

				// Helper for mapping JSON objects onto protocol structs.
				// See file header for example.
				class ObjectMapper {
				public:
				ObjectMapper(const Value &E) : O(E.asObject()) {}

				labathUnsubmitted Done Reply Inline Actions I find it a bit inconsistent when I see a `json::Expr` (uppercase) for the base type and then `json::obj` (lowercase) for the derived ones. The lowercase names don't really follow the naming convention and `ary` is also fairly un-obvious. Could we just call these `Object` and `Array` ? labath: I find it a bit inconsistent when I see a `json::Expr` (uppercase) for the base type and then…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Done. These are now defined as `json::Object` and `json::Array`. Since Value already had enumerator with these names, I un-nested the classes which required reordering some things and defining functions out-of-line. I put them in the cpp file for readability reasons, they could move to the bottom of the header if we care about inlining. While renaming, also changed `json::Expr` -> `json::Value` which seems more accurate. sammccall: Done. These are now defined as `json::Object` and `json::Array`. Since Value already had…
				// True if the expression is an object.
				// Must be checked before calling map().
				operator bool() { return O; }

				// Maps a property to a field, if it exists.
				template <typename T> bool map(const char *Prop, T &Out) {
				labathUnsubmitted Done Reply Inline Actions Could we use `ObjectKey` as the property type here? labath: Could we use `ObjectKey` as the property type here?
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Better, StringRef. (ObjectKey is just a maybe-owning stringref, and non-owning is fine here) sammccall: Better, StringRef. (ObjectKey is just a maybe-owning stringref, and non-owning is fine here)
				assert(*this && "Must check this is an object before calling map()");
				if (const Value *E = O->get(Prop))
				return fromJSON(*E, Out);
				return false;
				chandlercUnsubmitted Done Reply Inline Actions Sink this to be a non-friend operator like operator `==` and `!=`? Or hoist those to be friend operators? I don't have a strong opinion about one pattern or the other, mostly just advocating for consistency whichever way you want to go. chandlerc: Sink this to be a non-friend operator like operator `==` and `!=`? Or hoist those to be friend…
				}

				// Optional requires special handling, because missing keys are OK.
				template <typename T> bool map(const char *Prop, llvm::Optional<T> &Out) {
				chandlercUnsubmitted Done Reply Inline Actions Are there likely to be a lot of these? Might make sense to leave a note for the future that this could be optimized a lot by having a custom `StringRef` like implementation that encodes whether the data is owned w/o spending an extra pointer on it. Clearly not needed in this patch, just a thought for the future if it comes up. chandlerc: Are there likely to be a lot of these? Might make sense to leave a note for the future that…
				assert(*this && "Must check this is an object before calling map()");
				if (const Value *E = O->get(Prop))
				return fromJSON(*E, Out);
				Out = llvm::None;
				return true;
				}

				private:
				const Object *O;
				};

				// Parses the provided JSON source, or returns a ParseError.
				// The returned Value is self-contained and owns its strings (they do not refer
				chandlercUnsubmitted Not Done Reply Inline Actions Consider inlining this above? I actually think it makes the `std::initializer_list<KV>` used in a public API much easier to read if this trivial wrapper is immediately visible. Oh, I guess this is kept out-of-line in order to define `ObjectKey` after `Object`, `Array`, and `Value`? Not sure this ordering is buying that much in terms of readability. Given that you can define `ObjectKey` first, I might just do that and avoid the need to make this out-of-line... Anyways, this is just an optional suggestion, I'm fine with whichever way you end up. chandlerc: Consider inlining this above? I actually think it makes the `std::initializer_list<KV>` used in…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I wish this was just readability :-( The problem is a circular dependency: defining `KV` needs `Value` to be complete, `Value` requires `Object` to be complete (because of `Union`). So the `Object` -> `Value` -> `KV` ordering is necessary, I think. `ObjectKey` could indeed be hoisted higher, but that alone doesn't buy anything, I think. sammccall: I wish this was just readability :-( The problem is a circular dependency: defining `KV` needs…
				// to the original source).
				llvm::Expected<Value> parse(llvm::StringRef JSON);

				class ParseError : public llvm::ErrorInfo<ParseError> {
				const char *Msg;
				unsigned Line, Column, Offset;

				public:
				static char ID;
				ParseError(const char *Msg, unsigned Line, unsigned Column, unsigned Offset)
				: Msg(Msg), Line(Line), Column(Column), Offset(Offset) {}
				void log(llvm::raw_ostream &OS) const override {
				OS << llvm::formatv("[{0}:{1}, byte={2}]: {3}", Line, Column, Offset, Msg);
				}
				chandlercUnsubmitted Not Done Reply Inline Actions Given that these have `JSON` in the name, does it make sense to move them out of the `json` namespace? chandlerc: Given that these have `JSON` in the name, does it make sense to move them out of the `json`…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure. The intent is these are part of a family of functions found via ADL on the second argument. When used in that way, they are always available (because of ADL on the first argument), and they don't particularly "belong "in namespace llvm (they overload for builtin and `std` types). Most of these particular functions are for generic code (like ObjectMapper, or `fromJSON(Value, std::vector)` itself) . e.g. calling `Value.getAsNumber()` is generally nicer than `fromJSON(Value, double)`. So moving these particular overloads into `namespace llvm` seems to give them more prominence compared to the rest of the API than seems warranted. c.f. `llvm::json::parse` vs `llvm::parseJSON`, which is a main entrypoint. sammccall: I'm not sure. The intent is these are part of a family of functions found via ADL on the second…
				std::error_code convertToErrorCode() const override {
				return llvm::inconvertibleErrorCode();
				}
				};
				} // namespace json

				// Allow printing json::Value with formatv().
				// The default style is basic/compact formatting, like operator<<.
				// A format string like formatv("{0:2}", Value) pretty-prints with indent 2.
				template <> struct format_provider<llvm::json::Value> {
				static void format(const llvm::json::Value &, raw_ostream &, StringRef);
				};
				} // namespace llvm

				#endif
				chandlercUnsubmitted Done Reply Inline Actions This is really nice btw. =D chandlerc: This is really nice btw. =D
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions :-) We had plenty of this code in clangd to test the API on. The one thing that's pretty awful is that when parsing fails there's no detailed representation of the error. But worse seems to be better for us so far. sammccall: :-) We had plenty of this code in clangd to test the API on. The one thing that's pretty awful…
				chandlercUnsubmitted Done Reply Inline Actions Would it be a readability improvement to have this be `llvm::parseJSON` rather than `llvm::json::parse`? chandlerc: Would it be a readability improvement to have this be `llvm::parseJSON` rather than `llvm::json…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Hmm, this seems like a wash to me. `parseJSON` is slightly shorter than `json::parse`, while the latter hints (accurately) at a library boundary. Given the other entrypoints (`Value`, `ObjectMapper` etc) are in `namespace json`, I lean towards keeping this there too - among other things, it makes the spelling/capitalization of `json::parse`, `json::Value` etc easier to remember I think. Certainly happy to change this if you have a strong opinion here though. sammccall: Hmm, this seems like a wash to me. `parseJSON` is slightly shorter than `json::parse`, while…

lib/Support/CMakeLists.txt

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMSupport
FormatVariadic.cpp		FormatVariadic.cpp
GlobPattern.cpp		GlobPattern.cpp
GraphWriter.cpp		GraphWriter.cpp
Hashing.cpp		Hashing.cpp
InitLLVM.cpp		InitLLVM.cpp
IntEqClasses.cpp		IntEqClasses.cpp
IntervalMap.cpp		IntervalMap.cpp
JamCRC.cpp		JamCRC.cpp
		JSON.cpp
KnownBits.cpp		KnownBits.cpp
LEB128.cpp		LEB128.cpp
LineIterator.cpp		LineIterator.cpp
Locale.cpp		Locale.cpp
LockFileManager.cpp		LockFileManager.cpp
LowLevelType.cpp		LowLevelType.cpp
ManagedStatic.cpp		ManagedStatic.cpp
MathExtras.cpp		MathExtras.cpp
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

lib/Support/JSON.cpp

This file was added.

				//=== JSON.cpp - JSON value, parsing and serialization - C++ -----------*-===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===---------------------------------------------------------------------===//

				#include "llvm/Support/JSON.h"
				#include "llvm/Support/ConvertUTF.h"
				#include "llvm/Support/Format.h"
				#include <cctype>

				namespace llvm {
				namespace json {

				Value &Object::operator[](const ObjectKey &K) {
				return emplace(K, Value(nullptr)).first->second;
				}
				Value &Object::operator[](ObjectKey &&K) {
				return emplace(std::move(K), Value(nullptr)).first->second;
				}
				Value *Object::get(const ObjectKey &K) {
				auto I = find(K);
				if (I == end())
				return nullptr;
				return &I->second;
				}
				const Value *Object::get(const ObjectKey &K) const {
				auto I = find(K);
				if (I == end())
				return nullptr;
				return &I->second;
				}
				llvm::Optional<std::nullptr_t> Object::getNull(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asNull();
				return llvm::None;
				}
				llvm::Optional<bool> Object::getBoolean(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asBoolean();
				return llvm::None;
				}
				llvm::Optional<double> Object::getNumber(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asNumber();
				return llvm::None;
				}
				llvm::Optional<int64_t> Object::getInteger(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asInteger();
				return llvm::None;
				}
				llvm::Optional<llvm::StringRef> Object::getString(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asString();
				return llvm::None;
				}
				const json::Object *Object::getObject(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asObject();
				return nullptr;
				}
				json::Object *Object::getObject(const ObjectKey &K) {
				if (auto *V = get(K))
				return V->asObject();
				return nullptr;
				}
				const json::Array *Object::getArray(const ObjectKey &K) const {
				if (auto *V = get(K))
				return V->asArray();
				return nullptr;
				}
				json::Array *Object::getArray(const ObjectKey &K) {
				if (auto *V = get(K))
				return V->asArray();
				return nullptr;
				}

				Array::Array(std::initializer_list<Value> Elements) {
				reserve(Elements.size());
				for (const Value &V : Elements)
				emplace_back(std::move(V));
				};
				llvm::Optional<std::nullptr_t> Array::getNull(size_t I) const {
				return (*this)[I].asNull();
				simon_tathamUnsubmitted Done Reply Inline Actions When I tried to build with this change locally, g++ pointed out a spurious semicolon here. simon_tatham: When I tried to build with this change locally, g++ pointed out a spurious semicolon here.
				}
				llvm::Optional<bool> Array::getBoolean(size_t I) const {
				return (*this)[I].asBoolean();
				}
				llvm::Optional<double> Array::getNumber(size_t I) const {
				return (*this)[I].asNumber();
				}
				llvm::Optional<int64_t> Array::getInteger(size_t I) const {
				return (*this)[I].asInteger();
				}
				llvm::Optional<llvm::StringRef> Array::getString(size_t I) const {
				return (*this)[I].asString();
				}
				const Object Array::getObject(size_t I) const { return (this)[I].asObject(); }
				Object Array::getObject(size_t I) { return (this)[I].asObject(); }
				const Array Array::getArray(size_t I) const { return (this)[I].asArray(); }
				Array Array::getArray(size_t I) { return (this)[I].asArray(); }

				Value::Value(std::initializer_list<Value> Elements)
				: Value(json::Array(Elements)) {}

				void Value::copyFrom(const Value &M) {
				Type = M.Type;
				switch (Type) {
				case T_Null:
				case T_Boolean:
				case T_Number:
				memcpy(Union.buffer, M.Union.buffer, sizeof(Union.buffer));
				break;
				case T_StringRef:
				create<StringRef>(M.as<StringRef>());
				break;
				case T_String:
				create<std::string>(M.as<std::string>());
				break;
				case T_Object:
				create<json::Object>(M.as<json::Object>());
				break;
				case T_Array:
				create<json::Array>(M.as<json::Array>());
				break;
				}
				}

				void Value::moveFrom(const Value &&M) {
				Type = M.Type;
				switch (Type) {
				case T_Null:
				case T_Boolean:
				case T_Number:
				memcpy(Union.buffer, M.Union.buffer, sizeof(Union.buffer));
				break;
				case T_StringRef:
				create<StringRef>(M.as<StringRef>());
				break;
				case T_String:
				create<std::string>(std::move(M.as<std::string>()));
				M.Type = T_Null;
				break;
				case T_Object:
				create<json::Object>(std::move(M.as<json::Object>()));
				M.Type = T_Null;
				break;
				case T_Array:
				create<json::Array>(std::move(M.as<json::Array>()));
				M.Type = T_Null;
				break;
				}
				}

				void Value::destroy() {
				switch (Type) {
				case T_Null:
				case T_Boolean:
				case T_Number:
				break;
				case T_StringRef:
				as<StringRef>().~StringRef();
				break;
				case T_String:
				as<std::string>().~basic_string();
				break;
				case T_Object:
				as<json::Object>().~Object();
				break;
				case T_Array:
				as<json::Array>().~Array();
				break;
				}
				}

				bool operator==(const Value &L, const Value &R) {
				if (L.kind() != R.kind())
				return false;
				switch (L.kind()) {
				case Value::Null:
				return L.asNull() == R.asNull();
				case Value::Boolean:
				return L.asBoolean() == R.asBoolean();
				case Value::Number:
				return L.asNumber() == R.asNumber();
				case Value::String:
				return L.asString() == R.asString();
				case Value::Array:
				return L.asArray() == R.asArray();
				case Value::Object:
				return L.asObject() == R.asObject();
				}
				llvm_unreachable("Unknown value kind");
				}

				namespace {
				// Simple recursive-descent JSON parser.
				class Parser {
				public:
				Parser(StringRef JSON)
				: Start(JSON.begin()), P(JSON.begin()), End(JSON.end()) {}

				bool parseValue(Value &Out);

				bool assertEnd() {
				eatWhitespace();
				if (P == End)
				return true;
				return parseError("Text after end of document");
				}

				Error takeError() {
				assert(Err);
				return std::move(*Err);
				}

				private:
				void eatWhitespace() {
				while (P != End && (P == ' ' \|\| P == '\r' \|\| P == '\n' \|\| P == '\t'))
				++P;
				}

				// On invalid syntax, parseX() functions return false and set Err.
				bool parseNumber(char First, double &Out);
				bool parseString(std::string &Out);
				bool parseUnicode(std::string &Out);
				bool parseError(const char *Msg); // always returns false

				char next() { return P == End ? 0 : *P++; }
				char peek() { return P == End ? 0 : *P; }
				static bool isNumber(char C) {
				return C == '0' \|\| C == '1' \|\| C == '2' \|\| C == '3' \|\| C == '4' \|\|
				C == '5' \|\| C == '6' \|\| C == '7' \|\| C == '8' \|\| C == '9' \|\|
				C == 'e' \|\| C == 'E' \|\| C == '+' \|\| C == '-' \|\| C == '.';
				}

				Optional<Error> Err;
				const char Start, P, *End;
				};

				bool Parser::parseValue(Value &Out) {
				eatWhitespace();
				if (P == End)
				return parseError("Unexpected EOF");
				switch (char C = next()) {
				// Bare null/true/false are easy - first char identifies them.
				case 'n':
				Out = nullptr;
				return (next() == 'u' && next() == 'l' && next() == 'l') \|\|
				parseError("Invalid bareword");
				case 't':
				Out = true;
				return (next() == 'r' && next() == 'u' && next() == 'e') \|\|
				parseError("Invalid bareword");
				case 'f':
				Out = false;
				return (next() == 'a' && next() == 'l' && next() == 's' && next() == 'e') \|\|
				parseError("Invalid bareword");
				case '"': {
				std::string S;
				if (parseString(S)) {
				Out = std::move(S);
				return true;
				}
				return false;
				}
				case '[': {
				Out = Array{};
				Array &A = *Out.asArray();
				eatWhitespace();
				if (peek() == ']') {
				++P;
				return true;
				}
				for (;;) {
				A.emplace_back(nullptr);
				if (!parseValue(A.back()))
				return false;
				eatWhitespace();
				switch (next()) {
				case ',':
				eatWhitespace();
				continue;
				case ']':
				return true;
				default:
				return parseError("Expected , or ] after array element");
				}
				}
				}
				case '{': {
				Out = Object{};
				Object &O = *Out.asObject();
				eatWhitespace();
				if (peek() == '}') {
				++P;
				return true;
				}
				for (;;) {
				if (next() != '"')
				return parseError("Expected object key");
				std::string K;
				if (!parseString(K))
				return false;
				eatWhitespace();
				if (next() != ':')
				return parseError("Expected : after object key");
				eatWhitespace();
				if (!parseValue(O[std::move(K)]))
				return false;
				eatWhitespace();
				switch (next()) {
				case ',':
				eatWhitespace();
				continue;
				case '}':
				return true;
				default:
				return parseError("Expected , or } after object property");
				}
				}
				}
				default:
				if (isNumber(C)) {
				MeinersburUnsubmitted Done Reply Inline Actions The error message we get seem to be: If the token starts with an 'e' or 'E', the error we get is "Invalid number". For the letters 'n', 't', or 'f' we get "Invalid bareword". For any other first letter we get "Expected JSON value". I'd hope for more consistent error messages. Meinersbur: The error message we get seem to be: If the token starts with an 'e' or 'E', the error we get…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions There's a tension here between precise errors, consistent/useful errors, and parser complexity. The "invalid number" is a false positive for `elephant` but a true positive for `123,00`. I've renamed these messages to be consistent but with a hint: "Invalid JSON value (number?)", "Invalid JSON value (null?)" etc, and "Invalid JSON value" respectively - WDYT? sammccall: There's a tension here between precise errors, consistent/useful errors, and parser complexity.
				double Num;
				if (parseNumber(C, Num)) {
				Out = Num;
				return true;
				} else {
				return false;
				}
				}
				return parseError("Expected JSON value");
				}
				}

				bool Parser::parseNumber(char First, double &Out) {
				SmallString<24> S;
				S.push_back(First);
				while (isNumber(peek()))
				S.push_back(next());
				char *End;
				Out = std::strtod(S.c_str(), &End);
				return End == S.end() \|\| parseError("Invalid number");
				}

				bool Parser::parseString(std::string &Out) {
				// leading quote was already consumed.
				for (char C = next(); C != '"'; C = next()) {
				if (LLVM_UNLIKELY(P == End))
				return parseError("Unterminated string");
				if (LLVM_UNLIKELY((C & 0x1f) == C))
				return parseError("Control character in string");
				if (LLVM_LIKELY(C != '\\')) {
				Out.push_back(C);
				continue;
				}
				// Handle escape sequence.
				switch (C = next()) {
				case '"':
				case '\\':
				labathUnsubmitted Done Reply Inline Actions Is there any reason `Support/ConvertUTF.h` cannot be used here? Is sounds like you just need the "lenient" conversion mode here. labath: Is there any reason `Support/ConvertUTF.h` cannot be used here? Is sounds like you just need…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Done. That's really slow code with a really inconvenient API, but it shouldn't matter. sammccall: Done. That's really slow code with a really inconvenient API, but it shouldn't matter.
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Hmm, I guess I only thought I ran the tests :-( It turns out `DecodeUTF16ToUTF8` doesn't do the right thing - at least it doesn't do anything particularly compatible with JSON. The problematic cases are when UTF-16 surrogate code units appear without being properly paired. a JSON parser has to accept these, as they conform to the grammar the best behavior per Unicode is to replace them with U+FFFD ConvertUTF in lenient mode simply drops them in most cases (permitted, but not recommended). When encountering a lone leading surrogate at the end of text, it returns an error even in lenient mode. References: http://seriot.ch/parsing_json.php https://www.rfc-editor.org/errata_search.php?rfc=7159&eid=3984 http://unicode.org/review/pr-121.html I've restored the previous code and added more comments, including reasons not to use ConvertUTF. This implements unicode's preferred handling of invalid UTF-16 ("Replace each maximal subpart of the ill-formed subsequence by a single U+FFFD"). sammccall: Hmm, I guess I only thought I ran the tests :-( It turns out `DecodeUTF16ToUTF8` doesn't do…
				case '/':
				Out.push_back(C);
				break;
				case 'b':
				Out.push_back('\b');
				break;
				case 'f':
				Out.push_back('\f');
				break;
				case 'n':
				Out.push_back('\n');
				break;
				case 'r':
				Out.push_back('\r');
				break;
				case 't':
				Out.push_back('\t');
				break;
				case 'u':
				if (!parseUnicode(Out))
				return false;
				break;
				default:
				return parseError("Invalid escape sequence");
				}
				}
				return true;
				}

				// Parses a \uNNNN escape sequence, the \u have already been consumed.
				labathUnsubmitted Done Reply Inline Actions llvm coding standards say we use static for functions instead of anonymous namespaces. Also, the llvm and json namespaces are opened and closed twice, so this may be a good opportunity to merge them. labath: llvm coding standards say we use static for functions instead of anonymous namespaces. Also…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Switched to static and removed the namespace (enum doesn't need to be in it). The format_provider specialization does need to be outside the namespace and there is some order dependency around there, but I'vo avoided opening/closing a lot by qualifying definition names. sammccall: Switched to static and removed the namespace (enum doesn't need to be in it). The…
				// Continues parsing sequential escapes, to ensure proper UTF-16 handling.
				bool Parser::parseUnicode(std::string &Out) {
				// We need all the UTF-16 code units so decoding recovers correctly.
				llvm::SmallVector<UTF16, 2> U16;
				while (true) {
				U16.push_back(0);
				char Bytes[] = {next(), next(), next(), next()};
				for (unsigned char C : Bytes) {
				if (!std::isxdigit(C))
				return parseError("Invalid \\u escape sequence");
				U16.back() <<= 4;
				U16.back() \|= (C > '9') ? (C & ~0x20) - 'A' + 10 : (C - '0');
				}
				// If another \u escape follows, consume the escape and continue.
				if (P + 2 > End \|\| P != '\\' \|\| (P + 1) != 'u')
				break;
				P += 2;
				}
				// Now decode our UTF-16 buffer onto the end of our output string.
				// We must fight the APIs, which want preallocated buffers and unsigned char.
				// We can't use converrtUTF16ToUTF8String() because we need lenient mode.
				unsigned OldSize = Out.size();
				unsigned MaxLen = UNI_MAX_UTF8_BYTES_PER_CODE_POINT * U16.size();
				Out.resize(OldSize + MaxLen);
				UTF8 U8Begin = reinterpret_cast<unsigned char >(&Out[OldSize]);
				UTF8 *U8End = U8Begin;
				const UTF16 *U16In = U16.data();
				if (conversionOK != ConvertUTF16toUTF8(&U16In, U16In + U16.size(), &U8End,
				U8End + MaxLen, lenientConversion))
				llvm_unreachable("UTF-16 to UTF-8 conversion failed");
				Out.resize(OldSize + U8End - U8Begin);
				return true;
				}

				bool Parser::parseError(const char *Msg) {
				int Line = 1;
				const char *StartOfLine = Start;
				for (const char *X = Start; X < P; ++X) {
				if (*X == 0x0A) {
				++Line;
				StartOfLine = X + 1;
				}
				}
				Err.emplace(
				llvm::make_unique<ParseError>(Msg, Line, P - StartOfLine, P - Start));
				return false;
				}
				} // namespace

				Expected<Value> parse(StringRef JSON) {
				Parser P(JSON);
				Value E = nullptr;
				if (P.parseValue(E))
				if (P.assertEnd())
				return std::move(E);
				return P.takeError();
				}
				char ParseError::ID = 0;

				} // namespace json
				} // namespace llvm

				static void quote(llvm::raw_ostream &OS, llvm::StringRef S) {
				OS << '\"';
				for (unsigned char C : S) {
				if (C == 0x22 \|\| C == 0x5C)
				OS << '\\';
				if (C >= 0x20) {
				OS << C;
				continue;
				}
				OS << '\\';
				switch (C) {
				// A few characters are common enough to make short escapes worthwhile.
				case '\t':
				OS << 't';
				break;
				case '\n':
				OS << 'n';
				break;
				case '\r':
				OS << 'r';
				break;
				default:
				OS << 'u';
				llvm::write_hex(OS, C, llvm::HexPrintStyle::Lower, 4);
				break;
				}
				}
				OS << '\"';
				}

				enum IndenterAction {
				Indent,
				Outdent,
				Newline,
				Space,
				};

				// Prints JSON. The indenter can be used to control formatting.
				template <typename Indenter>
				void llvm::json::Value::print(raw_ostream &OS, const Indenter &I) const {
				switch (Type) {
				case T_Null:
				OS << "null";
				break;
				case T_Boolean:
				OS << (as<bool>() ? "true" : "false");
				break;
				case T_Number:
				OS << format("%g", as<double>());
				break;
				case T_StringRef:
				quote(OS, as<StringRef>());
				break;
				case T_String:
				quote(OS, as<std::string>());
				break;
				case T_Object: {
				bool Comma = false;
				OS << '{';
				I(Indent);
				for (const auto &P : as<json::Object>()) {
				if (Comma)
				OS << ',';
				Comma = true;
				I(Newline);
				quote(OS, P.first);
				OS << ':';
				I(Space);
				P.second.print(OS, I);
				}
				I(Outdent);
				if (Comma)
				I(Newline);
				OS << '}';
				break;
				}
				case T_Array: {
				bool Comma = false;
				OS << '[';
				I(Indent);
				for (const auto &E : as<json::Array>()) {
				if (Comma)
				OS << ',';
				Comma = true;
				I(Newline);
				E.print(OS, I);
				}
				I(Outdent);
				if (Comma)
				I(Newline);
				OS << ']';
				break;
				}
				}
				}

				void llvm::format_provider<llvm::json::Value>::format(
				const llvm::json::Value &E, raw_ostream &OS, StringRef Options) {
				if (Options.empty()) {
				OS << E;
				return;
				}
				unsigned IndentAmount = 0;
				if (Options.getAsInteger(/Radix=/10, IndentAmount))
				simon_tathamUnsubmitted Not Done Reply Inline Actions Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with. Also, over in D46054 I'm working on a JSON back end for TableGen, for which I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python `json.load`) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double. So, would it be possible to have some method of constructing a json::Value that formats as a 64-bit integer literal? simon_tatham: Could this number formatting be changed? The default %g loses precision – you don't even get…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with. Yes, though is there an existing aware of round-trip safe double formatter in llvm? I suspect this only actually matters when the values are integers, so we should consider your second suggestion first :-) I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python json.load) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double. What about this design: Internally, a numeric value can be an integer or a double. i.e we split the internal `ValueType` `T_Number` into two, `T_Integer` and `T_Double`. public `Kind` remains unchanged. when constructing, you get one or the other depending on the static type when parsing, you get integer unless it has a nonzero decimal part or is out-of-range. `asDouble()` always succeeds `asInteger()` succeeds if the underlying value is integer or if it's a double that can be exactly represented as `int64_t` (same as now) when serializing, you get `%g` for double and the usual representation for integers Open questions: is 1.2e3 a double or an integer? I kind of want the former, which complicates our heuristic. `int64_t` leaves anyone who wants `uint64_t` out in the cold. But adding more options for types is going to lead to madness. Can we live with this limitation? If this sounds good I can start on the changes, but I'd like to defer adding new features to another patch if that's OK. This one is largely moving mostly-battleworn code from clangd, and new features need closer review of the implementation. sammccall: > Could this number formatting be changed? The default %g loses precision – you don't even get…
				simon_tathamUnsubmitted Not Done Reply Inline Actions That design certainly sounds as if it would do what I need, and a great deal more besides. Another option that would be fine for me personally would be to have a means of constructing a `ValueType` called, say, `T_Custom`, which internally holds a string value, and serializes as exactly that string, unquoted. I could imagine that being used for other unusual purposes as well, such as controlling which of `\uXXXX` and UTF-8 was used to represent a non-ASCII character in a string literal. (And that possibility is simple enough that I could add it myself as part of my patch.) simon_tatham: That design certainly sounds as if it would do what I need, and a great deal more besides.
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions D46209 adds int64 support, and fixes use of %g to retain full precision. `T_Custom` is a cool idea and I definitely don't want to rule it out, but large integers in particular seems like something common that should "just work" if possible. (It's also unclear how a `T_Custom` could solve the problem on the parse side, which is nice to have) sammccall: D46209 adds int64 support, and fixes use of %g to retain full precision. `T_Custom` is a cool…
				assert(false && "json::Value format options should be an integer");
				unsigned IndentLevel = 0;
				E.print(OS, [&](IndenterAction A) {
				switch (A) {
				case Newline:
				OS << '\n';
				OS.indent(IndentLevel);
				break;
				case Space:
				OS << ' ';
				break;
				case Indent:
				IndentLevel += IndentAmount;
				break;
				case Outdent:
				IndentLevel -= IndentAmount;
				break;
				};
				});
				}

				llvm::raw_ostream &llvm::json::operator<<(raw_ostream &OS, const Value &E) {
				E.print(OS, [](IndenterAction A) { /ignore/ });
				return OS;
				}
				MeinersburUnsubmitted Done Reply Inline Actions Did you consider using `llvm_unreachable`? Meinersbur: Did you consider using `llvm_unreachable`?

unittests/Support/CMakeLists.txt

Show All 24 Lines	add_llvm_unittest(SupportTests
EndianTest.cpp		EndianTest.cpp
ErrnoTest.cpp		ErrnoTest.cpp
ErrorOrTest.cpp		ErrorOrTest.cpp
ErrorTest.cpp		ErrorTest.cpp
FileOutputBufferTest.cpp		FileOutputBufferTest.cpp
FormatVariadicTest.cpp		FormatVariadicTest.cpp
GlobPatternTest.cpp		GlobPatternTest.cpp
Host.cpp		Host.cpp
		JSONTest.cpp
LEB128Test.cpp		LEB128Test.cpp
LineIteratorTest.cpp		LineIteratorTest.cpp
LockFileManagerTest.cpp		LockFileManagerTest.cpp
MD5Test.cpp		MD5Test.cpp
ManagedStatic.cpp		ManagedStatic.cpp
MathExtrasTest.cpp		MathExtrasTest.cpp
MemoryBufferTest.cpp		MemoryBufferTest.cpp
MemoryTest.cpp		MemoryTest.cpp
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

unittests/Support/JSONTest.cpp

This file was added.

				//===-- JSONTest.cpp - JSON unit tests --------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/JSON.h"

				#include "gmock/gmock.h"
				#include "gtest/gtest.h"

				namespace llvm {
				namespace json {

				namespace {

				std::string s(const Value &E) { return llvm::formatv("{0}", E).str(); }
				std::string sp(const Value &E) { return llvm::formatv("{0:2}", E).str(); }

				TEST(JSONTest, Types) {
				EXPECT_EQ("true", s(true));
				EXPECT_EQ("null", s(nullptr));
				EXPECT_EQ("2.5", s(2.5));
				EXPECT_EQ(R"("foo")", s("foo"));
				EXPECT_EQ("[1,2,3]", s({1, 2, 3}));
				EXPECT_EQ(R"({"x":10,"y":20})", s(Object{{"x", 10}, {"y", 20}}));
				}

				TEST(JSONTest, Constructors) {
				// Lots of edge cases around empty and singleton init lists.
				EXPECT_EQ("[[[3]]]", s({{{3}}}));
				EXPECT_EQ("[[[]]]", s({{{}}}));
				EXPECT_EQ("[[{}]]", s({{Object{}}}));
				EXPECT_EQ(R"({"A":{"B":{}}})", s(Object{{"A", Object{{"B", Object{}}}}}));
				EXPECT_EQ(R"({"A":{"B":{"X":"Y"}}})",
				s(Object{{"A", Object{{"B", Object{{"X", "Y"}}}}}}));
				}

				TEST(JSONTest, StringOwnership) {
				char X[] = "Hello";
				Value Alias = static_cast<const char *>(X);
				X[1] = 'a';
				EXPECT_EQ(R"("Hallo")", s(Alias));

				std::string Y = "Hello";
				Value Copy = Y;
				Y[1] = 'a';
				EXPECT_EQ(R"("Hello")", s(Copy));
				}

				TEST(JSONTest, CanonicalOutput) {
				// Objects are sorted (but arrays aren't)!
				EXPECT_EQ(R"({"a":1,"b":2,"c":3})", s(Object{{"a", 1}, {"c", 3}, {"b", 2}}));
				EXPECT_EQ(R"(["a","c","b"])", s({"a", "c", "b"}));
				EXPECT_EQ("3", s(3.0));
				}

				TEST(JSONTest, Escaping) {
				std::string test = {
				0, // Strings may contain nulls.
				'\b', '\f', // Have mnemonics, but we escape numerically.
				'\r', '\n', '\t', // Escaped with mnemonics.
				'S', '\"', '\\', // Printable ASCII characters.
				'\x7f', // Delete is not escaped.
				'\xce', '\x94', // Non-ASCII UTF-8 is not escaped.
				};

				std::string teststring = R"("\u0000\u0008\u000c\r\n\tS\"\\)"
				"\x7f\xCE\x94\"";

				EXPECT_EQ(teststring, s(test));

				EXPECT_EQ(R"({"object keys are\nescaped":true})",
				s(Object{{"object keys are\nescaped", true}}));
				}

				TEST(JSONTest, PrettyPrinting) {
				const char str[] = R"({
				"empty_array": [],
				"empty_object": {},
				"full_array": [
				1,
				null
				],
				"full_object": {
				"nested_array": [
				{
				"property": "value"
				}
				]
				}
				})";

				EXPECT_EQ(str, sp(Object{
				{"empty_object", Object{}},
				{"empty_array", {}},
				{"full_array", {1, nullptr}},
				{"full_object",
				Object{
				{"nested_array",
				{Object{
				{"property", "value"},
				}}},
				}},
				}));
				}

				TEST(JSONTest, Parse) {
				auto Compare = [](llvm::StringRef S, Value Expected) {
				if (auto E = parse(S)) {
				// Compare both string forms and with operator==, in case we have bugs.
				EXPECT_EQ(*E, Expected);
				EXPECT_EQ(sp(*E), sp(Expected));
				} else {
				handleAllErrors(E.takeError(), [S](const llvm::ErrorInfoBase &E) {
				FAIL() << "Failed to parse JSON >>> " << S << " <<<: " << E.message();
				});
				}
				};

				Compare(R"(true)", true);
				Compare(R"(false)", false);
				Compare(R"(null)", nullptr);

				Compare(R"(42)", 42);
				Compare(R"(2.5)", 2.5);
				Compare(R"(2e50)", 2e50);
				Compare(R"(1.2e3456789)", std::numeric_limits<double>::infinity());

				Compare(R"("foo")", "foo");
				Compare(R"("\"\\\b\f\n\r\t")", "\"\\\b\f\n\r\t");
				Compare(R"("\u0000")", llvm::StringRef("\0", 1));
				Compare("\"\x7f\"", "\x7f");
				Compare(R"("\ud801\udc37")", u8"\U00010437"); // UTF16 surrogate pair escape.
				Compare("\"\xE2\x82\xAC\xF0\x9D\x84\x9E\"", u8"\u20ac\U0001d11e"); // UTF8
				Compare(
				R"("LoneLeading=\ud801, LoneTrailing=\udc01, LeadingLeadingTrailing=\ud801\ud801\udc37")",
				u8"LoneLeading=\ufffd, LoneTrailing=\ufffd, "
				u8"LeadingLeadingTrailing=\ufffd\U00010437"); // Invalid unicode.

				Compare(R"({"":0,"":0})", Object{{"", 0}});
				Compare(R"({"obj":{},"arr":[]})", Object{{"obj", Object{}}, {"arr", {}}});
				Compare(R"({"\n":{"\u0000":[[[[]]]]}})",
				Object{{"\n", Object{
				{llvm::StringRef("\0", 1), {{{{}}}}},
				}}});
				Compare("\r[\n\t] ", {});
				}

				TEST(JSONTest, ParseErrors) {
				auto ExpectErr = [](llvm::StringRef Msg, llvm::StringRef S) {
				if (auto E = parse(S)) {
				// Compare both string forms and with operator==, in case we have bugs.
				FAIL() << "Parsed JSON >>> " << S << " <<< but wanted error: " << Msg;
				} else {
				handleAllErrors(E.takeError(), [S, Msg](const llvm::ErrorInfoBase &E) {
				EXPECT_THAT(E.message(), testing::HasSubstr(Msg)) << S;
				});
				}
				};
				ExpectErr("Unexpected EOF", "");
				ExpectErr("Unexpected EOF", "[");
				ExpectErr("Text after end of document", "[][]");
				ExpectErr("Invalid bareword", "fuzzy");
				ExpectErr("Expected , or ]", "[2?]");
				ExpectErr("Expected object key", "{a:2}");
				ExpectErr("Expected : after object key", R"({"a",2})");
				ExpectErr("Expected , or } after object property", R"({"a":2 "b":3})");
				ExpectErr("Expected JSON value", R"([&%!])");
				ExpectErr("Invalid number", "1e1.0");
				ExpectErr("Unterminated string", R"("abc\"def)");
				ExpectErr("Control character in string", "\"abc\ndef\"");
				ExpectErr("Invalid escape sequence", R"("\030")");
				ExpectErr("Invalid \\u escape sequence", R"("\usuck")");
				ExpectErr("[3:3, byte=19]", R"({
				"valid": 1,
				invalid: 2
				})");
				}

				TEST(JSONTest, Inspection) {
				llvm::Expected<Value> Doc = parse(R"(
				{
				"null": null,
				"boolean": false,
				"number": 2.78,
				"string": "json",
				"array": [null, true, 3.14, "hello", [1,2,3], {"time": "arrow"}],
				"object": {"fruit": "banana"}
				}
				)");
				EXPECT_TRUE(!!Doc);

				Object *O = Doc->asObject();
				ASSERT_TRUE(O);

				EXPECT_FALSE(O->getNull("missing"));
				EXPECT_FALSE(O->getNull("boolean"));
				EXPECT_TRUE(O->getNull("null"));

				EXPECT_EQ(O->getNumber("number"), llvm::Optional<double>(2.78));
				EXPECT_FALSE(O->getInteger("number"));
				EXPECT_EQ(O->getString("string"), llvm::Optional<llvm::StringRef>("json"));
				ASSERT_FALSE(O->getObject("missing"));
				ASSERT_FALSE(O->getObject("array"));
				ASSERT_TRUE(O->getObject("object"));
				EXPECT_EQ(*O->getObject("object"), (Object{{"fruit", "banana"}}));

				Array *A = O->getArray("array");
				ASSERT_TRUE(A);
				EXPECT_EQ(A->getBoolean(1), llvm::Optional<bool>(true));
				ASSERT_TRUE(A->getArray(4));
				EXPECT_EQ(*A->getArray(4), (Array{1, 2, 3}));
				EXPECT_EQ(A->getArray(4)->getInteger(1), llvm::Optional<int64_t>(2));
				int I = 0;
				for (Value &E : *A) {
				if (I++ == 5) {
				ASSERT_TRUE(E.asObject());
				EXPECT_EQ(E.asObject()->getString("time"),
				llvm::Optional<llvm::StringRef>("arrow"));
				} else
				EXPECT_FALSE(E.asObject());
				}
				}

				// Sample struct with typical JSON-mapping rules.
				struct CustomStruct {
				CustomStruct() : B(false) {}
				CustomStruct(std::string S, llvm::Optional<int> I, bool B)
				: S(S), I(I), B(B) {}
				std::string S;
				llvm::Optional<int> I;
				bool B;
				};
				inline bool operator==(const CustomStruct &L, const CustomStruct &R) {
				return L.S == R.S && L.I == R.I && L.B == R.B;
				}
				inline llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
				const CustomStruct &S) {
				return OS << "(" << S.S << ", " << (S.I ? std::to_string(*S.I) : "None")
				<< ", " << S.B << ")";
				}
				bool fromJSON(const Value &E, CustomStruct &R) {
				ObjectMapper O(E);
				if (!O \|\| !O.map("str", R.S) \|\| !O.map("int", R.I))
				return false;
				O.map("bool", R.B);
				return true;
				}

				TEST(JSONTest, Deserialize) {
				std::map<std::string, std::vector<CustomStruct>> R;
				CustomStruct ExpectedStruct = {"foo", 42, true};
				std::map<std::string, std::vector<CustomStruct>> Expected;
				Value J = Object{
				{"foo",
				Array{
				Object{
				{"str", "foo"},
				{"int", 42},
				{"bool", true},
				{"unknown", "ignored"},
				},
				Object{{"str", "bar"}},
				Object{
				{"str", "baz"}, {"bool", "string"}, // OK, deserialize ignores.
				},
				}}};
				Expected["foo"] = {
				CustomStruct("foo", 42, true),
				CustomStruct("bar", llvm::None, false),
				CustomStruct("baz", llvm::None, false),
				};
				ASSERT_TRUE(fromJSON(J, R));
				EXPECT_EQ(R, Expected);

				CustomStruct V;
				EXPECT_FALSE(fromJSON(nullptr, V)) << "Not an object " << V;
				EXPECT_FALSE(fromJSON(Object{}, V)) << "Missing required field " << V;
				EXPECT_FALSE(fromJSON(Object{{"str", 1}}, V)) << "Wrong type " << V;
				// Optional<T> must parse as the correct type if present.
				EXPECT_FALSE(fromJSON(Object{{"str", 1}, {"int", "string"}}, V))
				<< "Wrong type for Optional<T> " << V;
				}

				} // namespace
				} // namespace json
				} // namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

Lift JSON library from clang-tools-extra/clangd to llvm/Support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143338

include/llvm/Support/JSON.h

lib/Support/CMakeLists.txt

lib/Support/JSON.cpp

unittests/Support/CMakeLists.txt

unittests/Support/JSONTest.cpp

Lift JSON library from clang-tools-extra/clangd to llvm/Support.
ClosedPublic