In D85705 I found the need for such accessors in llvm::json and I
created them in that diff. I'm moving those changes to this new diff.
The idea of these accessors is that they either return the requested JSON value
or a error string with a descriptive message useful for the user to fix
it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is a sensible thing to want and very nice code.
I'm not sure about putting it in JSON.h - there are a few unfortunate aspects that I just don't know how to fix, but limit the benefit which has to be measured against the cost of adding more complexity to the API.
- the error messages aren't *quite* good enough to be directly user-visible in the general case. e.g. "JSON object missing the 'id' field" - some hint to which (sub)object is often needed. U
- because the getAsX() method names are taken, the new accessors are stuck with awkward names
- Expected<pointer> (for object/array) is an awkward type, because -> doesn't really work and the meaning/impossibility of nullptr isn't obvious.
- this is useful when mapping objects with many fields, but it's not clear how to extend it to ObjectMapper or fromJSON (which maybe themselves don't have the best design, but...)
Argh, I hit enter too soon. One thing I'd planned to add was I'm sorry about the delay in reviewing this, too.
I think the first concern is the most important - providing an error-bearing API whose error messages aren't actually good enough for a lot of purposes is a bit of a hazard.
To give good errors you really need to pass around some context, but this cuts down on the simplicity of the API.
Happy to discuss this further if we want to try to come up with a general solution (I know clangd currently reports no errors other than "this is invalid", so I have some interest!)
Otherwise it might be best to carry these as non-member functions in the local project that can make the tradeoffs around APIs. Since they're implemented entirely on top of the public API this seems manageable, just a little ugly :-\
Thanks for the review. I'll implement this as helper functions in my patch, however it'd be interesting to figure out a way to create this API so that callers don't recreate this functionality. My general understanding is that showing some error message, even if incomplete, is better than showing almost nothing. But I can understand your point.
Right now all users of this code end up creating little extra functions that must work around this. lldb-vscode has it's own methods, the new IntelPT will do its own thing.
I think these APIs make sense for the API itself as a user of the API from my lldb-vscode work. Can we iterate on this and come up with a solution?
I believe they are good enough, they say 'id' is missing, but then print the full JSON object that they are missing from right?
- because the getAsX() method names are taken, the new accessors are stuck with awkward names
See my "expect" suggestion in inline comments?
- Expected<pointer> (for object/array) is an awkward type, because -> doesn't really work and the meaning/impossibility of nullptr isn't obvious.
Since this we are using Expected can we change the value to be "Object&" or "Array &" instead of a pointer?
- this is useful when mapping objects with many fields, but it's not clear how to extend it to ObjectMapper or fromJSON (which maybe themselves don't have the best design, but...)
I think the first concern is the most important - providing an error-bearing API whose error messages aren't actually good enough for a lot of purposes is a bit of a hazard.
To give good errors you really need to pass around some context, but this cuts down on the simplicity of the API.
Most of these errors are when we are trying to extract a key/value pair from an Object. So I don't really think we need to pass around more context? to make things make sense.
Happy to discuss this further if we want to try to come up with a general solution (I know clangd currently reports no errors other than "this is invalid", so I have some interest!)
Otherwise it might be best to carry these as non-member functions in the local project that can make the tradeoffs around APIs. Since they're implemented entirely on top of the public API this seems manageable, just a little ugly :-\
I would like to see more consistent use of this API with common errors instead of everyone make up their own error messages.
llvm/include/llvm/Support/JSON.h | ||
---|---|---|
152–158 | Maybe instead of these start with "get" and ending with "OrError", we could just name these "expect...". The getOptionalStringOrError doesn't map well to this though | |
459–462 | switch over to "expect" as mentioned above? |
TL;DR:
- I agree we should solve this, thanks for pushing back
- I think the path-to-error (e.g. processes[0].triple) is critical in some cases
- this really seems like an marshaling concern rather than something for Value/Object
- happy to take a stab at a design but I think it'll need some prototyping
I agree there's value here, let's try to build something really nice :-) One advantage of the current situation is it's clear that error-reporting is on the caller, so they have to deal with the question of whether it's good enough. If we provide an opinionated option then I think it's important it's good enough for most use cases.
One disadvantage of the current situation is you have to avoid fromJSON/ObjectMapper in favor of the caller inventing something much more verbose in order to get error messages - this seems like a comparable-sized problem to me.
I believe they are good enough, they say 'id' is missing, but then print the full JSON object that they are missing from right?
I have two concerns about this approach:
- the full JSON object may be huge (e.g. missing key in an object where sibling keys may be giant objects)
- the object where the error is found is not unique enough to be clear ("missing key 'id' in value '{}'" in a large object). Wrapping errors could help, but has its own problems.
I think the most important information we should be aiming to capture:
- what the root object is, in the context of the application (caller must supply this of course)
- the path from the root to the local error
- some description of the local error
The second is not present here, and requires a different API to achieve. llvm::Expected is good for *exposing* errors, but both awkward and inefficient for assembling complex errors (especially in cases where failure to parse isn't a real error). So I'd expect conversion of some error description to an llvm::Error to happen at the end of parsing.
- because the getAsX() method names are taken, the new accessors are stuck with awkward names
See my "expect" suggestion in inline comments?
Nice! I like expectString() etc a lot better. I think the answer for getOptionalStringOrError() is to drop that function - it seems fairly nonorthogonal and niche.
While IMO this solves the name problem, it would still be nice if we don't need these on the core data structure classes but rather treated this as a concern of marshaling functions, where it tends to come up. TraceSettingsParser seems to follow this pattern - obviously lack of error handling makes ObjectMapper unsuitable, but do you think it would otherwise work?
- Expected<pointer> (for object/array) is an awkward type, because -> doesn't really work and the meaning/impossibility of nullptr isn't obvious.
Since this we are using Expected can we change the value to be "Object&" or "Array &" instead of a pointer?
Indeed! I thought Expected<T&> wasn't allowed, but looks like I was hallucinating. (and I have some project code to go clean up!)
Most of these errors are when we are trying to extract a key/value pair from an Object. So I don't really think we need to pass around more context? to make things make sense.
(I think I covered this above and don't want to bore with repetition, but can go into more examples and stuff if that's interesting)
I have some ideas about this, but this comment is long enough. I'll dump some in the next one and then try to make time to prototype.
A rough design sketch:
When parsing an object at the top level, we create an error context object. This can store e.g. the name of the schema for the top-level, as well as details of the created error. This is fixed-size with no allocations unless an actual error occurs.
While parsing a subobject, the JSON path is stored as a reverse linked list of stack objects, again with no other allocations until an error happens. (Actual string data is either constant or in the JSON object itself, I think)
When a parse-object-function wants to parse a subobject field "foo", it passes Path.derive("foo") to the corresponding parse function.
When an error is reported, the parsing function walks up the chain of path elements to find the root error context:
- storing path elements in the root (which has a vector<Elt> so) to preserve them once destroyed
- storing error semantics in the root as well (could include enum like "wrong type", custom error messages, pointer to relevant JSON data, etc)
Then the error context provides an API to check success, assemble a llvm::Error or Expected etc.
I've fleshed this out in D88103.
As noted there it could be split into several patches, but I'm pretty happy with the ability there to show an error in context without producing an overwhelming amount of information.
(And without paying for rendering everything upfront when it may be unused or even a nonfatal error at the top level).
Interested whether this is something you could use?
Maybe instead of these start with "get" and ending with "OrError", we could just name these "expect...". The getOptionalStringOrError doesn't map well to this though