This is an archive of the discontinued LLVM Phabricator instance.

Add JSONScopedPrinter class

Authored by Jaysonyan on Nov 18 2021, 10:21 PM.



This change adds a JSONScopedPrinter as a subclass to ScopedPrinter.

This change is split off from D114052 and builds off of D114223.

Depends on D114684

Diff Detail

Event Timeline

Jaysonyan created this revision.Nov 18 2021, 10:21 PM
Jaysonyan requested review of this revision.Nov 18 2021, 10:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2021, 10:21 PM
Jaysonyan retitled this revision from Add `JSONScopedPrinter` to Add JSONScopedPrinter class.Nov 18 2021, 10:21 PM

I think JSONScopedPrinter (and possibly ScopedPrinter too, if it doesn't already have them) could do with gtest unit tests. It should be fairly straightforward to write them, and would help give confidence in the methods, especially as we may not use all of them up-front. The tests would also help with documenting what the code actually does, which will aid in future reviewing.

Will do! Do we want to test each method in isolation? or possibly in conjunction with each other to test things like the history stack? or possibly both?

Cross-posting this comment, since it's now applicable to this patch specifically: I think we probably need to do a bit of both: each method should be tested in isolation. I think we should ensure we have tests for the history stack, where it's applicable only. Basically, test the existing possible code paths, plus maybe one or two more interesting cases that test interaction between functions, if it seems sensible.



  1. I think we are normally explicit about public/private, so I'd add the explicit private: here, even though it's not strictly required.
  2. I don't think we normally have blank lines at the start of class definitions.

I'd probably add some blank lines between these things.


I believe the explicit llvm is unnecessary.


How come we need to stringify the value here? That seems surprising to me, and may lead to undesirable behaviour in my mind (e.g. a field for an ELF struct might be a number for the 32-bit ELF format, but a string in the 64-bit ELF format (and not all numbers will be strings in the latter-case, where they are not 64-bits)).

I have this vague memory that the LLVM JSON format doesn't support 64-bit numbers. If so, I think this is a mistake and should be fixed.


Similar point to the above, but slightly different: I think we should compare this to std::numeric_limits<uint64_t>::max() and print as a number, if possible, rather than as a string.


Similar to my comments in D114223, I'd recomend adding helper functions to avoid some of the logic duplication in the following methods.


I was thinking about this, and I think we shouldn't actually print as hex in JSON, and should fallback to the standard number printing. Here's the reasoning: most of the time, we use printHex to print some number that represents an offset, or other value in a fixed-sized field, that is easier to read as hex. However, JSON format is not intended for human consumption, at least not in our use-case for this code. As such, readability is of lesser concern than parseability. If we were to store hex numbers as strings, rather than converting them to their decimal counterparts, we'd end up with some numeric values as strings, as others as integers, in a seemingly arbitrary manner to the end-user. I don't think that this is desirable.


I would actually print this as two separate attributes: a symbol name, and a numeric offset. That'll be easier for consumers to parse. Perhaps these should be in a separate object, to avoid surprise name clashes. Rather than:






You should check through the existing usages, but at least in llvm-readelf, I think most labels use UpperCameCase style, so printing as MyValueRaw would look a little cleaner. Also, see my above comments re. symbol offset about a nested object/naming clash risk, i.e. I'd recommend:


Why's this being left as a todo?

Add printBinaryImpl implementation.
Remove duplicated logic for json scope logic.
Update printHex to print numbers.
Clean up formatting.

Jaysonyan marked 9 inline comments as done.Nov 21 2021, 11:02 PM

Cross-posting this comment, since it's now applicable to this patch specifically: I think we probably need to do a bit of both: each method should be tested in isolation. I think we should ensure we have tests for the history stack, where it's applicable only. Basically, test the existing possible code paths, plus maybe one or two more interesting cases that test interaction between functions, if it seems sensible.

That seems reasonable to me! Will update with these tests soon.


I had believed LLVM JSON originally didn't support uint64_t types but since then I see that support was added here: so I think I can just remove the to_string(...)


I updated this to just print the full APSInt unconditionally. Would it be preferable to cap printing numbers at std::numeric_limits<uint64_t>::max() and then fallback on strings otherwise? Since json doesn't have a max numerical value but arbitrarily long values may not always be supported.


That seems reasonable to me! Updated to output numeric values over hex strings.


My mistake, this was meant to be implemented before being put up for review.

jhenderson added inline comments.Nov 22 2021, 1:26 AM

Let's leave it without the cap, if we have a way of sensibly handling it. I think changing the type based on the value could get very confusing, and since as you say JSON has no firm limitations, printing out arbitrarily large numbers seems sensible.

An alternative (and probably better) approach would be to add support for this type to the JSON library directly (in a precursor patch).


Here and in the other loop below, I'd avoid the use of auto: it doesn't improve readability, and we tend to avoid the "always use auto" approach in LLVM, when the type isn't obvious from that line.


Perhaps worth a comment here elaborating on the reasoning, so that future users understand why.

Jaysonyan marked 3 inline comments as done.

Add JSONScopedPrinter tests.
Update some ScopedPrinter methods.
Fix some formatting.

Jaysonyan marked 4 inline comments as done.Nov 23 2021, 11:51 PM
Jaysonyan added inline comments.

I've added a comment at the top of the first method which outputs Hex. I wasn't sure if it was desirable to add a comment for every location that outputs Hex or not. Let me know if there's a preference of one over the other.

I've only skimmed the new tests, but as there are some structural issues there, I suggest you address those and then I'll confirm the test coverage and logic looks good.


Seems like the change in this and the printObject function below should be part of the ScopedPrinter virtualisation patch?


This check seems to be a behaviour change? Seems like we should keep that out of this patch, to keep this patch purely NFC for the regular ScopedPrinter


If I'm not mistaken, this is going to lead to an array of mixed types (some strings and some integers). I'm not convinced that's desirable, since it will mean needing to switch on type to determine how to handle the field. Perhaps we should do what we do elsewhere and have an array of objects, where the object has name (possibly optional) and value fields?

Alternatively, maybe the actual name isn't even needed, and the interesting thing is just the individual flag values. I guess that depends on how human readable we want this JSON output to be.


Same below.


I'm confused by this additional logic. It seems to me you're passing in a list of strings, but those strings might actually be numbers? That doesn't seem right to me with the calling code and/or our interface, rather than this function. For example, imagine our list happens to be a list of strings that are purely made up of digits, but the fact that they are digits is purely chance, and they should remain as strings. This function will convert to a list of integers. Further, a future new element in the same list without an all-digit input would cause the type of ALL the members of the list to change back to strings.


I'm inclined to say we should have the comment by each function. Perhaps better though, at the cost of a little more code than would otherwise be necessary, would be to have a small helper function hexNumberToInt or similar, which is then called in all places you currently do HexNumber.Value. You could then put the comment with that function, in one place.


Slightly surprised to see the new if here and a little further below. Could you explain them, please? Are you sure it's better to have attributes missing rather than as empty strings (I'm not entirely sure either way, and it probably depends on the context to some extent, but I'd focus on ease of parsing)?

Aside: reminder that for single line if and loop statements, remove the braces, i.e. here and in the for loop below.


I wouldn't bother with the anonymous namespace. There'd only be a problem if someone else started writing ScopedPrinterTests in another file, which I think we'd want to spot and possibly stop.


For new unit tests that are not to do with the JSONScopedPrinter class, I'd put them in their own patch, as they are independently useful.


This string isn't really scoped itself. It's more the buffer used by the stream, so I'd just call it Buffer or StreamBuffer.

jhenderson added inline comments.Nov 25 2021, 1:02 AM

What's the point of the lambda? It seems like it's adding unnecessary complexity to the test, when you could just do:

std::string ScopedString;
llvm::raw_string_ostream OS(ScopedString);
ScopedPrinter Writer(OS);

// <current body of the lambda>

const char *Out = /*...*/;
EXPECT_EQ(Out, ScopedString);

This setup logic might benefit from being pulled into a test fixture, to reduce the duplication between tests.

The JSON stuff could go in the same fixture.


Looking at this test seesm to make it clear to me that this is not hte optimal format of this output. I think you probably want it to be an array of numbers, rather than an object, with individual bytes labelled.

Jaysonyan updated this revision to Diff 390248.Nov 28 2021, 9:12 PM
Jaysonyan marked 9 inline comments as done.

Add JSONScopedPrinter specific tests.
Update printBinaryImpl structure to include index.
Add hexNumberToInt method.
Revert JSONScopedPrinter::printListImpl to only print strings.
Add printRawFlagsImpl method to JSONScopedPrinter.


Moved to virtualization patch!


This was meant to accommodate the change I made to the making the extra printFlags method virtual. I was hoping both public printFlags(...) methods could utilize the same printFlagsImpl method but you're right this could be a change in behaviour. I've updated the virtualisation patch to keep this implementation the same and added an explicit printRawFlagsImpl method.


This was an attempt similar to the change in the ScopedPrinter implementation of printFlagsImpl to handle 2 types of public printFlags methods. I've added a printRawFlagsImpl to handle printing Flag.Value and updated this method to printing both Flag.Name and Flag.Value. This seems to best match the information that the ScopedPrinter implementation provides.


Yea this makes sense. The problem I was trying to solve was that the public method printList(...) takes a list of template type. But to pass to the printListImpl(...) method currently we are using the to_string method. So passing a list of numbers to printList(...) would result in a list of strings for the JSONScopedPrinter.

I've reverted this change to just unconditionally print as strings for now but maybe it would be preferable to replace the template param printList(...) with multiple overloaded printList(...) methods that can delegate to printStringListImpl(...)/printNumberListImpl(...)/printBooleanListImpl(...) methods. Interested if you have any thoughts on this.


This was to match closer with the public methods available. Some public methods of printBinary(...) don't provide a Str param and so it's just passed into printBinaryImpl as StringRef() and the existing ScopedPrinter::printBinaryImpl does a similar check for Str.empty(). Similarly, there are public printBinary(...) and printBinaryBlock(...) methods where the latter prints the accompanying characters and the former does not. I'm not opposed to unconditionally printing these values but since these checks are used to differentiate different public methods inside ScopedPrinter::printBinaryImpl I thought it made sense to do the same for JSONScopedPrinter::printBinaryImpl.

My mistake about the if statments, I'll try to be more careful about it in the future.


Moved ScopedPrinter-related tests to D114684 and rebased off that patch.


I haven't removed the lambda but it's now using a test fixture which takes a lambda. It doesn't need to use the test fixture since we're not testing the JSONScopedPrinter but I'd imagine it might be nice for the sake of consistency with other tests. Let me know if you feel otherwise.


I've updated the output to have the following format:

"Bytes": [
    "Index": 0,
    "Value": 70,
    "Character": "F"
    "Index": 2,
    "Value": 111,
    "Character": "o"

(Character attribute will be omitted for non printBinaryBlock(...) methods for now)

Could you rebase this patch on top of the other commits in the series, please, so that it's easier to see the current state?


I'd use the type name explicitly here. The concept of a "hex" number is purely a representation thing, so doesn't really make sense to refer to a number as a "hex number" until it's been written out somewhere. Numbers are just numbers.

I also suggest referring to the base explicitly in this comment for how they are outputted.


I think multiple functions is the way forward, although I'd just call them all printListImpl and let the ArrayRef type determine the version to call. Any particular reason we'd need multiple printList functions too? Seems like it's something that could be determined without it.

Some rough pseudo code:

template <typename T>
void printList(StringRef Label, const ArrayRef<T> List) {
  printListImpl(Label, List);

virtual void printListImpl(StringRef, ArrayRef<int> List) { /*...*/ }
virtual void printListImpl(StringRef, ArrayRef<bool> List) { /*...*/ }
virtual void printListImpl(StringRef Label, ArrayRef<std::string> List) { /*...*/ }

Actually, now that I type that out, I wonder if you could just get rid of printListImpl entirely, and make printList a set of virtual non-templated functions with multiple overloads for the different supported types? (Potentially you might still want a non-virtual printListImpl (that could be templated) to avoid code duplication of course).


Let's leave it as-is for now. We can always add attributes should we desire to at a future point.

That being said, I'm wondering if the Character block is actually useful for JSON output, since the character is just a human-readable version of the byte. If we're assuming people won't be directly reading JSON output and instead will be processing it themselves, the Character output wouldn't be useful at all, I think (they could reproduce it from the Bytes data, if needed). Thoughts?

Jaysonyan updated this revision to Diff 391220.Dec 1 2021, 11:09 PM
Jaysonyan marked an inline comment as done.

Add JSON implementation for new printList(...) overloads.
Update comment wording.

Jaysonyan updated this revision to Diff 391221.Dec 1 2021, 11:16 PM

Remove Character block from JSONScopedPrinter::printBinaryImpl(...)

Jaysonyan added inline comments.Dec 1 2021, 11:21 PM

I've added overloaded methods for printList(...) inside the virtualization diff and updated the ScopedPrinter test diff and this diff accordingly.

The implementation roughly follows the ideas you've laid out but the major difference is that the overloaded methods are public printList(...) methods rather than private printListImpl(...). This was needed because we still need to maintain a template printList(...) method to fall-back on for lists not comprised of strings, ints, or booleans.


You're right, adding the characters doesn't provide much value if we're analyzing the output through automated scripts. I think it makes sense to remove this block.

I've not yet looked at the JSONScopedPrinter tests again, but the body of the code looks pretty good.


This comment applies to all HexNumber instances, not just the one passed in, so it should be plural.


I'm now looking at this Index field and wondering if it's useful at this position? We're in an array already, so the index should match the array position. It doesn't (so should be called Offset), but then why have an entry per byte, when one per array would be less verbose, and equally as informative? It would also allow the values to be written directly in the array rather than in a nested object. I.e.

  "Value" : <Str>,
  "Offset" : <StartOffset>,
  "Bytes" : [Value[0], Value[1], ...]
Jaysonyan updated this revision to Diff 391435.Dec 2 2021, 12:49 PM

Update JSONScopedPrinter::printBinaryImpl to print initial offset rather than including a byte index in each byte entry.
Small comment change.

Jaysonyan marked 10 inline comments as done.Dec 2 2021, 12:52 PM
Jaysonyan added inline comments.

This makes a lot of sense and seems like a better format to me. Updated to match this format.

You should add tests (for both classes) for classof and getKind functions, since they're part of the public API.

One thought that's occurred to me: should the default JSON output really be formatted so verbosely (i.e. with all the new lines and spacing)? As the prime motivation of this is for machine readability, would it not make more sense for it to be compact, at least by default (with a possible option for pretty printing)? The amount of whitespace that's in the output could make the output data size be significantly larger than it needs to be, slowing down the parsing of said output.


All the new enums probably want to be enum class.


I'd avoid the use of the name Standard, as it could be confused with the C++ standard. I'd use Base, Plain or Basic for the enum value.


Similar comment to the above: don't use the name "Standard" for an enum value. Actually, in this case, without looking at the code, I have no clue what "Standard" even means. It should be renamed to reflect what it is used for.


At one point, there was some effort to set good defaults for SmallVector, allowing you to omit the second argument. It's less useful here, because we have a good guess about the amount of nesting. I think it's more likely to be a max of about 4 or 5, so 16 is probably way too high. Perhaps worth doing an audit of the usage within the LLVM code and add a few more points above the current maximum scoping (yes, I know JSONScopedPrinter isn't used really yet, but you can imagine how it would look, based on how ScopedPrinter is used already).


I think we need a JSONScopedPrinter version of this test, since we override startLine in that class.



Jaysonyan updated this revision to Diff 391758.Dec 3 2021, 3:13 PM
Jaysonyan marked 5 inline comments as done.

Update enum naming.
Add APSInt implementation and test.
Add classof and kindOf test.

Jaysonyan edited the summary of this revision. (Show Details)Dec 3 2021, 3:14 PM
Jaysonyan updated this revision to Diff 391760.Dec 3 2021, 3:24 PM

Update ScopeHistory SmallVector size to 8


Updated to use Base.


Updated to NoAttribute.


Looking through the usage of ScopedPrinter the most amount of nested scopes I came across was 6. This was a tie between a few methods in llmv-readobj (LLVMElfDumper::printVersionDependencySection, LLVMElfDumper::printBBAddrMaps, and COFFDumper::printCodeViweSymbolSection). Inside each of these methods there are 4 nested scopes and paired with the 2 scopes that would be added by JSONScopedPrinter (highest level [] and {} for each file) so I think the most nesting is 6. I'll update the SmallVector size to be 8.


My mistake, I actually mean to remove the overridden implementations of both startLine and getOStream. For the JSONScopedPrinter to provide these methods, it relies on json::OStream::rawValueBegin() which can only be used where values are used (elements of arrays or values to attributes). So if startLine or getOStream are called in any place which aren't these contexts (which is most of the time) then assertions inside json::OStream fail. So I think it might be more desirable to just rely on the ScopedPrinter implementation of both these methods.

jhenderson added inline comments.Dec 6 2021, 12:39 AM

This is marked as done but I don't see it here?


I'd pull this into a little helper function, shared by printNumber. It just helps avoid a (small) amount of duplication, but also helps label what the code does. (Ideally, I'd actually suggest enhancing the JSON output stream interface, but up to you whether you want to go that far)


This and the classof test don't rely on the fixture, so I'd a) change them to not use the fixture, and b) move them above it. Alternatively, see my comment below.


Either "nothing is" or "nothing's". Same below.

You might want to consider enhancing the current fixture, as an alternative to making these two tests not use it. In this case, I'd add the std::string, raw_string_ostream, ScopedPrinter and JSONScopedPrinter local variables used here and in the verify* functions into the base class, and then use the TearDown method to ensure JSONScopedPrinter has that empty string written.

Aside: it seems to me that this assertion is bogus - it's not that unreasonable to create a printer, but write nothing to it, to get an empty output.


Okay. I haven't looked into this, so I'll trust your judgement.

Jaysonyan updated this revision to Diff 392284.Dec 6 2021, 10:34 PM
Jaysonyan marked 4 inline comments as done.

Update missing enum to enum class
Update test fixture to handle printing outer {} and handle nothing printed at top level.


My mistake, looks like I updated some but not all of the enums. Updated them all now.


I've added a helper function for now. I don't imagine it would take too much to enhance the JSON class to handle APSInt but I don't have the capacity to investigate right now.


Updated the test fixture to handle the teardown. I needed to add a check to ensure we're only printing an empty string if there hasn't been a call to verifyJSONScopedPrinter since printString can only be called under specific contexts. Alternatively I could call something like DictScope(W, "Label") which can be called under all contexts (since it uses the history stack) but I held off because calling printString("") with the empty string felt more appropriate. Let me know if you have any opinions on this.

jhenderson added inline comments.Dec 7 2021, 3:44 AM

I'm not a fan of having to have an enum for the scoping kind, having looked at this. I'd much rather a class that does RAII-style management of the printer be passed in and ownership taken by this printer. I was going to suggest the following interface, but I see that the DelimitedScope takes a ScopedPrinter, leading to a chicken and egg problem.

JSONScopedPrinter(raw_ostream &OS, bool PrettyPrint = false, std::unique_ptr<DelimitedScope> &&Scope = std::unique_ptr<DelimitedScope>());

The first argument is unchanged. The second argument means control of the details of indentation is left to the class, rather than needing to know what "2" or "0" means (especially as "0" is special, as it means no new lines either). The third argument allows a user to provide the appropriate scoping mechanism for their use-case, but leaves the class to take and manage ownership; if it's defaulted, then no scoping will be used.

NB: the functionality of the second and third arguments would need testing (both default and explicit values), if they aren't already.

I wonder if the way to resolve this circular issue, is to have the DelimitedScope subclasses have a default constructor, which delays assigning the ScopedPrinter to a separate function. The scope opening would then be performed when the class is assigned its printer (and closing would only happen if a printer has been assigned). What do you think of this?


The only other idea I had was to make JSONScopedPrinter an Optional in the fixture, initialised in the corresponding verify method (and optionally in other tests, if needed), but I don't think that works.

I'm happy going with your suggestion.

Jaysonyan updated this revision to Diff 392677.Dec 8 2021, 1:25 AM
Jaysonyan marked 4 inline comments as done.

Add verifyAll method to test fixture.
Allow JSONScopedPrinter to be constructed with DelimitedScope.
Add DelimitedScope and pretty-print unit tests.

Jaysonyan added inline comments.Dec 8 2021, 1:32 AM

I agree that constructing the JSONScopedPrinter with an enum is less preferable than constructing with a DelimitedScope. I've implemented your suggestion to allow for DelimitedScope subclasses to be default constructed and later add the ScopedPrinter. This allowed us to construct JSONScopedPrinter with a DelimitedScope rather than using an enum.

Making this change also pointed out a weird piece of code which accesses the ScopedPrinter through the DelimitedScope member but it had access to the original ScopedPrinter. So I've updated that piece of code to directly access the ScopedPrinter.

I've also added tests for pretty printing and this delimitedScope ctor param. Since they require specific ctor calls they don't use the test fixture.


I'll leave it as is for now, possibly in the future that assertion will be removed and this extra work can just be deleted.

Jaysonyan updated this revision to Diff 392679.Dec 8 2021, 1:33 AM

Fix DelimitedScopeCtor test.

jhenderson added inline comments.Dec 9 2021, 6:31 AM

Consider adding a comment, as suggested inline, to "name" the pretty print/indentation parameter. The name should match the parameter's name.


I don't think you need the .get()?


For symmetry, I might be inclined to print a string in each of the three cases, not just the "no scope" case.


I'd add a comment as suggested inline, to explain the boolean.

Jaysonyan marked 4 inline comments as done.

Add param comments.
Update pretty-print tests.

Jaysonyan marked an inline comment as done.Dec 10 2021, 12:37 AM
jhenderson accepted this revision.Dec 10 2021, 2:28 AM

Looks good to me!

This revision is now accepted and ready to land.Dec 10 2021, 2:28 AM
This revision was landed with ongoing or failed builds.Dec 10 2021, 10:58 AM
This revision was automatically updated to reflect the committed changes.