Page MenuHomePhabricator

Add JSONScopedPrinter class

Authored by Jaysonyan on Nov 18 2021, 10:21 PM.



This change adds a JSONScopedPrinter as a subclass to ScopedPrinter.

This change is split off from D114052 and builds off of D114223.

Depends on D114684

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Nov 25 2021, 1:02 AM

What's the point of the lambda? It seems like it's adding unnecessary complexity to the test, when you could just do:

std::string ScopedString;
llvm::raw_string_ostream OS(ScopedString);
ScopedPrinter Writer(OS);

// <current body of the lambda>

const char *Out = /*...*/;
EXPECT_EQ(Out, ScopedString);

This setup logic might benefit from being pulled into a test fixture, to reduce the duplication between tests.

The JSON stuff could go in the same fixture.


Looking at this test seesm to make it clear to me that this is not hte optimal format of this output. I think you probably want it to be an array of numbers, rather than an object, with individual bytes labelled.

Jaysonyan updated this revision to Diff 390248.Nov 28 2021, 9:12 PM
Jaysonyan marked 9 inline comments as done.

Add JSONScopedPrinter specific tests.
Update printBinaryImpl structure to include index.
Add hexNumberToInt method.
Revert JSONScopedPrinter::printListImpl to only print strings.
Add printRawFlagsImpl method to JSONScopedPrinter.


Moved to virtualization patch!


This was meant to accommodate the change I made to the making the extra printFlags method virtual. I was hoping both public printFlags(...) methods could utilize the same printFlagsImpl method but you're right this could be a change in behaviour. I've updated the virtualisation patch to keep this implementation the same and added an explicit printRawFlagsImpl method.


This was an attempt similar to the change in the ScopedPrinter implementation of printFlagsImpl to handle 2 types of public printFlags methods. I've added a printRawFlagsImpl to handle printing Flag.Value and updated this method to printing both Flag.Name and Flag.Value. This seems to best match the information that the ScopedPrinter implementation provides.


Yea this makes sense. The problem I was trying to solve was that the public method printList(...) takes a list of template type. But to pass to the printListImpl(...) method currently we are using the to_string method. So passing a list of numbers to printList(...) would result in a list of strings for the JSONScopedPrinter.

I've reverted this change to just unconditionally print as strings for now but maybe it would be preferable to replace the template param printList(...) with multiple overloaded printList(...) methods that can delegate to printStringListImpl(...)/printNumberListImpl(...)/printBooleanListImpl(...) methods. Interested if you have any thoughts on this.


This was to match closer with the public methods available. Some public methods of printBinary(...) don't provide a Str param and so it's just passed into printBinaryImpl as StringRef() and the existing ScopedPrinter::printBinaryImpl does a similar check for Str.empty(). Similarly, there are public printBinary(...) and printBinaryBlock(...) methods where the latter prints the accompanying characters and the former does not. I'm not opposed to unconditionally printing these values but since these checks are used to differentiate different public methods inside ScopedPrinter::printBinaryImpl I thought it made sense to do the same for JSONScopedPrinter::printBinaryImpl.

My mistake about the if statments, I'll try to be more careful about it in the future.


Moved ScopedPrinter-related tests to D114684 and rebased off that patch.


I haven't removed the lambda but it's now using a test fixture which takes a lambda. It doesn't need to use the test fixture since we're not testing the JSONScopedPrinter but I'd imagine it might be nice for the sake of consistency with other tests. Let me know if you feel otherwise.


I've updated the output to have the following format:

"Bytes": [
    "Index": 0,
    "Value": 70,
    "Character": "F"
    "Index": 2,
    "Value": 111,
    "Character": "o"

(Character attribute will be omitted for non printBinaryBlock(...) methods for now)

Could you rebase this patch on top of the other commits in the series, please, so that it's easier to see the current state?


I'd use the type name explicitly here. The concept of a "hex" number is purely a representation thing, so doesn't really make sense to refer to a number as a "hex number" until it's been written out somewhere. Numbers are just numbers.

I also suggest referring to the base explicitly in this comment for how they are outputted.


I think multiple functions is the way forward, although I'd just call them all printListImpl and let the ArrayRef type determine the version to call. Any particular reason we'd need multiple printList functions too? Seems like it's something that could be determined without it.

Some rough pseudo code:

template <typename T>
void printList(StringRef Label, const ArrayRef<T> List) {
  printListImpl(Label, List);

virtual void printListImpl(StringRef, ArrayRef<int> List) { /*...*/ }
virtual void printListImpl(StringRef, ArrayRef<bool> List) { /*...*/ }
virtual void printListImpl(StringRef Label, ArrayRef<std::string> List) { /*...*/ }

Actually, now that I type that out, I wonder if you could just get rid of printListImpl entirely, and make printList a set of virtual non-templated functions with multiple overloads for the different supported types? (Potentially you might still want a non-virtual printListImpl (that could be templated) to avoid code duplication of course).


Let's leave it as-is for now. We can always add attributes should we desire to at a future point.

That being said, I'm wondering if the Character block is actually useful for JSON output, since the character is just a human-readable version of the byte. If we're assuming people won't be directly reading JSON output and instead will be processing it themselves, the Character output wouldn't be useful at all, I think (they could reproduce it from the Bytes data, if needed). Thoughts?

Jaysonyan updated this revision to Diff 391220.Dec 1 2021, 11:09 PM
Jaysonyan marked an inline comment as done.

Add JSON implementation for new printList(...) overloads.
Update comment wording.

Jaysonyan updated this revision to Diff 391221.Dec 1 2021, 11:16 PM

Remove Character block from JSONScopedPrinter::printBinaryImpl(...)

Jaysonyan added inline comments.Dec 1 2021, 11:21 PM

I've added overloaded methods for printList(...) inside the virtualization diff and updated the ScopedPrinter test diff and this diff accordingly.

The implementation roughly follows the ideas you've laid out but the major difference is that the overloaded methods are public printList(...) methods rather than private printListImpl(...). This was needed because we still need to maintain a template printList(...) method to fall-back on for lists not comprised of strings, ints, or booleans.


You're right, adding the characters doesn't provide much value if we're analyzing the output through automated scripts. I think it makes sense to remove this block.

I've not yet looked at the JSONScopedPrinter tests again, but the body of the code looks pretty good.


This comment applies to all HexNumber instances, not just the one passed in, so it should be plural.


I'm now looking at this Index field and wondering if it's useful at this position? We're in an array already, so the index should match the array position. It doesn't (so should be called Offset), but then why have an entry per byte, when one per array would be less verbose, and equally as informative? It would also allow the values to be written directly in the array rather than in a nested object. I.e.

  "Value" : <Str>,
  "Offset" : <StartOffset>,
  "Bytes" : [Value[0], Value[1], ...]
Jaysonyan updated this revision to Diff 391435.Dec 2 2021, 12:49 PM

Update JSONScopedPrinter::printBinaryImpl to print initial offset rather than including a byte index in each byte entry.
Small comment change.

Jaysonyan marked 10 inline comments as done.Dec 2 2021, 12:52 PM
Jaysonyan added inline comments.

This makes a lot of sense and seems like a better format to me. Updated to match this format.

You should add tests (for both classes) for classof and getKind functions, since they're part of the public API.

One thought that's occurred to me: should the default JSON output really be formatted so verbosely (i.e. with all the new lines and spacing)? As the prime motivation of this is for machine readability, would it not make more sense for it to be compact, at least by default (with a possible option for pretty printing)? The amount of whitespace that's in the output could make the output data size be significantly larger than it needs to be, slowing down the parsing of said output.


All the new enums probably want to be enum class.


I'd avoid the use of the name Standard, as it could be confused with the C++ standard. I'd use Base, Plain or Basic for the enum value.


Similar comment to the above: don't use the name "Standard" for an enum value. Actually, in this case, without looking at the code, I have no clue what "Standard" even means. It should be renamed to reflect what it is used for.


At one point, there was some effort to set good defaults for SmallVector, allowing you to omit the second argument. It's less useful here, because we have a good guess about the amount of nesting. I think it's more likely to be a max of about 4 or 5, so 16 is probably way too high. Perhaps worth doing an audit of the usage within the LLVM code and add a few more points above the current maximum scoping (yes, I know JSONScopedPrinter isn't used really yet, but you can imagine how it would look, based on how ScopedPrinter is used already).


I think we need a JSONScopedPrinter version of this test, since we override startLine in that class.



Jaysonyan updated this revision to Diff 391758.Dec 3 2021, 3:13 PM
Jaysonyan marked 5 inline comments as done.

Update enum naming.
Add APSInt implementation and test.
Add classof and kindOf test.

Jaysonyan edited the summary of this revision. (Show Details)Dec 3 2021, 3:14 PM
Jaysonyan updated this revision to Diff 391760.Dec 3 2021, 3:24 PM

Update ScopeHistory SmallVector size to 8


Updated to use Base.


Updated to NoAttribute.


Looking through the usage of ScopedPrinter the most amount of nested scopes I came across was 6. This was a tie between a few methods in llmv-readobj (LLVMElfDumper::printVersionDependencySection, LLVMElfDumper::printBBAddrMaps, and COFFDumper::printCodeViweSymbolSection). Inside each of these methods there are 4 nested scopes and paired with the 2 scopes that would be added by JSONScopedPrinter (highest level [] and {} for each file) so I think the most nesting is 6. I'll update the SmallVector size to be 8.


My mistake, I actually mean to remove the overridden implementations of both startLine and getOStream. For the JSONScopedPrinter to provide these methods, it relies on json::OStream::rawValueBegin() which can only be used where values are used (elements of arrays or values to attributes). So if startLine or getOStream are called in any place which aren't these contexts (which is most of the time) then assertions inside json::OStream fail. So I think it might be more desirable to just rely on the ScopedPrinter implementation of both these methods.

jhenderson added inline comments.Dec 6 2021, 12:39 AM

This is marked as done but I don't see it here?


I'd pull this into a little helper function, shared by printNumber. It just helps avoid a (small) amount of duplication, but also helps label what the code does. (Ideally, I'd actually suggest enhancing the JSON output stream interface, but up to you whether you want to go that far)


This and the classof test don't rely on the fixture, so I'd a) change them to not use the fixture, and b) move them above it. Alternatively, see my comment below.


Either "nothing is" or "nothing's". Same below.

You might want to consider enhancing the current fixture, as an alternative to making these two tests not use it. In this case, I'd add the std::string, raw_string_ostream, ScopedPrinter and JSONScopedPrinter local variables used here and in the verify* functions into the base class, and then use the TearDown method to ensure JSONScopedPrinter has that empty string written.

Aside: it seems to me that this assertion is bogus - it's not that unreasonable to create a printer, but write nothing to it, to get an empty output.


Okay. I haven't looked into this, so I'll trust your judgement.

Jaysonyan updated this revision to Diff 392284.Dec 6 2021, 10:34 PM
Jaysonyan marked 4 inline comments as done.

Update missing enum to enum class
Update test fixture to handle printing outer {} and handle nothing printed at top level.


My mistake, looks like I updated some but not all of the enums. Updated them all now.


I've added a helper function for now. I don't imagine it would take too much to enhance the JSON class to handle APSInt but I don't have the capacity to investigate right now.


Updated the test fixture to handle the teardown. I needed to add a check to ensure we're only printing an empty string if there hasn't been a call to verifyJSONScopedPrinter since printString can only be called under specific contexts. Alternatively I could call something like DictScope(W, "Label") which can be called under all contexts (since it uses the history stack) but I held off because calling printString("") with the empty string felt more appropriate. Let me know if you have any opinions on this.

jhenderson added inline comments.Dec 7 2021, 3:44 AM

I'm not a fan of having to have an enum for the scoping kind, having looked at this. I'd much rather a class that does RAII-style management of the printer be passed in and ownership taken by this printer. I was going to suggest the following interface, but I see that the DelimitedScope takes a ScopedPrinter, leading to a chicken and egg problem.

JSONScopedPrinter(raw_ostream &OS, bool PrettyPrint = false, std::unique_ptr<DelimitedScope> &&Scope = std::unique_ptr<DelimitedScope>());

The first argument is unchanged. The second argument means control of the details of indentation is left to the class, rather than needing to know what "2" or "0" means (especially as "0" is special, as it means no new lines either). The third argument allows a user to provide the appropriate scoping mechanism for their use-case, but leaves the class to take and manage ownership; if it's defaulted, then no scoping will be used.

NB: the functionality of the second and third arguments would need testing (both default and explicit values), if they aren't already.

I wonder if the way to resolve this circular issue, is to have the DelimitedScope subclasses have a default constructor, which delays assigning the ScopedPrinter to a separate function. The scope opening would then be performed when the class is assigned its printer (and closing would only happen if a printer has been assigned). What do you think of this?


The only other idea I had was to make JSONScopedPrinter an Optional in the fixture, initialised in the corresponding verify method (and optionally in other tests, if needed), but I don't think that works.

I'm happy going with your suggestion.

Jaysonyan updated this revision to Diff 392677.Dec 8 2021, 1:25 AM
Jaysonyan marked 4 inline comments as done.

Add verifyAll method to test fixture.
Allow JSONScopedPrinter to be constructed with DelimitedScope.
Add DelimitedScope and pretty-print unit tests.

Jaysonyan added inline comments.Dec 8 2021, 1:32 AM

I agree that constructing the JSONScopedPrinter with an enum is less preferable than constructing with a DelimitedScope. I've implemented your suggestion to allow for DelimitedScope subclasses to be default constructed and later add the ScopedPrinter. This allowed us to construct JSONScopedPrinter with a DelimitedScope rather than using an enum.

Making this change also pointed out a weird piece of code which accesses the ScopedPrinter through the DelimitedScope member but it had access to the original ScopedPrinter. So I've updated that piece of code to directly access the ScopedPrinter.

I've also added tests for pretty printing and this delimitedScope ctor param. Since they require specific ctor calls they don't use the test fixture.


I'll leave it as is for now, possibly in the future that assertion will be removed and this extra work can just be deleted.

Jaysonyan updated this revision to Diff 392679.Dec 8 2021, 1:33 AM

Fix DelimitedScopeCtor test.

jhenderson added inline comments.Dec 9 2021, 6:31 AM

Consider adding a comment, as suggested inline, to "name" the pretty print/indentation parameter. The name should match the parameter's name.


I don't think you need the .get()?


For symmetry, I might be inclined to print a string in each of the three cases, not just the "no scope" case.


I'd add a comment as suggested inline, to explain the boolean.

Jaysonyan marked 4 inline comments as done.

Add param comments.
Update pretty-print tests.

Jaysonyan marked an inline comment as done.Dec 10 2021, 12:37 AM
jhenderson accepted this revision.Dec 10 2021, 2:28 AM

Looks good to me!

This revision is now accepted and ready to land.Dec 10 2021, 2:28 AM
This revision was landed with ongoing or failed builds.Dec 10 2021, 10:58 AM
This revision was automatically updated to reflect the committed changes.