This is an archive of the discontinued LLVM Phabricator instance.

Add JSONScopedPrinter to llvm-readelf
ClosedPublic

Authored by Jaysonyan on Nov 18 2021, 11:06 PM.

Details

Summary

This change adds JSONScopedPrinter to llvm-readelf. It includes an empty JSONELFDumper class which will be used to override any LLVMELFDumper methods which utilize startLine() which JSONScopedPrinter cannot provide.

This change is split from D114052 and builds off of D114224

Depends on D114224

Diff Detail

Event Timeline

Jaysonyan created this revision.Nov 18 2021, 11:06 PM
Jaysonyan requested review of this revision.Nov 18 2021, 11:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2021, 11:06 PM
jhenderson added inline comments.Nov 19 2021, 1:16 AM
llvm/tools/llvm-readobj/ELFDumper.cpp
716

It's not clear to me why we inherit from LLVMELFDumper in this patch. This probably needs a comment at the very least.

llvm/tools/llvm-readobj/llvm-readobj.cpp
233–234

Probably time to turn this into a StringSwitch.

234–245

It would be great if this message included what the actual value specified was, and now's a good time to change it whilst you're here. Providing more context is useful, because it may not always be obvious what the problem is. For example, a command-line of "llvm-readobj --elf-output-syle="GNU my/input/file.o", resulting from a typo, would not always be immediately obvious where the issue is (especially if this was in a script and the input file was in a variable etc).

342–349

These two sets of if statements are actually in the wrong place, in my opinion: I think they should be behind interfaces in the ObjDumper and sub-classes, as appropriate, something like printFileSummary (which could take care of both).

I think it may be a slightly separate issue though. Here's my thinking:

  1. LLVM style is controlled via the --elf-output-style option, which has different defaults for llvm-readelf and llvm-readobj.
  2. The name of that option implies it has no impact for non-ELF formats, but this area of code is format-agnostic.
  3. JSON output probably actually wants to print Format/Arch/AddressSize.
  4. Everything else that is impacted by the --elf-output-style option is driven from the inheritance hierarchy. It seems odd that we have something that isn't.

At the very least, we could move the logic into virtual functions in ObjDumper, and then have JSON format provide its own behaviour. This would be a step in the right direction at least. What do you think?

549–551
Jaysonyan added inline comments.Nov 21 2021, 11:31 PM
llvm/tools/llvm-readobj/llvm-readobj.cpp
342–349

I agree that --elf-output-style shouldn't impact non-ELF formats. Although the work of adding a JSONScopedPrinter was not ELF-specific and it may be desirable to add json support for other formats in the future. This makes me wonder if --elf-output-style is an appropriate flag to enable json through or if it'd be preferable to add a format-agnostic flag (ex: --output-format). This could lend itself well if we choose to support json output of other obj formats in the future but could be confusing how --elf-output-style and --output-format would interact with each other. I'm not sure if this is something worth considering since there are currently no plans to support other obj formats but don't want to lock us in to an inconvenient flag choice for the future.

Otherwise, conceptually it makes sense to move this work into ObjDumper methods. One issue I see is that any usage of DictScope or ListScope relies on the ctor and dtor to output the opening/closing braces. So it may be difficult to just move this logic to other methods, especially if we have to initialize multiple DictScopes/ListScopes and just returning a unique_ptr would be insufficient.

jhenderson added inline comments.Nov 22 2021, 1:29 AM
llvm/tools/llvm-readobj/llvm-readobj.cpp
342–349

Yes, changing --elf-output-style to just --output-style was something I've considered a good idea for quite some time. This issue becomes more important as we add JSON support, but I think we can defer it to later on (it shouldn't block this work).

One issue I see is that any usage of DictScope or ListScope relies on the ctor and dtor to output the opening/closing braces.

I'm a little lost which constructor we're talking about here? The ObjDumper constructor or the DictScope/ListScope constructors?

Jaysonyan added inline comments.Nov 22 2021, 7:07 PM
llvm/tools/llvm-readobj/llvm-readobj.cpp
342–349

Referring to the DictScope/ListScope ctor/dtor. Since the opening braces are printed in the constructor and the closing braces are printed in the destructor, moving any instantiations of ListScope/DictScope inside a printFileSummary method becomes difficult since we would still need DictScope/ListScope to get destructed at the end of the dumpObject method rather than at the end of the printFileSummary method.

jhenderson added inline comments.Nov 23 2021, 1:31 AM
llvm/tools/llvm-readobj/llvm-readobj.cpp
342–349

Should the top-level DictScope be a member of the JSONELFDumper? Would that help solve this issue?

Related aside: perhaps in JSON mode, the file summary could become a separate object within the object for the file?

Update --elf-output-style error messaging.
Add printFileSummary method to ObjDumper.
Misc formatting.

Update formatting.

Jaysonyan marked 7 inline comments as done.Nov 26 2021, 12:26 PM
Jaysonyan added inline comments.
llvm/tools/llvm-readobj/ELFDumper.cpp
716

Added a comment! Let me know if you think it could provide more detail.

llvm/tools/llvm-readobj/llvm-readobj.cpp
233–234

I needed to provide a default option to know when to send an error message so I added an UNKOWN entry to the OutputStyelTy enum. If this seems inappropriate I can also change to just using a switch statement.

342–349

Adding the top level DictScope solves the problem! I also added the file summary info to its own object with attribute FileSummary.

Tip: add "Depends on DXXXXXX" to the patch description (but no need for it to be in the final commit message) to get Phabricator to auto-link this patch to the predecessor patch. I believe this helps the pre-merge checks, and also provides a nice little "Stack" view in the interface.

llvm/tools/llvm-readobj/ELFDumper.cpp
716

Thanks. Missing a trailing . though.

The limited nature of this new dumper class shows that we're not far off the ideal of having just one common class, which takes a ScopedPrinter (or JSONScopedPrinter), but I don't think we'll get there with this patch.

3263–3264

Old code used "\n" rather than '\n'. I think the former may be more common elsewhere in LLVM in general, although I'm not aware of any concrete coding standard.

Also, can't we use the existing W in the base class here, to keep the code identical to what it was before?

7342

Isn't this a behaviour change? Is it necesssary? If so, I'd defer it to a separate patch, if at all possible (or put it in a prerequisite patch), so that this patch is a pure NFC if a user doesn't specify --elf-output-style=json. I'm also slightly surprised I didn't see any test failures?

I'd suggest in this patch to call out to the base class version to do most of the work, although there's now a slight impedence to doing that in the form of the additional new line printed in one case but not the other, which I'd query.

The base class version was doing the JSON style version before, so I wonder if there's a good reason for this and the base class to diverge?

llvm/tools/llvm-readobj/llvm-readobj.cpp
233–234

UNKNOWN is a reasonable addition, in my opinion.

234–244
234–244

Perhaps pull A->getValue() into a helper variable, since you use it in the error message below.

234–245

I think the text as it was was a little better, and I'd just add "but was \"foo\"" to the end.

Aside: general LLVM style for error messages is no leading capital letter.

Jaysonyan updated this revision to Diff 391244.EditedDec 2 2021, 12:52 AM
Jaysonyan marked 9 inline comments as done.

Update error messaging.
Misc formatting.

llvm/tools/llvm-readobj/ELFDumper.cpp
7342

Sorry, I'm not sure if I fully understand what constitutes as a behaviour change. This method would only be run when the user has specified --elf-output-style=JSON (which is introduced in this patch) so it shouldn't impact any existing behaviour.

The base class version calls W.startLine() which isn't well supported with JSONScopedPrinter and my expectation was to override any methods in the base class which use startLine(). Beyond that, the FileScope is needed in this implementation based on our previous discussion (D114225#inline-1092086) to achieve the desired JSON output discussed in the initial patch (D111658#3072213). If we relied solely on the base class, we wouldn't get each file in their own {}.

In my opinion, it makes sense to not break this change out into its own patch because it relies on the existence of a JSONELFDumper which is introduced in this patch and this patch relies on this change so that the output is in a reasonable format (each file has it's own object).

Jaysonyan edited the summary of this revision. (Show Details)Dec 2 2021, 12:53 AM
jhenderson added inline comments.Dec 3 2021, 1:04 AM
llvm/tools/llvm-readobj/ELFDumper.cpp
7342

Sorry, I'm not sure if I fully understand what constitutes as a behaviour change. This method would only be run when the user has specified --elf-output-style=JSON (which is introduced in this patch) so it shouldn't impact any existing behaviour.

Apologies, I think I was looking at the "changes since previous action" view, so didn't notice that this was in the JSONELFDumper function.

llvm/tools/llvm-readobj/llvm-readobj.cpp
242

Missed that you need a comma here.

342

I guess there's a subtle behaviour change as a result of this patch, which wouldn't hurt to be called out in the commit description, but I otherwise think is fine: if a user had exactly one non-ELF object/archive (e.g. a COFF object), and they had previously specified --elf-output-style=GNU (or run using llvm-readelf, which has that as the default), they would previously have had no file summary, but now they'd get the default summary, provided by the ObjDumper class. Similarly, if they'd had more than one input, or the objects were in an archive, they'd before have had just the File part of the summary, but now they have the full file summary.

616–618

To recap: we only need this scope for JSON, so that there is a top-level object, ensuring we always have valid JSON in the output. What's stopping us making the ListScope a member of the JSONScopedPrinter class, so that the scoping is added on construction/destruction of the Writer?

Taking it from another point of view, if you had another piece of client code using JSONScopedPrinter, would they expect the output to be a valid JSON object, without additional changes, or expect to need to provide the outer object wrapping themselves? If the former, the scoping belongs to the JSCONScopedPrinter.

Oh, you should also add one test in this patch to show that the file summary output is correct for JSON output, with a single file, multiple files, and a single archive file (possibly with only one object).

Jaysonyan updated this revision to Diff 391765.Dec 3 2021, 3:51 PM
Jaysonyan marked an inline comment as done.

Add missing comma in error message.

llvm/tools/llvm-readobj/llvm-readobj.cpp
342

I placed the same opts::InputFilenames.size() > 1 || A check inside GNUELFDumper::printFileSummary method so I think the behaviour is the same.

Before, if the opts::Output is LLVM both if(...) statements would trigger. Now the ObjDumper::printFileSummary performs all the work in both if statements and would be called for any Dumper that is not GNUELFDumper or JSONELFDumper which should be any case where opts::Output is LLVM.

And the GNUELFDumper::printFileSummary only prints the File part and still does the multiple inputs / archive check.

616–618

I think providing an optional ListScope/DictScope constructor could be a nice quality-of-life improvement, but I think there should still be a way to construct a JSONScopedPrinter without {}/[] already added. Since technically, "string" is still valid json so I wouldn't want to remove the option to output json that don't have an outermost []/{}.

Jaysonyan marked 2 inline comments as done.Dec 3 2021, 3:52 PM
Jaysonyan updated this revision to Diff 391944.Dec 5 2021, 2:50 PM

Add JSON FileSummary tests.
Add pretty-print option to llvm-readelf.

Jaysonyan updated this revision to Diff 391945.Dec 5 2021, 3:13 PM

Add pretty-print test.

Please remember to update llvm/docs/CommandGuide/llvm-readelf.rst and llvm/docs/CommandGuide/llvm-readobj.rst as appropriate for the new options.

llvm/test/tools/llvm-readobj/ELF/json-file-summary.test
1 ↗(On Diff #391945)

I'd rename this test file-summary-json.test.

I'd also test the entire output, so that you catch the opening and closing braces (if any) as well as any formatting points. To do this, I'd add --match-full-lines, --strict-whitespace and --implicit-check-not={{.}} to the FileCheck executions. These ensure that a) there is no other output on the same line (by default, FileCheck matches substrings); b) there is no additional whitespace on the line (by default, FileCheck ignores leading and trailing whitespace, and compresses all other whitespace into single ' ' characters); c) ensures the regex pattern . doesn't match anything that isn't covered by an existing check.

4 ↗(On Diff #391945)

I'd avoid reusing the same output filename, as it makes things a little easier to debug sometimes when there's a test problem. Probably name the files %t.single, %t.multiple and %t.archive.

5 ↗(On Diff #391945)

These RUN lines are getting quite long. I'd break them up as in the suggested edit. You can use the \ to continue a RUN statement onto the next line, so this might be useful too to allow the FileCheck command to run across multiple lines, when you add the extra options.

llvm/test/tools/llvm-readobj/ELF/pretty-print.test
4
5–6

Same comments as above re. FileCheck additional options, output file name, and breaking up over multiple lines. Also applies below.

llvm/tools/llvm-readobj/Opts.td
31
llvm/tools/llvm-readobj/llvm-readobj.cpp
342

No, I don't think the behaviour is the same. Non-ELF objects (e.g. COFF/XCOFF etc) won't use the *ELFDumper classes, regardless of the --elf-output-style option's value. This is unchanged from before. The difference is that the opts::Output value now doesn't have a direct impact on the output (which is good), and only picks the dumper class to use for ELF objects. This means non-ELF objects will always have the default file summary behaviour.

I think this is a good behaviour change (GNU output style should have no effect for non-ELF objects), but it needs highlighting, as said already.

616–618

Yes, I see your point, and agree with you. Are you suggesting deferring to a separate patch, or as part of the original JSONScopedPrinter patch? I suggest the latter, because it will directly improve the code here, in my opinion.

Jaysonyan updated this revision to Diff 392290.Dec 6 2021, 11:20 PM
Jaysonyan marked an inline comment as done.

Use JSONScopedPrinter ctor to initialize outer-most [].

Jaysonyan updated this revision to Diff 392298.Dec 7 2021, 12:33 AM
Jaysonyan marked 6 inline comments as done.

Update docs.
Clean up tests.

Jaysonyan marked an inline comment as done.Dec 7 2021, 12:34 AM
Jaysonyan added inline comments.
llvm/tools/llvm-readobj/llvm-readobj.cpp
342

Sorry, don't mean to drag this on but I wanted to ensure we're on the same page.

opts::Output by default is initialized to LLVM so if --elf-output-style is not specified, then opts::Output will be LLVM. So I believe that even beforehand non-ELF objects would still have file summary information. Looking at a readobj tests with COFF here it looks like it's printing file summary info despite not being an ELF object.

616–618

Updated the JSONScopedPrinter patch to add this and use that change here.

jhenderson added inline comments.Dec 7 2021, 3:31 AM
llvm/docs/CommandGuide/llvm-readelf.rst
77

This line might need reflowing, to keep to a standard width (80 presumably).

78

JSON is an acronym, so should always be all-caps.

133

I'd suggest ", ... JSON output will be formatted in a more readable format." instead of the "will ..." part of the sentence.

llvm/docs/CommandGuide/llvm-readobj.rst
189
213

Use the same description as the other doc.

llvm/test/tools/llvm-readobj/ELF/file-summary-json.test
4

I'm being dumb: you only need one yaml2obj invocation for all test inputs. Delete the others, and rename the file %t or %t.o or similar.

6–7

Add some indentation to make it clear these are continuation lines. Same elsewhere.

56

llvm-ar r doesn't delete the existing archive or its contents, which can lead to unexpected results, if the test were aborted/updated in the future. Delete the archive file before recreating it here. Same below.

llvm/test/tools/llvm-readobj/ELF/pretty-print.test
6

-D is usually a single-dash option; --check-prefix usually has two dashes.

llvm/tools/llvm-readobj/llvm-readobj.cpp
342

Default and --elf-output-style=LLVM behaviours are unchanged. I'm not talking about those. I'm talking about the following command-line:

$ llvm-readobj coff.obj --elf-output-style=GNU

Before this had no file summary. Now it does. I think it is okay if it does, but I think this should be called out in the commit description.

Jaysonyan updated this revision to Diff 392686.Dec 8 2021, 2:14 AM
Jaysonyan marked 12 inline comments as done.

Update docs formatting.
Update file-summary and pretty print tests.

llvm/tools/llvm-readobj/llvm-readobj.cpp
342

Ok I see now, sorry for taking so long to understand. Will definitely call out this change in the commit description.

jhenderson added inline comments.Dec 9 2021, 6:35 AM
llvm/test/tools/llvm-readobj/ELF/file-summary-json.test
4–5

No need to rm an object file. That's only necessary for archive files (yaml2obj overwrites the existing file, whereas llvm-ar appends to it).

65–67

You should be able to be more precise, rather than skipping a chunk of the name. I think the relevant lit macro is %basename_t or something to that effect. You could pass that into an additional argument e.g. -DMEMBER.

Jaysonyan updated this revision to Diff 393402.Dec 10 2021, 1:25 AM
Jaysonyan marked 2 inline comments as done.

Fix up file-summary test.

Just as a heads up, Dec 10th is the last day of my internship so unfortunately after that I won't be able to work on this change as actively as before. If there is still feedback to address a member of my team may be able to pick this up from where I left off. And thank you for taking the time to review all of my code changes these past months!

llvm/test/tools/llvm-readobj/ELF/file-summary-json.test
65–67

I was able to remove the {{.*}} by matching the structure of llvm-readobj/archive.test.

jhenderson accepted this revision.Dec 10 2021, 2:31 AM

Looks good!

Just as a heads up, Dec 10th is the last day of my internship so unfortunately after that I won't be able to work on this change as actively as before. If there is still feedback to address a member of my team may be able to pick this up from where I left off. And thank you for taking the time to review all of my code changes these past months!

No problem. I think this is now in a good state to land the first sequence of patches. Hopefully you can get that done today.

This revision is now accepted and ready to land.Dec 10 2021, 2:31 AM
thakis added a subscriber: thakis.Dec 10 2021, 12:18 PM

The test fails on Windows: http://45.33.8.238/win/50629/step_11.txt

Please take a look and revert for now if it takes a while to fix.

The test fails on Windows: http://45.33.8.238/win/50629/step_11.txt

Please take a look and revert for now if it takes a while to fix.

Thank you for the heads up, I'm taking a look now, hopefully I'll have a fix soon and if not I'll revert these changes.

This revision was landed with ongoing or failed builds.Dec 10 2021, 3:35 PM
This revision was automatically updated to reflect the committed changes.

The test fails on Windows: http://45.33.8.238/win/50629/step_11.txt

Please take a look and revert for now if it takes a while to fix.

Thank you for the heads up, I'm taking a look now, hopefully I'll have a fix soon and if not I'll revert these changes.

An alternative test fix (and arguably a better one), would be to cd into the %t.dir directory after creating it, and then running all operations relative to the cwd, rather than the absolute paths %t translates into. You might want to consider making that change.