This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] Standardize JSON output for `Other` field
ClosedPublic

Authored by paulkirth on Oct 31 2022, 10:26 AM.

Details

Summary

Today, the LLVM output uses special handling when the Other field is 0.
This output makes sense for a command line utility that a human will
read, but JSON is a machine readable format, so being consistent is more
important. Prior to this change, any consumer of the JSON output would
need to handle the Other field specially, since the structure of the
JSON would no longer be consistent.

Changes to JSON output when Other flag == 0:

"Other": 0,   ->   "Other": {
                      "RawFlags": 0,
                       "Flags": []
                    },

There are no changes to when Other flag != 0:

"Other": {        ->   "Other": {
  "RawFlags": 1,          "RawFlags": 1,
  "Flags": [              "Flags": [
      ...                     ...
  ]                       ]
},                     },

This patch adds a overload for the JSONELFDumper's printSymbol() method,
that uses consistent output formatting, regardless of the value of the
Other field.

Depends on D137092

Diff Detail

Event Timeline

paulkirth created this revision.Oct 31 2022, 10:26 AM
Herald added a project: Restricted Project. · View Herald Transcript
paulkirth requested review of this revision.Oct 31 2022, 10:26 AM

FYI @jhenderson This whole stack is in need of tests, and I plan to address that. I think that if we find this stack preferable to D135419, then we can just abandon that approach.

jhenderson requested changes to this revision.Nov 1 2022, 1:36 AM

Rather than duplicating, then refactoring, do the refactor first and then use it, so that the code is reasonably clean in every landed commit.

Also, please provide before and after examples, and test cases for the fix. Also, please fix the title to use "llvm-readobj" in the commit message/patch description, not simply "readobj".

FYI @jhenderson This whole stack is in need of tests, and I plan to address that. I think that if we find this stack preferable to D135419, then we can just abandon that approach.

Sorry, only just saw this comment. Either way, the before and after examples are needed for me to visualise the problems. One of the aims with the JSON output being built on top of the LLVM output was that they should have largely been the same structurally, just with some formatting details changing.

This revision now requires changes to proceed.Nov 1 2022, 1:36 AM

Sorry, I typed this up back in early November, when we were initially looking at this and forgot to hit the submit button. I've had a few other items take up my attention, but hope to spend a bit more time on this before the end of the year.

The point of this change is to make Other a consistent field, and avoid having to special case how its handled when Other is 0.

Other: 0,   ->   "Other": {
                   "RawFlags": 0,
                    "Flags": []
                 },

With more context the JSON looks like this:

  • Before
... 
 "Symbols": [
    {
      "Symbol": {
        "Name": {
          "Value": "foo",
          "RawValue": 1
        },
        "Value": 0,
        "Size": 0,
        "Binding": {
          "Value": "Local",
          "RawValue": 0
        },
        "Type": {
          "Value": "None",
          "RawValue": 0
        },
        "Other": 0,
        "Section": {
          "Value": ".text",
          "RawValue": 1
        }
      }
    }
  ]
  ...
  • After
...
"Symbols": [
   {
     "Symbol": {
       "Name": {
         "Value": "foo",
         "RawValue": 1
       },
       "Value": 0,
       "Size": 0,
       "Binding": {
         "Value": "Local",
         "RawValue": 0
       },
       "Type": {
         "Value": "None",
         "RawValue": 0
       },
       "Other": {
         "RawFlags": 0,
         "Flags": []
       },
       "Section": {
         "Value": ".text",
         "RawValue": 1
       }
     }
   }
 ]
 ...

It isn't significantly different than before, but now it's consistent, regardless of the flag's value. For tools that focuses on a human reader, the specializing the output to be concise makes sense. But JSON is a machine readable format, so it's far more important that the output always have the same structure than it is to be concise. That's really the point of this change.

Also, part of the reason I set these patches aside for a while was that there is little testing for the JSON output, so I was trying to find a middle ground where we could only test that the formatting is correct without duplicating many of the LLVM formating tests. I didn't really find a nice way to do that. Do you have any recommendations as to how we could achieve that?

paulkirth updated this revision to Diff 481110.Dec 7 2022, 4:48 PM
paulkirth retitled this revision from [readobj] Give JSON output a custom printSymbol to [llvm-readobj] Give JSON output a custom printSymbol.
paulkirth edited the summary of this revision. (Show Details)

Rebase patch on top of D137092.

  • Add some tests for new JSON output of the Other field.
  • Update the summary with an example of the change to the output.

As a heads-up, I am off work after today for 6 weeks, so won't be able to review further after today. Sorry about that.

llvm/tools/llvm-readobj/ELFDumper.cpp
7584

Could we avoid the mass code-duplication by just introducing another function "printSymbolOtherField" (or something like that) that is implemented differently for the two dumper types?

Maybe even "printZeroSymbolOtherField" so that only the very niche bit that needs to differ is distinguished. This function could then just delegate to the regular print symbol st_other code, to avoid needing to duplicate that in any way.

With this change, I think you could minimise the additional testing to just demonstrate basic behaviour difference for zero/non-zero st_other values, without needing to have additional cases for the various other aspects of this field printing.

Also, part of the reason I set these patches aside for a while was that there is little testing for the JSON output, so I was trying to find a middle ground where we could only test that the formatting is correct without duplicating many of the LLVM formating tests. I didn't really find a nice way to do that. Do you have any recommendations as to how we could achieve that?

I didn't respond to this directly, but just noting that as far as I'm concerned, if both LLVM and JSON styles use the same code, you only need to test one or the other. You only need to add additional testing where behaviour follows a different code path.

Also, although I'm marked as Requesting Changes, as long as my most recent comments are addressed, I'm happy for this patch to be landed if someone else involved with llvm-readobj approves it (e.g. @MaskRay).

@jhenderson Thanks for the feedback, and the notification re: time of work. I'm looking into your suggested approach now. The duplication also bothered me, but I also saw several other places where the output should diverge. I'll take a closer look at this, now that I've sketched out how to fix the JSON formatting for most of the problems I've found, and see if there is a nicer way to modify the output only when they diverge.

I didn't respond to this directly, but just noting that as far as I'm concerned, if both LLVM and JSON styles use the same code, you only need to test one or the other. You only need to add additional testing where behaviour follows a different code path.

That is good to know. Thank you for clarifying. I'll try to keep that in mind as this stack evolves.

paulkirth updated this revision to Diff 482519.Dec 13 2022, 9:17 AM

Test only the differences in output and implementation between the LLVM and JSON formats

paulkirth updated this revision to Diff 484093.Dec 19 2022, 2:52 PM

Fix Formatting

paulkirth updated this revision to Diff 484100.Dec 19 2022, 3:05 PM

Rebase + format

paulkirth updated this revision to Diff 484123.Dec 19 2022, 3:49 PM

Remove unnecesary test checks.

paulkirth updated this revision to Diff 484389.Dec 20 2022, 2:51 PM

Fix error in test

jhenderson added inline comments.Feb 24 2023, 12:59 AM
llvm/test/tools/llvm-readobj/ELF/llvm-vs-json-format.test
1 ↗(On Diff #496200)

Having looked at this again, rather than a separate test showing the differences, I think I'd prefer the test cases being added to existing tests that test LLVM and GNU output styles. For the st_other case, that would be https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/symbol-visibility.test (which is essentially the "main" st_other test) and potentially both https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/aarch64-symbols-stother.test and https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/mips-symbols-stother.test.

I was hoping we'd be able to share CHECK patterns, but of course, for JSON output, the keys are quoted, so that doesn't really work.

paulkirth added inline comments.Feb 24 2023, 10:38 AM
llvm/test/tools/llvm-readobj/ELF/llvm-vs-json-format.test
1 ↗(On Diff #496200)

I'm a bit concerned that the number of tests and test cases will start to rapidly grow as we progress through this stack, since there is basically zero JSON testing as it stands, but let's see how this goes.

I'd also like to keep this test since it documents how the two formats differ in a single place, and subsequent patches build on top of it.

paulkirth updated this revision to Diff 500332.Feb 24 2023, 4:47 PM
paulkirth retitled this revision from [llvm-readobj] Give JSON output a custom printSymbol to [llvm-readobj] Standardize JSON output for `Other` field.

Rebase and address comments

  • Update existing tests
  • Update summary
paulkirth marked an inline comment as done.Feb 24 2023, 4:48 PM
jhenderson added inline comments.Mar 1 2023, 12:43 AM
llvm/test/tools/llvm-readobj/ELF/llvm-vs-json-format.test
1 ↗(On Diff #496200)

It's just showing what the other test cases show though, for example, the aarch test you've modified clearly shows the behaviour difference between the two for st_other fields.

If you think highlighting the difference in behaviour between the two is worthwhile (I'm not convinced that it is), it probably should be in the documentation, rather than duplicating test coverage.

Regarding the rapid test case growth, that's rather inevitable, I'd have thought: you've got a new(ish) file format, that hasn't been rigorously tested before. It clearly needs to be better tested, so it's rather unsurprising. Keeping this file around will just exacerbate the test growth, since it will become one giant soup of test cases, be hard to maintain, and be somewhat hard to read, I suspect going forwards.

llvm/test/tools/llvm-readobj/ELF/mips-symbols-stother.test
27

Nit: no need for extra blank line.

paulkirth updated this revision to Diff 502187.Mar 3 2023, 10:56 AM
paulkirth marked 3 inline comments as done.

Remove redunant test code.

jhenderson accepted this revision.Mar 17 2023, 1:27 AM

Looks good, but one suggestion.

llvm/tools/llvm-readobj/ELFDumper.cpp
7584

It might be worth a comment explaining why for JSON it's desirable to do things differently here.

This revision is now accepted and ready to land.Mar 17 2023, 1:27 AM
paulkirth updated this revision to Diff 506223.Mar 17 2023, 4:36 PM
paulkirth marked an inline comment as done.

Add comment about the rationale for how we handle JSON output.

This revision was landed with ongoing or failed builds.Mar 17 2023, 4:38 PM
This revision was automatically updated to reflect the committed changes.