Page MenuHomePhabricator

[llvm-readobj] Give JSON output a custom printSymbol
Needs ReviewPublic

Authored by paulkirth on Oct 31 2022, 10:26 AM.

Details

Reviewers
jhenderson
Summary

Today, the LLVM output uses special handling when the Other field is 0.
This output makes sense for a command line utility that a human will
read, but JSON is a machine readable format, so being consistent is more
important. Prior to this change, any consumer of the JSON output would
need to handle the Other field specially, since the structure of the
JSON would no longer be consistent.

Changes to JSON output when Other flag == 0:

"Other": 0,   ->   "Other": {
                      "RawFlags": 0,
                       "Flags": []
                    },

There are no changes to when Other flag != 0:

"Other": {        ->   "Other": {
  "RawFlags": 1,          "RawFlags": 1,
  "Flags": [              "Flags": [
      ...                     ...
  ]                       ]
},                     },

This patch adds a overload for the JSONELFDumper's printSymbol() method,
that uses consistent output formatting, regardless of the value of the
Other field.

Depends on D137092

Diff Detail

Event Timeline

paulkirth created this revision.Oct 31 2022, 10:26 AM
Herald added a project: Restricted Project. · View Herald Transcript
paulkirth requested review of this revision.Oct 31 2022, 10:26 AM

FYI @jhenderson This whole stack is in need of tests, and I plan to address that. I think that if we find this stack preferable to D135419, then we can just abandon that approach.

jhenderson requested changes to this revision.Nov 1 2022, 1:36 AM

Rather than duplicating, then refactoring, do the refactor first and then use it, so that the code is reasonably clean in every landed commit.

Also, please provide before and after examples, and test cases for the fix. Also, please fix the title to use "llvm-readobj" in the commit message/patch description, not simply "readobj".

FYI @jhenderson This whole stack is in need of tests, and I plan to address that. I think that if we find this stack preferable to D135419, then we can just abandon that approach.

Sorry, only just saw this comment. Either way, the before and after examples are needed for me to visualise the problems. One of the aims with the JSON output being built on top of the LLVM output was that they should have largely been the same structurally, just with some formatting details changing.

This revision now requires changes to proceed.Nov 1 2022, 1:36 AM

Sorry, I typed this up back in early November, when we were initially looking at this and forgot to hit the submit button. I've had a few other items take up my attention, but hope to spend a bit more time on this before the end of the year.

The point of this change is to make Other a consistent field, and avoid having to special case how its handled when Other is 0.

Other: 0,   ->   "Other": {
                   "RawFlags": 0,
                    "Flags": []
                 },

With more context the JSON looks like this:

  • Before
... 
 "Symbols": [
    {
      "Symbol": {
        "Name": {
          "Value": "foo",
          "RawValue": 1
        },
        "Value": 0,
        "Size": 0,
        "Binding": {
          "Value": "Local",
          "RawValue": 0
        },
        "Type": {
          "Value": "None",
          "RawValue": 0
        },
        "Other": 0,
        "Section": {
          "Value": ".text",
          "RawValue": 1
        }
      }
    }
  ]
  ...
  • After
...
"Symbols": [
   {
     "Symbol": {
       "Name": {
         "Value": "foo",
         "RawValue": 1
       },
       "Value": 0,
       "Size": 0,
       "Binding": {
         "Value": "Local",
         "RawValue": 0
       },
       "Type": {
         "Value": "None",
         "RawValue": 0
       },
       "Other": {
         "RawFlags": 0,
         "Flags": []
       },
       "Section": {
         "Value": ".text",
         "RawValue": 1
       }
     }
   }
 ]
 ...

It isn't significantly different than before, but now it's consistent, regardless of the flag's value. For tools that focuses on a human reader, the specializing the output to be concise makes sense. But JSON is a machine readable format, so it's far more important that the output always have the same structure than it is to be concise. That's really the point of this change.

Also, part of the reason I set these patches aside for a while was that there is little testing for the JSON output, so I was trying to find a middle ground where we could only test that the formatting is correct without duplicating many of the LLVM formating tests. I didn't really find a nice way to do that. Do you have any recommendations as to how we could achieve that?

paulkirth updated this revision to Diff 481110.Dec 7 2022, 4:48 PM
paulkirth retitled this revision from [readobj] Give JSON output a custom printSymbol to [llvm-readobj] Give JSON output a custom printSymbol.
paulkirth edited the summary of this revision. (Show Details)

Rebase patch on top of D137092.

  • Add some tests for new JSON output of the Other field.
  • Update the summary with an example of the change to the output.

As a heads-up, I am off work after today for 6 weeks, so won't be able to review further after today. Sorry about that.

llvm/tools/llvm-readobj/ELFDumper.cpp
7584

Could we avoid the mass code-duplication by just introducing another function "printSymbolOtherField" (or something like that) that is implemented differently for the two dumper types?

Maybe even "printZeroSymbolOtherField" so that only the very niche bit that needs to differ is distinguished. This function could then just delegate to the regular print symbol st_other code, to avoid needing to duplicate that in any way.

With this change, I think you could minimise the additional testing to just demonstrate basic behaviour difference for zero/non-zero st_other values, without needing to have additional cases for the various other aspects of this field printing.

Also, part of the reason I set these patches aside for a while was that there is little testing for the JSON output, so I was trying to find a middle ground where we could only test that the formatting is correct without duplicating many of the LLVM formating tests. I didn't really find a nice way to do that. Do you have any recommendations as to how we could achieve that?

I didn't respond to this directly, but just noting that as far as I'm concerned, if both LLVM and JSON styles use the same code, you only need to test one or the other. You only need to add additional testing where behaviour follows a different code path.

Also, although I'm marked as Requesting Changes, as long as my most recent comments are addressed, I'm happy for this patch to be landed if someone else involved with llvm-readobj approves it (e.g. @MaskRay).

@jhenderson Thanks for the feedback, and the notification re: time of work. I'm looking into your suggested approach now. The duplication also bothered me, but I also saw several other places where the output should diverge. I'll take a closer look at this, now that I've sketched out how to fix the JSON formatting for most of the problems I've found, and see if there is a nicer way to modify the output only when they diverge.

I didn't respond to this directly, but just noting that as far as I'm concerned, if both LLVM and JSON styles use the same code, you only need to test one or the other. You only need to add additional testing where behaviour follows a different code path.

That is good to know. Thank you for clarifying. I'll try to keep that in mind as this stack evolves.

paulkirth updated this revision to Diff 482519.Dec 13 2022, 9:17 AM

Test only the differences in output and implementation between the LLVM and JSON formats

paulkirth updated this revision to Diff 484093.Dec 19 2022, 2:52 PM

Fix Formatting

paulkirth updated this revision to Diff 484100.Dec 19 2022, 3:05 PM

Rebase + format

paulkirth updated this revision to Diff 484123.Dec 19 2022, 3:49 PM

Remove unnecesary test checks.

paulkirth updated this revision to Diff 484389.Dec 20 2022, 2:51 PM

Fix error in test