This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] Include section information in summary
AbandonedPublic

Authored by paulkirth on Mar 23 2023, 1:11 PM.

Details

Reviewers
jhenderson
phosek
Summary

Since we include a summary in JSON output, it is useful to surface
aggregate information about the object, such as the total number of
sections, their size, and info on allocated/debug sections. This
has been useful in quickly understanding size issues without post
processing the output. There may be other types of information that
could be useful to summarize.

Diff Detail

Event Timeline

paulkirth created this revision.Mar 23 2023, 1:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 1:11 PM
Herald added a subscriber: emaste. · View Herald Transcript
paulkirth requested review of this revision.Mar 23 2023, 1:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 1:11 PM

The information about the total size of the sections might be misleading because it does not take into account possible gaps between them.

llvm/tools/llvm-readobj/ELFDumper.cpp
7771–7790

It looks like you forget to include the actual patch in the review.

jhenderson added inline comments.Mar 24 2023, 12:53 AM
llvm/test/tools/llvm-readobj/ELF/file-summary-json.test
20

TotalSections will be available in the file header already, so isn't this a bit superfluous?

paulkirth updated this revision to Diff 515926.Apr 21 2023, 2:49 PM
paulkirth marked 2 inline comments as done.

Rebase and include base commit

  • rename Total to Remaining
llvm/tools/llvm-readobj/ELFDumper.cpp
7771–7790

Indeed. I had forgotten to squash my commit when uploading. This should be fixed now.

I'm wary of extending the information specifically for JSON dumping, as it doesn't feel right to me that this output format should provide information that isn't provided in at least the LLVM format too (in practice, the two should be identical, barring how things are formatted, in my opinion).

You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job? Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".

I'm wary of extending the information specifically for JSON dumping, as it doesn't feel right to me that this output format should provide information that isn't provided in at least the LLVM format too (in practice, the two should be identical, barring how things are formatted, in my opinion).

AFAIK the LLVM format doesn't have a summary, while JSON does. This seemed like a good place to put some summary info.

You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job?

So if we're already processing a binary w/ llvm-objdump and consuming its JSON output it seems a lot nicer to get these statistics as part of the output rather than to invoke another tool. We can certainly reconstruct those values by post-processing the data, but it seemed trivial to add to the summary, since it was looping over the sections anyway.

Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".

"Other" was my first thought, but I went w/ "Remaining", since I thought it was less confusing. Looking at it, I'd have to agree that "Other" is probably a better name.

I wasn't all that confident on the way the current implementation classifies non-allocated sections. Do you have any suggestions as to what would be a better approach? I didn't think any of the other section flags were a strong enough indicator, but I'll be the first to admit that object file flags aren't something I'm particularly well versed in.

I'm wary of extending the information specifically for JSON dumping, as it doesn't feel right to me that this output format should provide information that isn't provided in at least the LLVM format too (in practice, the two should be identical, barring how things are formatted, in my opinion).

AFAIK the LLVM format doesn't have a summary, while JSON does. This seemed like a good place to put some summary info.

See ObjDumper::printFileSummary, which contains the default implementation for printFileSummary that LLVM output uses. Admittedly, "summary" is perhaps a poor name for it, given it doesn't actually summarise much.

You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job?

So if we're already processing a binary w/ llvm-objdump and consuming its JSON output it seems a lot nicer to get these statistics as part of the output rather than to invoke another tool. We can certainly reconstruct those values by post-processing the data, but it seemed trivial to add to the summary, since it was looping over the sections anyway.

(I assume you meant "llvm-readobj" here). You're not using the same section loop as any other section loop though, so you're making llvm-readobj unconditionally do more work for JSON output? I'm not opposed to the functionality per se, but I'd consider adding it under a new option. You'd probably want to get buy-in from the GNU community for the option name too, so that equivalent functionality can be added to GNU output (I don't see anything here that is inherent to JSON output).

Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".

"Other" was my first thought, but I went w/ "Remaining", since I thought it was less confusing. Looking at it, I'd have to agree that "Other" is probably a better name.

I wasn't all that confident on the way the current implementation classifies non-allocated sections. Do you have any suggestions as to what would be a better approach? I didn't think any of the other section flags were a strong enough indicator, but I'll be the first to admit that object file flags aren't something I'm particularly well versed in.

llvm-objcopy has a function for identifying debug sections (see https://github.com/llvm/llvm-project/blob/43436993f48b1d75d9d80796cb0889ce7d191888/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp#L54). We should mirror that, or better yet, try to pull that function into somewhere both tools can access it. The other thing to note is that strictly speaking, there's no requirement that debug sections are non-alloc. I could imagine a system where a debugger reads the in-memory file, rather than reading additional stuff off disk, and as such it would need SHF_ALLOC debug sections (I don't know of any practical examples of this, but I think we need to be careful about baking in assumptions around this area).

paulkirth abandoned this revision.May 25 2023, 2:22 PM

After the feedback, and some further thinking, I don't think this is the right approach. As @jhenderson pointed out there are some real mismatches with what I was trying to achieve and what this patch actually does. thanks everyone for their input, and dealing with the very slow progress/responses on my part.