Since we include a summary in JSON output, it is useful to surface
aggregate information about the object, such as the total number of
sections, their size, and info on allocated/debug sections. This
has been useful in quickly understanding size issues without post
processing the output. There may be other types of information that
could be useful to summarize.
Details
- Reviewers
jhenderson phosek
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The information about the total size of the sections might be misleading because it does not take into account possible gaps between them.
llvm/tools/llvm-readobj/ELFDumper.cpp | ||
---|---|---|
7885–7914 | It looks like you forget to include the actual patch in the review. |
llvm/test/tools/llvm-readobj/ELF/file-summary-json.test | ||
---|---|---|
20 | TotalSections will be available in the file header already, so isn't this a bit superfluous? |
Rebase and include base commit
- rename Total to Remaining
llvm/tools/llvm-readobj/ELFDumper.cpp | ||
---|---|---|
7885–7914 | Indeed. I had forgotten to squash my commit when uploading. This should be fixed now. |
I'm wary of extending the information specifically for JSON dumping, as it doesn't feel right to me that this output format should provide information that isn't provided in at least the LLVM format too (in practice, the two should be identical, barring how things are formatted, in my opinion).
You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job? Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".
AFAIK the LLVM format doesn't have a summary, while JSON does. This seemed like a good place to put some summary info.
You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job?
So if we're already processing a binary w/ llvm-objdump and consuming its JSON output it seems a lot nicer to get these statistics as part of the output rather than to invoke another tool. We can certainly reconstruct those values by post-processing the data, but it seemed trivial to add to the summary, since it was looping over the sections anyway.
Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".
"Other" was my first thought, but I went w/ "Remaining", since I thought it was less confusing. Looking at it, I'd have to agree that "Other" is probably a better name.
I wasn't all that confident on the way the current implementation classifies non-allocated sections. Do you have any suggestions as to what would be a better approach? I didn't think any of the other section flags were a strong enough indicator, but I'll be the first to admit that object file flags aren't something I'm particularly well versed in.
See ObjDumper::printFileSummary, which contains the default implementation for printFileSummary that LLVM output uses. Admittedly, "summary" is perhaps a poor name for it, given it doesn't actually summarise much.
You talk about how you are trying to understand size issues in this patch. Why does this mean JSON output specifically needs this info? llvm-size already provides similar sorts of information, so is this really the right tool for the job?
So if we're already processing a binary w/ llvm-objdump and consuming its JSON output it seems a lot nicer to get these statistics as part of the output rather than to invoke another tool. We can certainly reconstruct those values by post-processing the data, but it seemed trivial to add to the summary, since it was looping over the sections anyway.
(I assume you meant "llvm-readobj" here). You're not using the same section loop as any other section loop though, so you're making llvm-readobj unconditionally do more work for JSON output? I'm not opposed to the functionality per se, but I'd consider adding it under a new option. You'd probably want to get buy-in from the GNU community for the option name too, so that equivalent functionality can be added to GNU output (I don't see anything here that is inherent to JSON output).
Also, the "debug" versus "remaining" (which should probably be "other") line is not particularly well-defined. For example, this patch treats some sections that are stripped with --strip-debug in llvm-objcopy as "remaining".
"Other" was my first thought, but I went w/ "Remaining", since I thought it was less confusing. Looking at it, I'd have to agree that "Other" is probably a better name.
I wasn't all that confident on the way the current implementation classifies non-allocated sections. Do you have any suggestions as to what would be a better approach? I didn't think any of the other section flags were a strong enough indicator, but I'll be the first to admit that object file flags aren't something I'm particularly well versed in.
llvm-objcopy has a function for identifying debug sections (see https://github.com/llvm/llvm-project/blob/43436993f48b1d75d9d80796cb0889ce7d191888/llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp#L54). We should mirror that, or better yet, try to pull that function into somewhere both tools can access it. The other thing to note is that strictly speaking, there's no requirement that debug sections are non-alloc. I could imagine a system where a debugger reads the in-memory file, rather than reading additional stuff off disk, and as such it would need SHF_ALLOC debug sections (I don't know of any practical examples of this, but I think we need to be careful about baking in assumptions around this area).
After the feedback, and some further thinking, I don't think this is the right approach. As @jhenderson pointed out there are some real mismatches with what I was trying to achieve and what this patch actually does. thanks everyone for their input, and dealing with the very slow progress/responses on my part.
TotalSections will be available in the file header already, so isn't this a bit superfluous?