This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] - Print LMAs when dumping section headers.
ClosedPublic

Authored by grimar on Jan 24 2019, 3:02 AM.

Details

Summary

When --section-headers is used, GNU objdump prints both LMA and VMA for sections.
llvm-objdump does not do that what makes it's output be slightly inconsistent.

Patch teaches llvm-objdump to print LMA/VMA for ELF file formats.
The behavior for other formats remains unchanged.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Jan 24 2019, 3:02 AM

The code change looks basically fine, but I'm wondering if we really want this? For many (most?) use cases, the LMA is the same as the VMA, so the extra column is just useless noise. I know we want to broadly be consistent with GNU tools, but I wonder if that's the case when it doesn't make that much sense?

What do you think about the following approach:

  1. If there are any segments with different p_vaddr and p_paddr, go with your suggestion.
  2. If a user specifies a switch, e.g. --show-lma, also go with your suggestion.
  3. Otherwise, only print the VMA.

The issue with this is that it might break parsers which are written against GNU objdump's behaviour. I'm not sure if this is an issue or not? The switch is designed to mitigate that problem.

It's probably worth getting some others to give their opinion on this.

test/tools/llvm-objdump/X86/phdrs-lma.test
23 ↗(On Diff #183280)

Does yaml2obj provide the ability to specify a different LMA to VMA for segments? That would be better than a pre-built binary, if we can use it.

tools/llvm-objdump/ELFDump.cpp
142–143 ↗(On Diff #183280)

I don't think we need to phrase this comment in the future tense. I think the following would be a better re-wording:

"Search for a PT_LOAD segment containing the requested section. Use this segment's p_addr to calculate the section's LMA".

149 ↗(On Diff #183280)

Rather than saying we didn't find its LMA, let's be more precise by saying, "if it isn't in a PT_LOAD segment."

The code change looks basically fine, but I'm wondering if we really want this? For many (most?) use cases, the LMA is the same as the VMA, so the extra column is just useless noise. I know we want to broadly be consistent with GNU tools, but I wonder if that's the case when it doesn't make that much sense?

We had the same concern for-Map option in LLD. It did not print LMA initially but there were requests from embedded developers about adding it.
(https://reviews.llvm.org/D44899#1049497)

I also think it might be useful for writing some test cases. For example, I see that LLD's at.s test does not explicitly
shows/tests the sections LMAs: https://github.com/llvm-mirror/lld/blob/master/test/ELF/linkerscript/at.s
We probably might parse the -Map option output to improve it, but seems that having llvm-objdump output would be a more natural way to go probably.

I'll add Peter to this thread to see if he has any opinion on that too.

What do you think about the following approach:

  1. If there are any segments with different p_vaddr and p_paddr, go with your suggestion.
  2. If a user specifies a switch, e.g. --show-lma, also go with your suggestion.
  3. Otherwise, only print the VMA.

The issue with this is that it might break parsers which are written against GNU objdump's behaviour. I'm not sure if this is an issue or not? The switch is designed to mitigate that problem.

It's probably worth getting some others to give their opinion on this.

Generally, your suggestion sounds OK to me. I'll also be happy to hear other opinions.

I've not got a strong opinion here, for what it is worth:

  • The code changes look fine.
  • I can see LMA being useful to embedded developers, or tools vendors, given a binary without access to or ability to generate a map file. Would they intuitively look for objdump for that information though? I don't know.
  • gnu objdump also gives file offset and flag information which probably have more justification as they can't be derived from other options. If we are looking for output format compatibility then we ought to add those too. I don't think that there is any expectation of that at the moment though.
  • If there is a command line option to print the LMA, then there is a risk that a good portion of people won't read far into the help to discover it exists. For that reason I quite like the idea of printing the column only when there are program headers that have different VMA and LMA.
tools/llvm-objdump/ELFDump.cpp
138 ↗(On Diff #183280)

Perhaps PhdrsOrErr or PhdrRangeOrErr? My first thought was what will that call return if it is called on a relocatable object file with no program headers, is it going to give me an error? With PhdrsOrErr it more strongly implies a range.

The code change looks basically fine, but I'm wondering if we really want this? For many (most?) use cases, the LMA is the same as the VMA, so the extra column is just useless noise. I know we want to broadly be consistent with GNU tools, but I wonder if that's the case when it doesn't make that much sense?

What do you think about the following approach:

  1. If there are any segments with different p_vaddr and p_paddr, go with your suggestion.
  2. If a user specifies a switch, e.g. --show-lma, also go with your suggestion.
  3. Otherwise, only print the VMA.

The issue with this is that it might break parsers which are written against GNU objdump's behaviour. I'm not sure if this is an issue or not? The switch is designed to mitigate that problem.

It's probably worth getting some others to give their opinion on this.

+1 for this proposal. I find the GNU format extremely dense and hard to mentally parse as is.

  • If there is a command line option to print the LMA, then there is a risk that a good portion of people won't read far into the help to discover it exists. For that reason I quite like the idea of printing the column only when there are program headers that have different VMA and LMA.

I think James idea is to have both: (1) print the column when there are program headers that have different VMA and LMA and (2) have an option to force always printing the LMA. Having such an option which is an addition to (1) makes sense if you want to have a way to produce consistent output for parsers.

  • If there is a command line option to print the LMA, then there is a risk that a good portion of people won't read far into the help to discover it exists. For that reason I quite like the idea of printing the column only when there are program headers that have different VMA and LMA.

I think James idea is to have both: (1) print the column when there are program headers that have different VMA and LMA and (2) have an option to force always printing the LMA. Having such an option which is an addition to (1) makes sense if you want to have a way to produce consistent output for parsers.

Yes, exactly. The switch means that those reliant on the output format can specify it, or to print LMA where for some specific instance the LMA and VMA happen to be the same, but that doesn't always mean it will be. Finally, having a single switch whose default can be toggled in downstream ports is easier to maintain than a private patch in the middle of the formatting code.

One further suggestion for 1) might be to change the header to VMA/LMA for the address column, to make it clearer. It might be less confusing to those used to GNU output?

One further suggestion for 1) might be to change the header to VMA/LMA for the address column, to make it clearer. It might be less confusing to those used to GNU output?

I thought about that too. Decided to not suggest it because if we hide a LMA column completely then probably no need to make people who are unfamiliar with it even to think about it?
I.e. This patch prints "VMA ... LMA" now and "Address" in another case. I supposed that just "Address" would look simpler then.

I am not sure what would be really better though honestly. Maybe mixing "VMA" and "Address" for different cases is also not a very good idea.

One further suggestion for 1) might be to change the header to VMA/LMA for the address column, to make it clearer. It might be less confusing to those used to GNU output?

I thought about that too. Decided to not suggest it because if we hide a LMA column completely then probably no need to make people who are unfamiliar with it even to think about it?
I.e. This patch prints "VMA ... LMA" now and "Address" in another case. I supposed that just "Address" would look simpler then.

I am not sure what would be really better though honestly. Maybe mixing "VMA" and "Address" for different cases is also not a very good idea.

Okay. I'd just stick with VMA when printing only that.

  • If there is a command line option to print the LMA, then there is a risk that a good portion of people won't read far into the help to discover it exists. For that reason I quite like the idea of printing the column only when there are program headers that have different VMA and LMA.

I think James idea is to have both: (1) print the column when there are program headers that have different VMA and LMA and (2) have an option to force always printing the LMA. Having such an option which is an addition to (1) makes sense if you want to have a way to produce consistent output for parsers.

Yes, exactly. The switch means that those reliant on the output format can specify it, or to print LMA where for some specific instance the LMA and VMA happen to be the same, but that doesn't always mean it will be. Finally, having a single switch whose default can be toggled in downstream ports is easier to maintain than a private patch in the middle of the formatting code.

Thanks for the clarification. Makes sense.

grimar updated this revision to Diff 183538.Jan 25 2019, 7:19 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
  • Updated implementation in according to the recent discussion.
grimar added inline comments.Jan 25 2019, 7:19 AM
test/tools/llvm-objdump/X86/phdrs-lma.test
23 ↗(On Diff #183280)

Oh cool, I did not know yaml2obj is able to do that. Done.

jhenderson added inline comments.Jan 28 2019, 2:22 AM
test/tools/llvm-objdump/X86/phdrs-lma2.test
3–4 ↗(On Diff #183538)

which has LMA different from VMA -> with different LMA to VMA

tools/llvm-objdump/llvm-objdump.cpp
183 ↗(On Diff #183538)

Can this be static?

185 ↗(On Diff #183538)

This should probably be in the documentation.

1495 ↗(On Diff #183538)

It's more common to say "and/or" rather than "or/and".

grimar marked an inline comment as done.Jan 28 2019, 2:32 AM
grimar added inline comments.
tools/llvm-objdump/llvm-objdump.cpp
183 ↗(On Diff #183538)

I think so. Probably all of cl::opt<bool> should be static in this file, though they are not now. I followed for consistency but seems that was not a good idea.

grimar updated this revision to Diff 183818.Jan 28 2019, 3:30 AM
grimar marked 3 inline comments as done.
  • Addressed review comments.
tools/llvm-objdump/llvm-objdump.cpp
185 ↗(On Diff #183538)

I do not see any documentation files for llvm-objdump around or in the recent commits. Did you have something in mind? Seems we just do not have them =\

grimar planned changes to this revision.Jan 28 2019, 3:32 AM

Will reupload.

grimar updated this revision to Diff 183819.Jan 28 2019, 3:33 AM
  • Uploaded the right diff.
jhenderson accepted this revision.Jan 28 2019, 3:48 AM

LGTM.

tools/llvm-objdump/llvm-objdump.cpp
185 ↗(On Diff #183538)

Oh well. No, I didn't have anything in mind. I just assumed it would be there alongside e.g. llvm-symbolizer's command guide.

This revision is now accepted and ready to land.Jan 28 2019, 3:48 AM
This revision was automatically updated to reflect the committed changes.