[llvm-readelf] Make llvm-readelf more compatible with GNU readelf.
ClosedPublic

Authored by rupprecht on Mon, Nov 5, 2:42 PM.

Details

Summary

This change adds a bunch of options that GNU readelf supports. There is one breaking change when invoked as llvm-readobj, and three breaking changes when invoked as llvm-readelf:

  • Add --all (implies --file-header, --program-headers, etc.)
  • [Breaking] -a is --all instead of --arm-attributes
  • Add --file-header as an alias for --file-headers
  • Replace --sections with --sections-headers, keeping --sections as an alias for it
  • Add --relocs as an alias for --relocations
  • Add --dynamic as an alias for --dynamic-table
  • Add --segments as an alias for --program-headers
  • Add --section-groups as an alias for --elf-section-groups
  • Add --dyn-syms as an alias for --dyn-symbols
  • Add --syms as an alias for --symbols
  • Add --histogram as an alias for --elf-hash-histogram
  • [Breaking] When invoked as llvm-readelf, -s is --symbols instead of --sections
  • [Breaking] When invoked as llvm-readelf, -t is no longer an alias for --symbols

Diff Detail

Repository
rL LLVM
rupprecht created this revision.Mon, Nov 5, 2:42 PM

Add two subscribers:

@compnerd who added the -a alias to ARMAttributesShort in rL200450
@echristo who added --relocations in rL178679 and could share some background of the two tools llvm-readobj llvm-readelf and the naming of these options...

I guess there was once only llvm-readobj and it had better option names (but incompatible with GNU readelf, e.g. --dynamic-table instead of GNU --dynamic) and later llvm-readelf was born. Now for compatibility reasons we have to add these GNU aliases and the option parsing looks very messy now.

phosek added a subscriber: silvas.Mon, Nov 5, 11:15 PM

In the "Migrating llvm-objdump and a few other common binutils replacements away from llvm::c" thread, @silvas said:

The way I think about libOption vs cl:::opt is:

  • cl::opt: use it for convenience and simplicity
  • libOption: use it for when you need to be compatible with a particular command line flag format.

My understanding is that cl::opt just doesn't have the fine-grained control needed for a truly production-quality command line interface compatible with some existing tool. I think this is a matter of correctness: is it a requirement to parse the command line options in the same way as some existing program? If so, then cl::opt probably isn't flexible enough. It may get you 90% of the way there, maybe 95%, but it will probably never truly feel native to users of the tool that it is intended to be compatible with.

I agree with him and after looking at this change, I'm convinced that llvm-readobj/llvm-readelf fits firmly into the latter category. We should consider migrating them to libOption in the future, although that's independent from this change.

jhenderson added a subscriber: edd.

Wow, that's some spooky timing. I didn't realise somebody else was thinking the exact same thing as me at the time! I brought this up on the mailing list earlier today, not realising that this review existed!

Some thoughts:

  1. Why not break -s for both llvm-readobj and llvm-readelf? Yes, it will cause some test churn, but I think we should seriously consider avoiding weird lingering issues like this if we can.
  2. What do you think about removing -t as an alias for --symbols in llvm-readelf (and possibly llvm-readobj)? -t means --section-details in GNU readelf, so has nothing to do with symbols. Of course, llvm-readelf doesn't (currently?) support --section-details, but maybe if we're breaking the meaning of switches we should get it completely right?
  3. Are we comfortable breaking the behaviour for end-users with no prior warning? In my email, I proposed a few options, including "deprecating" old switch behaviour for a release, or simply providing a new switch that can be enabled by default in downstream ports to change the meaning of incompatible switches like -s.

I agree that changing to libOption seems sensible, but out of scope for this change. I also think we need to give careful consideration to the behaviour of single-dash options, since that's a difference in behaviour between GNU and LLVM tools (see for example . It's a common issue beyond llvm-readobj though (see also https://bugs.llvm.org/show_bug.cgi?id=31679).

I haven't looked at the actual change yet, but I'll try to do that tomorrow.

Wow, that's some spooky timing.

Happy belated Halloween! :)

I didn't realise somebody else was thinking the exact same thing as me at the time! I brought this up on the mailing list earlier today, not realising that this review existed!

I'd like to keep that thread going, as there are more people reading llvm-dev than reading this patch.

Some thoughts:

  1. Why not break -s for both llvm-readobj and llvm-readelf? Yes, it will cause some test churn, but I think we should seriously consider avoiding weird lingering issues like this if we can.

I'm unsure if that's allowed given how long the tool has existed. But I'm happy to do that if others agree it's a good idea.
As a measurement, I ran ninja check-all with -s removed, and ~450 tests broke. (Keep in mind I can't run arm/windows/etc. tests on my machine, so the real number is higher).

  1. What do you think about removing -t as an alias for --symbols in llvm-readelf (and possibly llvm-readobj)? -t means --section-details in GNU readelf, so has nothing to do with symbols. Of course, llvm-readelf doesn't (currently?) support --section-details, but maybe if we're breaking the meaning of switches we should get it completely right?

Done -- looks like lld tests were the only user of this; changed those in rL346260.

  1. Are we comfortable breaking the behaviour for end-users with no prior warning? In my email, I proposed a few options, including "deprecating" old switch behaviour for a release, or simply providing a new switch that can be enabled by default in downstream ports to change the meaning of incompatible switches like -s.

For llvm-readelf, I am comfortable, since it seems like this is the right behavior. Maybe it would be worth putting in release notes though.
I'm also fine doing it for llvm-readobj with some support.

rupprecht updated this revision to Diff 172818.Tue, Nov 6, 11:58 AM
  • Hide -t as an alias for --symbols in readelf mode
rupprecht updated this revision to Diff 172848.Tue, Nov 6, 1:54 PM
  • Only expose compatibility args (e.g. --relocs for --relocations) in readelf mode
MaskRay added inline comments.Tue, Nov 6, 3:02 PM
tools/llvm-readobj/llvm-readobj.cpp
597 ↗(On Diff #172848)

How about just meaning --symbols in readelf.

len(--symbols) < len(something else) :)

649 ↗(On Diff #172848)

if (sys::path::stem(argv[0]).contains("readelf")) {

MaskRay added inline comments.Tue, Nov 6, 7:12 PM
tools/llvm-readobj/llvm-readobj.cpp
614 ↗(On Diff #172848)

It seems NotHidden can be made inline:

static cl::alias SymbolsShort("s", cl::desc("Alias for --symbols"), cl::aliasopt(opts::Symbols), cl::NotHidden);
jhenderson added inline comments.Wed, Nov 7, 2:43 AM
test/tools/llvm-readobj/readelf-s-alias.test
4–5 ↗(On Diff #172848)

Since -s should be an alias for --sections, maybe it's worth piping the -s output to a file and then comparing the output against --sections output.

Similar comment below for --symbols.

I assume we have sufficient separate testing for --symbols and --sections, so we can rely on those to show that we dump sections and symbols output correctly with the long-form switches. If we don't, we should add them.

tools/llvm-readobj/llvm-readobj.cpp
616 ↗(On Diff #172848)

I don't see any harm in these aliases being applicable for both llvm-readobj and llvm-readelf, if they're hidden.

Please also add the following aliases:

  • --syms is also a GNU readelf option as an alias for --symbols.
  • --dyn-syms is the GNU readelf version of --dyn-symbols.
  • --histogram is the GNU readelf long-form of -I/--elf-hash-histogram.

I think those are the various missing aliases for supported features, on top of the ones you're proposing adding.

tools/llvm-readobj/llvm-readobj.cpp
57 ↗(On Diff #172848)

Missing comma between --version-info and --unwind. Also --histogram isn't a thing in llvm-readelf, as far as I can tell (but you should probably add it in this patch).

617–619 ↗(On Diff #172848)

I actually think --section-headers is clearer than --sections...

rupprecht updated this revision to Diff 173024.Wed, Nov 7, 1:59 PM
rupprecht marked 6 inline comments as done.
  • Clean up aliases/tests
rupprecht added inline comments.Wed, Nov 7, 1:59 PM
test/tools/llvm-readobj/readelf-s-alias.test
4–5 ↗(On Diff #172848)

Since -s should be an alias for --sections, maybe it's worth piping the -s output to a file and then comparing the output against --sections output.

Similar comment below for --symbols.

Sorry, I don't understand this comment. This test is already running through FileCheck to make sure that -s and --sections both look like section output (for readobj) and similar for --symbols. Piping to a file and then comparing both files seems like it should pass, but is not really a targeted test -- the error would be "files differ" instead of "'Sections [' not found".

I assume we have sufficient separate testing for --symbols and --sections, so we can rely on those to show that we dump sections and symbols output correctly with the long-form switches. If we don't, we should add them.

The tests use short symbols by default. I flipped it so by default we use --symbols and --sections, with just a quick test that the aliases work.

jhenderson added inline comments.Thu, Nov 8, 2:13 AM
test/tools/llvm-readobj/readelf-s-alias.test
4–5 ↗(On Diff #172848)

I would say that we want -s and --sections/--symbols to be identical output. Technically, all this test does is show that -s/--sections/--symbols produces sections/symbols output. The test would actually pass if -s did both, which clearly isn't the intent of the test.

We don't need to show that --sections/--symbols produces sections/symbols output in this test. There's already a dedicated test for that. If you want to show that -s is truly an alias, then the whole output needs to be identical, and therefore "files differ" is actually the appropriate failure.

The main --sections/--symbols tests can test that -s prints sections/symbols output as well, if desired.

test/tools/llvm-readobj/symbols.test
13 ↗(On Diff #173024)

Maybe worth doing "llvm-readelf -s" here too

tools/llvm-readobj/llvm-readobj.cpp
68 ↗(On Diff #173024)

These comments should probably mention all of the public aliases, not just one long and one short.

82 ↗(On Diff #173024)

This should be the "main" option, in my opinion, with the other two as aliases. Also, the help text for it should be "Display all section headers." since it doesn't display section contents.

126 ↗(On Diff #173024)

Maybe SymbolsGNU would be clearer as to why this exists? Ditto dyn-syms below.

618–619 ↗(On Diff #173024)

and -> as
Missing trailing full stop.

57 ↗(On Diff #172848)

--sections -> --section-headers
--histogram -> --elf-hash-histogram

--sections to me implies section headers and section contents, which isn't the case.

rupprecht updated this revision to Diff 173206.Thu, Nov 8, 11:47 AM
rupprecht marked 6 inline comments as done.
  • Renaming/removing unnecessary externs/more test cases/other comments
rupprecht marked 3 inline comments as done.Thu, Nov 8, 11:48 AM
jhenderson accepted this revision.Fri, Nov 9, 2:02 AM

Okay, from my point of view, LGTM (with a nit). However, we should probably get some more buy-in on the RFC before breaking people's builds etc, so you might want to hold off on the commit for now.

Any thoughts on how to drum up some more interest?

test/tools/llvm-readobj/symbols.test
14–15 ↗(On Diff #173206)

Nit: for consistency with the other filenames in this set of cases, maybe use %t.s, if you don't think it's too confusing with assembly?

This revision is now accepted and ready to land.Fri, Nov 9, 2:02 AM

In the description.

Add --sections-headers as an alias for --sections: --section-headers is now the canonical option name (I still have trouble understanding the plural form though, see my comment above) but --sections is an alias.

tools/llvm-readobj/llvm-readobj.cpp
82 ↗(On Diff #173024)

What's your opinion on the plural form -file-headers? I think I'm lost why in readelf, -file-header is singular while others are plural. Usually there are not more than one program headers, section headers, etc.

rupprecht edited the summary of this revision. (Show Details)Fri, Nov 9, 10:24 AM

Updated the description, thanks for pointing that out.

tools/llvm-readobj/llvm-readobj.cpp
82 ↗(On Diff #173024)

My guess: most file formats only have one file header (e.g. ELF just has the ELF header), so singular makes sense, e.g. vs section headers where there is one section header per section.

Looking at COFFDumper::printFileHeaders() though, there can be multiple file headers (ImageFileHeader, ImageOptionalHeader, DOSHeader), so actually I'm inclined to leave the plural version as the main flag: the tool will print any number of file headers, even if you only work in ELF and so it only ever prints one.

MaskRay accepted this revision.Fri, Nov 9, 10:32 AM

Okay, from my point of view, LGTM (with a nit). However, we should probably get some more buy-in on the RFC before breaking people's builds etc, so you might want to hold off on the commit for now.

Any thoughts on how to drum up some more interest?

I'll ping the thread you started, and I'd like to commit on Monday if there are no objections. I'm hoping this change is not so controversial.

After this I'd like to switch to tablegen/liboption and support some merged args (e.g. "readelf -SW" seems to be a common usage)... that will be slightly more dangerous :)

test/tools/llvm-readobj/symbols.test
14–15 ↗(On Diff #173206)

I'd prefer .lowers (consistent with the sections test) instead of .s just to highlight that it's a lowercase s given the general confusion of -s and -S for both (sometimes) meaning --sections, even though the same doesn't apply to --symbols.

Also just to note -- I chose "lowers" and "uppers" instead of "s" and "S" just to make sure this didn't mess with any build bots that are case insensitive with filenames.

Okay, from my point of view, LGTM (with a nit). However, we should probably get some more buy-in on the RFC before breaking people's builds etc, so you might want to hold off on the commit for now.

Any thoughts on how to drum up some more interest?

I'll ping the thread you started, and I'd like to commit on Monday if there are no objections. I'm hoping this change is not so controversial.

As a not entirely unbiased person, but who has worked on this, I'm in favor of this at the moment. Long term plans at the bottom...

After this I'd like to switch to tablegen/liboption and support some merged args (e.g. "readelf -SW" seems to be a common usage)... that will be slightly more dangerous :)

Awesome.

I'd really like to see a better "multiple tools calling into the same library with different options and output decorators allowing compatibility with random tools/output" set of support in an ideal world here.

-eric

I'm in favor of breaking changes for preserving GNU binutils compatibility. I think if downstream projects require "nonstandard" aliases, it is easy enough to tweak while in upstream I think compatibility is more important and brings LLVM's replacements for binutils one step closer to being drop-ins, which in turn encourages wider adoption of said tools.

This revision was automatically updated to reflect the committed changes.