This is an archive of the discontinued LLVM Phabricator instance.

llvm-readobj: add experimental support for SHT_RELR sections.
ClosedPublic

Authored by rahulchaudhry on Jun 7 2018, 4:48 PM.

Details

Summary

This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg

Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!

Diff Detail

Repository
rL LLVM

Event Timeline

rahulchaudhry created this revision.Jun 7 2018, 4:48 PM

I'm still working on adding some unit tests, but I thought I'd send this in for now in case there is any high level feedback on the overall approach.

Added tests, ready for review.

rahulchaudhry added inline comments.Jun 8 2018, 3:33 PM
tools/llvm-readobj/ELFDumper.cpp
2637 ↗(On Diff #150580)

Fields[1-4] are empty for SHT_RELR section, but they were already assigned a width above.
Doing it this way results in trailing spaces on every line. Does it matter?

3304 ↗(On Diff #150580)

Same as above, this results in trailing spaces on every line.

I'll have more comments tomorrow. I'm still grokking my way though llvm-readobj as I've never looked at that code before.

I have a general question about GNUStyle though. Why should GNUStyle with Relr be so different from GNUStyle for Rel(a)? Shouldn't (ideally) relr be transparently usable without a difference in how it is printed out to the user? e.g. relr should print out something indistinguishable from if the linker had used rela/rel relocation instead.

If r_info can't be sensibly set then I think the only real option is to have it be output differently for both IMO. We shouldn't be outputting a relocation type that's incorrect. I'm not sure it makes sense to decode things quite this way when you don't have an operating system or architecture specific solution (as was the case with android). I think LLD defines some common relocation types across architectures but that isn't super useful here. I wouldn't mind that being duplicated here (maybe pcc disagrees though).

include/llvm/BinaryFormat/ELF.h
1077 ↗(On Diff #150580)

I think in the spec you propose Elf32_Relr and Elf64_Relr as typedefs instead of structs. I think this file (and the higher level ELFType file as well) should match the spec in that regard.

lib/Object/ELF.cpp
229 ↗(On Diff #150580)

Is there a more sensible info we can set this to? It would have to be processor specific which might be super annoying. If there isn't a reasonable way to set the relocation type then I think it should be dropped llvm-readobj output in the llvm case.

232 ↗(On Diff #150580)

can this be made constexpr? or at least static const?

234 ↗(On Diff #150580)

nit: I think I'd prefer this use Elf_Addr since it represents an offset for a dynamic relocation (e.g. and address)

247 ↗(On Diff #150580)

nit I think I'd prefer this use Elf_Addr for the same reason.

test/tools/llvm-readobj/elf-relr-relocs.test
6 ↗(On Diff #150580)

R_X86_64_NONE is not a valid relocation type. See my r_info comment above about this issue.

In response to Jake's comments about GNUStyle:

First and foremost the mandate for GNUStyle is to closely approximate the output format of binutils readelf for the same data. There is not yet a binutils implementation of SHT_RELR support submitted upstream, so you'll need to coordinate with the folks implementing that support and harmonize the GNUStyle output format with whatever they and the binutils maintainers settle on.

I don't think it's at all desireable to make SHT_RELR output transparently resemble SHT_RELA/SHT_REL output, at least not as the only output format. The first purpose of readelf is to show the contents of the ELF file as they are, so it can be used to debug low-level encoding/decoding. So I definitely want at least the option of an output format that just shows the values in a nice presentation rather than pretending they're something else.

I think there's an argument to be made that readelf -r should always use the existing format since there are scripts that decode and such. So having readelf -r display an SHT_RELR section as if it were an SHT_REL(A) section where every reloc has type R_*_RELATIVE and all other fields (except r_offset) zero might make sense. i.e., that would produce up to 64 lines for each SHT_RELR entry. But there should certainly be a format (preferably other than just -x.dyn.relr for generic hex dump) that shows the exact contents without expansion, even if the -r switch doesn't elicit that output.

include/llvm/BinaryFormat/ELF.h
1077 ↗(On Diff #150580)

This header is C++, not C. So there's no difference there.

jakehehrlich added inline comments.Jun 11 2018, 7:39 PM
include/llvm/BinaryFormat/ELF.h
1077 ↗(On Diff #150580)

That's not my point. The spec proposal I'm working off of reads

typedef Elf32_Word   Elf32_Relr; 
typedef Elf64_Xword  Elf64_Relr;

This file should match (not precisely, it can use using instead of typedef if it wants) in that it should not have to go though an r_data field. Alternatively the spec can be updated to make this a struct.

Apologies for the delay in updating this patch.

There are several changes since the previous version:

  • By default, '.relr.dyn' contents are printed as-is.
  • Pass '-decode-relr' flag to decode the contents and print individual relocations.
  • Relative relocation type is printed correctly.
  • Elf32_Relr and Elf64_Relr are defined using typedef, matching the proposal.
rahulchaudhry marked 6 inline comments as done.Jun 15 2018, 4:12 PM
rahulchaudhry added inline comments.
lib/Object/ELF.cpp
229 ↗(On Diff #150580)

The type is set appropriately now (2 lines below).

234 ↗(On Diff #150580)

Defined a 'Word' type locally, and using it for Base/Entry/Offset now.
Elf_Addr is a packed version of this uint type for endianness conversion.
At least Entry can't be Elf_Addr, since we need to do bit-shifts on it (operator >>= is undefined for packed types).
And if Entry is Word, then it makes sense to keep Base/Offset as Word as well, otherwise we end up with several unnecessary conversions below when doing arithmetic and assignments.

test/tools/llvm-readobj/elf-relr-relocs.test
6 ↗(On Diff #150580)

When raw contents of '.relr.dyn' section are dumped, I'm printing a dummy 'R_RELR_RELATIVE' relocation type.
With -decode-relr, the correct relative relocation type is printed.

I think I'm more or less happy with the llvm-readobj part of this.

For yaml2obj I think it would be ideal to support a new relocation type that reads in a list of addresses, sets SHF_ALLOC, and computes the section contents from the addresses. Alternatively I'd be ok with allowing for individual entries to manually added. I'm just not a huge fan of forcing yaml2obj users to put a big hex blob in the middle of their test.

include/llvm/Object/ELFTypes.h
66 ↗(On Diff #151579)

This should also be a typedef the way Addr is below.

455 ↗(On Diff #151579)

You shouldn't use r_data in ELFTypes either. It should also be a typedef, just for 'packed' instead.

lib/Object/ELF.cpp
234 ↗(On Diff #150580)

Ah gotcha...hmm. I think that's an acceptable reason. If you run into anything like this in LLD and Rui or someone gives a better recommendation, please come back and change this. Otherwise consider this done.

test/tools/llvm-readobj/elf-relr-relocs.test
12 ↗(On Diff #151579)

I think you can drop the R_RELR_RELATIVE and the addend in this case.

tools/llvm-readobj/ELFDumper.cpp
3418 ↗(On Diff #151579)

As mentioned above, I think in this case you don't have to print out the relocation type and addend. You might just inline something here that prints out the relr entry instead.

4106 ↗(On Diff #151579)

Same thing here.

4326 ↗(On Diff #151579)

same thing here.

rahulchaudhry marked 2 inline comments as done.

Relr is now defined directly as packed<uint>.
struct Elf_Relr_Impl is no longer necessary.

I really feel that non-decoded entries shouldn't have an addend or faked relocation type. It's really confusing to look at those and see relocations. I think non-decoded entries should just print the relr entry.

rahulchaudhry marked 8 inline comments as done.Jun 26 2018, 12:39 PM
rahulchaudhry added inline comments.
test/tools/llvm-readobj/elf-relr-relocs.test
12 ↗(On Diff #151579)

See below.

tools/llvm-readobj/ELFDumper.cpp
3418 ↗(On Diff #151579)

The addend gets skipped in GNUStyle::printDynamicRelocation (it's only printed for RELA).
This GNUStyle dump is purely speculative, as there is no implementation for this in GNU binutils yet.
We'll need to come back and adjust the format here anyway once gnu-readelf supports printing SHT_RELR sections.

Also, see below.

4106 ↗(On Diff #151579)

See below.

4326 ↗(On Diff #151579)

I'd argue that both type and addend should be printed here for consistency.

  • The case for addend is easy. LLVMStyle prints it for all relocations, even SHT_REL, which don't have addends. The case for SHT_RELR is similar. A 0x0 is printed as addend in both cases.
  • For printing a type (even a dummy one, like R_RELR_RELATIVE), the rationale is wrt. two readobj flags: '-dyn-relocations' and '-expand-relocs'.

    -dyn-relocations prints all dynamic relocations in a single Dynamic Relocations { } block. It does not separate them by sections as -relocations does. It would be disorienting to have some relocations from .rela.dyn in the block (with full type/symbol/addend info), followed by a bunch of hex numbers (with no type or description), followed by some more relocations from .rela.plt section (with full type/symbol/addend info).

    LLVMStyle::printDynamicRelocation also handles -expand-relocs flag and prints each entry on multiple lines. This is another reason to use this function to print relocations here instead of inlining a special case.

So in considering the case you pointed out about -dyn-relocations I discovered that the current behavior for -relocations is to not print out dynamic relocations at all. This is hidden by the fact that dynamic relocations tend to be duplicated with non-dynamic relocations by default. So we should only print these out on -dyn-relocations unless they're duplicated. As part of my proposal to a solution to this I'm proposing an additional flag so that the raw form that Roland requested can be printed out. This way debugging of the underlying format is possible but for most purposes the decoded form can be used.

tools/llvm-readobj/ELFDumper.cpp
3418 ↗(On Diff #151579)

Ok sounds good. I'm aware we'll need to come back. I just want to make a best faith effort first.

4326 ↗(On Diff #151579)

My claim is that these are not actually relocation entries. Each entire corresponds to potentially many relocations and some correspond to none! More over the data seems like an address even though it isn't. I fully agree that in the decode case we should print the full information with the addend. My point is just that the information you're adding dosn't make sense as a relocation. I hadn't considered the behavior of -dyn-relcoations though. You're 100% right that we shouldn't print out raw hex addresses with no other information in that case. That clearly presents an issue with the specific change I requested here.

What do you think about this proposal:

  1. When dynamic relocations are printed out, you decode the relocations so that they make sense.
  2. Add an extra switch -dyn-relr for printing out the list of hex entries in all SHT_RELR sections.
  3. If -dyn-relr and -decode-relr are both set then decode those relocations and print them out as you've been doing.

This should resolve the dynamic relocation issue and resolve my concern. The matra here is then "you should cal print*Relocation when you can but you should only call it if you're actually using a real relocation.

rahulchaudhry added inline comments.Jun 27 2018, 11:34 AM
tools/llvm-readobj/ELFDumper.cpp
4326 ↗(On Diff #151579)

I'm not sure I fully understand the proposal. Here's what I think you're asking for:

  1. When printing -dyn-relocations, always decode the relr relocations. SGTM.
  1. With -relocations, print only the SHT_RELR hex entries by default. They're in a separate ".relr.dyn {}" block (for LLVMStyle) or under separate "Relocation section '.relr.dyn'" table (for GNUStyle), so there's no confusion. SGTM.
  1. With -relocations -decode-relr, decode the SHT_RELR section and print relocation entries in the corresponding .relr.dyn block or table. SGTM.

Please let me know if I'm missing something.

a. I don't see why an extra flag -dyn-relr is needed. What would it be useful for? Simple -relocations without -decode-relr works fine for testing lld output etc.
b. What happens with -relocations -expand-relocs (without -decode-relr)? What is the expanded form for SHT_RELR hex entries? Is it ok to ignore -expand-relocs for the .relr.dyn block?

jakehehrlich added inline comments.Jun 27 2018, 1:51 PM
tools/llvm-readobj/ELFDumper.cpp
4326 ↗(On Diff #151579)

Answer to a: I don't think -relocations is the flag that should trigger printing these things but both formats are needed.

Answer to b: Not important because there should be no interaction with -relocations.

Stated in the more clear format you used:

  1. Perfect. Exactly what I think should happen.
  1. -relocations shouldn't cause SHT_RELR to print at all. It's orthogonal. This is to be consistent with the fact that -relocations doesn't print out dynamic relocations.
  1. if -dyn-relr is set, print out the SHT_RELR sections using just hex entries.
  1. if -dyn-relr and -decode-relr are both set, print out the ".relr.dyn {}" format but with decoded relocation entries.
rahulchaudhry added inline comments.Jun 27 2018, 2:18 PM
tools/llvm-readobj/ELFDumper.cpp
4326 ↗(On Diff #151579)

Regarding 2: -relocations does print SHT_RELA sections for executables and shared objects. That's dynamic relocations, in much the same way as those in SHT_RELR.

Let's say a PIE binary has 10,000 relocations in SHT_RELA, 9,900 of which are relative relocations.
llvm-readobj -relocations will print all 10,000 relocations. So far so good.

Now let's say we re-link that binary with -pack-dyn-relocs=relr.
llvm-readobj -relocations will print only the remaining 100 relocations in SHT_RELA, and completely omit the rest, that are now encoded in SHT_RELR.
This is confusing, to say the least. It makes -relocations output pretty incomplete and useless for binaries containing SHT_RELR sections.

I think -relocations should print the entries in SHT_RELR as well. Whether it prints the encoded hex strings or the decoded relocations by default is up for discussion, and I'm fine either way, but completely skipping SHT_RELR in -relocations output does not sound like a good idea to me. When someone passes -relocations, they want to list the relocations. Printing only 1% of the relocations and requiring another flag to see the remaining 99% does not look like a reasonable interface.

jakehehrlich added inline comments.Jun 27 2018, 3:06 PM
tools/llvm-readobj/ELFDumper.cpp
4326 ↗(On Diff #151579)

Ah sorry, I assumed the wrong cause of some behavior I was seeing. Try this:

  1. link your executable
  2. full strip your executable using llvm-objcopy --strip-sections (the binaries I had on hand to play with this happened to be full stripped)
  3. run llvm-readobj -relocations. You'll see that no relocations are printed out
  4. run llvm-readobj -dyn-relocations. You'll see that all expected relocations are output.

I assumed that -relocations checked weather a section was allocated or not before printing. Instead -relocations prints relocations for sections (allocated or not) where as -dyn-relocations looks for PT_DYNAMIC (not for allocated sections) and that explains the difference. (we should probably add a test case for -dyn-relocation).

So you're right. We should print out on -relocations. I believe it should always be decoded in that case but I'm ok with it printing out raw hex entries with no fake relocation type or addend. There should also be a way of getting the raw hex entries regardless. An additional -dyn-relr flag (you can pick a different name if you'd like) should accomplish that. Also it's not clear a specific -decode-relr flag is needed after that as 90% of the time you'll just want to see what the relocations are and when debugging SHT_RELR related issues you might want to look at the raw entries.

I'd accept the functionality you have now with the additional constraint that relocations are always decoded for -dyn-relocations. I'd prefer if the same were true for -relocations and a separate flag allowed printing out the raw entries.

Decoding relr relocations is the default now for both -relocations and -dyn-relocations.
A new flag -raw-relr affects -relocations, turns off decoding and display raw SHT_RELR contents instead.

rahulchaudhry marked an inline comment as done.Jun 27 2018, 5:15 PM
rahulchaudhry added inline comments.
tools/llvm-readobj/ELFDumper.cpp
4326 ↗(On Diff #151579)

I've changed the default for both -relocations and -dyn-relocations to decode and print relative relocations in SHT_RELR section.
Added a new flag '-raw-relr' to disable this decoding (only for -relocations).
With '-raw-relr', just the raw hex entries are printed. There is no fake relocation type or addend.
Removed the '-decode-relr' flag, since that behavior is default now.
I think this addresses all of your concerns.
ptal.

jakehehrlich accepted this revision.Jun 28 2018, 1:21 PM

LGTM, sorry for the back and forth!

This revision is now accepted and ready to land.Jun 28 2018, 1:21 PM
rahulchaudhry marked an inline comment as done.Jun 28 2018, 1:29 PM

LGTM, sorry for the back and forth!

Thanks for the review.
Specifically, thanks for insisting that printRelocation() should only be called for real relocations, and not for printing RELR entries with a fake relocation name. I think we ended up with both cleaner code and cleaner output as a result.

If everything looks good, can you land this as well. I don't have commit access.

This revision was automatically updated to reflect the committed changes.