This is an archive of the discontinued LLVM Phabricator instance.

[ELF] -z combreloc: sort dynamic relocations by (symbol_index,r_offset)
ClosedPublic

Authored by MaskRay on May 2 2019, 10:23 PM.

Details

Summary

Fixes PR41692.

We currently sort dynamic relocations by (!is_relative,symbol_index).
Change it to (symbol_index,r_offset). We still place relative
relocations first because we only emit R_*_RELATIVE as dynamic
relocations with 0 symbol index.

This makes readelf -r debugging easier (relocations to the same symbol
are ordered by r_offset).

Refactor the test combreloc.s (renamed from combrelocs.s) to check
R_X86_64_RELATIVE, and delete --expand-relocs.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

MaskRay created this revision.May 2 2019, 10:23 PM

ld.bfd's sorting logic is unnecessarily complex.
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;hb=41f61c65a2e1cfdb4aca3bccf6e11025495ba02e;f=bfd/elflink.c#l9305
It uses elf_link_sort_cmp1 to sort relative relocations, then elf_link_sort_cmp2 to sort the non-relative relocations. The two functions can be combined to one: sorting by the triple (!IsRelative, SymIndex, r_offset)

gold's rule is at https://sourceware.org/git/?p=binutils-gdb.git;a=blob;hb=41f61c65a2e1cfdb4aca3bccf6e11025495ba02e;f=gold/output.cc#l1180
As @mcgrathr mentioned in PR41692, it has the fourth key SymType, which probably just serves the purpose of std::stable_sort.

By using std::make_tuple(), this added a small cost as we unconditionally call A.getOffset() and B.getOffset() - this cost doesn't matter.

  • Non-PIE executables/shared objects don't have many dynamic relocations.
  • Most dynamic relocations of an PIE are relative relocations. The comparison function has to call getOffset() anyway.
grimar added a comment.May 3 2019, 8:06 AM

But why this is an issue? Both PR and this patch description does not answer that it seems.
Is it just a cosmetic change for the output?

The sorting behavior is entirely an optimization for locality at dynamic reloc time. Both locality of symtab indices and locality of reloc offsets are useful to that goal.

grimar added a comment.May 6 2019, 3:35 AM

The sorting behavior is entirely an optimization for locality at dynamic reloc time. Both locality of symtab indices and locality of reloc offsets are useful to that goal.

When we initially implemented sorting of relocations, we sorted by symbol index only, because testing showed that it was the only
thing that made a difference: https://reviews.llvm.org/D19528#423828

After that at some point we started sorting by RelativeRel to set the DT_RELACOUNT tag.
(Setting this tag was needed for FreeBSD I think, but I do not remember honestly).

So what I want to say that probably sorting by offset is not really usefull. At least would be nice to see the numbers before doing that.

For the PR41692 reproduce tarball, ld.lld @response.txt --pack-dyn-relocs=none (without the patch) shows the relative relocations aren't sorted by r_offset.

# I deleted some other flags to compare the result with gold and ld.bfd
Relocation section '.rela.dyn' at offset 0x14008 contains 65020 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000405000  0000000000000008 R_X86_64_RELATIVE                         1954a5
0000000000405010  0000000000000008 R_X86_64_RELATIVE                         19eb00
000000000049d0f0  0000000000000008 R_X86_64_RELATIVE                         1ee500
000000000049d100  0000000000000008 R_X86_64_RELATIVE                         1ee54f
000000000049d110  0000000000000008 R_X86_64_RELATIVE                         1ee525
0000000000403000  0000000000000008 R_X86_64_RELATIVE                         1ee580
0000000000403008  0000000000000008 R_X86_64_RELATIVE                         1ee5a0
0000000000403010  0000000000000008 R_X86_64_RELATIVE                         0

Relocations sorted by r_offset would make a poor person's life easier when debugging with readelf -r :) (I use readelf -r much while debugging numerous internal issues)

grimar added a comment.May 6 2019, 7:15 AM

For the PR41692 reproduce tarball, ld.lld @response.txt --pack-dyn-relocs=none (without the patch) shows the relative relocations aren't sorted by r_offset.

# I deleted some other flags to compare the result with gold and ld.bfd
Relocation section '.rela.dyn' at offset 0x14008 contains 65020 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000405000  0000000000000008 R_X86_64_RELATIVE                         1954a5
0000000000405010  0000000000000008 R_X86_64_RELATIVE                         19eb00
000000000049d0f0  0000000000000008 R_X86_64_RELATIVE                         1ee500
000000000049d100  0000000000000008 R_X86_64_RELATIVE                         1ee54f
000000000049d110  0000000000000008 R_X86_64_RELATIVE                         1ee525
0000000000403000  0000000000000008 R_X86_64_RELATIVE                         1ee580
0000000000403008  0000000000000008 R_X86_64_RELATIVE                         1ee5a0
0000000000403010  0000000000000008 R_X86_64_RELATIVE                         0

Relocations sorted by r_offset would make a poor person's life easier when debugging with readelf -r :) (I use readelf -r much while debugging numerous internal issues)

Then this is a cosmetic change, as I supposed initially. I'll leave it up to Rui I think.

Sorting R_*_RELATIVE relocs first and setting REL(A)COUNT is a format requirement, period. It's not a debatable question of optimization.

The multi-level sorting behavior here makes lld consistent with the GNU linkers, which it is not today.

ruiu added a comment.May 6 2019, 11:17 PM

Can you give me a pointer to the formal spec and also include that to the patch description?

I can't find a spec of DT_RELACOUNT in gABI or psABI. I find its definition in Linux Standard Base but can't find its description. elfutils has a lint tool, it checks some properties of DT_RELACOUNT but doesn't require its entries to be sorted.

In glibc, there is a commit with changelog "If DT_GNU_PRELINKED, DT_RELACOUNT relocations can be skipped."
In Oracle Solaris Linkers and Libraries Guide, it is describe but nothing is said about the sorted r_offset: "Combining relocation records in this manner enables all RELATIVE relocations to be grouped together. All symbolic relocations are sorted by symbol name. The grouping of RELATIVE relocations permits optimized runtime processing using the DT_RELACOUNT/DT_RELCOUNT .dynamic entries. Sorted symbolic entries help reduce runtime symbol lookup"
FreeBSD copied the definition to their sys/sys/elf_common.h in 2016 from somewhere, but its rtld doesn't use DT_RELACOUNT. musl doesn't use DT_RELACOUNT.

I don't know if DT_RELACOUNT was a glibc invention or a Sun invention. To be honest, if it is not that we've added it in the first place, I wouldn't suggest adding it :D

To me this is a cosmetic change. You make the decision if it is worth doing.

Sorting R_*_RELATIVE relocs first and setting REL(A)COUNT is a format requirement, period. It's not a debatable question of optimization.

I never said we should stop doing that. Sorting by an offset implemented in this patch is completely different story.

The multi-level sorting behavior here makes lld consistent with the GNU linkers, which it is not today.

Above you wrote "The sorting behavior is entirely an optimization for locality at dynamic reloc time. Both locality of symtab indices and locality of reloc offsets are useful to that goal."
I pointed to Rafael's comment saying that sorting by offsets did not show any improvements during the testing and was not useful that time.

Now you're talking about consistency reasons. We are inconsistent with the GNU linkers in a many things and not always we want to fix it.
"consistency" by itself usually is not a reason for doing a change.

One reasonable argument to land this I saw in this thread was "Relocations sorted by r_offset would make a poor person's life easier when debugging with readelf -r :)"
It sounds useful.

Can you give me a pointer to the formal spec and also include that to the patch description?

FWIW here is the most detailed explanation of the feature I found were:
https://www.airs.com/blog/archives/186
http://people.redhat.com/jakub/prelink.pdf (p.2)

I says it worth sorting by r_offset though. But out tests showed that sorting did not improve anything (https://reviews.llvm.org/D19528#423828).
So this patch seems to be for a nice output only (i.e. not about the optimization).
I am not against doing this, but I think we should explicitly mention this in comments and/or patch description.

I don't know if DT_RELACOUNT was a glibc invention or a Sun invention. To be honest, if it is not that we've added it in the first place, I wouldn't suggest adding it :D

I am not sure why we are discussing DT_RELACOUNT :), but there was a solid reason to add it I believe:
https://reviews.llvm.org/D23661

MaskRay updated this revision to Diff 198402.May 7 2019, 12:21 AM

Add comment

I don't know if DT_RELACOUNT was a glibc invention or a Sun invention. To be honest, if it is not that we've added it in the first place, I wouldn't suggest adding it :D

I am not sure why we are discussing DT_RELACOUNT :), but there was a solid reason to add it I believe:
https://reviews.llvm.org/D23661

It's fine. Not too complex anyway :) (off-topic: musl could leverage the same optimization http://git.musl-libc.org/cgit/musl/tree/ldso/dynlink.c#n342 but it doesn't bother doing that probably because the improvement is insignificant)

MaskRay added a comment.EditedMay 19 2019, 7:41 PM

This is useful to compare the result with ld.bfd to check if we emit more or fewer relocations. (This helped D62107)

How about

return std::make_tuple(A.getSymIndex(), A.getOffset()) < std::make_tuple(B.getSymIndex(), B.getOffset());

? i.e. don't check if this is a relative relocation. In the dynamic loader, if SymIndex is 0, IMHO the dynamic relocation can only be relative (R_*_NONE doesn't make sense).

MaskRay updated this revision to Diff 200199.May 19 2019, 8:13 PM
MaskRay retitled this revision from [ELF] -z combreloc: sort dynamic relocations by (!is_relative,symbol_index,r_offset) to [ELF] -z combreloc: sort dynamic relocations by (symbol_index,r_offset).
MaskRay edited the summary of this revision. (Show Details)

Sort by (symbol_index,r_offset)

MaskRay updated this revision to Diff 200201.May 19 2019, 8:27 PM

Inline the comparator

ruiu accepted this revision.May 19 2019, 11:27 PM

LGTM

Recently I started thinking of adding a "greasing" mechanism to the linker output. Modern network protocols such as QUIC adds some randomness (such as adding an unsupported extension chunk to a packet to make sure that an unsupported chunk is just skipped by the peer), so that the specification doesn't stick to a specific implementation. That technique is called greasing. It might be interesting to use the same technique for the linker output, to make sure that the consumer side works as required by the spec.

This revision is now accepted and ready to land.May 19 2019, 11:27 PM
This revision was automatically updated to reflect the committed changes.
test/ELF/combreloc.s