Page MenuHomePhabricator

[ELF] Try placing newly created ThunkSection at the end of the InputSection first
AbandonedPublic

Authored by MaskRay on May 7 2019, 1:06 AM.

Details

Summary

Inspired by PR40740.

It increases the probability that the thunk section can be shared by a
subsequent InputSection, thus potentially decreases the total number of
thunk sections.

Event Timeline

MaskRay created this revision.May 7 2019, 1:06 AM
Herald added a project: Restricted Project. · View Herald Transcript

It is inspired by PR40740. It doesn't fix it.

The nice property would be extremely difficult to test directly after both PPC64/MIPS move towards getThunkSectionSpacing (I know little about MIPS to craft a MIPS test..). Just wanted to know if you think this is good to do before I start updating the following tests.

lld :: ELF/aarch64-cortex-a53-843419-large.s
lld :: ELF/arm-long-thunk-converge.s
lld :: ELF/arm-thumb-condbranch-thunk.s
lld :: ELF/arm-thumb-thunk-empty-pass.s
lld :: ELF/arm-thunk-re-add.s
lld :: ELF/ppc64-bsymbolic-toc-restore.s
lld :: ELF/ppc64-call-reach.s
lld :: ELF/ppc64-ifunc.s
lld :: ELF/ppc64-local-dynamic.s
lld :: ELF/ppc64-plt-stub.s
lld :: ELF/ppc64-toc-restore-recursive-call.s
lld :: ELF/ppc64-toc-restore.s
ruiu accepted this revision.May 7 2019, 1:13 AM

Makes sense. LGTM

This revision is now accepted and ready to land.May 7 2019, 1:13 AM

I don't think it will do any harm, but it is probably not going to make a lot of difference. The thunk code makes the assumption that there is one branch range for the thunk section spacing, the vast majority will get placed in these sections. The no suitable ThunkSection case is for the small number of edge cases that don't fit the model, for example in Thumb there is one very rare branch that has a much shorter range so it may need to use this, but there aren't enough of them to make reuse opportunities that likely.

grimar added a subscriber: grimar.May 7 2019, 2:10 AM

I don't think it will do any harm, but it is probably not going to make a lot of difference. The thunk code makes the assumption that there is one branch range for the thunk section spacing, the vast majority will get placed in these sections. The no suitable ThunkSection case is for the small number of edge cases that don't fit the model, for example in Thumb there is one very rare branch that has a much shorter range so it may need to use this, but there aren't enough of them to make reuse opportunities that likely.

createInitialThunkSections() looks to me a very good global heuristic. Without it, Relocations.cpp:1669 getISDThunkSec is making local decisions that may not be very good. I think it this way. There are three positions to create the thunk:

  • IS->OutSecOff: current status
  • IS->OutSecOff + IS->getSize(): this patch. It improves sharing slightly.
  • The last InputSection whose start address is before IS->OutputSecOff + getThunkSectionSpacing(). This is the best place to insert the thunk. (This is heuristically achieved by the proactive createInitialThunkSections().

If you think the marginal benefit is not bad to have, I'll update the tests:)

I don't think it will do any harm, but it is probably not going to make a lot of difference. The thunk code makes the assumption that there is one branch range for the thunk section spacing, the vast majority will get placed in these sections. The no suitable ThunkSection case is for the small number of edge cases that don't fit the model, for example in Thumb there is one very rare branch that has a much shorter range so it may need to use this, but there aren't enough of them to make reuse opportunities that likely.

createInitialThunkSections() looks to me a very good global heuristic. Without it, Relocations.cpp:1669 getISDThunkSec is making local decisions that may not be very good. I think it this way. There are three positions to create the thunk:

  • IS->OutSecOff: current status
  • IS->OutSecOff + IS->getSize(): this patch. It improves sharing slightly.
  • The last InputSection whose start address is before IS->OutputSecOff + getThunkSectionSpacing(). This is the best place to insert the thunk. (This is heuristically achieved by the proactive createInitialThunkSections().

    If you think the marginal benefit is not bad to have, I'll update the tests:)

I don't have any objections, just wanted to save you the work of updating the tests!

In an ideal world we would have something similar to IS->OutputSecOff + getThunkSectionSpacing() although it would need some modifications. For example the last case of IS->OutputSecOff + getThunkSectionSpacing() in its current form would not work for Thumb2. The getThunkSectionSpacing() uses a singler per architecture value, which is 16 Megabytes for Thumb2 representing the branch range of the unconditional branches, the (much rarer) conditional branch instruction has a range of 1 Megabyte. We would need a more sophisticated implementation that took into account the specific branch range of the type of the relocation when calculating the ThunkSectionSpacing.

MaskRay abandoned this revision.May 7 2019, 3:05 AM

I may revive it after the ppc64 tests are fixed by D61610 :)