This is an archive of the discontinued LLVM Phabricator instance.

ELF: Place ordered sections in the middle of the unordered section list on targets with limited-range branches.
ClosedPublic

Authored by pcc on Mar 27 2018, 8:59 PM.

Details

Summary

It generally does not matter much where we place sections ordered
by --symbol-ordering-file relative to other sections. But if the
ordered sections are hot (which is the case already for some users
of --symbol-ordering-file, and is increasingly more likely to be
the case once profile-guided section layout lands) and the target
has limited-range branches, it is beneficial to place the ordered
sections in the middle of the output section in order to decrease
the likelihood that a range extension thunk will be required to call
a hot function from a cold function or vice versa.

That is what this patch does. After D44966 it reduces the size of
Chromium for Android's .text section by 60KB.

Event Timeline

pcc created this revision.Mar 27 2018, 8:59 PM
grimar added a subscriber: grimar.Mar 28 2018, 1:39 AM

That looks reasonable to me. Another heuristic that can be used in at least C programs is to look for the sections with the most number of calls to them and place these in the centre. This tends to reduce the number of thunks as many library functions are called from all over the program, however I think the performance loss of going through a thunk means that moving the hot functions makes more sense.

ruiu added a comment.Mar 30 2018, 11:13 AM

I don't think I understand why this patch can reduce size of generated executables. This may increase performance because we use short branches to jump to hot functions, but how can it reduce the size of thunks?

lld/ELF/Writer.cpp
1088

Can you add a comment?

1091

nit: I'd write it in two lines.

pcc added a comment.Mar 30 2018, 11:21 AM

Imagine that you have 8MB of hot code and 32MB of cold code. If the layout is:

8MB hot
32MB cold

only the first 8-16MB of the cold code (depending on which hot function it is actually calling) can call the hot code without a range extension thunk. However, if we use this layout:

16MB cold
8MB hot
16MB cold

both the last 8-16MB of the first block of cold code and the first 8-16MB of the second block of cold code can call the hot code without a thunk. So we effectively double the amount of code that could potentially call into the hot code without a thunk, reducing the number of thunks that we need.

pcc updated this revision to Diff 140465.Mar 30 2018, 12:09 PM
  • Address review comments
pcc marked 2 inline comments as done.Mar 30 2018, 12:11 PM
ruiu added a comment.Mar 30 2018, 12:15 PM

Ah, so the assumption I was missing is that hot functions are called from many more places than cold functions. That's perhaps obvious, but that's not necessarily true. Could you add your explanation as a comment?

pcc updated this revision to Diff 140480.Mar 30 2018, 1:33 PM
  • Add comment
ruiu accepted this revision.Mar 30 2018, 2:31 PM

LGTM

Thanks!

This revision is now accepted and ready to land.Mar 30 2018, 2:31 PM
This revision was automatically updated to reflect the committed changes.