[LLD][ELF] Pre-create ThunkSections at Target specific intervals
ClosedPublic

Authored by peter.smith on Jun 27 2017, 7:54 AM.

Details

Summary

When an OutputSection is larger than the branch range for a Target we need to place thunks such that they are always in range of their caller, and sufficiently spaced to maximise the number of callers that can use the thunk. We use the simple heuristic of placing the ThunkSection at intervals corresponding to a target specific branch range. If the OutputSection is small we put the thunks at the end of the executable sections.

The overall design for range thunks is for each InputSectionDescription pre-create ThunkSections at regularly spaced intervals in which we can place range extension thunks that are within range of their caller. This is similar to the stub-groups implemented by bfd and clang.

This is patch 6/11 of range thunks. It is dependent on D34688 and its dependencies.

Diff Detail

Repository
rL LLVM
There are a very large number of changes, so older changes are hidden. Show Older Changes

For whatever it's worth, I've been using these patches internally on a pretty large codebase, and they've worked really well; I haven't noticed any issues. It sounds like @ikudrin has been using them as well.

Rebased to account for r309311 Merge OutputSectionCommand and OutputSection. No other changes.

General ping for the remaining range thunks patches. If there is anything that I can do to make these easier to review please let me know? D34692 contains a large comment that explains the overall design.

  • D34689 Pre-create ThunkSections at Target specific intervals (this one)
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks()

I'd like to say that I'm quite happy with this set of patches. We use them for Arm32 target with relatively large executables, and they seem to work well. It'd be great if these patches are ported to the release branch as well.

General ping for the remaining range thunks patches. D34692 contains a large comment that explains the overall design.

  • D34689 Pre-create ThunkSections at Target specific intervals (this one)
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks()

There is also D34513, which is an optimization requested in D34692.

@peter.smith I believe you meant to link to D35413.

Also, seconding what @ikudrin said. These patches are pretty vital for my use case (linking large libraries and executables for arm), so I've had them cherry-picked locally for a while now. Peter has been kind enough to keep rebasing these regularly, so they haven't been much of a maintenance burden for others, but it would be really nice to get these merged.

Yes thank you; I seem to have developed a habit for transposing numbers when typing. I would obviously like to get this functionality in as well, and will be happy to make any requested changes or provide further explanation. In the meantime I'll keep rebasing them when they fail to merge.

General ping for the remaining range thunks patches. D34692 contains a large comment that explains the overall design.

  • D34689 Pre-create ThunkSections at Target specific intervals (this one)
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks()

There is also D35413, which is an optimization requested in D34692.

emaste added inline comments.Aug 17 2017, 5:23 PM
ELF/Arch/ARM.cpp
66 ↗(On Diff #108614)

One more change to MiB needed here?

Thanks for the comment, I've updated the diff with the change from Mb to Mib.

Just a minor nit I noticed.

ELF/Relocations.cpp
987 ↗(On Diff #111627)

You can use llvm::erase_if which is equivalent to C.erase(remove_if(C, pred), C.end());

Thanks for pointing out llvm::erase_if, I've updated the diff to use it.

(Aside, for FreeBSD we are rather eager to have this support arrive in lld upstream and will from there backport it to the lld 5.0.0 in our src tree. As it stands we cannot link the full base system with lld because lldb has grown too large -- https://bugs.freebsd.org/215691)

General ping for the remaining range thunks patches. Is there anything more I can do to progress these?

D34692 contains a large comment that explains the overall design.

  • D34689 Pre-create ThunkSections at Target specific intervals (this one)
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks()

There is also D35413, which is an optimization requested in D34692.

General ping for the remaining range thunks patches. If it helps at all I will be at a conference in the Bay area 25th to 28th September, but I could escape to talk these over in person if any of the maintainers happened to be within public transport range.

D34692 contains a large comment that explains the overall design.

  • D34689 Pre-create ThunkSections at Target specific intervals (this one)
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks()

There is also D35413, which is an optimization requested in D34692.

ruiu added inline comments.Sep 5 2017, 6:45 PM
ELF/Arch/ARM.cpp
66 ↗(On Diff #111638)

nit: add blank lines before comments.

ELF/Relocations.h
138–148 ↗(On Diff #111638)

Can you add blank lines between multi-line declarations?

165–166 ↗(On Diff #111638)

This map is essentially a map from an InputSectionDescription to its thunks, right? Does it make more sense to add std::vector<ThunkSection *> to InputSectionDescription?

Thanks very much for the review comments. I've updated the diff with the extra spaces. I personally would prefer to stick with the map, but I'm happy to change it.

Sorry about all the nits, but I figured I'd try to get all the mechanical issues out of the way, at least :)

ELF/Arch/ARM.cpp
69–71 ↗(On Diff #114002)

These are the ranges in both the forward and backward direction, right? As in e.g. a Thumb BLX can branch 16 MiB forward or backward? It would probably be helpful to make that explicit (basically an ASCII-friendly equivalent to ±).

72 ↗(On Diff #114002)

Grammar nit: If a branch cannot reach.

73 ↗(On Diff #114002)

Grammar nit: should either be "the rare case of a Thumb 2 conditional branch" or "the rare case of Thumb 2 conditional branches".

74–75 ↗(On Diff #114002)

This is confusing me slightly. It says LLD is assuming support for ARMv6T2 and above, so that includes ARMv6T2, but then it says "if support is added for ARMv6T2", which contradicts the previous sentence. Is there a typo somewhere?

76 ↗(On Diff #114002)

Nit: period at end of comment.

77 ↗(On Diff #114002)

Is there a general preference for expressing such constants in hex? For me, 16 * 1024 * 1024 is much more obviously 16 MiB.

78 ↗(On Diff #114002)

If I'm understanding this correctly, the indented layout is:

|-------------------------------------------------------|
| Section: ThunkSectionSpacing - ThunkSectionSize bytes |
|-------------------------------------------------------|
|          ThunkSection: ThunkSectionSize bytes         |
|-------------------------------------------------------|
| Section: ThunkSectionSpacing - ThunkSectionSize bytes |
|-------------------------------------------------------|
|          ThunkSection: ThunkSectionSize bytes         |
|-------------------------------------------------------|
|                          ...                          |

When I was reading this originally, I got confused because I thought the intended layout was instead:

|-------------------------------------------------------|
|          Section: ThunkSectionSpacing bytes           |
|-------------------------------------------------------|
|          ThunkSection: ThunkSectionSize bytes         |
|-------------------------------------------------------|
|          Section: ThunkSectionSpacing bytes           |
|-------------------------------------------------------|
|          ThunkSection: ThunkSectionSize bytes         |
|-------------------------------------------------------|
|                          ...                          |

In hindsight, the name ThunkSectionSpacing makes the intended layout pretty clear, but it took me a while to grasp that. Maybe an explicit comment to that effect or an ASCII diagram could help?

79 ↗(On Diff #114002)

Grammar nit: within range of a branch

80 ↗(On Diff #114002)

Is the second part of the sentence supposed to imply that the start of the Section will be immediately after the end of the previous ThunkSection? It's not really clear to me.

81 ↗(On Diff #114002)

Nit: period at the end of comment.

82 ↗(On Diff #114002)

Same here; 16384 * 12 would be clearer to me.

ELF/Relocations.cpp
981 ↗(On Diff #114002)

Nit: period at the end of the comment.

1021 ↗(On Diff #114002)

Period nit.

1027 ↗(On Diff #114002)

There's a run-on sentence; it should probably be "by creating a new one in range; for now, it is unreachable."

1062 ↗(On Diff #114002)

Grammar nit: For an InputSectionRange that is smaller than the range, a single ThunkSection at the end of the range will do.

1066–1069 ↗(On Diff #114002)

Can't all of these be moved inside the lambda?

1074 ↗(On Diff #114002)

Could ISR be made a const pointer?

1077 ↗(On Diff #114002)

If I'm understanding this code correctly, this if will only be entered in the first iteration of the loop. If that's the case, it's probably cleaner to just pull the body out and place it above the loop? That should also allow you to get rid of PrevISR.

(If ISR can be empty, you'd also need to check for that, of course.)

1078 ↗(On Diff #114002)

Isn't this assignment dead, given that you unconditionally assign Off on line 1084?

1086 ↗(On Diff #114002)

What happens if e.g. you have a single IS in this ISR, and its Off >= Limit? If I'm understanding this code correctly, you'll create a thunk at the start of the IS (i.e. IS->OutSecOff), and you won't create one at the end (because NeedTrailingTS will get set to false), but isn't the one at the end also needed?

More generally, I'm not sure about:

  • Why do we create a thunk section at the start of the first IS, whereas all other thunk sections are at the end of ISs?
  • How do we handle ISs which require multiple thunk sections?

I feel like I'm missing something pretty obvious, but it doesn't hurt to ask :)

1096–1097 ↗(On Diff #114002)

Remove the check for ISR->back() == IS and move this after the loop?

1103 ↗(On Diff #114002)

Could this parameter be made a const pointer?

1135 ↗(On Diff #114002)

You can get rid of the braces on this if now.

peter.smith marked 15 inline comments as done.

Thanks very much for the review comments, especially those surrounding createInitialThunkSections(), I've made an attempt at applying the suggestions.

I've added some inline comments to show what I changed.

ELF/Arch/ARM.cpp
74–75 ↗(On Diff #114002)

I've reworded the sentence a bit. I've done some work downstream to support the older architectures (needs D36823 before I can upstream) so I've got a bit more information on what changes are needed.

77 ↗(On Diff #114002)

I don't know if there is a preferred style in lld. From looking around it seems to be mostly decimal and sometimes hex. I've not seen the form 16 * 1024 * 1024 in the lld codebase. I've left the constants as they are for now but I'll happily change them if the consensus is to express them in another way.

78 ↗(On Diff #114002)

Thanks for the suggestions. I've moved a description of the two parameters to the top and I've put in a diagram.

ELF/Relocations.cpp
1066–1069 ↗(On Diff #114002)

Yes they can, I've rewritten the function to account for your suggestions. Thanks very much I think that this simplifies it a bit.

1086 ↗(On Diff #114002)

If I've understood correctly, for Off to be >= Limit then IS->getSize() must be >= (ThunkSectionSpacing - ThunkSectionSize), i.e. a gigantic InputSection. These are difficult to handle, and in some cases impossible (if there is a branch in the middle of a Section so large it can't escape the Section). I've put an answer for the 2nd bullet.

Why do we create a thunk section at the start of the first IS, whereas all other thunk sections are at the end of ISs?

I don't have a good answer for that so I've made it consistent. Initially the code was written when there was simultaneously OutputSections->Sections and InputSectionDescriptions and I probably haven't been aggressive enough at refactoring the function.

How do we handle ISs which require multiple thunk sections?

At present (with the latest modifications) we'll create a ThunkSection after the InputSection. This will be out of range of branches in the earlier part of the InputSection, but in range of the later part. In later patches (D34691 and D34692) we check to see if a ThunkSection is in range, if it isn't we create a new one prior to the InputSection, this should be in range of branches of the earlier part of the InputSection. We can't do anything if the Section is so large that a branch towards the middle can't escape. This might happen with an LTO build of a very large program that isn't compiled with -ffunction-sections so the whole program ends up in a single .text section, but I think it would be rare.

1103 ↗(On Diff #114002)

I don't think so, the ISR is the key in the map which usually means it can be const, but in mergeThunks we insert the ThunkSections into ISR so it needs to be non-const at that point. Let me know if I've missed a way of doing it though.

ELF/Relocations.h
165–166 ↗(On Diff #111638)

The reasons I had to keep the mapping local were:

  • The mapping is only needed when generating Thunks and as most Targets don't need them I wanted to avoid modifying the shared data structures that are visible to the end of the link.
  • When there are multiple passes we need to remember not just the Thunks we've added but the ones that we have just added in the latest pass (See D34692) we would either need a map or to add yet another vector to InputSectionDescription.

I'm happy to change it if you'd prefer to store the fields in InputSectionDescription and remove the maps.

Since the branch range we're supporting in this patch is ±16 MiB, in theory the thunk sections could be spaced 32 MiB apart instead of 16, with input sections jumping either forward or backward to thunks, correct? I understand it's probably not worth the additional complexity to support that, and I'm not suggesting changing it; I just wanted to confirm my understanding :)

ELF/Relocations.cpp
1086 ↗(On Diff #114002)

Yup, I was thinking of the case where IS->getSize() >= ThunkSectionSpacing - ThunkSectionSize. It makes sense that we can't handle that case completely, and a comment stating that would probably be helpful.

1103 ↗(On Diff #114002)

You're right; I'd missed that.

1090–1091 ↗(On Diff #114182)

As I understand, the design relies on ThunkSections being used by InputSections prior to them. In that case, don't we always need a final trailing ThunkSection, to service the InputSections at the end?

In other words, with the current code, I believe an example layout could end up being something like:

InputSection
ThunkSection
InputSection
InputSection
InputSection
ThunkSection
InputSection
InputSection

I'm wondering if you need a ThunkSection at the end too, to service the last two InputSections in my example.

smeenai added inline comments.Sep 7 2017, 10:37 PM
ELF/Relocations.cpp
1086 ↗(On Diff #114002)
  • Why do we create a thunk section at the start of the first IS, whereas all other thunk sections are at the end of ISs?

I think I just figured my own question out. Conceptually, if the end of the current InputSection is beyond Limit, you'll add a ThunkSection at the end of the previous InputSection. For the very first InputSection, there's no previous one, in which case treating the start of the first InputSection as the end of the non-existent previous InputSection makes sense. You probably wanna go back to that behavior, sorry.

Instead of storing the entire PrevIS, how about just having a variable that keeps track of the end address of the previous section? I think that'd make the intent more immediately obvious, and its initialization to ISR->front()->OutSecOff could have a brief comment explaining the special casing for the first InputSection.

Thanks very much for the comments. Yes you are correct that the ThunkSpacing could be increased to ~32Mb after the first one, however I thought I'd try and keep it simple for the first iteration.

I've incorporated your suggestions into createInitialThunkSections(), in summary:

  • I've removed PrevISR and moved to storing the previous insertion point, this removes the need for the conditional expression.
  • I've renamed Off, Limit in favour of ISLimit and ThunkUpperBound, I'm hoping that this makes the intent clearer.
  • I've removed the NeedsTrailingTS as it doesn't hurt to just add the trailing InputSection anyway as we'll just remove it if it isn't used.
smeenai added inline comments.Sep 8 2017, 10:08 AM
ELF/Relocations.cpp
1080 ↗(On Diff #114370)

Should this be > instead of >=? ISLimit is one past the ending address of the current IS, and ThunkUpperBound is the maximum starting address of the thunk, so in the case where ISLimit == ThunkUpperBound, the thunk should be placed immediately after that IS, which is what a > comparison would accomplish.

ELF/Relocations.cpp
1080 ↗(On Diff #114370)

I think you are most likely correct. I've got to leave the office right now, will pick this back up on Monday. Thanks very much again for the comments.

Yes you are correct, A < is better than <= as the ThunkSectionSpacing already lowers the start of the ThunkSection to be within range. I've updated the diff with the change and to account for D37627

ruiu added inline comments.Sep 11 2017, 5:27 PM
ELF/Relocations.h
165–166 ↗(On Diff #111638)

I think I prefer adding all these stuffs to InputSectionDescription because (1) InputSectionDescription should be the central place to describe an input section and (2) doing hash table lookups that often could be slow.

Thanks for all the updates! I tested the updated set of patches against our internal builds, and everything is still looking good.

I think I prefer adding all these stuffs to InputSectionDescription because (1) InputSectionDescription should be the central place to describe an input section and (2) doing hash table lookups that often could be slow.

I've implemented just the change to use InputSectionDescription in a separate refactoring review D37743.

Updated diff to rebase on top of the refactoring done in D37743 no other changes made.

Very small nit noticed on another look: there are couple of instances of Mib that should be MiB

Thanks for pointing the typos in the comment out, I've uploaded a new patch to fix them.

At the moment I've got 5 remaining reviews in the RangeThunks series. It would be great if I could get some feedback on how to progress these towards either approval or rejection as they have been present in much the same form (modulo refactoring) for several Months now. I'm wondering if there is anything more that I could or should be doing beyond replying to comments. Many thanks to everyone for all the comments received so far and I do appreciate that it is difficult to review a large Target specific change. I'd like to make it any easier to review, but I need a bit more feedback on the blocking areas as I've probably been staring at this too long.

The outstanding reviews can be committed in sequence without regressing from the previous patch.

  • D37743 Record created ThunkSections in InputSectionDescription (Refactoring only)
  • D34689 Pre-create ThunkSections at Target specific intervals
  • D34691 Introduce range extension thunks for ARM
  • D34692 Add support for multiple passes to createThunks
  • D35413 Add DefinedInThunk to SymbolBody to remove need for hash lookup (Refactoring only)

For what it is worth I've created https://git.linaro.org/people/peter.smith/range-thunks-arm.git/ that contains all the range-thunks patches. I'll keep this in synch with the reviews.

Rebase in light of recent refactoring. No other changes.

Rebase on top of recent refactoring, no other changes.

ruiu added inline comments.Wed, Oct 25, 8:35 PM
ELF/Arch/ARM.cpp
91–92 ↗(On Diff #118761)

It is not clear to me if you really need this variable. If we understand correctly, we initially assume that we have thunks of 0x30000 bytes for each 0x1000000 bytes. But they are just rough initial estimation and doesn't have to be accurate, right? I wonder if we just assume that the size of thunk is zero. Or, if you really need it, can you assume that we have thunks of size zero for each 0x1000000 minus 0x30000 bytes?

Strictly speaking we don't need the ThunkSectionSize variable. I think it is worth having ThunkSectionSpacing less than the maximum branch range as it makes it more likely that all the thunks can be created in one pass. Does the value we reduce ThunkSectionSpacing by need to be as large as 16,384 Thunks, most likely not. It is a very conservative upper bound. I can change it but it will mean updating some of the tests in later approved reviews so I'd prefer not to do it here.

What I have done is to merge ThunkSectionSize into ThunkSectionSpacing by including the ThunkSectionSize in ThunkSectionSpacing, I've updated the comment to try and explain it a bit better as well.

ruiu accepted this revision.Thu, Oct 26, 2:01 PM

LGTM

This revision is now accepted and ready to land.Thu, Oct 26, 2:01 PM
This revision was automatically updated to reflect the committed changes.