This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Optimize getISDThunkSec() to amortized O(1) in thunk creation
AbandonedPublic

Authored by MaskRay on May 8 2019, 1:52 AM.

Details

Summary

The time complexity of getISDThunkSec() is O(|ISD->ThunkSections|).
When thunk section spacing is enabled, ISD->ThunkSections can be in the
scale of size(ISD)/getThunkSectionSpacing(). In one large powerpc64le
executable, this number for .text is 25980.

It is very slow if every created thunk in the current
InputSectionDescription goes through the whole array.

Sort relocations by offset and leverage this property to optimize it to O(1):

If a thunk is located before the current relocation offset, once it
can't be reached, any subsequent relocation can't reach this thunk,
either.

Event Timeline

MaskRay created this revision.May 8 2019, 1:52 AM

If I understand correctly I think this could prematurely skip over ThunkSections if there is more than one branch relocation range. For example consider a case where there are two relocation ranges S (short) and L (long). If we search for a ThunkSection for S where the ThunkSection spacing is L then there is a chance that at a relocation with range S at offset O will be out of range of one of more ThunkSections, but these would have been in range had it been R. The next relocation with range L at offset O2 will not even check some ThunkSections that it would have been in range of. The chances of this happening are much greater when S is much smaller than L.

Short-branch-needing-thunk
Long-branch-needing-thunk
...
ThunkSection out of range of short-branch, but in range of long
...

I may be missing something though? If I'm right I think that you would need to store a ThunkSecIdx per relocation range for this to give the same results as the existing code. Another possibility is that ThunkSecIdx is only written back to for the largest relocation range.

MaskRay abandoned this revision.May 8 2019, 3:00 AM

if there is more than one branch relocation range.

Good point - there can be more than one branch types. I didn't know the code well when I created this patch.

If I choose 0x2000000 for ppc64le thunk section spacing (I tried several number slightly smaller than 0x2000000; they have no differences), ThunkSections.size() is at most 22. I'll abandon this.