This is an archive of the discontinued LLVM Phabricator instance.

[LLD] [COFF] Alternative ARM range thunk algorithm
ClosedPublic

Authored by mstorsjo on Sep 16 2018, 2:30 PM.

Details

Summary

This is an alternative to D51089, implementing a simpler thunk addition algorithm with a simpler but less graceful fallback in case the first attempt at thunks didn't cut it.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Sep 16 2018, 2:30 PM
ruiu added a comment.Sep 17 2018, 4:45 PM

I think I prefer this way, as it is conceptually a bit simpler than the iterative algorithm you've implemented in another patch.

COFF/Writer.cpp
369–371 ↗(On Diff #165697)

I'd write these three lines one line.

385 ↗(On Diff #165697)

The relationship between section and its thunk is 1:N. I believe that's why you are using a map to manage thunk -> symbol mapping.

However, I think you visit sections in the increasing address order, so if the last thunk is not reachable, no other thunks are reachable. Can you use the fact?

My best suggestion for a test that will need exceed the margin is to use input sections with high alignment requirements. If these are on a natural boundary prior to adding thunks then adding a thunk will result in a large amount of bytes inserted to maintain section alignment. As long as these padding bytes are in between the source and destination of a branch and exceed the margin then a test case can be constructed without needing to insert hundreds of Thunks.

COFF/Writer.cpp
485 ↗(On Diff #165697)

Given that we expect most programs not to need any thunks at all then would moving the verify part ahead of createThunks make sense? Something like

while (true) {
  if (verifyRanges)
    // were done exit
  else {
    if pass 0 initialise margin else double it.
    create thunks()
    assignAddresses()
    ++ThunkPasses;
  }
}

That would mean one more call to verifyRanges if thunks are needed, but that is expected to be the exception.

491 ↗(On Diff #165697)

Can you move this to line 500?

My best suggestion for a test that will need exceed the margin is to use input sections with high alignment requirements. If these are on a natural boundary prior to adding thunks then adding a thunk will result in a large amount of bytes inserted to maintain section alignment. As long as these padding bytes are in between the source and destination of a branch and exceed the margin then a test case can be constructed without needing to insert hundreds of Thunks.

The biggest section alignment that COFF supports is 8 KB. Using that probably can reduce the testcase to requiring a few dozen of thunks, and that's probably manageable to write a testcase for. Thanks for the idea! I looked into alignments before, for creating the whole gap to cause the need for thunks, but it wasn't enough as you can't do a 16 MB alignment.

COFF/Writer.cpp
369–371 ↗(On Diff #165697)

Will do

385 ↗(On Diff #165697)

The relationship between section and its thunk is 1:N. I believe that's why you are using a map to manage thunk -> symbol mapping.

For every target Symbol, there can be N thunks that point to it, yes.

However, I think you visit sections in the increasing address order, so if the last thunk is not reachable, no other thunks are reachable. Can you use the fact?

Yes - I'm using exactly this fact. This map allows me to look up the last thunk pointing to each specific RVA. (In the previous version, I had something similar to DenseMap<uint64_t, vector<Defined *>>, to allow finding all thunks for a specific target. But as you say, as we progress linearly we only need to keep track of the last thunk for each target.)

485 ↗(On Diff #165697)

That sounds like a good idea. That also avoids the case of needlessly adding thunks for the case when the margin would cause them to be added, even if the total image size still is slightly below the maximum branch range.

491 ↗(On Diff #165697)

Oh, indeed - I don't need it after the loop any longer.

mstorsjo updated this revision to Diff 165931.Sep 18 2018, 4:11 AM
mstorsjo edited the summary of this revision. (Show Details)

Applied the given suggestions, added test cases based on @peter.smith's idea of using large alignments for testing cases when multiple passes are needed. This should now again be in a good-to-commit state, if everybody are happy with it.

Thanks for the updates. I don't have any more comments.

@ruiu Any more comments?

ruiu added inline comments.Sep 24 2018, 11:22 AM
COFF/Writer.cpp
385 ↗(On Diff #165697)

I wonder if you can eliminate map lookups (which is slow). Since the relocation is 1:N, you can add a Defined * to a symbol to represent the relationship, no?

mstorsjo added inline comments.Sep 24 2018, 11:40 AM
COFF/Writer.cpp
385 ↗(On Diff #165697)

Oh, yes, that should also work, I think. And it seems like sizeof(Defined) is significantly less than sizeof(SymbolUnion), so it shouldn't grow the total memory consumption.

But in that case, on restarts, we need to iterate over all symbols in order to reset it. That's probably fine performance-wise since that's not expected to happen in real life - but is that even possible? SymbolTable only contains global symbols right?

If that's not possible, we'd need to store Defined * and the pass number it was added in.

ruiu added a comment.Sep 24 2018, 12:30 PM

I believe you can iterate over all input files to visit all symbols including local ones.

I believe you can iterate over all input files to visit all symbols including local ones.

Ah, yes. I have this working now - the speedup isn't significant, but it turned out I looked at the wrong numbers for the sizes. While Defined itself is significantly smaller than SymbolUnion, Defined itself has got lots of subclasses, and adding this extra field does grow SymbolUnion from 56 to 64 bytes. Do you still think it's worth it?

ruiu added a comment.Sep 24 2018, 1:21 PM

Ah, yes. I have this working now - the speedup isn't significant, but it turned out I looked at the wrong numbers for the sizes. While Defined itself is significantly smaller than SymbolUnion, Defined itself has got lots of subclasses, and adding this extra field does grow SymbolUnion from 56 to 64 bytes. Do you still think it's worth it?

Hmm, then maybe my idea wasn't that good.

ruiu added a comment.Sep 24 2018, 5:18 PM

This patch looks pretty good.

COFF/Writer.cpp
298–300 ↗(On Diff #165931)

I would inline this function because it is too small, and I don't see a need to abstract this predicate this way at the moment.

333 ↗(On Diff #165931)

Can you explain the algorithm of thunk creation, that is, we create thunks if a jump target looks too far with some margin, and then after creating all thunks, verify that all jump targets are within their ranges, and if not, redo with a wider margin.

425 ↗(On Diff #165931)

nit: we usually define only one variable in one line.

mstorsjo added inline comments.Sep 25 2018, 12:10 AM
COFF/Writer.cpp
298–300 ↗(On Diff #165931)

Ok - at some point I think I had more uses of it and the inline function made the check clearer, but yes, with the one current use there's little point in it - I'll inline it.

333 ↗(On Diff #165931)

Ok, will do.

425 ↗(On Diff #165931)

Will fix

mstorsjo updated this revision to Diff 166819.Sep 25 2018, 12:11 AM

Did the suggested changes, added code comments to functions that were lacking comments.

ruiu accepted this revision.Sep 25 2018, 3:47 AM

LGTM

Thanks!

This revision is now accepted and ready to land.Sep 25 2018, 3:47 AM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
lld/trunk/COFF/Writer.cpp