Page MenuHomePhabricator

[ELF2] Add support for Gnu Hash section
ClosedPublic

Authored by ikudrin on Oct 16 2015, 9:57 AM.

Details

Summary

This patch implements --hash-style command line switch.

  • By default, or with "sysv" or "both" parameters the linker generates a standard ELF hash section.
  • With "gnu" or "both" it produces Gnu-style hash section.

Support for that section requires symbols in the dynamic symbol table section to be divided in
two groups (not hashed and hashed symbols) and the last group have to be sorted to correspond
the hash table of the Gnu Hash section.

The division function, as well as estimations for the section's parameters, are just the first rough
variants and the subjects for further adjustments.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rafael added inline comments.Oct 18 2015, 7:02 AM
ELF/OutputSections.h
48 ↗(On Diff #37685)

Optimizing the sysv hash is a good thing, but lets do it in another patch (before or after this one).

152 ↗(On Diff #37685)

Independent change, no? If so, please commit it and rebase.

ikudrin added inline comments.Oct 18 2015, 10:12 AM
ELF/OutputSections.cpp
194 ↗(On Diff #37685)

I consider the main reason to remove this function as it is not a hash table's responsibility to fill up a dynamic symbol table.

I like the idea to sort entries of a symbol table within the symbol table class, but in case of the Gnu hash section, it imposes not only some requirements to the order of entries, but also the oreder itself, as it is calculated using hash values and NBuckets parameter. I can't find a better solution than sorting the symbols which belongs to Gnu Hash table within the GnuHashTableSection class.

ikudrin added inline comments.Oct 18 2015, 10:51 AM
ELF/OutputSections.cpp
194 ↗(On Diff #37685)

One more thing. If we implement one method to sort the entire dynamic symbol table and set indexes of the entries, we have to decide, when it should be called. The right place to call thist sort function might be the finalize() method of the symbol table class. But the sort function will require some information, which can only be provided by the Gnu hash table class, and the right place to calculate it is the finalize() method of this class. I don't think it's a good idea to rely on the order of calling of finalize() methods.

I like the idea to sort entries of a symbol table within the symbol table class, but in case of the Gnu hash section, it imposes not only some requirements to the order of entries, but also the oreder itself, as it is calculated using hash values and NBuckets parameter. I can't find a better solution than sorting the symbols which belongs to Gnu Hash table within the GnuHashTableSection class.

One more thing. If we implement one method to sort the entire dynamic symbol table and set indexes of the entries, we have to decide, when it should be called. The right place to call thist sort function might be the finalize() method of the symbol table class. But the sort function will require some information, which can only be provided by the Gnu hash table class, and the right place to calculate it is the finalize() method of this class. I don't think it's a good idea to rely on the order of calling of finalize() methods.

What about:

  • In the current loop: Out<ELFT>::DynSymTab->addSymbol(...)
  • Sort the DynSymTab if needed: if (Out<ELFT>::GnuHashTab) GnuHashTable<ELFT>::sortDynamicSymbols()
  • Walk the DynSymTab, calling setDynamicSymbolTableIndex and adding

the symbol to one or both hash tables.

What about:

  • In the current loop: Out<ELFT>::DynSymTab->addSymbol(...)
  • Sort the DynSymTab if needed: if (Out<ELFT>::GnuHashTab) GnuHashTable<ELFT>::sortDynamicSymbols()
  • Walk the DynSymTab, calling setDynamicSymbolTableIndex and adding the symbol to one or both hash tables.

I've almost done a new variant which is close to your suggestion, I hope. Let's look at it.

ikudrin marked an inline comment as done.Oct 19 2015, 8:09 AM
ikudrin updated this revision to Diff 37764.Oct 19 2015, 9:24 AM
ikudrin edited edge metadata.
  • Implemented the new approach to sort symbols within a symbol table. All symbols are added to symbol tables which are responsible for sorting them in the correct order, taking into account all requirements.

NB. If this approach is recognized as acceptable, I think it will be better to commit it in a separate patch.

ikudrin marked 2 inline comments as done and an inline comment as not done.Oct 19 2015, 9:25 AM
ruiu added inline comments.Oct 19 2015, 9:44 AM
ELF/OutputSections.cpp
232 ↗(On Diff #37685)

My previous comment is probably wrong. You may want to use uint8_t here.

245–246 ↗(On Diff #37685)

OK, thank you for the explanation. I'd prefer Is64Bits ? 0 : 4, as that's what we want.

247 ↗(On Diff #37685)

This is always 2, no? Is this correct?

256–274 ↗(On Diff #37685)

Remove comments and write them in one line (or two), and add a comment saying that these numbers are the smallest numbers that are not smaller than 2^n.

300 ↗(On Diff #37685)

Use llvm::array_lengthof().

306 ↗(On Diff #37685)

Remove ';'

309 ↗(On Diff #37685)
HashedSymbols.begin(), HashedSymbols.end(),
311–313 ↗(On Diff #37685)
return L.Hash % NBuckets < R.Hash % NBuckets;
341 ↗(On Diff #37685)

CHAR_BIT is always 8, so write 8. Drop 'static const' -- it is static const but the compiler can easily figure that out.

346 ↗(On Diff #37685)

You can directly write (MaskWords - 1) here.

347–348 ↗(On Diff #37685)

You can just write 1 because RHS's type of << is the same as the result's type.

356–357 ↗(On Diff #37685)

Drop const.

357 ↗(On Diff #37685)

auto -> unsigned

359 ↗(On Diff #37685)

Please add a reference to the data structure this code is trying to create.

359–361 ↗(On Diff #37685)

signed -> int

363 ↗(On Diff #37685)

signed -> int

367 ↗(On Diff #37685)

1u -> 1

371 ↗(On Diff #37685)

HashMask -> ~1

ELF/OutputSections.h
262 ↗(On Diff #37685)

Rename this Symbols because all symbols handled by this class are subject of calculation of hash values (so it is redundant).

299 ↗(On Diff #37685)

Hm, makes sense.

ELF/Writer.cpp
486 ↗(On Diff #37685)

Why do we have to call setDynamicSymbolTableIndex here? That's not obvious.

ikudrin added inline comments.Oct 19 2015, 10:39 AM
ELF/OutputSections.cpp
247 ↗(On Diff #37685)

Hmm, no. It's 8 on 64-bit platform and 4 on 32-bit.

357 ↗(On Diff #37685)

It's not unsigned but Elf_Word.

ELF/Writer.cpp
486 ↗(On Diff #37685)

Please see my last patch, it's straightforward on that question.

ruiu added inline comments.Oct 19 2015, 11:01 AM
ELF/OutputSections.cpp
241 ↗(On Diff #37764)

Can you then use uintX_t instead of Elf_Off? Elf_Off is, strictly speaking, probably better type, but I think in practice we don't want to distinguish them.

243–244 ↗(On Diff #37764)

Can you do this?

ELF/OutputSections.h
150 ↗(On Diff #37764)

This name doesn't feel a correct one. Does this return the number of dynamically-linked symbols? Does this return the GNU-style hash table size?

251–253 ↗(On Diff #37764)

Sort

ikudrin added inline comments.Oct 19 2015, 11:13 AM
ELF/OutputSections.cpp
243–244 ↗(On Diff #37764)

I'll make the change in the next update.

ikudrin added inline comments.Oct 19 2015, 11:24 AM
ELF/OutputSections.cpp
241 ↗(On Diff #37764)

I don't think sizeof(uintX_t) is relevant here as we don't use that type to store values. Maybe it's better to use (Is64Bins ? 8 : 4) to be in sync with the previous line.

ELF/OutputSections.h
150 ↗(On Diff #37764)

Is "getNumGnuHashed()" a better variant? Any suggestions are welcome.

ikudrin updated this revision to Diff 37831.Oct 20 2015, 12:50 AM

Addressed review comments.

ikudrin marked 16 inline comments as done.Oct 20 2015, 12:57 AM

Rebased patch attached. I did a merge to review the patch.

rafael added inline comments.Oct 20 2015, 11:38 AM
ELF/OutputSections.cpp
237 ↗(On Diff #37831)

Given that you are using a uint32_t, this is a nop, no?

270 ↗(On Diff #37831)

The expression is always bigger than 1, no?
Also, given the comment, should it be

RoundUpToAlignment(NumHashed, sizeof(Elf_Off))/sizeof(Elf_Off)?

292 ↗(On Diff #37831)

Maybe pass a reference to writeHeader and others so that you don't have to also increment the buf pointer in here?

335 ↗(On Diff #37831)

This means that the entries in Bucket are inedxes into the Value vector.

According to https://blogs.oracle.com/ali/entry/gnu_hash_elf_sections, they are indexes into the dynamic symbol table, so in here you would have to add the number of non-hashed symbols.

Is that reference wrong?

760 ↗(On Diff #37831)

You are also sorting the non dynamic symbols, right?

I t is probably cleaner if there are two independent sorts, with just the constraints that each section has

if (!StrTabSec.isDynamic()) {
simple sort on just getBinding() == STB_LOCAL.
return;
}
The more complicated sort for the dynamic table.

907 ↗(On Diff #37831)

Adding getSymbolBinding is a nice implement on its own. Please commit and rebase.

ikudrin added inline comments.Oct 20 2015, 12:25 PM
ELF/OutputSections.cpp
237 ↗(On Diff #37831)

Thanks for reminding. The reference implementation uses uint_fast32_t, I think we should do the same.

907 ↗(On Diff #37831)
ruiu added inline comments.Oct 20 2015, 12:50 PM
ELF/Driver.cpp
159 ↗(On Diff #37831)

I'd name this just "S".

ELF/OutputSections.cpp
237 ↗(On Diff #37831)

Unlike uint_fast32_t, uint32_t is always 32-bit. You don't need this mask.

246 ↗(On Diff #37831)

Yeah, this looks better than before.

256 ↗(On Diff #37831)

Let's remove this line. This implementation should be good enough (not only as a first try).

258–259 ↗(On Diff #37831)

No need to align them vertically.

315–317 ↗(On Diff #37831)

It's probably better to split this into multiple statements.

size_t Pos = (Item.GnuHash / C) & (MaskWords - 1);
Elf_Off V = uintX_t(1) << (Item.GnuHash % C) | uintX_t(1) << ((Item.GnuHash >> Shift2) % C);
Mask[Pos] |= V;
ELF/OutputSections.h
62 ↗(On Diff #37831)

Agreed.

280 ↗(On Diff #37831)

Gnu -> GNU

ikudrin added inline comments.Oct 20 2015, 12:58 PM
ELF/OutputSections.cpp
237 ↗(On Diff #37831)

Yep, but I'd prefer to replace the H's type with uint_fast32_t. What do you think?

258–259 ↗(On Diff #37831)

It's what clang-format does.

ruiu added inline comments.Oct 20 2015, 1:07 PM
ELF/OutputSections.cpp
237 ↗(On Diff #37831)

I don't prefer to use uint_fast32_t. I don't expect that would make any observable difference because I think the type would be mapped to uint32_t. In addition to that, the type is used so rarely that that would make the code look a bit odd. (If you are a reader, you'd have to stop and think about the type.)

ikudrin added inline comments.Oct 20 2015, 1:09 PM
ELF/OutputSections.cpp
237 ↗(On Diff #37831)

OK.

ikudrin updated this revision to Diff 37929.Oct 20 2015, 3:06 PM
ikudrin updated this object.

The part of the patch was extracted as a separate change http://reviews.llvm.org/D13911.

  • Rebased on the top.
  • Bugfix: index of the symbol in the hash values part.
  • Addressed the other comments.
ikudrin marked 10 inline comments as done.Oct 20 2015, 3:14 PM
ikudrin added inline comments.
ELF/OutputSections.cpp
324 ↗(On Diff #37929)

Thank you for the suggestion, it looks much better now.

389 ↗(On Diff #37929)

Shame on me, you are absolutely correct.

ruiu added inline comments.Oct 20 2015, 7:47 PM
ELF/OutputSections.cpp
367–368 ↗(On Diff #37929)

It's better to parenthesize uintX_t(1) << (Item.GnuHash % C) and uintX_t(1) << ((Item.GnuHash >> Shift2) % C). Not everybody remembers the operator precedence table precisely.

379–396 ↗(On Diff #37929)

This part of code is a bit hard to read because too many local variables are used and they look similar. Let's use less variables and make their name shorter.

int PrevHash = -1;
int I = 0;
for (const typename SymbolTableSection<ELFT>::SymbolData &Item :
     Out<ELFT>::DynSymTab->getGnuHashSymbols()) {
  int Hash = Item.GnuHash % NBuckets;
  if (Hash != PrevHash) {
    if (I > 0)
      Values[I - 1] |= 1;
    Buckets[Hash] = Item.Body->getDynamicSymbolTableIndex();
    PrevHash = Hash;
  }
  Values[I] = Item.GnuHash & ~1;
  ++I;
}
if (I > 0)
  Values[I - 1] |= 1;
ELF/OutputSections.h
185 ↗(On Diff #37929)

I'd name this HasGnuHash.

ikudrin updated this revision to Diff 37970.Oct 21 2015, 1:53 AM
ikudrin marked 2 inline comments as done.
  • Corrected calculation of MaskWords.
  • Simplified hash fill loop.
  • Addressed other comments.
ikudrin marked 2 inline comments as done.Oct 21 2015, 1:57 AM
ikudrin added inline comments.
ELF/OutputSections.cpp
324 ↗(On Diff #37970)

On the fresh look, I've forgotten about the requirement for this value to be a power of 2.

rafael added inline comments.Oct 21 2015, 8:23 AM
ELF/OutputSections.cpp
324 ↗(On Diff #37970)

Why the -1?
Why the static_cast?

ikudrin added inline comments.Oct 21 2015, 8:53 AM
ELF/OutputSections.cpp
324 ↗(On Diff #37970)

For 64-bit target: calcMaskWords(0..8) = 1, calcMaskWords(9..16) = 2, calcMaskWords(17..32) = 4... I feel like it'd be better to have a unit test here.

The return type of NextPowerOf2 is uint64_t. I try to avoid a potential warning here about value truncation.

rafael added inline comments.Oct 21 2015, 9:09 AM
ELF/OutputSections.cpp
324 ↗(On Diff #37970)

OK, I missed the fact that NextPowerOf returns the power of 2 that is grater, not greater or equal.

Can you add your example as a comment?

ikudrin updated this revision to Diff 38028.Oct 21 2015, 10:33 AM

Added an explanatory comment for calcMaskWords() method.

ikudrin marked an inline comment as done.Oct 21 2015, 10:35 AM
ruiu accepted this revision.Oct 21 2015, 1:00 PM
ruiu edited edge metadata.

LGTM

There might be room to improve code even more, but this is I think already good enough. As long as readers understand the structure of .gnu.hash section, this code feels straightforward.

This revision is now accepted and ready to land.Oct 21 2015, 1:00 PM
rafael accepted this revision.Oct 21 2015, 2:19 PM
rafael edited edge metadata.

LGTM for me too.

This revision was automatically updated to reflect the committed changes.

Thank you for all the comments.