This is an archive of the discontinued LLVM Phabricator instance.

[ELF2] Add support for Gnu Hash section
ClosedPublic

Authored by ikudrin on Oct 16 2015, 9:57 AM.

Details

Summary

This patch implements --hash-style command line switch.

  • By default, or with "sysv" or "both" parameters the linker generates a standard ELF hash section.
  • With "gnu" or "both" it produces Gnu-style hash section.

Support for that section requires symbols in the dynamic symbol table section to be divided in
two groups (not hashed and hashed symbols) and the last group have to be sorted to correspond
the hash table of the Gnu Hash section.

The division function, as well as estimations for the section's parameters, are just the first rough
variants and the subjects for further adjustments.

Diff Detail

Event Timeline

ikudrin updated this revision to Diff 37606.Oct 16 2015, 9:57 AM
ikudrin retitled this revision from to [ELF2] Add support for Gnu Hash section.
ikudrin updated this object.
ikudrin added reviewers: rafael, ruiu.
ikudrin added a project: lld.
ikudrin added a subscriber: llvm-commits.
ruiu edited edge metadata.Oct 16 2015, 12:36 PM

First round of comments.

ELF/Config.h
54–55

Let's name them GnuHash and SysvHash.

ELF/OutputSections.cpp
235

Remove unsigned.

248–249

What is this condition for?

257–278

Where does this implementation come from?

287–288

Can you return here if empty?

848–855

What is this change for?

ELF/OutputSections.h
322

Remove this line.

331

I'd name this Primes.

332

Remove this. You can use sizeof(Primes) / sizeof(Prime[0]).

354

Rename this SysvHashTab.

emaste added a subscriber: emaste.Oct 16 2015, 12:52 PM
ruiu added a comment.Oct 16 2015, 1:46 PM

Re-submitting to let Phab to send out a mail...

ruiu added inline comments.Oct 16 2015, 3:11 PM
ELF/OutputSections.h
59–61

I'm wondering if we really need two distinct functions for the traditional hash table and GNU's. GNU hash table is smaller than the traditional one because it omits some symbols. Technically, such symbols can be removed from the traditional hash table without hurting correctness.

ikudrin added inline comments.Oct 16 2015, 11:01 PM
ELF/OutputSections.cpp
248–249

It may be rewritten as (Is64Bits ? 0 : 4), but my variant shows the reason for this, I hope.
See http://sourceware.org/ml/binutils/2006-10/msg00377.html:

The .gnu.hash section uses sh_entsize 4 for 32-bit ELF and sh_entsize 0
for 64-bit ELF (as it has then non-uniform entity size; 4 32-bit words,
variable count of ELFCLASS sized words and then variable count of 32-bit
words; it doesn't repeat the historic mistakes on Alpha/s390x with .hash).

257–278

I've just selected prime numbers which are not greater than lower bound of each range. Haven't done any deep investigations, it just looks reasonable for the first try.

848–855

Gnu hash section requires reordering of dynamic symbol table entries. If Gnu hash section is in action, it sets correct indexes in its finalize() method; otherwise, indexes just follow the original order. So, in case of a dynamic symbol table, we write the entry right into its place without any additional sorting.

ELF/OutputSections.h
59–61

It sounds reasonable, but I tried to make this patch to solve just one problem. You may notice that all existent tests are still intact. We may optimize a traditional hash section in a separate patch.

ikudrin updated this revision to Diff 37685.Oct 17 2015, 9:35 AM
ikudrin edited edge metadata.
  • Rebased to the top.
  • Addressed comments.
ikudrin marked 7 inline comments as done.Oct 17 2015, 9:36 AM
rafael edited edge metadata.Oct 18 2015, 6:44 AM
rafael added a subscriber: rafael.

+template <class ELFT>
+const unsigned GnuHashTableSection<ELFT>::NBucketsEstimation[] = {
+ 1, 0.. 1
+ 1,
2
+ 3, 3.. 4
+ 3,
5.. 8
+ 7, 9.. 16
+ 13,
17.. 32
+ 31, 33.. 64
+ 61,
65.. 128
+ 127, 129.. 256
+ 251,
257.. 512
+ 509, 513.. 1024
+ 1021,
1025.. 2048
+ 2039, 2049.. 4096
+ 4093,
4097.. 8192
+ 8191, 8193.. 16384
+ 16381,
16385.. 32768
+ 32749, 32769.. 65536
+ 65521,
65537..131072
+ 131071 // 131073..
+};

+

ruiu wrote:

Where does this implementation come from?

I've just selected prime numbers which are not greater than lower bound of each range. Haven't done any deep investigations, it just looks reasonable for the first try.

Please add that as a comment.

Cheers,
Rafael

rafael added inline comments.Oct 18 2015, 7:02 AM
ELF/OutputSections.cpp
199

This is being removed because the symbols have to be sorted, right?

It is a bit untidy that we now call setDynamicSymbolTableIndex from two places.

Maybe organize this as:

  • In the current loop: Out<ELFT>::DynSymTab->addSymbol(...)
  • Sort the DynSymTab if needed.
  • Walk the DynSymTab, calling setDynamicSymbolTableIndex and adding the symbol to one or both hash tables.
ELF/OutputSections.h
338

Most of these members are named after the section name in the output file, so I would keep the old name.

rafael added inline comments.Oct 18 2015, 7:02 AM
ELF/OutputSections.h
62

Optimizing the sysv hash is a good thing, but lets do it in another patch (before or after this one).

177

Independent change, no? If so, please commit it and rebase.

ikudrin added inline comments.Oct 18 2015, 10:12 AM
ELF/OutputSections.cpp
199

I consider the main reason to remove this function as it is not a hash table's responsibility to fill up a dynamic symbol table.

I like the idea to sort entries of a symbol table within the symbol table class, but in case of the Gnu hash section, it imposes not only some requirements to the order of entries, but also the oreder itself, as it is calculated using hash values and NBuckets parameter. I can't find a better solution than sorting the symbols which belongs to Gnu Hash table within the GnuHashTableSection class.

ikudrin added inline comments.Oct 18 2015, 10:51 AM
ELF/OutputSections.cpp
199

One more thing. If we implement one method to sort the entire dynamic symbol table and set indexes of the entries, we have to decide, when it should be called. The right place to call thist sort function might be the finalize() method of the symbol table class. But the sort function will require some information, which can only be provided by the Gnu hash table class, and the right place to calculate it is the finalize() method of this class. I don't think it's a good idea to rely on the order of calling of finalize() methods.

I like the idea to sort entries of a symbol table within the symbol table class, but in case of the Gnu hash section, it imposes not only some requirements to the order of entries, but also the oreder itself, as it is calculated using hash values and NBuckets parameter. I can't find a better solution than sorting the symbols which belongs to Gnu Hash table within the GnuHashTableSection class.

One more thing. If we implement one method to sort the entire dynamic symbol table and set indexes of the entries, we have to decide, when it should be called. The right place to call thist sort function might be the finalize() method of the symbol table class. But the sort function will require some information, which can only be provided by the Gnu hash table class, and the right place to calculate it is the finalize() method of this class. I don't think it's a good idea to rely on the order of calling of finalize() methods.

What about:

  • In the current loop: Out<ELFT>::DynSymTab->addSymbol(...)
  • Sort the DynSymTab if needed: if (Out<ELFT>::GnuHashTab) GnuHashTable<ELFT>::sortDynamicSymbols()
  • Walk the DynSymTab, calling setDynamicSymbolTableIndex and adding

the symbol to one or both hash tables.

What about:

  • In the current loop: Out<ELFT>::DynSymTab->addSymbol(...)
  • Sort the DynSymTab if needed: if (Out<ELFT>::GnuHashTab) GnuHashTable<ELFT>::sortDynamicSymbols()
  • Walk the DynSymTab, calling setDynamicSymbolTableIndex and adding

the symbol to one or both hash tables.

I've almost done a new variant which is close to your suggestion, I hope. Let's look at it.

ikudrin marked an inline comment as done.Oct 19 2015, 8:09 AM
ikudrin updated this revision to Diff 37764.Oct 19 2015, 9:24 AM
ikudrin edited edge metadata.
  • Implemented the new approach to sort symbols within a symbol table. All symbols are added to symbol tables which are responsible for sorting them in the correct order, taking into account all requirements.

NB. If this approach is recognized as acceptable, I think it will be better to commit it in a separate patch.

ikudrin marked 2 inline comments as done and an inline comment as not done.Oct 19 2015, 9:25 AM
ruiu added inline comments.Oct 19 2015, 9:44 AM
ELF/OutputSections.cpp
235

My previous comment is probably wrong. You may want to use uint8_t here.

248–249

OK, thank you for the explanation. I'd prefer Is64Bits ? 0 : 4, as that's what we want.

250

This is always 2, no? Is this correct?

259–277

Remove comments and write them in one line (or two), and add a comment saying that these numbers are the smallest numbers that are not smaller than 2^n.

303

Use llvm::array_lengthof().

309

Remove ';'

312
HashedSymbols.begin(), HashedSymbols.end(),
314–316
return L.Hash % NBuckets < R.Hash % NBuckets;
344

CHAR_BIT is always 8, so write 8. Drop 'static const' -- it is static const but the compiler can easily figure that out.

349

You can directly write (MaskWords - 1) here.

350–351

You can just write 1 because RHS's type of << is the same as the result's type.

359–360

Drop const.

360

auto -> unsigned

362

Please add a reference to the data structure this code is trying to create.

362–364

signed -> int

366

signed -> int

370

1u -> 1

374

HashMask -> ~1

ELF/OutputSections.h
331

Rename this Symbols because all symbols handled by this class are subject of calculation of hash values (so it is redundant).

338

Hm, makes sense.

ELF/Writer.cpp
502

Why do we have to call setDynamicSymbolTableIndex here? That's not obvious.

ikudrin added inline comments.Oct 19 2015, 10:39 AM
ELF/OutputSections.cpp
250

Hmm, no. It's 8 on 64-bit platform and 4 on 32-bit.

360

It's not unsigned but Elf_Word.

ELF/Writer.cpp
502

Please see my last patch, it's straightforward on that question.

ruiu added inline comments.Oct 19 2015, 11:01 AM
ELF/OutputSections.cpp
246

Can you then use uintX_t instead of Elf_Off? Elf_Off is, strictly speaking, probably better type, but I think in practice we don't want to distinguish them.

248–249

Can you do this?

ELF/OutputSections.h
164

This name doesn't feel a correct one. Does this return the number of dynamically-linked symbols? Does this return the GNU-style hash table size?

282–284

Sort

ikudrin added inline comments.Oct 19 2015, 11:13 AM
ELF/OutputSections.cpp
248–249

I'll make the change in the next update.

ikudrin added inline comments.Oct 19 2015, 11:24 AM
ELF/OutputSections.cpp
246

I don't think sizeof(uintX_t) is relevant here as we don't use that type to store values. Maybe it's better to use (Is64Bins ? 8 : 4) to be in sync with the previous line.

ELF/OutputSections.h
164

Is "getNumGnuHashed()" a better variant? Any suggestions are welcome.

ikudrin updated this revision to Diff 37831.Oct 20 2015, 12:50 AM

Addressed review comments.

ikudrin marked 16 inline comments as done.Oct 20 2015, 12:57 AM

Rebased patch attached. I did a merge to review the patch.

rafael added inline comments.Oct 20 2015, 11:38 AM
ELF/OutputSections.cpp
237

Given that you are using a uint32_t, this is a nop, no?

270

The expression is always bigger than 1, no?
Also, given the comment, should it be

RoundUpToAlignment(NumHashed, sizeof(Elf_Off))/sizeof(Elf_Off)?

292

Maybe pass a reference to writeHeader and others so that you don't have to also increment the buf pointer in here?

335

This means that the entries in Bucket are inedxes into the Value vector.

According to https://blogs.oracle.com/ali/entry/gnu_hash_elf_sections, they are indexes into the dynamic symbol table, so in here you would have to add the number of non-hashed symbols.

Is that reference wrong?

760

You are also sorting the non dynamic symbols, right?

I t is probably cleaner if there are two independent sorts, with just the constraints that each section has

if (!StrTabSec.isDynamic()) {
simple sort on just getBinding() == STB_LOCAL.
return;
}
The more complicated sort for the dynamic table.

907

Adding getSymbolBinding is a nice implement on its own. Please commit and rebase.

ikudrin added inline comments.Oct 20 2015, 12:25 PM
ELF/OutputSections.cpp
237

Thanks for reminding. The reference implementation uses uint_fast32_t, I think we should do the same.

907
ruiu added inline comments.Oct 20 2015, 12:50 PM
ELF/Driver.cpp
159

I'd name this just "S".

ELF/OutputSections.cpp
237

Unlike uint_fast32_t, uint32_t is always 32-bit. You don't need this mask.

246

Yeah, this looks better than before.

256

Let's remove this line. This implementation should be good enough (not only as a first try).

258–259

No need to align them vertically.

315–317

It's probably better to split this into multiple statements.

size_t Pos = (Item.GnuHash / C) & (MaskWords - 1);
Elf_Off V = uintX_t(1) << (Item.GnuHash % C) | uintX_t(1) << ((Item.GnuHash >> Shift2) % C);
Mask[Pos] |= V;
ELF/OutputSections.h
62

Agreed.

280

Gnu -> GNU

ikudrin added inline comments.Oct 20 2015, 12:58 PM
ELF/OutputSections.cpp
237

Yep, but I'd prefer to replace the H's type with uint_fast32_t. What do you think?

258–259

It's what clang-format does.

ruiu added inline comments.Oct 20 2015, 1:07 PM
ELF/OutputSections.cpp
237

I don't prefer to use uint_fast32_t. I don't expect that would make any observable difference because I think the type would be mapped to uint32_t. In addition to that, the type is used so rarely that that would make the code look a bit odd. (If you are a reader, you'd have to stop and think about the type.)

ikudrin added inline comments.Oct 20 2015, 1:09 PM
ELF/OutputSections.cpp
237

OK.

ikudrin updated this revision to Diff 37929.Oct 20 2015, 3:06 PM
ikudrin updated this object.

The part of the patch was extracted as a separate change http://reviews.llvm.org/D13911.

  • Rebased on the top.
  • Bugfix: index of the symbol in the hash values part.
  • Addressed the other comments.
ikudrin marked 10 inline comments as done.Oct 20 2015, 3:14 PM
ikudrin added inline comments.
ELF/OutputSections.cpp
270

Thank you for the suggestion, it looks much better now.

335

Shame on me, you are absolutely correct.

ruiu added inline comments.Oct 20 2015, 7:47 PM
ELF/OutputSections.cpp
313–314

It's better to parenthesize uintX_t(1) << (Item.GnuHash % C) and uintX_t(1) << ((Item.GnuHash >> Shift2) % C). Not everybody remembers the operator precedence table precisely.

325–342

This part of code is a bit hard to read because too many local variables are used and they look similar. Let's use less variables and make their name shorter.

int PrevHash = -1;
int I = 0;
for (const typename SymbolTableSection<ELFT>::SymbolData &Item :
     Out<ELFT>::DynSymTab->getGnuHashSymbols()) {
  int Hash = Item.GnuHash % NBuckets;
  if (Hash != PrevHash) {
    if (I > 0)
      Values[I - 1] |= 1;
    Buckets[Hash] = Item.Body->getDynamicSymbolTableIndex();
    PrevHash = Hash;
  }
  Values[I] = Item.GnuHash & ~1;
  ++I;
}
if (I > 0)
  Values[I - 1] |= 1;
ELF/OutputSections.h
177

I'd name this HasGnuHash.

ikudrin updated this revision to Diff 37970.Oct 21 2015, 1:53 AM
ikudrin marked 2 inline comments as done.
  • Corrected calculation of MaskWords.
  • Simplified hash fill loop.
  • Addressed other comments.
ikudrin marked 2 inline comments as done.Oct 21 2015, 1:57 AM
ikudrin added inline comments.
ELF/OutputSections.cpp
270

On the fresh look, I've forgotten about the requirement for this value to be a power of 2.

rafael added inline comments.Oct 21 2015, 8:23 AM
ELF/OutputSections.cpp
270

Why the -1?
Why the static_cast?

ikudrin added inline comments.Oct 21 2015, 8:53 AM
ELF/OutputSections.cpp
270

For 64-bit target: calcMaskWords(0..8) = 1, calcMaskWords(9..16) = 2, calcMaskWords(17..32) = 4... I feel like it'd be better to have a unit test here.

The return type of NextPowerOf2 is uint64_t. I try to avoid a potential warning here about value truncation.

rafael added inline comments.Oct 21 2015, 9:09 AM
ELF/OutputSections.cpp
270

OK, I missed the fact that NextPowerOf returns the power of 2 that is grater, not greater or equal.

Can you add your example as a comment?

ikudrin updated this revision to Diff 38028.Oct 21 2015, 10:33 AM

Added an explanatory comment for calcMaskWords() method.

ikudrin marked an inline comment as done.Oct 21 2015, 10:35 AM
ruiu accepted this revision.Oct 21 2015, 1:00 PM
ruiu edited edge metadata.

LGTM

There might be room to improve code even more, but this is I think already good enough. As long as readers understand the structure of .gnu.hash section, this code feels straightforward.

This revision is now accepted and ready to land.Oct 21 2015, 1:00 PM
rafael accepted this revision.Oct 21 2015, 2:19 PM
rafael edited edge metadata.

LGTM for me too.

This revision was automatically updated to reflect the committed changes.

Thank you for all the comments.