This is an archive of the discontinued LLVM Phabricator instance.

[ELF] .gnu.hash bloom filter: use Shift2 = 26 instead of 6
ClosedPublic

Authored by MaskRay on Dec 20 2018, 4:21 PM.

Details

Summary

For the 2-bit bloom filter, we currently pick the bits Hash%64 and Hash>>6%64 (Shift2=6), but bits [6:...] are also used to select a word, causing a loss of precision.

In this patch, we choose Shift2=26, with is suggested by Ambrose Feinstein.

Note, Shift2 is computed as maskbitslog2 in bfd/elflink.c and gold/dynobj.cc

Event Timeline

MaskRay created this revision.Dec 20 2018, 4:21 PM
ruiu added a comment.Dec 20 2018, 4:24 PM

Can you include more benchmark numbers or something that support your claim in the commit message?

MaskRay updated this revision to Diff 179201.Dec 20 2018, 4:30 PM
MaskRay edited the summary of this revision. (Show Details)

ref of gold and bfd

ruiu added a comment.Dec 20 2018, 4:57 PM

Discussed with Ambrose and I'm convinced that this change makes sense.

So, this is the code to create the bitmap.

for (const Entry &Sym : Symbols) {
  size_t I = (Sym.Hash / C) & (MaskWords - 1);
  uint64_t Val = readUint(Buf + I * Config->Wordsize);
  // We choose Shift2 = 26 to give us high 6 bits for the 64-bit case,
  // (nearly) independent from the low 6 bits, and thus less collision.
  Val |= uint64_t(1) << (Sym.Hash % C);
  Val |= uint64_t(1) << ((Sym.Hash >> Shift2) % C);
  writeUint(Buf + I * Config->Wordsize, Val);
}

Assume a 64-bit target, then C is 64 (or 1 << 6). We compute I which is bit [N:6] of Sym.Hash where N is some integer. (Sym.Hash % C) is bit [0:5]. ((Sym.Hash >> Shift2) % C) is bit [11:6]. Thus, I and the last expression are likely to compute the same value. That results in setting the same bit again and again for each word, so even though we have two bits for each word, we effectively has only one bit if that happens.

ruiu added inline comments.Dec 20 2018, 5:01 PM
ELF/SyntheticSections.cpp
2183–2184

This is perhaps not the best place to write a comment because it's a bit too deep inside a loop. You probably should give a broader view for a reader.

I'd explain that we choose a word using bit [MaskWords+6 : 6] and set 1 to two bits in the word using bit [0:5] and [26:31].

MaskRay updated this revision to Diff 179216.Dec 20 2018, 5:53 PM
MaskRay edited the summary of this revision. (Show Details)

.

MaskRay updated this revision to Diff 179217.Dec 20 2018, 5:54 PM
MaskRay retitled this revision from [ELF] .gnu.hash bloom filter: use Shift2 = 26 instead of 2 to [ELF] .gnu.hash bloom filter: use Shift2 = 6 instead of 2.
MaskRay edited the summary of this revision. (Show Details)

Update description

MaskRay updated this revision to Diff 179219.Dec 20 2018, 6:00 PM

Better comment

MaskRay updated this revision to Diff 179221.Dec 20 2018, 6:11 PM

Sorry for whatever reason I thought 2 was chosen... Fixed the test

MaskRay retitled this revision from [ELF] .gnu.hash bloom filter: use Shift2 = 6 instead of 2 to [ELF] .gnu.hash bloom filter: use Shift2 = 26 instead of 6.Dec 20 2018, 7:15 PM
MaskRay edited the summary of this revision. (Show Details)
ruiu accepted this revision.Dec 21 2018, 1:16 PM

LGTM

ELF/SyntheticSections.cpp
2180

"It is important to be independent." is a bit too vague or too obvious. I'd just remove that sentence.

This revision is now accepted and ready to land.Dec 21 2018, 1:16 PM
This revision was automatically updated to reflect the committed changes.
MaskRay marked 2 inline comments as done.Dec 21 2018, 2:03 PM
This comment was removed by MaskRay.