This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] - Add a few warnings for --gnu-hash-table.
ClosedPublic

Authored by grimar on Jan 23 2020, 5:36 AM.

Details

Summary

The current implementation stops dumping in case of a single error
it handles, though we can continue dumping.
This patch refines it: it adds a few warnings and a few test cases.

Diff Detail

Event Timeline

grimar created this revision.Jan 23 2020, 5:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added inline comments.Jan 23 2020, 9:10 PM
llvm/test/tools/llvm-readobj/ELF/gnuhash.test
205

what we do

we emit a warning when the dynamic symbol table is empty. A valid dynamic symbol table should have at least one symbol: the symbol with index 0.

llvm/tools/llvm-readobj/ELFDumper.cpp
2525

GnuHashTable->symndx == 1 is probably sufficient.

The other conditions are redundant.

grimar marked an inline comment as done.Jan 24 2020, 12:33 AM
grimar added inline comments.
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

GnuHashTable->symndx == 1 is probably sufficient.

It is not, because it can be > 1. For example when we have an object with an
undefined dynamic symbol and use lld:

1.s:
.global bar

as 1.s -o 1.o
lld 1.o -o 1.so -shared --hash-style=gnu
llvm-readelf --gnu-hash-table 1.so
GnuHashTable {
  Num Buckets: 1
  First Hashed Symbol Index: 2
  Num Mask Words: 1
  Shift Count: 26
  Bloom Filter: [0x0]
  Buckets: [0]
  Values: []
}

For GNU ld the output is different, btw (but it does not seem to be critical to fix LLD here, I guess):

ld.bfd 1.o -o 1.so -shared --hash-style=gnu
llvm-readelf --gnu-hash-table 1.so
GnuHashTable {
  Num Buckets: 1
  First Hashed Symbol Index: 1
  Num Mask Words: 1
  Shift Count: 0
  Bloom Filter: [0x0]
  Buckets: [0]
  Values: [0x0]
}
grimar updated this revision to Diff 240147.Jan 24 2020, 3:37 AM
grimar marked an inline comment as done.
  • Addressed review comment.
jhenderson added inline comments.Jan 24 2020, 8:15 AM
llvm/test/tools/llvm-readobj/ELF/gnuhash.test
259

Linker -> either "Linkers" or "A linker"

264–267

I'm not sure I follow why we shouldn't report this in this specific case? Surely the linker should just set the index to 0, not 1?

llvm/tools/llvm-readobj/ELFDumper.cpp
2506

I was going to suggest that this should be reportUniqueWarning, but I guess that it can only ever be hit once per file, right?

2507

I'd move "found" to after "section". I might also change section -> table here, since the data does not technically need a section header, but follow whatever we say elsewhere.

2512

unsigned -> size_t?

2521–2524

Similar comment to the test. Why is it okay for linkers to do this? At least explain this.

2530–2531

I'm not sure I agree with this statement. The first symbol is the symbol with index 0. Saying we have 0 dynamic symbols is incorrect when there is just that symbol. After all the size of dynamic_symbols() will be 1.

MaskRay added inline comments.Jan 24 2020, 9:37 AM
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

OK, !Buckets.empty() && Buckets[0] == 0 is sufficient.

The number of buckets does not matter. The size of the bloom filter does not matter (it can preclude some symbols. Since there is no defined dynamic symbol, it shouldn't matter if the bloom filter precludes anything).

grimar marked an inline comment as done.Jan 25 2020, 9:25 AM
grimar added inline comments.
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

As far I understand, the first thing that any dynamic loader tries to do during a symbol lookup is that it checks agains the Bloom filter,
When we have a zeroed Bloom filter of any size, it means the object can't contain any symbol.

Then it means that what is really important for an empty gnu hash table is that the Bloom filter is zeroed.
And that I believe is what any linker produces and what we might want to check.
Anything else probably is not important.

The code to check against the filter can be:
(https://blogs.oracle.com/solaris/gnu-hash-elf-sections-v2)

/* Test against the Bloom filter */
  c = sizeof (BloomWord) * 8;
  n = (h1 / c) & os->os_maskwords_bm;
  bitmask = (1 << (h1 % c)) | (1 << (h2 % c));
  if ((os->os_bloom[n] & bitmask) != bitmask)
          return (NULL);

So it uses maskwords and bloom filter words.
So we want to check that maskwords==1 and that the Bloom filter is empty.

Hence, I'd simplify this to:

bool IsEmptyHashTable = `GnuHashTable->maskwords == 1 && BloomFilter.size() == 1 && BloomFilter.front() == 0`;

Are you OK with this?

MaskRay added inline comments.Jan 25 2020, 9:49 AM
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

The bloom filter saying "true" is a necessary condition. It the first thing to check, but it is not required to return "false" when there is a question "can symbol foo exist". It can say "true", and let the bucket/chain filter to say "false".
So, `GnuHashTable->maskwords == 1 && BloomFilter.size() == 1 && BloomFilter.front() == 0;` is indeed one way to make .gnu.hash reject all input, but it is not the only way.

All buckets being zeros (so buckets[hash % num_buckets] = 0) is a sufficient condition to say "false". This is another way to make .gnu.hash reject all input.

(Now I think the warning is probably not useful.)

grimar marked an inline comment as done.Jan 25 2020, 11:43 AM
grimar added inline comments.
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

All buckets being zeros (so buckets[hash % num_buckets] = 0) is a sufficient condition to say "false". This is another way to make .gnu.hash reject all input.

LLD creates a single dummy bucket just because of an issue in the android loader I think:

// Note that we don't want to create a zero-sized hash table because
// Android loader as of 2018 doesn't like a .gnu.hash containing such
// table. If that's the case, we create a hash table with one unused
// dummy slot.
nBuckets = std::max<size_t>((v.end() - mid) / 4, 1);

I see no other reason to create a bucket.
So the condition should actually probably be (!Buckets.empty() && Buckets[0] = 0) || Buckets.empty() then.

I think checking the Bloom filter is a more stable way?

(Now I think the warning is probably not useful.)

I've introduced it because of
W.printHexList("Values", GnuHashTable->values(NumSyms));

which has a dangerous implementation:

ArrayRef<Elf_Word> values(unsigned DynamicSymCount) const {
  return ArrayRef<Elf_Word>(buckets().end(), DynamicSymCount - symndx);
}

Unfortunately we can't know the GNU hash table size, we might have no section headers and there is no dynamic tag that holds such a value. So it might overflow even with this patch. For example when we have N dynamic symbols and that N is larger than the number of values. I do not think there is a way to detect this.

So, this warning itself not needed, but the whole branch that exits early might help a bit.

MaskRay added inline comments.Jan 25 2020, 2:18 PM
llvm/tools/llvm-readobj/ELFDumper.cpp
2525

So the condition should actually probably be (!Buckets.empty() && Buckets[0] = 0) || Buckets.empty() then.

Maybe we can use all_of(Buckets, [](... V) { return !V; });

grimar planned changes to this revision.Jan 27 2020, 1:48 AM
grimar updated this revision to Diff 240829.Jan 28 2020, 3:41 AM
grimar marked 11 inline comments as done.
  • Addressed review comments.
llvm/test/tools/llvm-readobj/ELF/gnuhash.test
264–267

In real world they do not do that.
For example if you have an empty assembly file and do

as test.s -o test.o
ld.bfd test.o -o test.so -shared --hash-style=gnu

Then you'll see that GNU ld sets index to 1:

GnuHashTable {
  Num Buckets: 1
  First Hashed Symbol Index: 1
  Num Mask Words: 1
  Shift Count: 0
  Bloom Filter: [0x0]
  Buckets: [0]
  Values: []
}

While we have buckets empty or a Bloom filter empy it is enough for
the dynamic loader to know that there is no symbol it tries to lookup.
So linkers just do not care much about the rest.

And I am not sure we even can think about this as about a bug:
the index value should point to the first hashed symbol index, but we hash nothing.
I am thinking about it as about unspecified behavior.

llvm/tools/llvm-readobj/ELFDumper.cpp
2506

Yes. We take the GnuHashTable from the DT_GNU_HASH entry:

case ELF::DT_GNU_HASH:
  GnuHashTable = reinterpret_cast<const Elf_GnuHash *>(
      toMappedAddr(Dyn.getTag(), Dyn.getPtr()));

and even if there are more than one of them (a possible broken case), we still will print only the last one.

Perhaps we want to change all of reportWarning to reportUniqueWarning. The benefit I see is that people
who will look the code around and use copy-paste will copy-paste the right piece. But I am not sure
it should be done in this patch. Perhaps there should be one separate patch that changes all warnings and
adds missing test cases. I have no strong feeling about it, but I'd prefer a comment for each untested
reportUniqueWarning call then (like one we could have here).

2507

We use "table" usually it seems, so I've switched to it.

2530–2531

OK

jhenderson added inline comments.Jan 29 2020, 2:00 AM
llvm/test/tools/llvm-readobj/ELF/gnuhash.test
266–267

what -> which

What is "it" that we should normally report? I think that needs clarifying a little. I'd split the whole part from "and we..." into a new sentence, which more explicitly explains what we normally report, and say something like "For empty tables however, this value is unimportant and can be ignored."

llvm/tools/llvm-readobj/ELFDumper.cpp
2506

All sounds reasonable to me. I'd actually make reportWarning report unique warnings if possible, with a possible extra argument (defaulted to true) that says whether to make them unique or not.

2525

when they finds that the -> when the

2526

skips -> skip
Delete "then"
And I'd change "and so linkers do not care much too" and the next sentence to say something like, in a new sentence, "In such cases, the symndx value is irrelevant and we should not report a warning."

grimar updated this revision to Diff 241105.Jan 29 2020, 3:52 AM
grimar marked 4 inline comments as done.
  • Addressed review comments.
jhenderson accepted this revision.Jan 30 2020, 1:14 AM

LGTM, with two nits.

llvm/test/tools/llvm-readobj/ELF/gnuhash.test
266

that -> than

llvm/tools/llvm-readobj/ELFDumper.cpp
2523

the dummy -> dummy

This revision is now accepted and ready to land.Jan 30 2020, 1:14 AM
This revision was automatically updated to reflect the committed changes.
grimar marked an inline comment as done.