This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] - Fix a st_name of the first symbol table entry.
ClosedPublic

Authored by grimar on Mar 18 2019, 9:40 AM.

Details

Summary

Spec says about the first symbol table entry that index 0 both designates the first entry in the table
and serves as the undefined symbol index. It should have zero value.
Hence the first symbol table entry has no name. And so has to have a st_name == 0.
(http://refspecs.linuxbase.org/elf/gabi4+/ch4.symtab.html)

Currently, we do not emit zero value for the first symbol table entry.
That happens because we add empty strings to the string builder, which
for each such case adds a zero byte:
(https://github.com/llvm-mirror/llvm/blob/master/lib/MC/StringTableBuilder.cpp#L185)
After the string optimization performed it might return non zero indexes for the
empty string requested.

It was revealed during the development and review of D59488.
The patch fixes this issue for the case above and other sections with no names.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Mar 18 2019, 9:40 AM

Perhaps you should add a test for other empty-named symbols?

test/tools/llvm-objcopy/ELF/symtab-null-entry.test
27 ↗(On Diff #191110)

did not reproduce -> is not reproduced by this test.

tools/llvm-objcopy/ELF/Object.cpp
498 ↗(On Diff #191110)

e.g special first symbol entry -> e.g. the null symbol entry

for consistency with our wording elsewhere, I believe.

499 ↗(On Diff #191110)

sections -> section

jakehehrlich added inline comments.Mar 18 2019, 12:21 PM
tools/llvm-objcopy/ELF/Object.cpp
301 ↗(On Diff #191110)

Seems like that shouldn't be an error

476 ↗(On Diff #191110)

Can't findIndex just return zero in such a case?

grimar updated this revision to Diff 191268.Mar 19 2019, 3:36 AM
grimar marked 6 inline comments as done.
  • Addressed review comments.

Perhaps you should add a test for other empty-named symbols?

I added a test for a section symbol.
Since we have 2 noticeable possible cases here: null symbols and all others, I think it should be enough,
probably no need to test all kinds of symbols that can have empty names.

tools/llvm-objcopy/ELF/Object.cpp
301 ↗(On Diff #191110)

Does not seem to make sense to add an empty string to a string builder too though.
(while StringTableSection::findIndex explicitly returns 0 for an empty string case)

Are you ok with the new code?

476 ↗(On Diff #191110)

OK

498 ↗(On Diff #191110)

Thanks. This comment was moved.

jhenderson added inline comments.Mar 19 2019, 4:09 AM
test/tools/llvm-objcopy/ELF/symbol-empty-name.test
31 ↗(On Diff #191268)

Nit: Could you reduce the number of spaces between the tag and the value, please, here and below? I'm finding it a little hard to follow what maps to what! Keep them lined up within the block of tags, but remove all but one space for the longest tag name like this:

Class:   ELFCLASS64
Data:    ELFDATA2LSB
Type:    ET_EXEC
Machine: EM_X86_64
tools/llvm-objcopy/ELF/Object.cpp
308–309 ↗(On Diff #191268)

I'm thinking that the StrTabBuilder class should actually store an empty string explicitly, rather than have an implicit one as it currently does. That would allow you to do findIndex with an empty string, with no problem.

301 ↗(On Diff #191110)

Personally, I quite like the behaviour in the latest version of the patch.

grimar marked an inline comment as done.Mar 19 2019, 6:45 AM
grimar added inline comments.
tools/llvm-objcopy/ELF/Object.cpp
308–309 ↗(On Diff #191268)

I am not sure I understand. StrTabBuilder is an instance of llvm/MC/StringTableBuilder class.
It is used for enum Kind { ELF, WinCOFF, MachO, RAW, DWARF }; string tables.

Now it works fine for adding the empty string or any other strings.
Empty string's offset is not guaranteed to have a zero offset after the optimization and it seems
fine to me. Why would string builder need to know we need to have a zero offset for no-string (empty) ELF only case?

jhenderson added inline comments.Mar 19 2019, 7:21 AM
tools/llvm-objcopy/ELF/Object.cpp
308–309 ↗(On Diff #191268)

Ah, I think I see a problem. I imagined that the ELF kind of StringTableBuilder would add a null string to its string mapping during initialisation. This would mean that its getOffset method could be called with empty string without having to explicitly add an empty string. Adding it explicitly (e.g. in the constructor) though could cause optimization to move the initial empty string to another location (which would be wrong).

Basically, I find it strange that we have to guard against an empty string lookup, when we know that there is always such a string in the StringTableBuilder instance for ELF kinds (i.e. I don't like the need for the check, just to prevent an assertion for a string not being found).

The current implementation in this change was what I had in mind but I have questions now. I think this implementation is wrong

  1. Is a zero index not ensured to point to the empty string according to the standard?
  2. If you need a symbol with an empty string for a name are you expected to use a non-zero index or a zero index to represent the empty string?
  3. Regardless of what string table is present, does the zero index always represent the empty string? That seems impossible since you'd always need a dummy character at the start which might as well be a zero. I think we're expecting this behavior right now and it doesn't seem right to me. Consider "bat\0" as a string table with a symbol named "" and a symbol named "bat\0". I think the symbol named "" should have index 3 not 0 and that the symbol named "bat" should have index 0.

Maybe the correct implementation is to special case the dummy symbol and then to leave the string table implementation unchanged.

grimar added a comment.EditedMar 20 2019, 2:28 AM

I need a bit more time to investigate this I think.

Spec says that "The first byte, which is index zero, holds a null character.
Likewise, a string table's last byte holds a null character, ensuring null termination for all strings.
A string whose index is zero specifies either no name or a null name, depending on the context."
(https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-73709.html)

So looks like I was wrong and string table builder just really should not optimize the first zero byte for ELF tables.
I.e. we should have 2 zero bytes, at the start and at the end: "\0bat\0".

grimar updated this revision to Diff 191496.Mar 20 2019, 7:54 AM
grimar marked an inline comment as done.
  • Updated implementation.

So StringTableBuilder reserves a null byte at start for ELF tables:
https://github.com/llvm-mirror/llvm/blob/master/lib/MC/StringTableBuilder.cpp#L38

Seems what we want is just to use it explicitly. For that, I reassign the index of an empty string after finalization.

jhenderson added inline comments.Mar 20 2019, 8:04 AM
lib/MC/StringTableBuilder.cpp
163 ↗(On Diff #191496)

Whilst I think the change is good, I can't find anything in the specification that requires empty strings to have index zero, only that the first byte must be an empty string. I think this comment needs to be updated to say something about "the first byte must always be an empty string" etc.

grimar marked an inline comment as done.Mar 20 2019, 8:26 AM
grimar added inline comments.
lib/MC/StringTableBuilder.cpp
163 ↗(On Diff #191496)

What about something like the following?

"The first byte in ELF string table must be null. In according to ELF specification a string whose index is zero specifies either
no name or a null name, depending on the context. In 'initSize()' we reserve the first byte to hold null for that and here we use it.
That also allows 'getOffset()' called for any empty string to always return a fixed zero offset."

jhenderson added inline comments.Mar 20 2019, 8:34 AM
lib/MC/StringTableBuilder.cpp
163 ↗(On Diff #191496)

I don't think we need to be that complicated. How about this:

"The first byte in an ELF string table must be null, according to the ELF specification. In 'initSize()' we reserved the first byte to hold null for this purpose and here we actually add the string to allow 'getOffset()' to be called on an empty string."

grimar updated this revision to Diff 191503.Mar 20 2019, 8:43 AM
grimar marked an inline comment as done.
  • Updated the comment.
lib/MC/StringTableBuilder.cpp
163 ↗(On Diff #191496)

Looks good, thanks!

This revision is now accepted and ready to land.Mar 20 2019, 9:06 AM

@jakehehrlich, are you OK with this?

jakehehrlich accepted this revision.Mar 21 2019, 2:38 PM

LGTM as well. Thanks for checking the spec!

rupprecht accepted this revision.Mar 21 2019, 3:44 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2019, 3:28 AM