This is an archive of the discontinued LLVM Phabricator instance.

[lldb/COFF] Remove strtab zeroing hack
ClosedPublic

Authored by labath on Jul 15 2020, 8:30 AM.

Details

Summary

This code (recently responsible for a unaligned access sanitizer
failure) claims that the string table offset zero should result in an
empty string.

I cannot find any mention of this detail in the Microsoft COFF
documentation, and the llvm COFF parser also does not handle offset zero
specially. This code was introduced in 0076e7159, which also does not go
into specifics, citing "various bugfixes".

Given that this is obviously a hack, and does not cause tests to fail, I
think we should just delete it.

Diff Detail

Event Timeline

labath created this revision.Jul 15 2020, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 8:30 AM

Yes, getting rid of this hack looks like a good idea. If it was actually necessary, there should have been a test on it, and the comments should have been clearer.

See my inline comment, though. It looks like this might back out only part of the change.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642

The +4 at the end of this expression is from the same patch. I wonder if it was an attempt to make space for the four bytes of zeros at offset 0 that you're eliminating?

I suggest removing the +4 and trying the tests again unless it's obvious to you why it's still necessary. The comment above seems like it might be trying to explain it, but that comment came later.

Yes, getting rid of this hack looks like a good idea. If it was actually necessary, there should have been a test on it, and the comments should have been clearer.

Well in general, I wouldn't agree with that logic - especially if the test coverage for PECOFF in lldb has been weak to begin with. However I do agree with the conclusion that we can try to get rid of it.

So with COFF symbol tables, you can either have a <= 8 chars symbol name embedded in the struct, or have an offset into the string table (where the first 4 bytes of the string table contains the length of the string table). Now the existing code seems to imply that one potentially could have a symbol with an all-zeros symbol name field (offset), which according to this code should be interpreted as an offset into the string table (but without the current hack would end up interpreting the binary size field as text).

I haven't heard of such symbol table entries, and COFFObjectFile::getSymbolName, https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L994-L997, seems to just blindly call COFFObjectFile::getString(), https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L980-L986, which doesn't seem to have a special case for that. So if there are object files out there with this pattern, COFFObjectFile wouldn't be able to handle them correctly either.

So with that in mind, I agree that it probably is ok to remove this hack.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642

No, the +4 here was present before 0076e71592a6a (if viewing that commit, view it with a bit more than the default git context size and you'll find "// Include the 4 bytes string table size at the end of the symbols" existing already before that.

The +4 here can't be eliminated; without it, the const uint32_t strtab_size = symtab_data.GetU32(&offset) two lines below would be out of range. So this first reads the symbol table and the 4 byte size of the string table, and if the string table turns out to be nonzero in size, it also loads a separate DataExtractor with the string table contents, with the two DataExtractors overlapping for that size field. It doesn't have anything to do with overwriting the start, just with the code layout in general.

672

Kind of unrelated question: Does this treat the assigned symbol as always 8 bytes long, or does it scan the provided 8 bytes for potential trailing terminators? Object/COFFObjectFile has got a bit of extra logic to distinguish between those cases: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L999-L1004

689

Unrelated to the patch at hand: I believe this should be offset += symbol.naux * symbol_size;. I could try to make a patch to exercise that case at some later time...

amccarth accepted this revision.Jul 16 2020, 11:46 AM

Thanks @mstorsjo for clarifying for me.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642

Sorry, I mixed up strtab_data and symtab_data when comparing to the old patch, which is why I didn't see the comment where I expected it.

The old patch actually _removed_ a +4 when computing the offset for strtab_data. It seemed weird this patch didn't have to restore that in order to back out this part of the change. But I think I understand it now.

This revision is now accepted and ready to land.Jul 16 2020, 11:46 AM
labath marked 2 inline comments as done.Jul 17 2020, 4:03 AM

Thanks for checking this out. Sorry for forgetting you Martin, I'll have to remember to add you to my future coff patches.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
672

It always treats is as 8 bytes, but then the string is accessed via .c_str() on line 679. which picks up the first nul character in the string, which is either one of the embedded ones, or the one that std::string appends internally.

I don't know if that was intentional -- it's kinda neat, but also quite scary.

689

Yeah, that looks like a bug. If it is, we could still manage to cherry-pick that for 11.0.

This revision was automatically updated to reflect the committed changes.