This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/source/Plugins/ObjectFile/PECOFF/
-
source/
-
Plugins/
-
ObjectFile/
-
PECOFF/
2/7
ObjectFilePECOFF.cpp

Differential D83881

[lldb/COFF] Remove strtab zeroing hack
ClosedPublic

Authored by labath on Jul 15 2020, 8:30 AM.

Download Raw Diff

Details

Reviewers

amccarth
markmentovai

Commits

rGede7c02b38c0: [lldb/COFF] Remove strtab zeroing hack

Summary

This code (recently responsible for a unaligned access sanitizer
failure) claims that the string table offset zero should result in an
empty string.

I cannot find any mention of this detail in the Microsoft COFF
documentation, and the llvm COFF parser also does not handle offset zero
specially. This code was introduced in 0076e7159, which also does not go
into specifics, citing "various bugfixes".

Given that this is obviously a hack, and does not cause tests to fail, I
think we should just delete it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labath created this revision.Jul 15 2020, 8:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 8:30 AM

Harbormaster completed remote builds in B64365: Diff 278201.Jul 15 2020, 9:01 AM

Yes, getting rid of this hack looks like a good idea. If it was actually necessary, there should have been a test on it, and the comments should have been clearer.

See my inline comment, though. It looks like this might back out only part of the change.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642	The `+4` at the end of this expression is from the same patch. I wonder if it was an attempt to make space for the four bytes of zeros at offset 0 that you're eliminating? I suggest removing the `+4` and trying the tests again unless it's obvious to you why it's still necessary. The comment above seems like it might be trying to explain it, but that comment came later.

In D83881#2153687, @amccarth wrote:

Yes, getting rid of this hack looks like a good idea. If it was actually necessary, there should have been a test on it, and the comments should have been clearer.

Well in general, I wouldn't agree with that logic - especially if the test coverage for PECOFF in lldb has been weak to begin with. However I do agree with the conclusion that we can try to get rid of it.

So with COFF symbol tables, you can either have a <= 8 chars symbol name embedded in the struct, or have an offset into the string table (where the first 4 bytes of the string table contains the length of the string table). Now the existing code seems to imply that one potentially could have a symbol with an all-zeros symbol name field (offset), which according to this code should be interpreted as an offset into the string table (but without the current hack would end up interpreting the binary size field as text).

I haven't heard of such symbol table entries, and COFFObjectFile::getSymbolName, https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L994-L997, seems to just blindly call COFFObjectFile::getString(), https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L980-L986, which doesn't seem to have a special case for that. So if there are object files out there with this pattern, COFFObjectFile wouldn't be able to handle them correctly either.

So with that in mind, I agree that it probably is ok to remove this hack.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642	No, the `+4` here was present before 0076e71592a6a (if viewing that commit, view it with a bit more than the default git context size and you'll find "// Include the 4 bytes string table size at the end of the symbols" existing already before that. The +4 here can't be eliminated; without it, the `const uint32_t strtab_size = symtab_data.GetU32(&offset)` two lines below would be out of range. So this first reads the symbol table and the 4 byte size of the string table, and if the string table turns out to be nonzero in size, it also loads a separate DataExtractor with the string table contents, with the two DataExtractors overlapping for that size field. It doesn't have anything to do with overwriting the start, just with the code layout in general.
672	Kind of unrelated question: Does this treat the assigned symbol as always 8 bytes long, or does it scan the provided 8 bytes for potential trailing terminators? Object/COFFObjectFile has got a bit of extra logic to distinguish between those cases: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L999-L1004
689	Unrelated to the patch at hand: I believe this should be `offset += symbol.naux * symbol_size;`. I could try to make a patch to exercise that case at some later time...

Thanks @mstorsjo for clarifying for me.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
642	Sorry, I mixed up `strtab_data` and `symtab_data` when comparing to the old patch, which is why I didn't see the comment where I expected it. The old patch actually _removed_ a `+4` when computing the offset for `strtab_data`. It seemed weird this patch didn't have to restore that in order to back out this part of the change. But I think I understand it now.

This revision is now accepted and ready to land.Jul 16 2020, 11:46 AM

Thanks for checking this out. Sorry for forgetting you Martin, I'll have to remember to add you to my future coff patches.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
672	It always treats is as 8 bytes, but then the string is accessed via `.c_str()` on line 679. which picks up the first nul character in the string, which is either one of the embedded ones, or the one that std::string appends internally. I don't know if that was intentional -- it's kinda neat, but also quite scary.
689	Yeah, that looks like a bug. If it is, we could still manage to cherry-pick that for 11.0.

Closed by commit rGede7c02b38c0: [lldb/COFF] Remove strtab zeroing hack (authored by labath). · Explain WhyJul 17 2020, 4:26 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lldb/

source/

Plugins/

ObjectFile/

PECOFF/

ObjectFilePECOFF.cpp

12 lines

Diff 278721

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp

Show First 20 Lines • Show All 537 Lines • ▼ Show 20 Lines

DataExtractor ObjectFilePECOFF::ReadImageData(uint32_t offset, size_t size) {		DataExtractor ObjectFilePECOFF::ReadImageData(uint32_t offset, size_t size) {
if (!size)		if (!size)
return {};		return {};

if (m_data.ValidOffsetForDataOfSize(offset, size))		if (m_data.ValidOffsetForDataOfSize(offset, size))
return DataExtractor(m_data, offset, size);		return DataExtractor(m_data, offset, size);

if (m_file) {
// A bit of a hack, but we intend to write to this buffer, so we can't
// mmap it.
auto buffer_sp = MapFileData(m_file, size, offset);
return DataExtractor(buffer_sp, GetByteOrder(), GetAddressByteSize());
}
ProcessSP process_sp(m_process_wp.lock());		ProcessSP process_sp(m_process_wp.lock());
DataExtractor data;		DataExtractor data;
if (process_sp) {		if (process_sp) {
auto data_up = std::make_unique<DataBufferHeap>(size, 0);		auto data_up = std::make_unique<DataBufferHeap>(size, 0);
Status readmem_error;		Status readmem_error;
size_t bytes_read =		size_t bytes_read =
process_sp->ReadMemory(m_image_base + offset, data_up->GetBytes(),		process_sp->ReadMemory(m_image_base + offset, data_up->GetBytes(),
data_up->GetByteSize(), readmem_error);		data_up->GetByteSize(), readmem_error);
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	if (m_symtab_up == nullptr) {

const uint32_t num_syms = m_coff_header.nsyms;		const uint32_t num_syms = m_coff_header.nsyms;

if (m_file && num_syms > 0 && m_coff_header.symoff > 0) {		if (m_file && num_syms > 0 && m_coff_header.symoff > 0) {
const uint32_t symbol_size = 18;		const uint32_t symbol_size = 18;
const size_t symbol_data_size = num_syms * symbol_size;		const size_t symbol_data_size = num_syms * symbol_size;
// Include the 4-byte string table size at the end of the symbols		// Include the 4-byte string table size at the end of the symbols
DataExtractor symtab_data =		DataExtractor symtab_data =
ReadImageData(m_coff_header.symoff, symbol_data_size + 4);		ReadImageData(m_coff_header.symoff, symbol_data_size + 4);
		amccarthUnsubmitted Not Done Reply Inline Actions The `+4` at the end of this expression is from the same patch. I wonder if it was an attempt to make space for the four bytes of zeros at offset 0 that you're eliminating? I suggest removing the `+4` and trying the tests again unless it's obvious to you why it's still necessary. The comment above seems like it might be trying to explain it, but that comment came later. amccarth: The `+4` at the end of this expression is from the same patch. I wonder if it was an attempt…
		mstorsjoUnsubmitted Not Done Reply Inline Actions No, the `+4` here was present before 0076e71592a6a (if viewing that commit, view it with a bit more than the default git context size and you'll find "// Include the 4 bytes string table size at the end of the symbols" existing already before that. The +4 here can't be eliminated; without it, the `const uint32_t strtab_size = symtab_data.GetU32(&offset)` two lines below would be out of range. So this first reads the symbol table and the 4 byte size of the string table, and if the string table turns out to be nonzero in size, it also loads a separate DataExtractor with the string table contents, with the two DataExtractors overlapping for that size field. It doesn't have anything to do with overwriting the start, just with the code layout in general. mstorsjo: No, the `+4` here was present before 0076e71592a6a (if viewing that commit, view it with a bit…
		amccarthUnsubmitted Not Done Reply Inline Actions Sorry, I mixed up `strtab_data` and `symtab_data` when comparing to the old patch, which is why I didn't see the comment where I expected it. The old patch actually _removed_ a `+4` when computing the offset for `strtab_data`. It seemed weird this patch didn't have to restore that in order to back out this part of the change. But I think I understand it now. amccarth: Sorry, I mixed up `strtab_data` and `symtab_data` when comparing to the old patch, which is…
lldb::offset_t offset = symbol_data_size;		lldb::offset_t offset = symbol_data_size;
const uint32_t strtab_size = symtab_data.GetU32(&offset);		const uint32_t strtab_size = symtab_data.GetU32(&offset);
if (strtab_size > 0) {		if (strtab_size > 0) {
DataExtractor strtab_data = ReadImageData(		DataExtractor strtab_data = ReadImageData(
m_coff_header.symoff + symbol_data_size, strtab_size);		m_coff_header.symoff + symbol_data_size, strtab_size);

// First 4 bytes should be zeroed after strtab_size has been read,
// because it is used as offset 0 to encode a NULL string.
uint32_t strtab_data_start = const_cast<uint32_t >(
reinterpret_cast<const uint32_t *>(strtab_data.GetDataStart()));
::memset(&strtab_data_start[0], 0, sizeof(uint32_t));

offset = 0;		offset = 0;
std::string symbol_name;		std::string symbol_name;
Symbol *symbols = m_symtab_up->Resize(num_syms);		Symbol *symbols = m_symtab_up->Resize(num_syms);
for (uint32_t i = 0; i < num_syms; ++i) {		for (uint32_t i = 0; i < num_syms; ++i) {
coff_symbol_t symbol;		coff_symbol_t symbol;
const uint32_t symbol_offset = offset;		const uint32_t symbol_offset = offset;
const char *symbol_name_cstr = nullptr;		const char *symbol_name_cstr = nullptr;
// If the first 4 bytes of the symbol string are zero, then they		// If the first 4 bytes of the symbol string are zero, then they
// are followed by a 4-byte string table offset. Else these		// are followed by a 4-byte string table offset. Else these
// 8 bytes contain the symbol name		// 8 bytes contain the symbol name
if (symtab_data.GetU32(&offset) == 0) {		if (symtab_data.GetU32(&offset) == 0) {
// Long string that doesn't fit into the symbol table name, so		// Long string that doesn't fit into the symbol table name, so
// now we must read the 4 byte string table offset		// now we must read the 4 byte string table offset
uint32_t strtab_offset = symtab_data.GetU32(&offset);		uint32_t strtab_offset = symtab_data.GetU32(&offset);
symbol_name_cstr = strtab_data.PeekCStr(strtab_offset);		symbol_name_cstr = strtab_data.PeekCStr(strtab_offset);
symbol_name.assign(symbol_name_cstr);		symbol_name.assign(symbol_name_cstr);
} else {		} else {
// Short string that fits into the symbol table name which is 8		// Short string that fits into the symbol table name which is 8
// bytes		// bytes
offset += sizeof(symbol.name) - 4; // Skip remaining		offset += sizeof(symbol.name) - 4; // Skip remaining
symbol_name_cstr = symtab_data.PeekCStr(symbol_offset);		symbol_name_cstr = symtab_data.PeekCStr(symbol_offset);
if (symbol_name_cstr == nullptr)		if (symbol_name_cstr == nullptr)
break;		break;
symbol_name.assign(symbol_name_cstr, sizeof(symbol.name));		symbol_name.assign(symbol_name_cstr, sizeof(symbol.name));
		mstorsjoUnsubmitted Not Done Reply Inline Actions Kind of unrelated question: Does this treat the assigned symbol as always 8 bytes long, or does it scan the provided 8 bytes for potential trailing terminators? Object/COFFObjectFile has got a bit of extra logic to distinguish between those cases: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Object/COFFObjectFile.cpp#L999-L1004 mstorsjo: Kind of unrelated question: Does this treat the assigned symbol as always 8 bytes long, or does…
		labathAuthorUnsubmitted Done Reply Inline Actions It always treats is as 8 bytes, but then the string is accessed via `.c_str()` on line 679. which picks up the first nul character in the string, which is either one of the embedded ones, or the one that std::string appends internally. I don't know if that was intentional -- it's kinda neat, but also quite scary. labath: It always treats is as 8 bytes, but then the string is accessed via `.c_str()` on line 679.
}		}
symbol.value = symtab_data.GetU32(&offset);		symbol.value = symtab_data.GetU32(&offset);
symbol.sect = symtab_data.GetU16(&offset);		symbol.sect = symtab_data.GetU16(&offset);
symbol.type = symtab_data.GetU16(&offset);		symbol.type = symtab_data.GetU16(&offset);
symbol.storage = symtab_data.GetU8(&offset);		symbol.storage = symtab_data.GetU8(&offset);
symbol.naux = symtab_data.GetU8(&offset);		symbol.naux = symtab_data.GetU8(&offset);
symbols[i].GetMangled().SetValue(ConstString(symbol_name.c_str()));		symbols[i].GetMangled().SetValue(ConstString(symbol_name.c_str()));
if ((int16_t)symbol.sect >= 1) {		if ((int16_t)symbol.sect >= 1) {
Address symbol_addr(sect_list->FindSectionByID(symbol.sect),		Address symbol_addr(sect_list->FindSectionByID(symbol.sect),
symbol.value);		symbol.value);
symbols[i].GetAddressRef() = symbol_addr;		symbols[i].GetAddressRef() = symbol_addr;
symbols[i].SetType(MapSymbolType(symbol.type));		symbols[i].SetType(MapSymbolType(symbol.type));
}		}

if (symbol.naux > 0) {		if (symbol.naux > 0) {
i += symbol.naux;		i += symbol.naux;
offset += symbol_size;		offset += symbol_size;
		mstorsjoUnsubmitted Not Done Reply Inline Actions Unrelated to the patch at hand: I believe this should be `offset += symbol.naux * symbol_size;`. I could try to make a patch to exercise that case at some later time... mstorsjo: Unrelated to the patch at hand: I believe this should be `offset += symbol.naux * symbol_size;`.
		labathAuthorUnsubmitted Done Reply Inline Actions Yeah, that looks like a bug. If it is, we could still manage to cherry-pick that for 11.0. labath: Yeah, that looks like a bug. If it is, we could still manage to cherry-pick that for 11.0.
}		}
}		}
}		}
}		}

// Read export header		// Read export header
if (coff_data_dir_export_table < m_coff_header_opt.data_dirs.size() &&		if (coff_data_dir_export_table < m_coff_header_opt.data_dirs.size() &&
m_coff_header_opt.data_dirs[coff_data_dir_export_table].vmsize > 0 &&		m_coff_header_opt.data_dirs[coff_data_dir_export_table].vmsize > 0 &&
▲ Show 20 Lines • Show All 544 Lines • Show Last 20 Lines