This is an archive of the discontinued LLVM Phabricator instance.

Unconditionally accept symbol sizes from elf
ClosedPublic

Authored by tberghammer on Jan 14 2016, 5:52 AM.

Download Raw Diff

Details

Reviewers

Commits

rG8c6996f737fe: Unconditionally accept symbol sizes from elf
rG6b2322fb4ca7: Unconditionally accept symbol sizes from elf
rLLDB258113: Unconditionally accept symbol sizes from elf
rLLDB258040: Unconditionally accept symbol sizes from elf
rL258113: Unconditionally accept symbol sizes from elf
rL258040: Unconditionally accept symbol sizes from elf

Summary

Unconditionally accept symbol sizes from elf

The ELF symbol table always contain the size of the symbols so we
don't have to try to guess them based on the address of the next
symbol (it is needed for mach-o).

The change fixes an issue when a symbol is removed after a 0 size
symbol (e.g. because the second one is not public) what previously
caused the symbol lookup algorithm to end up with showing the 0 size
symbol even for the later addresses (what are not part of any symbol).
That symbol lookup error can confuse the user and also confuses the
current stack unwinder.

Diff Detail

Repository: rL LLVM

Event Timeline

tberghammer updated this revision to Diff 44872.Jan 14 2016, 5:52 AM

tberghammer retitled this revision from to Unconditionally accept symbol sizes from .dynsym.

tberghammer updated this object.

tberghammer added a reviewer: clayborg.

tberghammer added a subscriber: lldb-commits.

So does this mean any symbols whose byte size is zero will always have a byte size of zero when parsed from a .dynsym section? What kinds of symbols have a byte size of zero in the .dynsym? Seems like function symbols should always have a valid byte size no? Why is the confusing the unwinder? How does the unwinder treat symbols differently when they have a byte size of zero after your fix and how does this fix your issues?

Yes it will prevent the artificial size calculation for symbols with 0 size in .dynsym. I think all function and data symbols should have a valid size so I am not sure why we have this symbol size calculation in the first place but I assume it is to work around some old compiler bug.

You can end up with a 0 size symbol if you specify a symbol in assembly without specifying it's type and size (found it in some android framework code).

.global foo
foo:
<any data or code>

The unwinder treat symbols with 0 size the same way as it treats with valid size. The difference is that it treats an address differently if we have a symbol for it and if we don't have a symbol for it. In the first case it calculates the unwind info for the given function while in the second case it tries to find unwind information fitting for the given address.

The current implementation with extending the length of the symbol have a few problem:

We will display the wrong function name to the user in the backtrace what is annoying (worse then not showing a function name)
If we end up with an incorrect symbol name then we will use the unwind information for that symbol what won't be valid at the location where the code is currently stopped

It would be interesting to see if there are any relocations or any other hints to help make correct function bounds from a stripped (.dynsym only) ELF file. In MachO we have a LC_FUNCTION_STARTS load command (kind of like an ELF note) that contains all start addresses of all functions even if we have all private symbols stripped. It might be worth checking if there is anything in ELF that could help us determine function starts for a given binary, then this wouldn't be an issue right? A few things I can think of are the EH frame FDEs can give you all function bounds for all functions that have unwind info. Relocations might be able to help you, but they might just be noise.

I looked through the sections we have in a striped elf file and non of them have any information what would tell us the start address of the functions (it isn't needed in runtime so it is removed to decrease the size).

Relocations won't really help because they will only reference the start of the public/global functions where we already have the start address from dynsym.

.eh_frame is a useful information source, but it isn't complete for several reasons. It is only present for non leaf functions (if the compiler is smart enough) and one eh_frame entry can belong to several functions (very common on arm where we use .ARM.exidx instead of .eh_frame). Currently if we don't have a symbol name for a function then we try to create a fake symbol for it based on the eh_frame but this is the behavior what is stopped by the extension of the 0 sized symbols because the symbol size extension will cover addresses where we originally didn't have any symbol. With leaving the 0 size symbols valid I would like to get this behavior back ti work (it works when the last symbol before the address had a non 0 size).

Do you remember what was the original reason for changing the size of the symbols from 0 to the address of the next symbol? I think it would be a good idea to completely remove that logic but I am not sure if it would break somebody or not.

In D16186#327081, @tberghammer wrote:

I looked through the sections we have in a striped elf file and non of them have any information what would tell us the start address of the functions (it isn't needed in runtime so it is removed to decrease the size).

Relocations won't really help because they will only reference the start of the public/global functions where we already have the start address from dynsym.

.eh_frame is a useful information source, but it isn't complete for several reasons. It is only present for non leaf functions (if the compiler is smart enough) and one eh_frame entry can belong to several functions (very common on arm where we use .ARM.exidx instead of .eh_frame). Currently if we don't have a symbol name for a function then we try to create a fake symbol for it based on the eh_frame but this is the behavior what is stopped by the extension of the 0 sized symbols because the symbol size extension will cover addresses where we originally didn't have any symbol. With leaving the 0 size symbols valid I would like to get this behavior back ti work (it works when the last symbol before the address had a non 0 size).

Should we try to use EH frame set the size of any zero sizes symbols?

Do you remember what was the original reason for changing the size of the symbols from 0 to the address of the next symbol? I think it would be a good idea to completely remove that logic but I am not sure if it would break somebody or not.

This was done for MachO because we don't have sizes in our symbol table. You can probably just remove it and say that the size is valid always because ELF does have a symbol size. We needed to do this in MachO, but it isn't necessary in ELF. So try removing it and running the test suite, if all passes, then just make it so for all symbols (both .symtab and .dynsym).

This revision now requires changes to proceed.Jan 14 2016, 11:07 AM

I disabled the symbol size calculation for ELF and it worked for all configuration I tested (Linux x86_64/i386, clang-3.5/gcc-4.8.4) so I think we should go ahead with this solution. We can look for something different if we hit any issue.

See inlined comments.

source/Symbol/Symtab.cpp
974–978 ↗	(On Diff #44981)	You can remove this if statement right? Symbol byte size will always be valid no?
1070–1071 ↗	(On Diff #44981)	Why do we need to check this? Won't "m_file_addr_to_index.FindEntryThatContains(file_addr);" already check this for us?

This revision now requires changes to proceed.Jan 15 2016, 9:49 AM

tberghammer added inline comments.Jan 15 2016, 9:59 AM

source/Symbol/Symtab.cpp
974–978 ↗	(On Diff #44981)	I case of ELF all synbol size will be valid but I think this code is used for mach-o as well where they won't. The condition is kind of saying "if (mach-o)"
1070–1071 ↗	(On Diff #44981)	The m_file_addr_to_index initialized with extending all 0 byte entry until the next entry so it can handle mach-o as well and because of this an entry can cover a larger address range then it's symbol. An alternative implementation would be to sort the symbols based on address. Then calculate the size for all of them if they are not valid (mach-o) and finally generate the entries based on that. That way we can get rid of this condition but Symtab::InitAddressIndexes would become more complicated and most likely a bit less efficient.

Never mind, sorry, missed that this was in Symbol.cpp, I was still thinking of the ObjectFileELF...

This revision is now accepted and ready to land.Jan 15 2016, 10:24 AM

Closed by commit rL258040: Unconditionally accept symbol sizes from elf (authored by tberghammer). · Explain WhyJan 18 2016, 2:42 AM

This revision was automatically updated to reflect the committed changes.

jevinskie added a subscriber: jevinskie.Jan 27 2016, 7:55 AM

Revision Contents

Path

Size

lldb/

trunk/

include/

lldb/

Symbol/

Symbol.h

3 lines

source/

Plugins/

ObjectFile/

ELF/

ObjectFileELF.cpp

6 lines

Symbol/

Symbol.cpp

7 lines

Symtab.cpp

23 lines

Diff 45160

lldb/trunk/include/lldb/Symbol/Symbol.h

Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	GetInstructions (const ExecutionContext &exe_ctx,
bool prefer_file_cache);		bool prefer_file_cache);

bool		bool
GetDisassembly (const ExecutionContext &exe_ctx,		GetDisassembly (const ExecutionContext &exe_ctx,
const char *flavor,		const char *flavor,
bool prefer_file_cache,		bool prefer_file_cache,
Stream &strm);		Stream &strm);

		bool
		ContainsFileAddress (lldb::addr_t file_addr) const;

protected:		protected:
// This is the internal guts of ResolveReExportedSymbol, it assumes reexport_name is not null, and that module_spec		// This is the internal guts of ResolveReExportedSymbol, it assumes reexport_name is not null, and that module_spec
// is valid. We track the modules we've already seen to make sure we don't get caught in a cycle.		// is valid. We track the modules we've already seen to make sure we don't get caught in a cycle.

Symbol *		Symbol *
ResolveReExportedSymbolInModuleSpec (Target &target,		ResolveReExportedSymbolInModuleSpec (Target &target,
ConstString &reexport_name,		ConstString &reexport_name,
lldb_private::ModuleSpec &module_spec,		lldb_private::ModuleSpec &module_spec,
Show All 22 Lines

lldb/trunk/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Show First 20 Lines • Show All 2,289 Lines • ▼ Show 20 Lines	for (i = 0; i < num_symbols; ++i)
is_global, // Is this globally visible?		is_global, // Is this globally visible?
false, // Is this symbol debug info?		false, // Is this symbol debug info?
false, // Is this symbol a trampoline?		false, // Is this symbol a trampoline?
false, // Is this symbol artificial?		false, // Is this symbol artificial?
AddressRange(		AddressRange(
symbol_section_sp, // Section in which this symbol is defined or null.		symbol_section_sp, // Section in which this symbol is defined or null.
symbol_value, // Offset in section or symbol value.		symbol_value, // Offset in section or symbol value.
symbol.st_size), // Size in bytes of this symbol.		symbol.st_size), // Size in bytes of this symbol.
symbol.st_size != 0, // Size is valid if it is not 0		true, // Symbol size is valid
has_suffix, // Contains linker annotations?		has_suffix, // Contains linker annotations?
flags); // Symbol flags.		flags); // Symbol flags.
symtab->AddSymbol(dc_symbol);		symtab->AddSymbol(dc_symbol);
}		}
return i;		return i;
}		}

unsigned		unsigned
ObjectFileELF::ParseSymbolTable(Symtab symbol_table, user_id_t start_id, lldb_private::Section symtab)		ObjectFileELF::ParseSymbolTable(Symtab *symbol_table,
		user_id_t start_id,
		lldb_private::Section *symtab)
{		{
if (symtab->GetObjectFile() != this)		if (symtab->GetObjectFile() != this)
{		{
// If the symbol table section is owned by a different object file, have it do the		// If the symbol table section is owned by a different object file, have it do the
// parsing.		// parsing.
ObjectFileELF obj_file_elf = static_cast<ObjectFileELF >(symtab->GetObjectFile());		ObjectFileELF obj_file_elf = static_cast<ObjectFileELF >(symtab->GetObjectFile());
return obj_file_elf->ParseSymbolTable (symbol_table, start_id, symtab);		return obj_file_elf->ParseSymbolTable (symbol_table, start_id, symtab);
}		}
▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lldb/trunk/source/Symbol/Symbol.cpp

Show First 20 Lines • Show All 731 Lines • ▼ Show 20 Lines	Symbol::GetDisassembly (const ExecutionContext &exe_ctx,
{		{
const bool show_address = true;		const bool show_address = true;
const bool show_bytes = false;		const bool show_bytes = false;
disassembler_sp->GetInstructionList().Dump (&strm, show_address, show_bytes, &exe_ctx);		disassembler_sp->GetInstructionList().Dump (&strm, show_address, show_bytes, &exe_ctx);
return true;		return true;
}		}
return false;		return false;
}		}

		bool
		Symbol::ContainsFileAddress (lldb::addr_t file_addr) const
		{
		return m_addr_range.ContainsFileAddress(file_addr);
		}

lldb/trunk/source/Symbol/Symtab.cpp

Show First 20 Lines • Show All 965 Lines • ▼ Show 20 Lines	if (!m_file_addr_to_index_computed && !m_symbols.empty())
SectionSP section_sp (m_objfile->GetSectionList()->FindSectionContainingFileAddress (entry.GetRangeBase()));		SectionSP section_sp (m_objfile->GetSectionList()->FindSectionContainingFileAddress (entry.GetRangeBase()));
if (section_sp)		if (section_sp)
{		{
const lldb::addr_t end_section_file_addr = section_sp->GetFileAddress() + section_sp->GetByteSize();		const lldb::addr_t end_section_file_addr = section_sp->GetFileAddress() + section_sp->GetByteSize();
const lldb::addr_t symbol_file_addr = entry.GetRangeBase();		const lldb::addr_t symbol_file_addr = entry.GetRangeBase();
if (end_section_file_addr > symbol_file_addr)		if (end_section_file_addr > symbol_file_addr)
{		{
Symbol &symbol = m_symbols[entry.data];		Symbol &symbol = m_symbols[entry.data];
		if (!symbol.GetByteSizeIsValid())
		{
symbol.SetByteSize(end_section_file_addr - symbol_file_addr);		symbol.SetByteSize(end_section_file_addr - symbol_file_addr);
symbol.SetSizeIsSynthesized(true);		symbol.SetSizeIsSynthesized(true);
}		}
}		}
}		}
		}
// Sort again in case the range size changes the ordering		// Sort again in case the range size changes the ordering
m_file_addr_to_index.Sort();		m_file_addr_to_index.Sort();
}		}
}		}
}		}

void		void
Symtab::CalculateSymbolSizes ()		Symtab::CalculateSymbolSizes ()
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	Symtab::FindSymbolContainingFileAddress (addr_t file_addr, const uint32_t* indexes, uint32_t num_indexes)
if (info.match_symbol)		if (info.match_symbol)
{		{
if (info.match_offset == 0)		if (info.match_offset == 0)
{		{
// We found an exact match!		// We found an exact match!
return info.match_symbol;		return info.match_symbol;
}		}

const size_t symbol_byte_size = info.match_symbol->GetByteSize();		if (!info.match_symbol->GetByteSizeIsValid())

if (symbol_byte_size == 0)
{		{
// We weren't able to find the size of the symbol so lets just go		// The matched symbol dosn't have a valid byte size so lets just go with that match...
// with that match we found in our search...
return info.match_symbol;		return info.match_symbol;
}		}

// We were able to figure out a symbol size so lets make sure our		// We were able to figure out a symbol size so lets make sure our
// offset puts "file_addr" in the symbol's address range.		// offset puts "file_addr" in the symbol's address range.
if (info.match_offset < symbol_byte_size)		if (info.match_offset < info.match_symbol->GetByteSize())
return info.match_symbol;		return info.match_symbol;
}		}
return nullptr;		return nullptr;
}		}

Symbol *		Symbol *
Symtab::FindSymbolContainingFileAddress (addr_t file_addr)		Symtab::FindSymbolContainingFileAddress (addr_t file_addr)
{		{
Mutex::Locker locker (m_mutex);		Mutex::Locker locker (m_mutex);

if (!m_file_addr_to_index_computed)		if (!m_file_addr_to_index_computed)
InitAddressIndexes();		InitAddressIndexes();

const FileRangeToIndexMap::Entry *entry = m_file_addr_to_index.FindEntryThatContains(file_addr);		const FileRangeToIndexMap::Entry *entry = m_file_addr_to_index.FindEntryThatContains(file_addr);
if (entry)		if (entry)
return SymbolAtIndex(entry->data);		{
		Symbol* symbol = SymbolAtIndex(entry->data);
		if (symbol->ContainsFileAddress(file_addr))
		return symbol;
		}
return nullptr;		return nullptr;
}		}

void		void
Symtab::SymbolIndicesToSymbolContextList (std::vector<uint32_t> &symbol_indexes, SymbolContextList &sc_list)		Symtab::SymbolIndicesToSymbolContextList (std::vector<uint32_t> &symbol_indexes, SymbolContextList &sc_list)
{		{
// No need to protect this call using m_mutex all other method calls are		// No need to protect this call using m_mutex all other method calls are
// already thread safe.		// already thread safe.
▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines