This is an archive of the discontinued LLVM Phabricator instance.

Ignore mapping symbols on aarch64
ClosedPublic

Authored by tberghammer on Apr 1 2015, 9:40 AM.

Download Raw Diff

Details

Reviewers

clayborg
omjavaid

Commits

rG83544cf660d9: Ignore mapping symbols on aarch64
rLLDB234307: Ignore mapping symbols on aarch64
rL234307: Ignore mapping symbols on aarch64

Summary

Ignore mapping symbols on aarch64

ELF symbol tables on aarch64 may contains some mapping symbols. They
provide information about the underlying data but interfere with symbol
look-up of lldb. They are already ignored on arm32. With this CL they
will be ignored on aarch64 also.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 23067.Apr 1 2015, 9:40 AM

tberghammer retitled this revision from to Ignore mapping symbols on aarch64.

tberghammer updated this object.

tberghammer edited the test plan for this revision. (Show Details)

tberghammer added reviewers: clayborg, omjavaid.

tberghammer added a subscriber: Unknown Object (MLST).

Herald added subscribers: aemerson, rengolin, tberghammer. · View Herald TranscriptApr 1 2015, 9:40 AM

So I would like to see this patch improved before we just remove all these symbols.

For 32 bit ARM, all of the magic $a $t $d symbols used to be preceded by a $m symbol and that symbol says how many $a/$t/$d symbols follow the $m symbol. So the question is: does this same thing exist for arm32 and also arm64? If so we should just modify this patch to skip $m symbols ahead. If not, we are not skipping all symbols correctly. From the ARM ELF specification we see:

$a labels the first byte of a sequence of ARM instructions. Its type is STT_FUNC.
$b labels a Thumb BL instruction. Its type is STT_FUNC.
$d labels the first byte of a sequence of data items. Its type is STT_OBJECT.
$f labels a function pointer constant (static pointer to code). Its type is STT_OBJECT.
$p labels the final, PC-modifying instruction of an indirect function call. Its type is STT_FUNC.
(An indirect call is a call through a function pointer variable). $p does not label the PC-modifying instruction of a function return sequence.
$t labels the first byte of a sequence of Thumb instructions. Its type is STT_FUNC.

You should consult the aarch64 ELF specification and make sure we aren't missing any and also see if there is a $m symbol that can help us skip the symbols.

The next thing I would like to see fixed is we need to parse these symbols and make an AddressClass map for the ELF file so we can use this map in the following function:

AddressClass
ObjectFileELF::GetAddressClass (addr_t file_addr)

This currently implementation is a bit weak:

AddressClass
ObjectFileELF::GetAddressClass (addr_t file_addr)
{
    auto res = ObjectFile::GetAddressClass (file_addr);

    if (res != eAddressClassCode)
        return res;

    ArchSpec arch_spec;
    GetArchitecture(arch_spec);
    if (arch_spec.GetMachine() != llvm::Triple::arm)
        return res;

    auto symtab = GetSymtab();
    if (symtab == nullptr)
        return res;

    auto symbol = symtab->FindSymbolContainingFileAddress(file_addr);
    if (symbol == nullptr)
        return res;

    // Thumb symbols have the lower bit set in the flags field so we just check
    // for that.
    if (symbol->GetFlags() & ARM_ELF_SYM_IS_THUMB)
        res = eAddressClassCodeAlternateISA;

    return res;
}

This should really be changed to use the AddressClass map that we create when parsing the symbol table when we run into these magic $ variables. We should not be adding these $ symbols to the symbol table, but we should be using them to create a AddressClass map. Something like:

ObjectFileELF.h:

typedef std::map<lldb::addr_t, AddressClass> FileAddressToAddressClassMap;
FileAddressToAddressClassMap m_address_class_map;

Then in the symbol table parsing code when we detect one of the magic $ variables, we should parse the symbols and help create the map:

if (arch == arm)
{
    if (name == "$a")
        m_address_class_map[symbol.st_value] = eAddressClassCode;
    else if (name == "$t" || name == "$b")
        m_address_class_map[symbol.st_value] = eAddressClassCodeAlternateISA;
    else if (name == "$d")
        m_address_class_map[symbol.st_value] = eAddressClassData;
}
else if (arch == aarch64 || arch == arm64)
{
    ....
}

Then we should use this m_address_class_map inside ObjectFileELF::GetAddressClass().

This revision now requires changes to proceed.Apr 1 2015, 10:13 AM

Change based on request in review.

I checked the current ELF spec for arm and the $m symbol isn't present in neither in arm32 nor in aarch64 (they got removed from arm32 with EABI) so we can't rely on that ($b, $f, $p also got removed). I think we can get a reliable detection of the mapping symbols if we search for symbols starting with a mapping symbol prefix (e.g.: $d\0 or $d.)and checking that it have STB_LOCAL binding as all symbol starting with STB_LOCAL and starts with a $ are reserved so we have to remove them from the symbol table anyway.

Remove wrong assert

Very nice. Just fix the bad std::map code as mentioned in the inline comment and we are good to go.

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

899–907

You need to check if "file_addr" is equal to pos->first before you decrement. This code should be:

if (ub->first == file_addr)
    return ub->second;

if (ub == m_address_class_map.begin())
{
    // No entry in the address class map before the address. Return
    // default address class for an address in a code section.
    return eAddressClassCode;
}
--ub;

This revision now requires changes to proceed.Apr 2 2015, 10:33 AM

tberghammer requested a review of this revision.Apr 2 2015, 10:37 AM

tberghammer edited edge metadata.

tberghammer added inline comments.

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
899–907	Upper bounds returns an iterator to the first element greater then the key so they will never be equal.

You are correct. I always use lower_bound. Code looks good then.

This revision is now accepted and ready to land.Apr 2 2015, 10:45 AM

Closed by commit rL234307: Ignore mapping symbols on aarch64 (authored by tberghammer). · Explain WhyApr 7 2015, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

source/

Plugins/

ObjectFile/

ELF/

ObjectFileELF.h

5 lines

ObjectFileELF.cpp

127 lines

Diff 23143

source/Plugins/ObjectFile/ELF/ObjectFileELF.h

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	private:
typedef std::vector<ELFSectionHeaderInfo> SectionHeaderColl;		typedef std::vector<ELFSectionHeaderInfo> SectionHeaderColl;
typedef SectionHeaderColl::iterator SectionHeaderCollIter;		typedef SectionHeaderColl::iterator SectionHeaderCollIter;
typedef SectionHeaderColl::const_iterator SectionHeaderCollConstIter;		typedef SectionHeaderColl::const_iterator SectionHeaderCollConstIter;

typedef std::vector<elf::ELFDynamic> DynamicSymbolColl;		typedef std::vector<elf::ELFDynamic> DynamicSymbolColl;
typedef DynamicSymbolColl::iterator DynamicSymbolCollIter;		typedef DynamicSymbolColl::iterator DynamicSymbolCollIter;
typedef DynamicSymbolColl::const_iterator DynamicSymbolCollConstIter;		typedef DynamicSymbolColl::const_iterator DynamicSymbolCollConstIter;

		typedef std::map<lldb::addr_t, lldb::AddressClass> FileAddressToAddressClassMap;

/// Version of this reader common to all plugins based on this class.		/// Version of this reader common to all plugins based on this class.
static const uint32_t m_plugin_version = 1;		static const uint32_t m_plugin_version = 1;
static const uint32_t g_core_uuid_magic;		static const uint32_t g_core_uuid_magic;

/// ELF file header.		/// ELF file header.
elf::ELFHeader m_header;		elf::ELFHeader m_header;

/// ELF build ID.		/// ELF build ID.
Show All 17 Lines	private:
mutable std::unique_ptr<lldb_private::FileSpecList> m_filespec_ap;		mutable std::unique_ptr<lldb_private::FileSpecList> m_filespec_ap;

/// Cached value of the entry point for this module.		/// Cached value of the entry point for this module.
lldb_private::Address m_entry_point_address;		lldb_private::Address m_entry_point_address;

/// The architecture detected from parsing elf file contents.		/// The architecture detected from parsing elf file contents.
lldb_private::ArchSpec m_arch_spec;		lldb_private::ArchSpec m_arch_spec;

		/// The address class for each symbol in the elf file
		FileAddressToAddressClassMap m_address_class_map;

/// Returns a 1 based index of the given section header.		/// Returns a 1 based index of the given section header.
size_t		size_t
SectionIndex(const SectionHeaderCollIter &I);		SectionIndex(const SectionHeaderCollIter &I);

/// Returns a 1 based index of the given section header.		/// Returns a 1 based index of the given section header.
size_t		size_t
SectionIndex(const SectionHeaderCollConstIter &I) const;		SectionIndex(const SectionHeaderCollConstIter &I) const;

▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Show First 20 Lines • Show All 889 Lines • ▼ Show 20 Lines
AddressClass		AddressClass
ObjectFileELF::GetAddressClass (addr_t file_addr)		ObjectFileELF::GetAddressClass (addr_t file_addr)
{		{
auto res = ObjectFile::GetAddressClass (file_addr);		auto res = ObjectFile::GetAddressClass (file_addr);

if (res != eAddressClassCode)		if (res != eAddressClassCode)
return res;		return res;

ArchSpec arch_spec;		auto ub = m_address_class_map.upper_bound(file_addr);
GetArchitecture(arch_spec);		if (ub == m_address_class_map.begin())
if (arch_spec.GetMachine() != llvm::Triple::arm)		{
return res;		// No entry in the address class map before the address. Return
		// default address class for an address in a code section.
auto symtab = GetSymtab();		return eAddressClassCode;
if (symtab == nullptr)		}
return res;

auto symbol = symtab->FindSymbolContainingFileAddress(file_addr);
if (symbol == nullptr)
return res;

// Thumb symbols have the lower bit set in the flags field so we just check		// Move iterator to the address class entry preceding address
// for that.		--ub;
		clayborgUnsubmitted Not Done Reply Inline Actions You need to check if "file_addr" is equal to pos->first before you decrement. This code should be: if (ub->first == file_addr) return ub->second; if (ub == m_address_class_map.begin()) { // No entry in the address class map before the address. Return // default address class for an address in a code section. return eAddressClassCode; } --ub; clayborg: You need to check if "file_addr" is equal to pos->first before you decrement. This code should…
		tberghammerAuthorUnsubmitted Not Done Reply Inline Actions Upper bounds returns an iterator to the first element greater then the key so they will never be equal. tberghammer: Upper bounds returns an iterator to the first element greater then the key so they will never…
if (symbol->GetFlags() & ARM_ELF_SYM_IS_THUMB)
res = eAddressClassCodeAlternateISA;

return res;		return ub->second;
}		}

size_t		size_t
ObjectFileELF::SectionIndex(const SectionHeaderCollIter &I)		ObjectFileELF::SectionIndex(const SectionHeaderCollIter &I)
{		{
return std::distance(m_section_headers.begin(), I) + 1u;		return std::distance(m_section_headers.begin(), I) + 1u;
}		}

▲ Show 20 Lines • Show All 942 Lines • ▼ Show 20 Lines	for (i = 0; i < num_symbols; ++i)
sect_name == rodata1_section_name \|\|		sect_name == rodata1_section_name \|\|
sect_name == bss_section_name)		sect_name == bss_section_name)
{		{
symbol_type = eSymbolTypeData;		symbol_type = eSymbolTypeData;
}		}
}		}
}		}

ArchSpec arch;
int64_t symbol_value_offset = 0;		int64_t symbol_value_offset = 0;
uint32_t additional_flags = 0;		uint32_t additional_flags = 0;

if (GetArchitecture(arch) &&		ArchSpec arch;
arch.GetMachine() == llvm::Triple::arm)		if (GetArchitecture(arch))
		{
		if (arch.GetMachine() == llvm::Triple::arm)
		{
		if (symbol.getBinding() == STB_LOCAL && symbol_name && symbol_name[0] == '$')
		{
		// These are reserved for the specification (e.g.: mapping
		// symbols). We don't want to add them to the symbol table.

		llvm::StringRef symbol_name_ref(symbol_name);
		if (symbol_name_ref == "$a" \|\| symbol_name_ref.startswith("$a."))
{		{
// ELF symbol tables may contain some mapping symbols. They provide
// information about the underlying data. There are three of them
// currently defined:
// $a[.<any>]* - marks an ARM instruction sequence		// $a[.<any>]* - marks an ARM instruction sequence
		m_address_class_map[symbol.st_value] = eAddressClassCode;
		}
		else if (symbol_name_ref == "$b" \|\| symbol_name_ref.startswith("$b.") \|\|
		symbol_name_ref == "$t" \|\| symbol_name_ref.startswith("$t."))
		{
		// $b[.<any>]* - marks a THUMB BL instruction sequence
// $t[.<any>]* - marks a THUMB instruction sequence		// $t[.<any>]* - marks a THUMB instruction sequence
		m_address_class_map[symbol.st_value] = eAddressClassCodeAlternateISA;
		}
		else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))
		{
// $d[.<any>]* - marks a data item sequence (e.g. lit pool)		// $d[.<any>]* - marks a data item sequence (e.g. lit pool)
// These symbols interfere with normal debugger operations and we		m_address_class_map[symbol.st_value] = eAddressClassData;
// don't need them. We can drop them here.		}

		continue;
		}
		}
		else if (arch.GetMachine() == llvm::Triple::aarch64)
		{
		if (symbol.getBinding() == STB_LOCAL && symbol_name && symbol_name[0] == '$')
		{
		// These are reserved for the specification (e.g.: mapping
		// symbols). We don't want to add them to the symbol table.

static const llvm::StringRef g_armelf_arm_marker("$a");
static const llvm::StringRef g_armelf_thumb_marker("$t");
static const llvm::StringRef g_armelf_data_marker("$d");
llvm::StringRef symbol_name_ref(symbol_name);		llvm::StringRef symbol_name_ref(symbol_name);
		if (symbol_name_ref == "$x" \|\| symbol_name_ref.startswith("$x."))
		{
		// $x[.<any>]* - marks an A64 instruction sequence
		m_address_class_map[symbol.st_value] = eAddressClassCode;
		}
		else if (symbol_name_ref == "$d" \|\| symbol_name_ref.startswith("$d."))
		{
		// $d[.<any>]* - marks a data item sequence (e.g. lit pool)
		m_address_class_map[symbol.st_value] = eAddressClassData;
		}

if (symbol_name &&
(symbol_name_ref.startswith(g_armelf_arm_marker) \|\|
symbol_name_ref.startswith(g_armelf_thumb_marker) \|\|
symbol_name_ref.startswith(g_armelf_data_marker)))
continue;		continue;
		}
		}

		if (arch.GetMachine() == llvm::Triple::arm)
		{
// THUMB functions have the lower bit of their address set. Fixup		// THUMB functions have the lower bit of their address set. Fixup
// the actual address and mark the symbol as THUMB.		// the actual address and mark the symbol as THUMB.
if (symbol_type == eSymbolTypeCode && symbol.st_value & 1)		if (symbol_type == eSymbolTypeCode && symbol.st_value & 1)
{		{
// Substracting 1 from the address effectively unsets		// Substracting 1 from the address effectively unsets
// the low order bit, which results in the address		// the low order bit, which results in the address
// actually pointing to the beginning of the symbol.		// actually pointing to the beginning of the symbol.
// This delta will be used below in conjuction with		// This delta will be used below in conjuction with
// symbol.st_value to produce the final symbol_value		// symbol.st_value to produce the final symbol_value
// that we store in the symtab.		// that we store in the symtab.
symbol_value_offset = -1;		symbol_value_offset = -1;
additional_flags = ARM_ELF_SYM_IS_THUMB;		additional_flags = ARM_ELF_SYM_IS_THUMB;
}		}
}		}
		}

// If the symbol section we've found has no data (SHT_NOBITS), then check the module section		// If the symbol section we've found has no data (SHT_NOBITS), then check the module section
// list. This can happen if we're parsing the debug file and it has no .text section, for example.		// list. This can happen if we're parsing the debug file and it has no .text section, for example.
if (symbol_section_sp && (symbol_section_sp->GetFileSize() == 0))		if (symbol_section_sp && (symbol_section_sp->GetFileSize() == 0))
{		{
ModuleSP module_sp(GetModule());		ModuleSP module_sp(GetModule());
if (module_sp)		if (module_sp)
{		{
▲ Show 20 Lines • Show All 1,000 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Ignore mapping symbols on aarch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23143

source/Plugins/ObjectFile/ELF/ObjectFileELF.h

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Ignore mapping symbols on aarch64
ClosedPublic