This is an archive of the discontinued LLVM Phabricator instance.

Ignore mapping symbols on aarch64
ClosedPublic

Authored by tberghammer on Apr 1 2015, 9:40 AM.

Details

Summary

Ignore mapping symbols on aarch64

ELF symbol tables on aarch64 may contains some mapping symbols. They
provide information about the underlying data but interfere with symbol
look-up of lldb. They are already ignored on arm32. With this CL they
will be ignored on aarch64 also.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 23067.Apr 1 2015, 9:40 AM
tberghammer retitled this revision from to Ignore mapping symbols on aarch64.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer added reviewers: clayborg, omjavaid.
tberghammer added a subscriber: Unknown Object (MLST).
clayborg requested changes to this revision.Apr 1 2015, 10:13 AM
clayborg edited edge metadata.

So I would like to see this patch improved before we just remove all these symbols.

For 32 bit ARM, all of the magic $a $t $d symbols used to be preceded by a $m symbol and that symbol says how many $a/$t/$d symbols follow the $m symbol. So the question is: does this same thing exist for arm32 and also arm64? If so we should just modify this patch to skip $m symbols ahead. If not, we are not skipping all symbols correctly. From the ARM ELF specification we see:

$a labels the first byte of a sequence of ARM instructions. Its type is STT_FUNC.
$b labels a Thumb BL instruction. Its type is STT_FUNC.
$d labels the first byte of a sequence of data items. Its type is STT_OBJECT.
$f labels a function pointer constant (static pointer to code). Its type is STT_OBJECT.
$p labels the final, PC-modifying instruction of an indirect function call. Its type is STT_FUNC.
(An indirect call is a call through a function pointer variable). $p does not label the PC-modifying instruction of a function return sequence.
$t labels the first byte of a sequence of Thumb instructions. Its type is STT_FUNC.

You should consult the aarch64 ELF specification and make sure we aren't missing any and also see if there is a $m symbol that can help us skip the symbols.

The next thing I would like to see fixed is we need to parse these symbols and make an AddressClass map for the ELF file so we can use this map in the following function:

AddressClass
ObjectFileELF::GetAddressClass (addr_t file_addr)

This currently implementation is a bit weak:

AddressClass
ObjectFileELF::GetAddressClass (addr_t file_addr)
{
    auto res = ObjectFile::GetAddressClass (file_addr);

    if (res != eAddressClassCode)
        return res;

    ArchSpec arch_spec;
    GetArchitecture(arch_spec);
    if (arch_spec.GetMachine() != llvm::Triple::arm)
        return res;

    auto symtab = GetSymtab();
    if (symtab == nullptr)
        return res;

    auto symbol = symtab->FindSymbolContainingFileAddress(file_addr);
    if (symbol == nullptr)
        return res;

    // Thumb symbols have the lower bit set in the flags field so we just check
    // for that.
    if (symbol->GetFlags() & ARM_ELF_SYM_IS_THUMB)
        res = eAddressClassCodeAlternateISA;

    return res;
}

This should really be changed to use the AddressClass map that we create when parsing the symbol table when we run into these magic $ variables. We should not be adding these $ symbols to the symbol table, but we should be using them to create a AddressClass map. Something like:

ObjectFileELF.h:

typedef std::map<lldb::addr_t, AddressClass> FileAddressToAddressClassMap;
FileAddressToAddressClassMap m_address_class_map;

Then in the symbol table parsing code when we detect one of the magic $ variables, we should parse the symbols and help create the map:

if (arch == arm)
{
    if (name == "$a")
        m_address_class_map[symbol.st_value] = eAddressClassCode;
    else if (name == "$t" || name == "$b")
        m_address_class_map[symbol.st_value] = eAddressClassCodeAlternateISA;
    else if (name == "$d")
        m_address_class_map[symbol.st_value] = eAddressClassData;
}
else if (arch == aarch64 || arch == arm64)
{
    ....
}

Then we should use this m_address_class_map inside ObjectFileELF::GetAddressClass().

This revision now requires changes to proceed.Apr 1 2015, 10:13 AM
tberghammer updated this revision to Diff 23138.Apr 2 2015, 4:08 AM
tberghammer edited edge metadata.

Change based on request in review.

I checked the current ELF spec for arm and the $m symbol isn't present in neither in arm32 nor in aarch64 (they got removed from arm32 with EABI) so we can't rely on that ($b, $f, $p also got removed). I think we can get a reliable detection of the mapping symbols if we search for symbols starting with a mapping symbol prefix (e.g.: $d\0 or $d.)and checking that it have STB_LOCAL binding as all symbol starting with STB_LOCAL and starts with a $ are reserved so we have to remove them from the symbol table anyway.

tberghammer updated this revision to Diff 23143.Apr 2 2015, 5:16 AM
tberghammer edited edge metadata.

Remove wrong assert

clayborg requested changes to this revision.Apr 2 2015, 10:33 AM
clayborg edited edge metadata.

Very nice. Just fix the bad std::map code as mentioned in the inline comment and we are good to go.

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
899–907

You need to check if "file_addr" is equal to pos->first before you decrement. This code should be:

if (ub->first == file_addr)
    return ub->second;

if (ub == m_address_class_map.begin())
{
    // No entry in the address class map before the address. Return
    // default address class for an address in a code section.
    return eAddressClassCode;
}
--ub;
This revision now requires changes to proceed.Apr 2 2015, 10:33 AM
tberghammer requested a review of this revision.Apr 2 2015, 10:37 AM
tberghammer edited edge metadata.
tberghammer added inline comments.
source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
899–907

Upper bounds returns an iterator to the first element greater then the key so they will never be equal.

clayborg accepted this revision.Apr 2 2015, 10:45 AM
clayborg edited edge metadata.

You are correct. I always use lower_bound. Code looks good then.

This revision is now accepted and ready to land.Apr 2 2015, 10:45 AM
This revision was automatically updated to reflect the committed changes.