Page MenuHomePhabricator

Add a method to get the "base" file address of an object file
Needs ReviewPublic

Authored by labath on Dec 6 2018, 1:32 AM.

Details

Summary

Object files generally have a "base" address, which is their preferred
address to be loaded into memory (in the sense that if they are loaded
at this address, then no runtime relocation is required). This is often
the lowest address corresponding to that object file, and various other
addresses are expressed relative to this base (relative virtual
addresses (RVAs) in PE/COFF parlor).

All three of our main object file formats already had such a concept
(it's usually needed to implement SetLoadAddress), but it was not
exposed in any way. The motivation for making this public is that the
symbol address in the breakpad format are written relative to this
address.

Event Timeline

labath created this revision.Dec 6 2018, 1:32 AM
amccarth accepted this revision.Dec 6 2018, 11:04 AM
amccarth added a subscriber: amccarth.

LGTM.

lit/Modules/MachO/basic-info.yaml
11

Just so I understand, this 0x10000000 is supposed to correspond to the vmaddr of the TEXT section, right? There are a few other vmaddrs and addrs in the YAML that correspond to other addresses.

This revision is now accepted and ready to land.Dec 6 2018, 11:04 AM
clayborg requested changes to this revision.Dec 6 2018, 12:09 PM

This is already available with:

virtual lldb_private::Address ObjectFile::GetHeaderAddress();

It return a lldb_private::Address which is section offset, but nothing stopping us from returning a lldb_private::Address with no section and just the file address. For mach-o the mach header is in the __TEXT segment, but not true for other file formats. I am ok if we need to rename "GetHeaderAddress()" to something else.

This revision now requires changes to proceed.Dec 6 2018, 12:09 PM

Marked as requesting changes in case "GetBaseFileAddress() == GetHeaderAddress().GetFileAddress()" in all cases. If the base file address differs from where the object file header is located, then this change would work. Else we should use GetHeaderAddress() and possibly rename it if we want to

lemo added inline comments.Dec 6 2018, 2:10 PM
include/lldb/Symbol/ObjectFile.h
569 ↗(On Diff #176937)

"file address" can mean an offset within a container file.

to avoid any confusion I'd use "base address" (or "image base address")

clayborg added inline comments.Dec 6 2018, 2:12 PM
include/lldb/Symbol/ObjectFile.h
569 ↗(On Diff #176937)

"file address" is the correct LLDB term so I would leave this as is. File addresses in LLDB are addresses as they are represented within the object file itself. Load addresses at unique and map to a single address in a live process where shared libraries are slid around. So "file address" is the correct LLDB term.

labath planned changes to this revision.Dec 7 2018, 3:26 AM

This is already available with:

virtual lldb_private::Address ObjectFile::GetHeaderAddress();

It return a lldb_private::Address which is section offset, but nothing stopping us from returning a lldb_private::Address with no section and just the file address. For mach-o the mach header is in the __TEXT segment, but not true for other file formats. I am ok if we need to rename "GetHeaderAddress()" to something else.

Thanks for pointing that out. I was very surprised that this functionality is not available already. It looks like all/most uses of GetHeaderAddress don't really care that there is a header located at that address, but are rather interested in it's property of being a base address for various calculations (the same thing that I am). So I've created D55422 to rename it to something more appropriate.

labath updated this revision to Diff 177717.Dec 11 2018, 8:10 AM

Rebase after comitting D55422. Now this patch just implements GetBaseAddress for
ELF and PECOFF object files, and adds tests.

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Not unless someone has manually set the section load address in the test?

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Not unless someone has manually set the section load address in the test?

But even if the test is setting the section load address, this won't work for anything else other than Darwin, so the test would need to be fixed,

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Not unless someone has manually set the section load address in the test?

The test is not setting the load address manually. This simply falls out of how Address::GetLoadAddress is implemented:

addr_t Address::GetLoadAddress(Target *target) const {
  SectionSP section_sp(GetSection());
  if (section_sp) {
    ...
  } else {
    // We don't have a section so the offset is the load address
    return m_offset;
  }
}

So, where's the bug here? It's not clear to me how to change Address::GetLoadAddress to do something different than what it is doing now.

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Not unless someone has manually set the section load address in the test?

The test is not setting the load address manually. This simply falls out of how Address::GetLoadAddress is implemented:

addr_t Address::GetLoadAddress(Target *target) const {
  SectionSP section_sp(GetSection());
  if (section_sp) {
    ...
  } else {
    // We don't have a section so the offset is the load address
    return m_offset;
  }
}

So, where's the bug here? It's not clear to me how to change Address::GetLoadAddress to do something different than what it is doing now.

So this is a bug really. When there is no section, we should specify what the m_offset is using lldb_private::AddressType in maybe a new ivar name "m_offset_type". Then GetBaseAddress() would specify eAddressTypeFile. And the above code would become:

addr_t Address::GetLoadAddress(Target *target) const {
  SectionSP section_sp(GetSection());
  if (section_sp) {
    ...
  } else if (m_offset_type == eAddressTypeLoad) {
    // We don't have a section so the offset is the load address
    return m_offset;
  }
}

We just need to be careful and see if we can not make lldb_private::Address get any bigger byte size wise if we can at all avoid it.

Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.

Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the GetLoadAddress function happily returns the address.

Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)

Not unless someone has manually set the section load address in the test?

The test is not setting the load address manually. This simply falls out of how Address::GetLoadAddress is implemented:

addr_t Address::GetLoadAddress(Target *target) const {
  SectionSP section_sp(GetSection());
  if (section_sp) {
    ...
  } else {
    // We don't have a section so the offset is the load address
    return m_offset;
  }
}

So, where's the bug here? It's not clear to me how to change Address::GetLoadAddress to do something different than what it is doing now.

So this is a bug really. When there is no section, we should specify what the m_offset is using lldb_private::AddressType in maybe a new ivar name "m_offset_type". Then GetBaseAddress() would specify eAddressTypeFile. And the above code would become:

addr_t Address::GetLoadAddress(Target *target) const {

SectionSP section_sp(GetSection());
if (section_sp) {
  ...
} else if (m_offset_type == eAddressTypeLoad) {
  // We don't have a section so the offset is the load address
  return m_offset;
}

}

We just need to be careful and see if we can not make lldb_private::Address get any bigger byte size wise if we can at all avoid it.

I must be missing something. If there's a file around so that we can express this address relative to the file, why would it ever not be expressed as a section + offset? If there's not a file around, then what does it mean to say this address ie eAddressTypeFile but we don't know the file?