Page MenuHomePhabricator

[LLDB] Set the right address size on output DataExtractors from ObjectFile
ClosedPublic

Authored by mstorsjo on Fri, Nov 29, 4:03 AM.

Details

Summary

If filling in a DataExtractor from an ObjectFile, e.g. via the ReadSectionData method, the output DataExtractor gets the address size from the m_data member.

ObjectFile's m_data member is initialized without knowledge about the address size (so the address size is set based on the host's sizeof(void*), and at that point within ObjectFile's constructor, virtual methods implemented in subclasses (like GetAddressByteSize()) can't be called, therefore fix it up when filling in external DataExtractors.

This makes sure that line tables from executables with a different address size are parsed properly; previously this tripped up DWARFDebugLine::LineTable::parse for 32 bit executables on a 64 bit host, as the address size in the line table (4) didn't match the one set in the DWARFDataExtractor.

Diff Detail

Event Timeline

mstorsjo created this revision.Fri, Nov 29, 4:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Nov 29, 4:03 AM
Herald added a subscriber: aprantl. · View Herald Transcript
labath accepted this revision.Mon, Dec 2, 6:28 AM
This revision is now accepted and ready to land.Mon, Dec 2, 6:28 AM
This revision was automatically updated to reflect the committed changes.
clayborg added inline comments.
lldb/source/Symbol/ObjectFile.cpp
480–486

I would vote to make this happen within DataExtractor::SetData(const DataExtractor &...)

mstorsjo marked an inline comment as done.Mon, Dec 2, 11:31 PM
mstorsjo added inline comments.
lldb/source/Symbol/ObjectFile.cpp
480–486

Do you mean that we'd extend DataExtractor::SetData(const DataExtractor &...) to take a byte address size parameter, or that we'd update m_data's byte address size before doing data.SetData() here?

Ideally I'd set the right byte address size in m_data as soon as it is known and available. It's not possible to do this in ObjectFile's constructor, as that is called before the subclass is constructed and its virtual methods are available, but is there a better point in the lifetime where we could update it?

labath added a comment.Wed, Dec 4, 5:29 AM

I just realized I didn't press "sumbit" yesterday...

lldb/source/Symbol/ObjectFile.cpp
480–486

I too would prefer that, but I couldn't see a way to achieve that (which is why I stamped this version).

Today, with a fresh set of eyes, I think it may be reasonable to have this happen in ObjectFile::ParseHeader. After the header has been parsed, all object formats (I hope) should be able to determine their address size and endianness, and the operating invariant would be that the address size is only valid after the ParseHeader has been called. WDYT?

labath added a comment.Wed, Dec 4, 5:32 AM

BTW, I don't know if you've noticed, but the new test is failing on windows: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/11217. It looks like a path separator problem. We actually already have code which tries to guess the path style of a compile unit, but it is keyed off of the DW_AT_comp_dir attribute, which it looks like you stripped from the test case. My guess is that adding this attribute back will fix this problem...

mstorsjo marked an inline comment as done.Wed, Dec 4, 11:03 AM

BTW, I don't know if you've noticed, but the new test is failing on windows: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/11217. It looks like a path separator problem. We actually already have code which tries to guess the path style of a compile unit, but it is keyed off of the DW_AT_comp_dir attribute, which it looks like you stripped from the test case. My guess is that adding this attribute back will fix this problem...

I did notice it, and tried to fix it (https://reviews.llvm.org/rG7d019d1a3be252a885e8db1ee7af11c90), but I forgot to check that it actually worked - sorry about that. Apparently I got the backslash escaping wrong (from some clang test). I'll see if DW_AT_comp_dir helps (and remove that failed regex fix), or fix the backslash matching.

lldb/source/Symbol/ObjectFile.cpp
480–486

Sounds sensible! I'll give it a shot.

mstorsjo marked an inline comment as done.Wed, Dec 4, 1:06 PM
mstorsjo added inline comments.
lldb/source/Symbol/ObjectFile.cpp
480–486

ObjectFile::ParseHeader is currently pure virtual, and thus the subclass concrete implementations of it don't call the base class version of the method. Do you want me to make the base class method non-pure and add calls to it in the subclasses, or add calls to m_data.SetAddressByteSize(GetAddressByteSize()); in all of the subclasses ParseHeader?

labath added inline comments.Thu, Dec 5, 4:28 AM
lldb/source/Symbol/ObjectFile.cpp
480–486

Yeah.. I don't know... I am starting to wonder if this really is a good idea.. From the looks of it, there doesn't seem to be anyone (outside of the object file classes themselves) who is calling this method, so it's not clear to me why is it even present on the base class...

Overall, the scheme that I think I'd like the most would be something like:

class ObjectFile {
  /*nonvirtual*/ uint32_t GetAddressByteSize() { ParseHeader(); return m_data.GetAddressByteSize(); }
  // same for byte order
  
  /*nonvirtual*/ void ParseHeader() { std::tie(address_size, byte_order) = DoParseHeader(); m_data.SetAddressByteSize(address_size); m_data.SetByteOrder(byte_order); }
  virtual ??? DoParseHeader() = 0;
};

However, I don't know how hard would it be to get to that point, so I think I'd settle for just making sure that each subclass calls m_data.SetByteOrder internally. It looks like ObjectFileELF does that already. I haven't checked, but I wouldn't be surprised if MachO does the same...

mstorsjo marked an inline comment as done.Fri, Dec 6, 3:30 AM
mstorsjo added inline comments.
lldb/source/Symbol/ObjectFile.cpp
480–486

I checked, and both MachO and ELF seem to be calling both m_data.SetByteOrder() and m_data.SetAddressByteSize(), while PECOFF only calls m_data.SetByteOrder(). So that makes the fix rather straightforward.