This is an archive of the discontinued LLVM Phabricator instance.

[ClangASTContext] Extract VTable pointers from C++ objects
ClosedPublic

Authored by aleksandr.urakov on Oct 22 2018, 9:14 AM.

Details

Summary

This patch processes the case of retrieving a virtual base when the object is already read from the debuggee memory.

To achieve that ValueObject::GetCPPVTableAddress was removed (btw, it really returned not a C++ VTable address but an object's address, which is a C++ VTable pointer address for Itanium, but have nothing to do with VTable address for MSVC) and was reimplemented in ClangASTContext (because access to the process is needed to retrieve the VTable pointer in general, and because this is the only place that used old version of ValueObject::GetCPPVTableAddress).

This patch allows to use real object's VTable instead of searching virtual bases by offsets restored by MicrosoftRecordLayoutBuilder. PDB has no enough info to restore VBase offsets properly, so we have to read real VTable instead.

This patch depends on D53497

Diff Detail

Repository
rL LLVM

Event Timeline

PDB has no enough info to restore VBase offsets properly, so we have to read real VTable instead.

What's missing that you're unable to restore the VBase offset properly?

What's missing that you're unable to restore the VBase offset properly?

If I understand correctly, in the PDB there is only info about offset to VTablePtr and index in VTable, so there is enough info to retrieve VBase offset fairly, and we do it in that way. But there's no info in PDB about offset to VBase directly from object. This info is used when the "fair" doesn't work (e.g. at line 6640). This patch just makes the "fair" way to work in more cases.

granata.enrico resigned from this revision.Oct 22 2018, 9:41 AM

What's missing that you're unable to restore the VBase offset properly?

If I understand correctly, in the PDB there is only info about offset to VTablePtr and index in VTable, so there is enough info to retrieve VBase offset fairly, and we do it in that way. But there's no info in PDB about offset to VBase directly from object. This info is used when the "fair" doesn't work (e.g. at line 6640). This patch just makes the "fair" way to work in more cases.

My understanding of record layout with virtual bases is still sketchy (it's very confusing), and it's even worse with DIA because the API is so general and poorly documented, so let's go to the low-level CodeView records.

typedef struct lfVBClass {
    unsigned short  leaf;           // LF_VBCLASS (virtual base) | LV_IVBCLASS (indirect virtual base)
    CV_fldattr_t    attr;           // attribute
    CV_typ_t        index;          // type index of direct virtual base class
    CV_typ_t        vbptr;          // type index of virtual base pointer
    unsigned char   vbpoff[CV_ZEROLEN];       // virtual base pointer offset from address point
                                    // followed by virtual base offset from vbtable
} lfVBClass;

This is what we have access to reading directly from the raw pdb file, which is sometimes more information than what we have access to using DIA. Of course, we also have to interpret whether this actually means what we think it means by inspecting the bytes of a C++ object in a debugger and comparing the layout to what the debug info tells us.

So, the point is, just because we don't have access to the info via DIA doesn't mean we won't have access to the info once the native pdb plugin is complete. Just something to think about.

So, the point is, just because we don't have access to the info via DIA doesn't mean we won't have access to the info once the native pdb plugin is complete. Just something to think about.

Yes, I also try to see low-level PDB dumps now (since the case with FPO :) ), but in this case it seems that we still have no required info to retrieve VBase offsets. Consider the following example:

struct A { int a = 1; };
struct B : virtual A { int b = 2; };
struct C : virtual A { int c = 3; };
struct D : virtual B, virtual C { int d = 4; };

int main() {
  D d{};
  return 0;
}

Here for D we have the next LF_FIELDLIST (assume that the application is MSVC compiled):

0x1016 | LF_FIELDLIST [size = 96, hash = 0x415A]
         - LF_VBCLASS
           base = 0x1002, vbptr = 0x1004, vbptr offset = 0, vtable index = 2
           attrs = public
         - LF_VBCLASS
           base = 0x1005, vbptr = 0x1004, vbptr offset = 0, vtable index = 3
           attrs = public
         - LF_IVBCLASS
           base = 0x1006, vbptr = 0x1004, vbptr offset = 0, vtable index = 1
           attrs = public
         - LF_MEMBER [name = `d`, Type = 0x0074 (int), offset = 4, attrs = public]
         - LF_METHOD [name = `D`, # overloads = 3, overload list = 0x1011]
         - LF_METHOD [name = `operator=`, # overloads = 2, overload list = 0x1015]
0x1017 | LF_STRUCTURE [size = 32, hash = 0xC4D] `D`
         unique name: `.?AUD@@`
         vtable: <no type>, base list: <no type>, field list: 0x1016
         options: has ctor / dtor | has unique name | overloaded operator | overloaded operator=, sizeof 28

vbptr offset here is an offset to VBTable pointer, not to VBase. MicrosoftRecordLayoutBuilder requires exactly offsets to VBases as if they were non-virtual base classes. In general it is wrong, but it works ok if we know a real full type of the variable. ClangASTContext when retrieving a virtual base offset tries at first to retrieve it in the "fair" way:

  • retrieve the pointer to VTable;
  • read the offset to the VBase from VTable.

If it fails somewhere, then it gets the VBase offset from the record layout. It is an "unfair" VBase offset, which was put there by MicrosoftRecordLayoutBuilder. And what I mean is that we have no info about it in PDB (to give it to MicrosoftRecordLayoutBuilder).

What this patch makes is it allows the "fair" way to work in the case, when an object is already read from the debuggee. Then we will have eAddressTypeHost as an address type of ValueObject, and it used to lead to the situation when the "fair" way is failing and the "unfair" way is used. This patch allows to still process it by the "fair" way.

To make things more clear consider the structures layout:

class A	size(4):
	+---
 0	| a
	+---

class B	size(12):
	+---
 0	| {vbptr}
 4	| b
	+---
	+--- (virtual base A)
 8	| a
	+---

class C	size(12):
	+---
 0	| {vbptr}
 4	| c
	+---
	+--- (virtual base A)
 8	| a
	+---

class D	size(28):
	+---
 0	| {vbptr}
 4	| d
	+---
	+--- (virtual base A)
 8	| a
	+---
	+--- (virtual base B)
12	| {vbptr}
16	| b
	+---
	+--- (virtual base C)
20	| {vbptr}
24	| c
	+---

MicrosoftRecordsLayoutBuilder waits that we will give it 8 as the VBase offset for A in B, C and D. In D for B it wants 12, and for C it wants 20. It's an info we are missing in PDB.

Also this example can show how this patch should improve behavior on non-Windows too. If you will stand on return 0 and call print d, then an invalid value should be printed for A inside B or C. If you call frame variable d then the value printed should be ok. It's because in the first case an object is already fully read from the debuggee, and without the patch the "unfair" way works, and it uses the offset to A from B (or C) as if B (or C) would be a real type of the variable (for the layout above it would use 8). But the real type is D, so offsets to A from B (or C) inside D are different (for the layout above it would be -4 from B and -12 from C). That's why "unfair" way doesn't work in this case. This patch should also fix it.

Ping! Can you take a look at this, please?

Looks fine to me. Make sure no one else has any objections.

Ping! Is it ok to proceed with it? Does anyone have objections?

This revision was not accepted when it landed; it landed in state Needs Review.Nov 12 2018, 8:26 AM
This revision was automatically updated to reflect the committed changes.