This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Parse UDT symbols and pointers to members
AbandonedPublic

Authored by asmith on Jul 16 2018, 6:59 PM.

Details

Diff Detail

Event Timeline

asmith created this revision.Jul 16 2018, 6:59 PM
labath added a subscriber: labath.Jul 17 2018, 1:26 AM

I have no opinion on the implementation, but I like that you wrote a non-execution test for this, as this will enable us one day to run it on non-windows hosts too.

lit/SymbolFile/PDB/class-layout.test
51–54

Should these actually be check-DAGs? I am assuming the member order will impact the final class layout, so you need to match the source code order here. (I don't know how hard would it be, but it might be interesting to change the dumping code to produce final element offsets, so you can assert the exact structure layout).

Ah, we need some way to synchronize our work :)

I think in this case it will be better to appreciate the pros and cons of our patches, choose the better variant and improve that with the features available in another variant. I will investigate your patch to understand the differences, can you look at my patch too, please?

I agree with Pavel, the test in your patch looks better, than in my one. I think there must be non-execute test in the result patch too.

It seems that you forgot to add the testing *.cpp files from the Input directory to the patch...

I think that this patch is more solid and covers more cases, than D49368, especially in CreateLLDBTypeFromPDBType part. But D49368 has also following:

  • Filling of a layout info. It allows use of packed types, bitfield structs with unnamed fields (unnamed fields themselves are not processed, but named fields are located correctly) etc., you can look at D49368 test case;
  • Adding of method overrides for CXXRecordType.

In my opinion, it's not so difficult to add these features to this patch. What do you think about it?

source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
647–648

It's not so important, but I think that these lines can be deleted if arguments of subsequent if will be flipped.

696

If I understand correctly, base_of_class must be false for structs. May be check the kind of pdb_udt here?

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
587

I don't fully understand, please, explain me, when does this can be false?

588

What are the advantages and disadvantages of immediate (here) and deferred (at the CompleteType function) completions? I thought that deferred completion is a kind of lazy optimization, so we lost its advantages?

asmith updated this revision to Diff 155904.Jul 17 2018, 9:38 AM

Adding missing inputs

zturner added inline comments.Jul 17 2018, 9:42 AM
lit/SymbolFile/PDB/class-layout.test
4

If you change this to lld-link this test may be able to run on non windows today.

source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
186

I wonder if this would be better as an llvm_unreachable. Being able to assume this function returns a sane value would simplify calling code.

198

And here.

246

This looks unusual. Did you clang-format it?

271–272

So is this access == None a thing that can actually happen? AFAICT it's impossible for getAccess() to return anything other than public, protected, or private, in which case this code path will never get executed, so we can just delete it.

291–295

Can you remind me what this means? How would an entire type be marked const or volatile? There's no instance variable in play here, these are just types as they are declared in the source code, if I'm not mistaken. Something like const class Foo {int x; }; doesn't make any sense. So I'm curious under what circumstances these 2 if statements would evaluate to true.

source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
291–295

If I understand right, we may find ourselves here during processing of argument types of some method.

Don't you mind if I try to combine this with D49368? To avoid doing the work twice.

Hui added a subscriber: Hui.Jul 18 2018, 9:53 PM
Hui added inline comments.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
587

All types are parsed by symbol->getSymIndexId() except PDBSymbolBaseClass and PDBSymbolFunc symbols they are parsed by symbol->getTypeId(). After completion, Clang type of PDBSymbolBaseClass is reset with symbol->getSymIndexId().

Hui added inline comments.Jul 18 2018, 9:57 PM
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
696

Struct/Union can't have base class. Only class udt is handled. I think it is safe to set it ture.

Hui added inline comments.Jul 18 2018, 10:02 PM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
588

I think the initial purpose was that the 'incomplete' can be detected immediately after 'first' parsing and then solved immediately.

The deferred type completion can encounter a drawback (maybe be worked around by using lock)

a. the deferred procedure could happen during the 'm_types' (TypeList)" dumping
b. the deferred output could add new types to the 'm_types'.
c. see below. The dump implementation of TypeList.

It will result in undefined behavior if a list is expanded while accessing it with iterator.

void TypeList::Dump(Stream *s, bool show_context) {

for (iterator pos = m_types.begin(), end = m_types.end(); pos != end; ++pos) {
  pos->get()->Dump(s, show_context);
}

}

In this sense, https://reviews.llvm.org/D49368 has a better solution.

Hui added inline comments.Jul 18 2018, 10:06 PM
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
246

Just tried clang-format this file with llvm style. They seemed good in shape.

Hui added inline comments.Jul 18 2018, 10:33 PM
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
271–272

Chance is that udt->getAccess() could return PDB_MemberAccess() (ctor) if the DIA call fails. Need to check if CodeView generates default access for a nested udt or not. I think.

Hui added inline comments.Jul 18 2018, 10:49 PM
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
647–648

member udt or base class shall have all been completed before their parent 's completion. The 'if' is a double insurance? seeing a hard assert at line #659.

source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
696

I think that it must be false for following code:

class B {
  char _b;
};

struct S : B {
  int _s;
};
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
186

PDBSymbolTypeUDT::getType() returns PDB_MemberAccess that equals 0, so we can't make it llvm_unreachable.

lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp
37

Here is a problem. MicrosoftRecordLayoutBuilder asserts every field or base offset, but in our case fields a and b are treated as struct Complex's fields, not union's, so lldb crashes in debug on this. I can't find enough info in PDB to restore the unnamed union here. Do you have any ideas about it?

source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
273

Yes, we can occur here, as I have mentioned at the line 186. But if I understand correctly for following code:

class C {
  struct S {
  };
};

we will retrieve public as the access for S, but must retrieve private, right? May be we need to get the default accessibility of the class parent?

lit/SymbolFile/PDB/class-layout.test
4

It seems that with lld-link the GetDeclarationForSymbol function can't find declarations for UDTs.

zturner added inline comments.Jul 19 2018, 9:49 AM
lit/SymbolFile/PDB/class-layout.test
4

Heh, I know exactly what that is actually. Ignore my request, it's easy to fix in lld-link but I don't have the time to do it right now.

lit/SymbolFile/PDB/class-layout.test
4

Ok, thanks!

Hui added inline comments.Jul 20 2018, 1:01 AM
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp
37

Based on MSVC cl yielded PDB, you could have full information to restore the unnamed UDT.

From my experience, PDB yielded by clang-cl (/Z7 or /Zi) is slightly different from the one by cl.

Both contain information about forwarded unnamed UDT.
However PDB yielded by clang-cl more or less lacks the member information. See below.

The CodeView info is good. Maybe you need to look at LLC?

CodeView

FieldList (0x1044) {
    TypeLeafKind: LF_FIELDLIST (0x1203)
    DataMember {
      TypeLeafKind: LF_MEMBER (0x150D)
      AccessSpecifier: Public (0x3)
      Type: int (0x74)
      FieldOffset: 0x0
      Name: a
    }
    DataMember {
      TypeLeafKind: LF_MEMBER (0x150D)
      AccessSpecifier: Public (0x3)
      Type: float (0x40)
      FieldOffset: 0x0
      Name: b
    }
  }

Union (0x1045) {
    TypeLeafKind: LF_UNION (0x1506)
    MemberCount: 2
    Properties [ (0x608)
      HasUniqueName (0x200)
      Nested (0x8)
      Sealed (0x400)
    ]
    FieldList: <field list> (0x1044)
    SizeOf: 4
    Name: Complex::<unnamed-tag>
    LinkageName: .?AT<unnamed-type-$S2>@Complex@@
  }

llvm-pdbutil pdb (clang-cl /z7)

(found unnamed symbol, however size = 0, they will be just ignored. See PDBASTParser.cpp #259
The size should not be zero)

   struct Complex::<unnamed-tag> [sizeof = 0] {}

   union Complex::<unnamed-tag> [sizeof = 0] {}

struct Complex [sizeof = 728] {
     data +0x00 [sizeof=720] _List* array[90]
     data +0x2d0 [sizeof=4] int x
     data +0x2d4 [sizeof=4] int a
     data +0x2d4 [sizeof=4] float b
   }

llvm-pdbutil pdb ( cl /z7)

( you have full information to restore unnamed)

struct Complex [sizeof = 728] {
      data +0x00 [sizeof=720] _List* array[90]
      data +0x2d0 [sizeof=4] int x
      data +0x2d4 [sizeof=4] int a
      data +0x2d4 [sizeof=4] float b
    }

    Total padding 3 bytes (25% of class size)
    Immediate padding 3 bytes (25% of class size)

    struct Complex::<unnamed-tag> [sizeof = 4] {
      data +0x00 [sizeof=4] int x
    }

    union Complex::<unnamed-tag> [sizeof = 4] {
      data +0x00 [sizeof=4] int a
      data +0x00 [sizeof=4] float b
    }
Hui added inline comments.Jul 20 2018, 1:04 AM
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
696

Agree.

lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp
37

Thank you! But what means LLC?

I think the problem is in lld, we don’t emit S_UDT symbols because it
caused weird problems in WinDbg. There’s a comment explaining it in
PDB.cpp. But after some recent fixes this may just work. So we should
probably try again to emit the S_UDT in lld, I think that should fix it

lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp
37

I have figured that out, sorry. I usually use disassembly tools for this purpose.

lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp
37

I have just dumped two PDBs, one was produced with cl and link, and other with clang-cl and lld-link with the same keys (/Zi /GS- /c for compilation, /nodefaultlib /debug:full /entry:main for linking). I have compiled the source:

struct S {
  struct {
    char a;
    short b;
  };
  short c;
  union {
    short d;
    int e;
  };
};

int main() {
  S ss[sizeof(S)];
  return 0;
}

and have retrieved identical type infos from PDBs:

struct S::<unnamed-tag> [sizeof = 0] {}

union S::<unnamed-tag> [sizeof = 0] {}

struct S [sizeof = 12] {
  data +0x00 [sizeof=1] char a
  <padding> (1 bytes)
  data +0x02 [sizeof=2] short b
  data +0x04 [sizeof=2] short c
  <padding> (2 bytes)
  data +0x08 [sizeof=2] short d
  data +0x08 [sizeof=4] int e
}
Total padding 3 bytes (25% of class size)
Immediate padding 3 bytes (25% of class size)

struct S::<unnamed-tag> [sizeof = 4] {
  data +0x00 [sizeof=1] char a
  <padding> (1 bytes)
  data +0x02 [sizeof=2] short b
}
Total padding 1 bytes (25% of class size)
Immediate padding 1 bytes (25% of class size)

union S::<unnamed-tag> [sizeof = 4] {
  data +0x00 [sizeof=2] short d
  data +0x00 [sizeof=4] int e
}

So it seems that both cl and clang emit enough info to restore the layout of unnamed unions or structs. But we also need to:

  • Find a location of unnamed fields of such types in the outer structure (S in our case);
  • Somehow drop fields a, b, d and e from it (because we will place the unnamed fields of the unnamed types there).

And I can't find enough info for that. How do you think, is it possible?

zturner added a subscriber: asmith.Jul 25 2018, 3:25 AM

When you have a DIA interface for struct S, can you just call
findChildren<PDBSymbolTypeUDT>()? Will that enumerate tge unnamed struct?
The fact that pdbutil doesn’t is only an indication of how the printing
code behaves, you shouldn’t interpret anything about what information is
available from it

When you have a DIA interface for struct S, can you just call
findChildren<PDBSymbolTypeUDT>()? Will that enumerate tge unnamed struct?

I have checked this again with cl and clang-cl (but I have enumerated all children, not only UDTs). For the cl-emitted PDB there are only Data children (for fields a, b, c, d and e). The clang-emitted PDB also contains all the Data children, but it also contains two unnamed UDT children (for the unnamed struct and the unnamed union). I.e. clang emits more info than cl.

But I can't understand, how will this info help us to resolve two points I've mentioned in the previous message? Can you explain, please?

Thanks!

The fact that pdbutil doesn’t is only an indication of how the printing
code behaves, you shouldn’t interpret anything about what information is
available from it

Wouldn’t the location of the unnamed struct be the location of its first
member? We already print the offsets of the individual members, so that
part is solvable (although that code is horrendously complex). The second
part is figuring out how to correlate the member back to the unnamed struct
it came from. If you can find the unnamed struct by enumerating children of
S, this isn’t too bad. S has a data member named a and a child unnamed
struct with a member named a, therefore they must be the same a. If the
unnamed struct doesn’t show up when enumerating children though, then I’m
not sure .

Yes, I have understood that, thank you. It may be not trivial in a case of multiple nested unnamed unions/structs, but it's solvable for clang-emitted PDBs. But I'm not sure about cl-emitted PDBs too.

Can we leave the things as in this commit (and allow the fields overlapping in MicrosoftRecordLayoutBuilder)? So fields of nested unnamed UDTs will be treated as outer structure's fields. It will allow us to support both clang and cl PDBs, but what are the drawbacks of such solution?

zturner accepted this revision.Jul 26 2018, 8:22 PM

I am not sure what are the drawbacks of such a solution are. I guess the best way to find out is to try it and see if you observe any strange behavior :) I'm ok with that, for now.

This revision is now accepted and ready to land.Jul 26 2018, 8:22 PM

Thanks!

I have created the corresponding patch for Clang (D49871, @rnk have accepted it. Can you commit this, please? I have no commit access).

But current patch intersects with D49368, and I'm preparing the combined patch (we were talking about it some earlier in the thread), so can you, please, NOT commit current patch now? :) I'll submit the combined patch on Monday (I have a day off today), it's almost ready.

In the meantime, perhaps you could request commit access :)

It would be great, thank you! I'll also do it on Monday, if it's already possible.

asmith abandoned this revision.Aug 28 2018, 6:39 PM