This is an alternative to https://reviews.llvm.org/D49368
Details
Diff Detail
Event Timeline
I have no opinion on the implementation, but I like that you wrote a non-execution test for this, as this will enable us one day to run it on non-windows hosts too.
lit/SymbolFile/PDB/class-layout.test | ||
---|---|---|
51–54 | Should these actually be check-DAGs? I am assuming the member order will impact the final class layout, so you need to match the source code order here. (I don't know how hard would it be, but it might be interesting to change the dumping code to produce final element offsets, so you can assert the exact structure layout). |
Ah, we need some way to synchronize our work :)
I think in this case it will be better to appreciate the pros and cons of our patches, choose the better variant and improve that with the features available in another variant. I will investigate your patch to understand the differences, can you look at my patch too, please?
I agree with Pavel, the test in your patch looks better, than in my one. I think there must be non-execute test in the result patch too.
It seems that you forgot to add the testing *.cpp files from the Input directory to the patch...
I think that this patch is more solid and covers more cases, than D49368, especially in CreateLLDBTypeFromPDBType part. But D49368 has also following:
- Filling of a layout info. It allows use of packed types, bitfield structs with unnamed fields (unnamed fields themselves are not processed, but named fields are located correctly) etc., you can look at D49368 test case;
- Adding of method overrides for CXXRecordType.
In my opinion, it's not so difficult to add these features to this patch. What do you think about it?
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
647–648 | It's not so important, but I think that these lines can be deleted if arguments of subsequent if will be flipped. | |
696 | If I understand correctly, base_of_class must be false for structs. May be check the kind of pdb_udt here? | |
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
587 | I don't fully understand, please, explain me, when does this can be false? | |
588 | What are the advantages and disadvantages of immediate (here) and deferred (at the CompleteType function) completions? I thought that deferred completion is a kind of lazy optimization, so we lost its advantages? |
lit/SymbolFile/PDB/class-layout.test | ||
---|---|---|
4 | If you change this to lld-link this test may be able to run on non windows today. | |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
186 | I wonder if this would be better as an llvm_unreachable. Being able to assume this function returns a sane value would simplify calling code. | |
198 | And here. | |
246 | This looks unusual. Did you clang-format it? | |
271–272 | So is this access == None a thing that can actually happen? AFAICT it's impossible for getAccess() to return anything other than public, protected, or private, in which case this code path will never get executed, so we can just delete it. | |
291–295 | Can you remind me what this means? How would an entire type be marked const or volatile? There's no instance variable in play here, these are just types as they are declared in the source code, if I'm not mistaken. Something like const class Foo {int x; }; doesn't make any sense. So I'm curious under what circumstances these 2 if statements would evaluate to true. |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
291–295 | If I understand right, we may find ourselves here during processing of argument types of some method. |
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
---|---|---|
587 | All types are parsed by symbol->getSymIndexId() except PDBSymbolBaseClass and PDBSymbolFunc symbols they are parsed by symbol->getTypeId(). After completion, Clang type of PDBSymbolBaseClass is reset with symbol->getSymIndexId(). |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
696 | Struct/Union can't have base class. Only class udt is handled. I think it is safe to set it ture. |
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
---|---|---|
588 | I think the initial purpose was that the 'incomplete' can be detected immediately after 'first' parsing and then solved immediately. The deferred type completion can encounter a drawback (maybe be worked around by using lock) a. the deferred procedure could happen during the 'm_types' (TypeList)" dumping It will result in undefined behavior if a list is expanded while accessing it with iterator. void TypeList::Dump(Stream *s, bool show_context) { for (iterator pos = m_types.begin(), end = m_types.end(); pos != end; ++pos) { pos->get()->Dump(s, show_context); } } In this sense, https://reviews.llvm.org/D49368 has a better solution. |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
246 | Just tried clang-format this file with llvm style. They seemed good in shape. |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
271–272 | Chance is that udt->getAccess() could return PDB_MemberAccess() (ctor) if the DIA call fails. Need to check if CodeView generates default access for a nested udt or not. I think. |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
647–648 | member udt or base class shall have all been completed before their parent 's completion. The 'if' is a double insurance? seeing a hard assert at line #659. |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
696 | I think that it must be false for following code: class B { char _b; }; struct S : B { int _s; }; |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
186 | PDBSymbolTypeUDT::getType() returns PDB_MemberAccess that equals 0, so we can't make it llvm_unreachable. |
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp | ||
---|---|---|
37 | Here is a problem. MicrosoftRecordLayoutBuilder asserts every field or base offset, but in our case fields a and b are treated as struct Complex's fields, not union's, so lldb crashes in debug on this. I can't find enough info in PDB to restore the unnamed union here. Do you have any ideas about it? |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
273 | Yes, we can occur here, as I have mentioned at the line 186. But if I understand correctly for following code: class C { struct S { }; }; we will retrieve public as the access for S, but must retrieve private, right? May be we need to get the default accessibility of the class parent? |
lit/SymbolFile/PDB/class-layout.test | ||
---|---|---|
4 | It seems that with lld-link the GetDeclarationForSymbol function can't find declarations for UDTs. |
lit/SymbolFile/PDB/class-layout.test | ||
---|---|---|
4 | Heh, I know exactly what that is actually. Ignore my request, it's easy to fix in lld-link but I don't have the time to do it right now. |
lit/SymbolFile/PDB/class-layout.test | ||
---|---|---|
4 | Ok, thanks! |
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp | ||
---|---|---|
37 | Based on MSVC cl yielded PDB, you could have full information to restore the unnamed UDT. From my experience, PDB yielded by clang-cl (/Z7 or /Zi) is slightly different from the one by cl. Both contain information about forwarded unnamed UDT. The CodeView info is good. Maybe you need to look at LLC? CodeView FieldList (0x1044) { TypeLeafKind: LF_FIELDLIST (0x1203) DataMember { TypeLeafKind: LF_MEMBER (0x150D) AccessSpecifier: Public (0x3) Type: int (0x74) FieldOffset: 0x0 Name: a } DataMember { TypeLeafKind: LF_MEMBER (0x150D) AccessSpecifier: Public (0x3) Type: float (0x40) FieldOffset: 0x0 Name: b } } Union (0x1045) { TypeLeafKind: LF_UNION (0x1506) MemberCount: 2 Properties [ (0x608) HasUniqueName (0x200) Nested (0x8) Sealed (0x400) ] FieldList: <field list> (0x1044) SizeOf: 4 Name: Complex::<unnamed-tag> LinkageName: .?AT<unnamed-type-$S2>@Complex@@ } llvm-pdbutil pdb (clang-cl /z7) (found unnamed symbol, however size = 0, they will be just ignored. See PDBASTParser.cpp #259 struct Complex::<unnamed-tag> [sizeof = 0] {} union Complex::<unnamed-tag> [sizeof = 0] {} struct Complex [sizeof = 728] { data +0x00 [sizeof=720] _List* array[90] data +0x2d0 [sizeof=4] int x data +0x2d4 [sizeof=4] int a data +0x2d4 [sizeof=4] float b } llvm-pdbutil pdb ( cl /z7) ( you have full information to restore unnamed) struct Complex [sizeof = 728] { data +0x00 [sizeof=720] _List* array[90] data +0x2d0 [sizeof=4] int x data +0x2d4 [sizeof=4] int a data +0x2d4 [sizeof=4] float b } Total padding 3 bytes (25% of class size) Immediate padding 3 bytes (25% of class size) struct Complex::<unnamed-tag> [sizeof = 4] { data +0x00 [sizeof=4] int x } union Complex::<unnamed-tag> [sizeof = 4] { data +0x00 [sizeof=4] int a data +0x00 [sizeof=4] float b } |
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp | ||
---|---|---|
696 | Agree. |
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp | ||
---|---|---|
37 | Thank you! But what means LLC? |
I think the problem is in lld, we don’t emit S_UDT symbols because it
caused weird problems in WinDbg. There’s a comment explaining it in
PDB.cpp. But after some recent fixes this may just work. So we should
probably try again to emit the S_UDT in lld, I think that should fix it
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp | ||
---|---|---|
37 | I have figured that out, sorry. I usually use disassembly tools for this purpose. |
lit/SymbolFile/PDB/Inputs/ClassLayoutTest.cpp | ||
---|---|---|
37 | I have just dumped two PDBs, one was produced with cl and link, and other with clang-cl and lld-link with the same keys (/Zi /GS- /c for compilation, /nodefaultlib /debug:full /entry:main for linking). I have compiled the source: struct S { struct { char a; short b; }; short c; union { short d; int e; }; }; int main() { S ss[sizeof(S)]; return 0; } and have retrieved identical type infos from PDBs: struct S::<unnamed-tag> [sizeof = 0] {} union S::<unnamed-tag> [sizeof = 0] {} struct S [sizeof = 12] { data +0x00 [sizeof=1] char a <padding> (1 bytes) data +0x02 [sizeof=2] short b data +0x04 [sizeof=2] short c <padding> (2 bytes) data +0x08 [sizeof=2] short d data +0x08 [sizeof=4] int e } Total padding 3 bytes (25% of class size) Immediate padding 3 bytes (25% of class size) struct S::<unnamed-tag> [sizeof = 4] { data +0x00 [sizeof=1] char a <padding> (1 bytes) data +0x02 [sizeof=2] short b } Total padding 1 bytes (25% of class size) Immediate padding 1 bytes (25% of class size) union S::<unnamed-tag> [sizeof = 4] { data +0x00 [sizeof=2] short d data +0x00 [sizeof=4] int e } So it seems that both cl and clang emit enough info to restore the layout of unnamed unions or structs. But we also need to:
And I can't find enough info for that. How do you think, is it possible? |
When you have a DIA interface for struct S, can you just call
findChildren<PDBSymbolTypeUDT>()? Will that enumerate tge unnamed struct?
The fact that pdbutil doesn’t is only an indication of how the printing
code behaves, you shouldn’t interpret anything about what information is
available from it
I have checked this again with cl and clang-cl (but I have enumerated all children, not only UDTs). For the cl-emitted PDB there are only Data children (for fields a, b, c, d and e). The clang-emitted PDB also contains all the Data children, but it also contains two unnamed UDT children (for the unnamed struct and the unnamed union). I.e. clang emits more info than cl.
But I can't understand, how will this info help us to resolve two points I've mentioned in the previous message? Can you explain, please?
Thanks!
The fact that pdbutil doesn’t is only an indication of how the printing
code behaves, you shouldn’t interpret anything about what information is
available from it
Wouldn’t the location of the unnamed struct be the location of its first
member? We already print the offsets of the individual members, so that
part is solvable (although that code is horrendously complex). The second
part is figuring out how to correlate the member back to the unnamed struct
it came from. If you can find the unnamed struct by enumerating children of
S, this isn’t too bad. S has a data member named a and a child unnamed
struct with a member named a, therefore they must be the same a. If the
unnamed struct doesn’t show up when enumerating children though, then I’m
not sure .
Yes, I have understood that, thank you. It may be not trivial in a case of multiple nested unnamed unions/structs, but it's solvable for clang-emitted PDBs. But I'm not sure about cl-emitted PDBs too.
Can we leave the things as in this commit (and allow the fields overlapping in MicrosoftRecordLayoutBuilder)? So fields of nested unnamed UDTs will be treated as outer structure's fields. It will allow us to support both clang and cl PDBs, but what are the drawbacks of such solution?
I am not sure what are the drawbacks of such a solution are. I guess the best way to find out is to try it and see if you observe any strange behavior :) I'm ok with that, for now.
Thanks!
I have created the corresponding patch for Clang (D49871, @rnk have accepted it. Can you commit this, please? I have no commit access).
But current patch intersects with D49368, and I'm preparing the combined patch (we were talking about it some earlier in the thread), so can you, please, NOT commit current patch now? :) I'll submit the combined patch on Monday (I have a day off today), it's almost ready.
Here is a problem. MicrosoftRecordLayoutBuilder asserts every field or base offset, but in our case fields a and b are treated as struct Complex's fields, not union's, so lldb crashes in debug on this. I can't find enough info in PDB to restore the unnamed union here. Do you have any ideas about it?