Preivously we would output S_UDT records for typedefs that pointed to forward decls. This does not match the behavior of MSVC, so we update S_UDT records to always point to full decls. However, there are some cases where a full decl is not available, so in this case we now drop these S_UDT records entirely since they are of no help. Some tests indicate that this patch closes about 20% of the gap between clang-cl and MSVC in terms of /DEBUG:FASTLINK PDB size.
Details
Diff Detail
Event Timeline
It matches the behavior in C++ mode. I haven't tested C mode. Is there something special about C mode?
Update to be a little big more aggressive. From the bug:
Making progress. I found that in ELFObjectWriter.cpp.obj, we were emitting about 1,100 S_UDT records that referred to LF_POINTER types, while MSVC was emitting only 9. Almost all of them were from STL template instantiations.
I dug into this and found that of the first 8 of the LF_POINTERs that I looked at, they all were pointers to types that did not have a complete decl in the debug info. So, it sounds like my original optimization was correct, but I just needed to be more aggressive. Instead of dropping it if it is "just" a typedef to a type with incomplete debug info, we need to also drill through const, volatile, pointer, reference, etc until we get to the real type.
This is a very marked improvement.
before after msvc Gap Closed FileCheck.pdb 4,452,352 4,165,632 5,165,056 -40.23% opt.pdb 466,702,336 421,957,632 365,957,120 44.41% libclang.pdb 1,304,383,488 1,177,718,784 935,817,216 34.37% clang.pdb 1,098,010,624 977,195,008 822,833,152 43.90% lld.pdb 508,645,376 454,995,968 404,697,088 51.61% llvm-tblgen.pdb 59,822,080 51,105,792 38,465,536 40.81% clang-tblgen.pdb 15,085,568 13,332,480 13,545,472 113.83%
So we're now about 50% of the way to parity. Now that I'm on the right track hopefully I can figure out the rest of them pretty easily.
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp | ||
---|---|---|
2204 | This isn't attached to the null check. It would look better as a function level comment (///). | |
2215 | You should be able to dyn_cast to DICompositeType and call ->isForwardDecl(). Generally speaking, when translating one representation to another, it's better to look at the input and not the output whenever possible. Actually, is there any reason you can't do that during addToUdts? What does the TI == CTI comparision achieve? | |
2217–2219 | You can simplify this to return the condition. |
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp | ||
---|---|---|
2215 | We want to emit a UDT if:
If we just say return !T->isForwardDecl(); this doesn't handle the case where T is a forward decl but a complete decl exists that we could remap to. Also, for typedefs, addToUdts(Ty) can happen before we know if there will ever be a complete decl for Ty so this can't happen until the end. |
Previous patch was slightly wrong. With this new update we're about 70% to parity with MSVC's fastlink PDBs.
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp | ||
---|---|---|
2215 | Re: typedefs, that's what I'm getting at. We don't have to wait to the end, because we should have the DICompositeType, and it already knows if it's complete or forward. |
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp | ||
---|---|---|
2205 | All this discussion of what we have and haven't seen doesn't make sense to me. As long as we keep looking at the DI metadata, we know up front what types will be complete and what types used forward decls. Clang replaces forward decls with complete DICompositeTypes when it thinks it needs one. We should be able to look at a DIType before we add it to the UDT list and say, "after looking through pointers and qualifiers, does this refer to a complete class or struct type?" If yes, add it, otherwise, don't. |
So maybe you're right. I was under the impression that you could end up with IR that contained two DICompositeTypes. one with DW_TAG_structure_type and DIFlagFwdDecl, and a second without the forward decl flag. Then I assumed you could have a DW_TAG_typedef that referred to the forward decl, even though the complete decl appeared separately later in the IR. It seems this is not the case though and for a given linkage name, the generated IR contains either a forward decl or a complete decl, but not both. Is that right?
I bet we can simplify this and get rid of the std::string by computing it later, but that seems like a separate patch.