This is a split of D113724. Calling TypeSystemClang::AddMethodToCXXRecordType
to create function decls for class methods.
lldb-test symbols --dump-ast also triggers that crash.
Differential D113930
[LLDB][NativePDB] Fix function decl creation for class methods zequanwu on Nov 15 2021, 12:25 PM. Authored by
Details This is a split of D113724. Calling TypeSystemClang::AddMethodToCXXRecordType lldb-test symbols --dump-ast also triggers that crash.
Diff Detail
Event TimelineComment Actions Tagging Shafik in case he wants to look at the AST usage, but overall, this seems to be in line with the discussions on the other patch.
Comment Actions I realized that the test case here is incorrect, because the output of lldb-test symbols dump-ast ... shows void virtual_method() twice. The void virtual_method() without virtual is created by the new code. Because lldb-test symbols dump-ast ... calls Module::ParseAllDebugSymbols() which calls SymbolFileNativePDB::ParseTypes(CompileUnit &comp_unit). It will construct all debug info inside TagDecls found from TPI stream, so the second void virtual_method() shows up. I tried to use this for testing: lldb-test symbols --find=function --name=full::name --function-flags=full ..., but it doesn't even shows information for function main. I'm working on another way to fix it. Comment Actions Use lldb-test symbols --find=function --name=full::name --function-flags=full for testing.
Comment Actions Fix style.
Comment Actions Add test with lldb-test symbols --dump-ast though it printed the methods twice. It is considered a separate bug that could be fixed later. Comment Actions I am somewhat confused by this deduplication logic. I've read https://reviews.llvm.org/D113930#3136404, so I understand why the duplication is happening, but I don't understand _should_ it be happening. IIUC, SymbolFileNativePDB::ParseTypes tries to force the completion of all types. If it succeeds, then there should be no need to add anything else to that type. In fact, adding methods to an already-complete type sounds like a pretty bad idea. While I'm not very familiar with PDBs or the process of clang ast creation, it sounds to me like there is something wrong here, and I want to make sure we're not using the deduplication to cover up some other issue. Were you able to understand (and ideally, explain) why we have the two parsing methods and why they end up trying to add the same method twice? Comment Actions In ModuleParseAllDebugSymbols, these two functions (NativePDB::ParseFunctions and NativePDB::ParseTypes) are called with NativePDB plugin. NativePDB::ParseFunctions parses pdb debug stream, which contains information for those methods existing in the binary. Methods defined in the source files but removed by optimizations will not show up in it. This is where those methods get created at first time. NativePDB::ParseTypes parses pdb type stream, which contains type information for methods even if they are removed by optimizations. All class members are in the class's FieldList type record. ParseTypes visits all classes' FieldList to complete their TagRecord. This is where those methods get created the second time. We can refer to a function's type record (from type stream) using its symbol record (from debug stream) but not the opposite, since it's common multiple functions have the same signature, which will be just one type record for them. If we can change the order of function calls so that ParseFunctions is after ParseTypes, we check if a method is already created or not, not sure if it's okay. Comment Actions Thanks for the explanation. This makes things a lot clearer (though it doesn't convince me that this is the right approach). I still think that the piecemeal addition of member functions is a problem. While I don't think that changing the parsing order in Module::ParseAllDebugSymbols would be a problem (that function is only called from lldb-test), I think that would only hide the problem. Looking at what happens in the DWARF case, I /think/ (the code is fairly messy, so I could be wrong) that we don't create a clang ast representation for the function in ParseFunctions. We only create an lldb_private::Function (in DWARFASTParserClang::ParseFunctionFromDWARF). The ast entity only gets created in SymbolFileDWARF::ParseTypes (->ParseType->DWARFASTParserClang->ParseTypeFromDWARF). This kinda makes sense since we only need the ast representation of the method when we start dealing its class. So I have a feeling that the real solution here is to avoid creating an ast entry for the function in the first phase. Then you should be able to create the full class declaration in the second phase, as you will have complete information there. Does that make sense? Comment Actions From what I tested on dwarf plugin, lldb-test symbols -dump-ast doesn't print any ast decl, except a skeleton class decl. After adding some logs at (https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp#L1026), it doesn't get executed. $ lldb-test symbols -dump-ast ast-methods.exe Module: ast-methods.exe struct Struct; Nevertheless I think it works in dwarf because it creates the decls if die is DW_TAG_subprogram/DW_TAG_inlined_subroutine /DW_TAG_subroutine_type (https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp#L522). We can associate a decl with a die. But we can't associate a type record with a decl when parsing type stream in PDB.
If we don't create ast entries at ParseFunctions, non-class-member functions will not have ast decls. If we just create ast decls for non-class-member functions in ParseFunctions, we are expecting that ParseTypes is always called after ParseFunctions to complete the decl creation for member functions, which is not right. Comment Actions I am still not sure this is the right solution, but a) I don't know what *is* the right solution; and (b) I don't think this makes the situation worse; so I am hitting accept. I don't think this is as bad as it may sound at first. For lldb-test and ParseAllDebugSymbols purposes, I would even say its perfectly fine. Overall I am not too worried about what happens in the ParseAllDebugSymbols scenario. I am more interested in what happens during "normal" operation when all of this is parsed lazily, in response to FindFunctions/FindTypes/... queries. (*) If everything works fine there then I think this would even be preferable, since you generally want to parse only the minimal amount of information. But I don't really even know how all of this works in DWARF, so I don't want to force you to redesign everything. (*) A corollary of this is that I don't really trust that SymbolFileDWARF is doing "the right thing" in response to ParseAllDebugSymbols, since the main use case is lazy parsing, and ParseAllDebugSymbols only started being used relatively recently. |
As a style thing, this class is quite large, I think I would prefer to see this at global scope in an anonymous namespace.