I cannot update the patch of D32167 so I have created this new revision for that.
Details
Diff Detail
- Repository
- rLLDB LLDB
Event Timeline
source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp | ||
---|---|---|
268 ↗ | (On Diff #174520) | I have added these 4 lines - decl_ctx copy - which fixes the -fdebug-types-section regression described before as: There is a _dwarf -> _dwarf_type_units regression for: -f TestCppTypeLookup.test_namespace_only_dwarf_type_units ./bin/clang -o 1 ../llvm-git/tools/lldb/packages/Python/lldbsuite/test/lang/cpp/type_lookup/main.cpp -g -fdebug-types-section;./bin/lldb -o 'settings set target.experimental.inject-local-vars false' -o 'target create "./1"' -o 'b 66' -o r -o 'p *((in_contains_type *)&i)' Expected: (lldb) p *((in_contains_type *)&i) error: use of undeclared identifier 'in_contains_type' error: expected expression Actual: (lldb) p *((in_contains_type *)&i) (in_contains_type) $0 = (aaa = 123) The regression was due to DWARFASTParserClang::GetClangDeclContextForDIE calling ResolveType but then still GetCachedClangDeclContextForDIE was returning NULL decl_ctx. That was because in this code above the recursive ParseTypeFromDWARF call properly did set LinkDeclContextToDIE but only for the referenced DIE, not for the referring DIE. But then the caller asked for DeclContext of the referring DIE which has not been found. Compilation Unit @ offset 0x84: Signature: 0x368b3ff5e79ee8d1 <0><9b>: Abbrev Number: 1 (DW_TAG_type_unit) <1><a2>: Abbrev Number: 2 (DW_TAG_namespace) <a3> DW_AT_name : (indirect string, offset: 0x149): nsp_a <2><a7>: Abbrev Number: 6 (DW_TAG_structure_type) HERE: Referenced DIE. <a9> DW_AT_name : (indirect string, offset: 0x169): contains_type Compilation Unit @ offset 0xb2: Signature: 0xb27591880ea58868 <0><c9>: Abbrev Number: 1 (DW_TAG_type_unit) <1><d0>: Abbrev Number: 2 (DW_TAG_namespace) <d1> DW_AT_name : (indirect string, offset: 0x149): nsp_a <2><d5>: Abbrev Number: 7 (DW_TAG_structure_type) HERE: Referring DIE. <d6> DW_AT_declaration : 1 <d6> DW_AT_signature : signature: 0x368b3ff5e79ee8d1 <3><de>: Abbrev Number: 3 (DW_TAG_structure_type) <e0> DW_AT_name : (indirect string, offset: 0x166): in_contains_type <4><e7>: Abbrev Number: 4 (DW_TAG_member) <e8> DW_AT_name : (indirect string, offset: 0x177): aaa |
@clayborg You have already approved this patchset merged with D51578 in D51578 three weeks ago. I have split it afterwards to:
1of2 | D51578 | the section merge which you have approved; but it is not much useful without any code using it. |
2of2 | this D54670 | your original .debug_types patch but with my added few lines of code to fix its regression when using .debug_types. These few lines have not been approved yet. |
Should I check it in as you wrote it or is it OK with those few lines of code added by me?
Few style nits
lldb/source/Plugins/SymbolFile/DWARF/DIERef.cpp | ||
---|---|---|
44–45 | Invert and make this an early return. | |
51 | This auto is not entirely obvious for me, same for the next line. | |
53 | How about if (auto tu = ...)? | |
59 | How about cu_offset = dwarf_cu->GetBaseObjOffset() != DW_INVALID_OFFSET ? dwarf_cu->GetBaseObjOffset() : dwarf_cu->GetOffset(); | |
64 | Move this up. | |
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp | ||
261 | How about if (auto signature_die = ...)? | |
lldb/source/Plugins/SymbolFile/DWARF/DWARFTypeUnit.cpp | ||
45 | Let's make this a sentence with a full stop at the end. | |
lldb/source/Plugins/SymbolFile/DWARF/DWARFTypeUnit.h | ||
75 | /// | |
lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp | ||
945 | Full sentence. |
I am still worried about the divergence from llvm's dwarf reader here. We now have a relatively nice Object API, which presents a view of the object file that everybody is used to (i.e., as a bunch of sections). This includes the llvm object and dwarf readers, and the DWARF spec (!). But now we are drilling a hole in that API, which basically reverts the sections and gives us back the a view of the object file as a stream of bytes. And if the object file happens to be compressed, then we have to mock up a new uncompressed buffer with all of the data, even though the compression is supposed to be transparent and unobtrusive (and it mostly is, if you stick to viewing the object file as a collection of sections). All because our dwarf reader cannot handle the fact that the debug info can come from multiple sections? How come nobody else has this problem?
Treating dwarf data as a single buffer with various chunks starting at random offsets should essentially be isomorphous with treating it as a collection of independent buffers. We are here introducing a function called get_debug_types_offset. Why not have get_debug_types_data_extractor() instead? That may mean some functions need to be changed so that they take an additional buffer argument, instead of just assuming everything refers to the single grand buffer, but I think that's a good thing, because it brings us closer to reality instead of continuing to fabricate ever more complicated fiction. BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?
I am still going to investigate your DW_FORM_* dispatching suggestion. It looks too simple to work, maybe it is the right way forward.
the debug info can come from multiple sections? How come nobody else has this problem?
Because GDB does not need to return back to DWARF after it is parsed - but then for LLDB the multiple-sections dispatching could be done only for user_id_t
Why not have get_debug_types_data_extractor() instead?
Because DataExtractor accesses always start at offset 0.
BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?
It would fall back to the section-reading (instead of section-mmapping) less performing fallback in get_debug_info_data(). If there are any such files out there sure the code could be extended to handle both orders of .debug_info and .debug_types but personally I do not think it is worth the code complication given there is the fallback anyway. I can put a warning there for such case so that it would be easily noticed in the future - but then maybe one could already rather extend the code, any suggestions what to choose?
Thank you for doing that. Please don't feel like I'm taking this all out to you, I know that you are following the direction that has already been set out. My comment is directed at everyone who has interest in the state of lldb's dwarf parser.
And I am fully aware that my proposal is a gross oversimplification, and that doing that might be hard, but I believe the end goal is worth it, and that we can reach it incrementally.
One of the problems we'll probably have to tackle is the encoding of the user_id_t, which is currently very wastefull. Despite it being 64 bits, we can only encode 4GB of debug info. That's still a fairly large number, but it's still not that hard to surpass it. If we slice one bit out of that, we'll get 2GB, which will be just too small. So we'll either have to increase the size of that type, use a more rational encoding (we already have compile units and dies sitting in a vector, so maybe index into that vector?), or use some concatenation scheme just for the purposes of user_id_t, but then bail out and decompose that very early on.
Why not have get_debug_types_data_extractor() instead?
Because DataExtractor accesses always start at offset 0.
Yes, that's part of the idea. :) Starting at zero is the right thing to do here, as that's the base any intra-section references will be using. (and for inter-section refs, you have to manually fiddle with the offsets anyway)
BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?
It would fall back to the section-reading (instead of section-mmapping) less performing fallback in get_debug_info_data(). If there are any such files out there sure the code could be extended to handle both orders of .debug_info and .debug_types but personally I do not think it is worth the code complication given there is the fallback anyway. I can put a warning there for such case so that it would be easily noticed in the future - but then maybe one could already rather extend the code, any suggestions what to choose?
Hm... ok, I'm glad you've thought about all of the possibilities. Though I would say that this just show that we have an incorrect abstraction here. :)
Of the two options I would say only adding the warning makes sense, since I believe making everything aware of the fact that .debug_info can start at a non-zero offset is equivalent to teaching it that it can start in a different section.
Invert and make this an early return.