This is an archive of the discontinued LLVM Phabricator instance.

03/03: .debug_types: Update of D32167 (.debug_types) on top of D51578 (section concatenation)
AbandonedPublic

Authored by jankratochvil on Nov 17 2018, 3:00 PM.

Details

Summary

I cannot update the patch of D32167 so I have created this new revision for that.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jankratochvil created this revision.Nov 17 2018, 3:00 PM
jankratochvil retitled this revision from Update of D32167 on top of D51578 to .debug_types: Update of D32167 on top of D51578 (.debug_types section concatenation).Nov 17 2018, 3:01 PM
jankratochvil added inline comments.Nov 17 2018, 3:12 PM
source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
268 ↗(On Diff #174520)

I have added these 4 lines - decl_ctx copy - which fixes the -fdebug-types-section regression described before as:

There is a _dwarf -> _dwarf_type_units regression for: -f TestCppTypeLookup.test_namespace_only_dwarf_type_units

./bin/clang -o 1 ../llvm-git/tools/lldb/packages/Python/lldbsuite/test/lang/cpp/type_lookup/main.cpp -g -fdebug-types-section;./bin/lldb -o 'settings set target.experimental.inject-local-vars false' -o 'target create "./1"' -o 'b 66' -o r -o 'p *((in_contains_type *)&i)'

Expected:

(lldb) p *((in_contains_type *)&i)
error: use of undeclared identifier 'in_contains_type'
error: expected expression

Actual:

(lldb) p *((in_contains_type *)&i)
(in_contains_type) $0 = (aaa = 123)

The regression was due to DWARFASTParserClang::GetClangDeclContextForDIE calling ResolveType but then still GetCachedClangDeclContextForDIE was returning NULL decl_ctx. That was because in this code above the recursive ParseTypeFromDWARF call properly did set LinkDeclContextToDIE but only for the referenced DIE, not for the referring DIE. But then the caller asked for DeclContext of the referring DIE which has not been found.

  Compilation Unit @ offset 0x84:
   Signature:     0x368b3ff5e79ee8d1
 <0><9b>: Abbrev Number: 1 (DW_TAG_type_unit)
 <1><a2>: Abbrev Number: 2 (DW_TAG_namespace)
    <a3>   DW_AT_name        : (indirect string, offset: 0x149): nsp_a
 <2><a7>: Abbrev Number: 6 (DW_TAG_structure_type)
HERE: Referenced DIE.
    <a9>   DW_AT_name        : (indirect string, offset: 0x169): contains_type
  Compilation Unit @ offset 0xb2:
   Signature:     0xb27591880ea58868
 <0><c9>: Abbrev Number: 1 (DW_TAG_type_unit)
 <1><d0>: Abbrev Number: 2 (DW_TAG_namespace)
    <d1>   DW_AT_name        : (indirect string, offset: 0x149): nsp_a
 <2><d5>: Abbrev Number: 7 (DW_TAG_structure_type)
HERE: Referring DIE.
    <d6>   DW_AT_declaration : 1
    <d6>   DW_AT_signature   : signature: 0x368b3ff5e79ee8d1
 <3><de>: Abbrev Number: 3 (DW_TAG_structure_type)
    <e0>   DW_AT_name        : (indirect string, offset: 0x166): in_contains_type
 <4><e7>: Abbrev Number: 4 (DW_TAG_member)
    <e8>   DW_AT_name        : (indirect string, offset: 0x177): aaa

@clayborg You have already approved this patchset merged with D51578 in D51578 three weeks ago. I have split it afterwards to:

1of2D51578the section merge which you have approved; but it is not much useful without any code using it.
2of2this D54670your original .debug_types patch but with my added few lines of code to fix its regression when using .debug_types. These few lines have not been approved yet.

Should I check it in as you wrote it or is it OK with those few lines of code added by me?

jankratochvil retitled this revision from .debug_types: Update of D32167 on top of D51578 (.debug_types section concatenation) to 03/03: .debug_types: Update of D32167 (.debug_types) on top of D51578 (section concatenation).Feb 17 2019, 11:15 AM

I am definitely ready for this to go in ASAP if we are good on this!

clayborg accepted this revision.Feb 19 2019, 12:53 PM
This revision is now accepted and ready to land.Feb 19 2019, 12:53 PM

Few style nits

lldb/source/Plugins/SymbolFile/DWARF/DIERef.cpp
44–45

Invert and make this an early return.

51

This auto is not entirely obvious for me, same for the next line.

53

How about if (auto tu = ...)?

59

How about

cu_offset = dwarf_cu->GetBaseObjOffset() != DW_INVALID_OFFSET ? dwarf_cu->GetBaseObjOffset() : dwarf_cu->GetOffset();
64

Move this up.

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
261

How about if (auto signature_die = ...)?

lldb/source/Plugins/SymbolFile/DWARF/DWARFTypeUnit.cpp
45

Let's make this a sentence with a full stop at the end.

lldb/source/Plugins/SymbolFile/DWARF/DWARFTypeUnit.h
75

///

lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
945

Full sentence.

I am still worried about the divergence from llvm's dwarf reader here. We now have a relatively nice Object API, which presents a view of the object file that everybody is used to (i.e., as a bunch of sections). This includes the llvm object and dwarf readers, and the DWARF spec (!). But now we are drilling a hole in that API, which basically reverts the sections and gives us back the a view of the object file as a stream of bytes. And if the object file happens to be compressed, then we have to mock up a new uncompressed buffer with all of the data, even though the compression is supposed to be transparent and unobtrusive (and it mostly is, if you stick to viewing the object file as a collection of sections). All because our dwarf reader cannot handle the fact that the debug info can come from multiple sections? How come nobody else has this problem?

Treating dwarf data as a single buffer with various chunks starting at random offsets should essentially be isomorphous with treating it as a collection of independent buffers. We are here introducing a function called get_debug_types_offset. Why not have get_debug_types_data_extractor() instead? That may mean some functions need to be changed so that they take an additional buffer argument, instead of just assuming everything refers to the single grand buffer, but I think that's a good thing, because it brings us closer to reality instead of continuing to fabricate ever more complicated fiction. BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?

I am still worried about the divergence from llvm's dwarf reader here.

I am still going to investigate your DW_FORM_* dispatching suggestion. It looks too simple to work, maybe it is the right way forward.

the debug info can come from multiple sections? How come nobody else has this problem?

Because GDB does not need to return back to DWARF after it is parsed - but then for LLDB the multiple-sections dispatching could be done only for user_id_t

Why not have get_debug_types_data_extractor() instead?

Because DataExtractor accesses always start at offset 0.

BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?

It would fall back to the section-reading (instead of section-mmapping) less performing fallback in get_debug_info_data(). If there are any such files out there sure the code could be extended to handle both orders of .debug_info and .debug_types but personally I do not think it is worth the code complication given there is the fallback anyway. I can put a warning there for such case so that it would be easily noticed in the future - but then maybe one could already rather extend the code, any suggestions what to choose?

I am still worried about the divergence from llvm's dwarf reader here.

I am still going to investigate your DW_FORM_* dispatching suggestion. It looks too simple to work, maybe it is the right way forward.

Thank you for doing that. Please don't feel like I'm taking this all out to you, I know that you are following the direction that has already been set out. My comment is directed at everyone who has interest in the state of lldb's dwarf parser.

And I am fully aware that my proposal is a gross oversimplification, and that doing that might be hard, but I believe the end goal is worth it, and that we can reach it incrementally.

One of the problems we'll probably have to tackle is the encoding of the user_id_t, which is currently very wastefull. Despite it being 64 bits, we can only encode 4GB of debug info. That's still a fairly large number, but it's still not that hard to surpass it. If we slice one bit out of that, we'll get 2GB, which will be just too small. So we'll either have to increase the size of that type, use a more rational encoding (we already have compile units and dies sitting in a vector, so maybe index into that vector?), or use some concatenation scheme just for the purposes of user_id_t, but then bail out and decompose that very early on.

Why not have get_debug_types_data_extractor() instead?

Because DataExtractor accesses always start at offset 0.

Yes, that's part of the idea. :) Starting at zero is the right thing to do here, as that's the base any intra-section references will be using. (and for inter-section refs, you have to manually fiddle with the offsets anyway)

BTW, I don't see a get_debug_info_offset(). AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?

It would fall back to the section-reading (instead of section-mmapping) less performing fallback in get_debug_info_data(). If there are any such files out there sure the code could be extended to handle both orders of .debug_info and .debug_types but personally I do not think it is worth the code complication given there is the fallback anyway. I can put a warning there for such case so that it would be easily noticed in the future - but then maybe one could already rather extend the code, any suggestions what to choose?

Hm... ok, I'm glad you've thought about all of the possibilities. Though I would say that this just show that we have an incorrect abstraction here. :)

Of the two options I would say only adding the warning makes sense, since I believe making everything aware of the fact that .debug_info can start at a non-zero offset is equivalent to teaching it that it can start in a different section.

jankratochvil planned changes to this revision.Mar 11 2019, 12:13 PM
jankratochvil abandoned this revision.Jun 27 2019, 3:53 AM