This is an archive of the discontinued LLVM Phabricator instance.

02/03: Contiguous sections (.debug_info+.debug_types) for D54670==D32167 (.debug_types)
AbandonedPublic

Authored by jankratochvil on Sep 2 2018, 1:29 PM.

Details

Summary

A different trivial approach is to use the mmapped memory range <.debug_info ... garbage ... .debug_types>, then no dispatching is needed and it has no performance impact.
With this patch one could revert the new GetData() from D46606 but the code may be cleaner with it anyway. It is just no longer needed to be virtual so this patch removes its virtuality.
The core is new SymbolFileDWARF::get_debug_info_data().

It has no regressions.

That new class and inheritance of SectionReader (ObjectFileELF::SectionReaderCompressed, ObjectFileJIT::SectionReaderJIT) copying inheritance of ObjectFile' is there because SectionReader` may have local state (such as Decompressor) which is needed for each Section. Additionally ObjectFileELF may contain both uncompressed and compressed sections so it can choose the proper class for instantiation.

Diff Detail

Repository
rLLDB LLDB

Event Timeline

jankratochvil created this revision.Sep 2 2018, 1:29 PM

Overall I am ok with minimal regression in speed if a few percent is all that it is costing us. I am generally ok with this patch. A few questions below.

Is there a reason we need DWARFConcatenatingDataExtractor? Can we just put all functionality into DWARFDataExtractor? Then the only place we need to modify if the place that parses the old CU and the type units from .debug_types.

Also is there a reason we need to pass the block data in code such as:

const DWARFDataExtractor &debug_info_data = die.GetData().GetDWARFDataExtractor(form_value.BlockData());
jankratochvil added a comment.EditedSep 13 2018, 8:55 AM

Is there a reason we need DWARFConcatenatingDataExtractor? Can we just put all functionality into DWARFDataExtractor?

OK, I will try it that way.

Also is there a reason we need to pass the block data in code such as:

const DWARFDataExtractor &debug_info_data = die.GetData().GetDWARFDataExtractor(form_value.BlockData());

EDIT: This case cannot be solved transparently as DWARFConcatenatingDataExtractor (no matter how it is called) has discontiguous data. Existing code does not expect that:

uint32_t block_offset = form_value.BlockData() - debug_info_data.GetDataStart();

What should be GetDataStart() for a discontiguous set of memory data areas?
Still I will try to refactor it named DWARFDataExtractor but this specific change may need to remain there.

I am sorry I will be now for 2 weeks away.

Patch concat = original posting with DWARFConcatenatingDataExtractor used only for .debug_info(+.debug_types) section. download
Patch concat2 = used DWARFDataExtractor with integrated concatenation for all DWARF sections. download

master  real=0.996s user=14.744s sys=1.050s
concat  real=0.994s user=15.524s sys=1.117s
concat2 real=1.001s user=15.580s sys=1.334s

concat vs. concat2 benchmark difference was mostly a measurement error. It is now 5% slowdown compared to trunk - even for former concat which I do not understand now, I will investigate it more as my original measured slowdown was just 1% before.
concat2 sometimes needs to cast DWARFDataExtractor to DataExtractor by .AsDataExtractor() as some functions expected a contiguous block of memory. That is a method applicable only for non-.debug_info sections (it would assert on .debug_info). It needs to be called as DWARFDataExtractor can no longer inherit DataExtractor. I did not use operator DataExtractor & as it could accidentally by applied to .debug_info which may be difficult to catch when not tested with -fdebug-types-section.
DataExtractorConcat is a separate class but it is never used separately without DWARFDataExtractor inheriting it.
Greg, do you mean it somehow way or do I miss something? Personally I may like a bit more concat but then the patch size is almost the same for both.

I got an idea to try making the .debug_info+.debug_types memory contiguous. It cannot be mmapped then but LLDB does not seem to do that now anyway (except maybe MachO but despite some comment I haven't found a mmap() even there). Besides that there could be even .debug_info+gap+.debug_types layout.

jankratochvil updated this revision to Diff 172549.
jankratochvil retitled this revision from DWARFConcatenatingDataExtractor for D32167 to Contiguous .debug_info+.debug_types for D32167.
jankratochvil edited the summary of this revision. (Show Details)

Just a few nits, but _very_ close.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
65 ↗(On Diff #172549)

Use llvm::DenseMap here?

source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
786

Should this just be "if (GetUnitDIEOnly().Tag() != DW_TAG_type_unit)"? Partial units and many others can be top level DIEs.

jankratochvil edited the summary of this revision. (Show Details)
jankratochvil marked 2 inline comments as done.Nov 10 2018, 3:22 PM

There is a _dwarf -> _dwarf_type_units regression for: -f TestCppTypeLookup.test_namespace_only_dwarf_type_units

./bin/clang -o 1 ../llvm-git/tools/lldb/packages/Python/lldbsuite/test/lang/cpp/type_lookup/main.cpp -g -fdebug-types-section;./bin/lldb -o 'settings set target.experimental.inject-local-vars false' -o 'target create "./1"' -o 'b 66' -o r -o 'p *((in_contains_type *)&i)'

Expected:

(lldb) p *((in_contains_type *)&i)
error: use of undeclared identifier 'in_contains_type'
error: expected expression

Actual:

(lldb) p *((in_contains_type *)&i)
(in_contains_type) $0 = (aaa = 123)

I will check it more.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
65 ↗(On Diff #172549)

Yes, I agree that is a good idea - changed it (BTW this is your part of the patch).

source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
786

When you ask then I disagree. Currently GetAsCompileUnit() is used only for checking DW_TAG_partial_unit::DW_AT_name (from D11003) which is never present in files by DWZ tool. DWZ tool never puts DIEs referencing any code into DW_TAG_partial_unit so D11003 should not be needed; although DWZ intentionally sometimes uses DW_TAG_partial_unit::DW_AT_stmt_list but I haven't found when/how it could be useful.
> many others
Top level DIEs can be according to DWARF-5 7.5 just DW_TAG_compile_unit, DW_TAG_partial_unit or DW_TAG_type_unit.
So personally I would keep GetAsCompileUnit() to do what its name exactly says.

jankratochvil marked 2 inline comments as done.
clayborg accepted this revision.Nov 12 2018, 9:56 AM
clayborg added inline comments.
source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
65 ↗(On Diff #173535)

yeah, I kept getting people commenting on my patches that told me to use DenseMap and StringMap!

This revision is now accepted and ready to land.Nov 12 2018, 9:56 AM
jankratochvil edited the summary of this revision. (Show Details)

I have left here only the sections concatenation. The .debug_types itself by Greg Clayton is now updated on top of this patch in D32167.

Updated against trunk due to new getSectionType from rL348936.

This update fixes a regression after D52403, the code is simplified - eSectionTypeDWARFDebugInfo is no longer hardcoded in this patch.

OK for check-in with this new code?

clayborg added inline comments.Feb 4 2019, 10:04 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
693–707 ↗(On Diff #184919)

So if we have 3GB of .debug_info and 1GB of .debug_types, we are expecting to allocate a heap based buffer and copy all the data into this? This will fail.

jankratochvil retitled this revision from Contiguous .debug_info+.debug_types for D32167 to 02/03: Contiguous sections (.debug_info+.debug_types) for D54670==D32167 (.debug_types).Feb 17 2019, 11:16 AM
jankratochvil edited the summary of this revision. (Show Details)

So if we have 3GB of .debug_info and 1GB of .debug_types, we are expecting to allocate a heap based buffer and copy all the data into this? This will fail.

That fallback code path is used only when there is no way to access .debug_info and .debug_types from mmapped area. I was thinking that fallback would be used for example for Linux vDSO which needs to be read by Process::ReadModuleFromMemory() but that has only about 16KB so its size does not matter. But maybe on OSX (or for GDB JIT modules) the data can be bigger so in this patch I have implemented the most effective merging of non-mmapped sections.

jankratochvil marked 2 inline comments as done.Feb 17 2019, 11:29 AM
jankratochvil added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
693–707 ↗(On Diff #184919)
jankratochvil requested review of this revision.Feb 18 2019, 1:09 AM
jankratochvil edited the summary of this revision. (Show Details)
jankratochvil marked an inline comment as done.
clayborg accepted this revision.Feb 19 2019, 12:58 PM
This revision is now accepted and ready to land.Feb 19 2019, 12:58 PM
JDevlieghere requested changes to this revision.Feb 19 2019, 2:21 PM

Few comments inline.

lldb/include/lldb/Symbol/ObjectFile.h
781 ↗(On Diff #187169)

Please clang-format the patch before landing this.

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
3400 ↗(On Diff #187169)

Rather than returning 0 when there's no decompressor, why not return llvm::Optional<uint64_t> to differentiate between with an actual size of zero?

3409 ↗(On Diff #187169)

I don't think having an expected as a member makes sense. Maybe you want to use an optional? Is the goal to abstract over the decompression not being available? Or should the class assume the the decompressor is there (and itself be wrapped in an expected when that's not the case?). I'm not too familiar with this code but it seems a rather odd design.

3413 ↗(On Diff #187169)

Trailing underscore?

3467 ↗(On Diff #187169)

You could early return here (return 0;).

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
684 ↗(On Diff #187169)

Can we split this up? There's no way I remember the first two clauses by the time I'm reading the third.

This revision now requires changes to proceed.Feb 19 2019, 2:21 PM
jankratochvil abandoned this revision.Jun 27 2019, 3:52 AM