This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Support multiple compile units per OSO entry in SymbolFileDWARFDebugMap
AbandonedPublic

Authored by sgraenitz on Sep 21 2018, 12:58 PM.

Details

Summary

So far SymbolFileDWARFDebugMap did only parse one compile unit per OSO entry. Thus, LLDB failed to resolve breakpoint locations in LTO objects with embedded DWARF.

This is a first minimal patch, that primairily fixes my bug. The central issues are:

  • SymbolFileDWARFDebugMap::InitOSO() -- For each OSO entry we need to parse all compile units and store their individual offsets.
  • SymbolFileDWARF::ParseCompileUnit() -- Parse the compile unit that matches the given dwarf_cu. At this point there is no context information except offset, so use offset mapping from InitOSO(). (The cu_idx parameter is the default from SymbolFileDWARF::GetCompUnitForDWARFCompUnit() in this execution path.)
  • DWARFUnit::BuildAddressRangeTable() -- Subsequent TAG_compile_unit's have no AT_ranges. Passing true for check_hi_lo_pc enables fallback to construct the range from AT_low/high_pc. It is not fully clear to me whether the missing tags are expected and the fallback reliable.
  • SymbolFileDWARFDebugMap::HasCompileUnits() -- New function with only the subset of InitOSO() required for CalculateAbilities(). Full init should only run once we're sure this is the right plugin.

Missing in the first patch:

  • tests that reproduce the issue
  • cleanup of dead code I found
  • refactorings for involved code and data structures

FIXMEs indicate further changes I consider necessary, but I am quite new to the code base, so please don't hesitate to point out what I failed to see.

A part of this was shortly discussed on the mailing list:
http://lists.llvm.org/pipermail/lldb-dev/2018-September/014160.html

Event Timeline

sgraenitz created this revision.Sep 21 2018, 12:58 PM
sgraenitz added a subscriber: Restricted Project.Sep 22 2018, 5:20 AM
sgraenitz removed a subscriber: JDevlieghere.
JDevlieghere added inline comments.Sep 24 2018, 3:40 AM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
438–439

Please give this bool a name, either by assigning it to a named variable or by prefixing it with a comment like /* check_hi_lo_pc */. I think the LLDB code base prefer the former?

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
786

What is "it" here?

800

You can use llvm_unreachable here.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
1338

Yes, otherwise we have invalid DWARF, which I don't think it's worth supporting something like that here.

sgraenitz updated this revision to Diff 166698.Sep 24 2018, 8:51 AM

Address Jonas' feedback + minor additions

sgraenitz marked 4 inline comments as done.Sep 24 2018, 8:51 AM
sgraenitz updated this revision to Diff 166699.Sep 24 2018, 8:55 AM

Address Jonas' feedback #2

sgraenitz edited the summary of this revision. (Show Details)Sep 24 2018, 9:40 AM
sgraenitz added inline comments.
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
438–439

@JDevlieghere Thanks for your feedback!

@everyone:
This parameter is, actually, one of the key questions in this review: If there is no DW_AT_range attribute for the compile unit, can we safely fall back to DW_AT_low/high_pc?

clayborg added inline comments.Sep 24 2018, 2:36 PM
source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
298–322
static constexpr uint32_t kOSOSymbolFlags = 0x660001u;
328

kOSOSymbolFlags

source/Symbol/Symtab.cpp
512–520

This function is a bit specific. Might be nice to have a method like:

void Symtab::ForEachSymbolWithType(
  lldb::SymbolType symbol_type, 
  llvm::function_ref<bool(const Symbol*)> lambda) const;

The lambda returns false, then stop iterating. Else keep going. See ModuleList::ForEach as an example. Then the code that was calling HasSymbolWithTypeAndFlags would do something like:

symtab->ForEachSymbolWithType(eSymbolTypeObjectFile, [&this->m_has_compile_unit_infos](Symbol *symbol)->bool {
  if (symbol->GetFlags() == kOSOSymbolFlags) {
    m_has_compile_unit_infos = true;
    return false; // Stop iterating
  }
  return true; // Keep iterating
});
clayborg added inline comments.Sep 24 2018, 3:11 PM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
438–439

I have seen DWARF where the DW_AT_ranges is incorrect and I have seen cases where the low pc is just set to zero and the high pc is set to the max address. So you might have many overlapping ranges between different CUs in the latter case. We might need to be more vigilant with high/low pc values though and that was the reason that we previously ignored high/low pc values. If the stuff is missing, it should index the DWARF and generate the address ranges table manually right? What was the reasoning behind this change?

sgraenitz updated this revision to Diff 166985.Sep 25 2018, 1:47 PM

Revert check_hi_lo_pc and comment out call to AddOSOARanges() for now

sgraenitz added inline comments.Sep 25 2018, 1:59 PM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
438–439

I had a closer look at the individual fallbacks and you are right. After fixing SymbolFileDWARF::ParseCompileUnit() we have the correct CompileUnit pointers in DWARFUnit::m_user_data for all CUs. Thus, the below die->BuildAddressRangeTable(dwarf, this, debug_aranges); now correctly constructs the ranges! Great, with this we can keep check_hi_lo_pc=false.

I didn't notice this while testing, because of the following issue that kept my tests failing: 2 of my LTO object's CUs have no code remaining after optimization, so we step into the next fallback assuming 'a line tables only situation' and eventually call AddOSOARanges(), which adds to debug_aranges all entries of the CU's FileRangeMap with a zero offset.

Thus, my debug_aranges had 28000 entries too much and most requests would continue to return 0 as before. For now I commented out the call below.

What do you think is a good way to avoid this for empty CUs?
IIUC they should have no DW_AT_low/high_pc right? :-D

clayborg added inline comments.Sep 25 2018, 3:16 PM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
425

Following from inline comment below, you can abort if this CU is empty:

// If this DWARF unit has no children, we have no address ranges to add. If
// something is built with line tables only, there will still be function DIEs with
// address ranges as children.
if (!die->HasChildren())
  return;
439

Empty CUs won't have any children. Easy to skip by getting the CU die and doing:

if (!die->HasChildren())
488–489

revert if the "if (!die->hasChildren())" trick works.

sgraenitz updated this revision to Diff 167109.Sep 26 2018, 4:56 AM
sgraenitz marked an inline comment as done.

Skip AddOSOARanges() if the OSO debug map entry has multiple compile units.

sgraenitz added inline comments.Sep 26 2018, 4:58 AM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
439

Sorry if that was too unspecific. With "no code left" I meant it has no instructions. The DIE does have child DIEs for declarations (e.g. TAG_namespace, TAG_enumeration_type, TAG_imported_declaration), but nothing that would form a range.

I could traverse the DWARF and filter for DIEs with instructions (e.g. TAG_subprogram), but it seems pretty much what die->BuildAddressRangeTable() does already.

488–489

revert if the "if (!die->hasChildren())" trick works.

As noted above, unfortunately, it doesn't. The exploration below got quite verbose.

It looks like a conceptual clash to me: For OSO debug map entries with multiple CUs, we can't map between symtab entries and CUs. Thus, each CU's FileRangeMap will be populated from the entire symtab range of that OSO debug map entry and the call to AddOSOARanges() here will add all of that to debug_aranges for each CU again.

We should have one FileRangeMap per OSO debug map entry and not one per CU.

Even if I fixed AddOSOARanges() to write the current CU offset instead of dwarf2Data->GetID() and FileRangeMap existed per OSO debug map entry (or the huge number of duplicates was acceptable), it would still add entries from other CUs of that OSO debug map entry. This cannot work with the current approach in SymbolFileDWARF::ResolveSymbolContext() because we will have #CUs candidates here, but we only look for a single result:

const dw_offset_t cu_offset =
    debug_info->GetCompileUnitAranges().FindAddress(file_vm_addr);

In order to keep the current AddOSOARanges() approach and support multiple CUs per OSO debug map entry, SymbolFileDWARF::ResolveSymbolContext() must be changed to test all candidate CUs. While these changes may be important and the code may really benefit from a cleanup, it goes far beyond the scope of this bug fix.

The only reasonable alternative I see is to skip AddOSOARanges() for OSO debug map entries with multiple CUs.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
1530

Skipping AddOSOARanges() here.

vsk added a subscriber: vsk.Sep 26 2018, 9:13 AM
vsk added inline comments.
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
438–439

+1 to @clayborg's suggestion here that we either use AT_ranges, or build up the ranges by parsing the low/high pc's in subprograms. ISTM that it'd be acceptable to implement the fallback path as a follow-up, provided that doing so isn't a regression.

489

Could you leave an in-source comment explaining why this is commented out?

sgraenitz added inline comments.Sep 26 2018, 11:45 AM
source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
1530

Could you leave an in-source comment explaining why this is commented out?

@vsk AddOSOARanges() was commented out only temporarily. I reverted it in the latest patch and added these lines with a comment. Is that what you were asking for?

vsk added inline comments.Sep 26 2018, 12:27 PM
source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
1530

Ah sure sure, sorry, I missed the broader context of the change.

sgraenitz added inline comments.Sep 26 2018, 2:37 PM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
468

@clayborg Any idea how to reproduce this code path? I expected test_lt as below would do, but debug_aranges is read from DIE even there. Is it (only) for backwards compatibility?

clang -c -gline-tables-only -o main_lt.o main.c
clang -c -gline-tables-only -o cu1_lt.o cu1.c
clang -gline-tables-only main_lt.o cu1_lt.o -o test_lt
source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
1530

No worries, thanks for having a look.

sgraenitz updated this revision to Diff 167335.Sep 27 2018, 8:48 AM

Add test for breakpoint resolution in DWARF debug map, exercising different LTO and line-tables-only combinations

clayborg added inline comments.Sep 27 2018, 9:29 AM
source/Symbol/Symtab.cpp
520

This function is still pretty specific. Any comments on my above inlined comment?

sgraenitz added inline comments.Sep 27 2018, 10:46 AM
source/Symbol/Symtab.cpp
520

Yes absolutely, but the same holds for most functions around here. This one is just a stripped-down version of the above AppendSymbolIndexesWithTypeAndFlagsValue(). Both of them are used exactly once (in CalculateAbilities() and InitOSO() respectively).

Of course this is all overly specific. Instead it should be HasSymbol(Predicate) and AppendSymbol(Predicate, Collection&), but I don't think we gain anything here by making one of them generic.

My fix is very loosely connected to Symtab at all. IMHO it's preferable to keep the existing practice here. This code should be improved consistently in a refactoring commit with no semantic changes.

clayborg accepted this revision.Sep 27 2018, 11:00 AM

Looks good then as long as all tests pass on Darwin.

This revision is now accepted and ready to land.Sep 27 2018, 11:00 AM
sgraenitz added inline comments.Sep 27 2018, 11:22 AM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
468

Found repros in the test suite. Some _dwarf and _gmodule variants seem to be failing due to my fix. Will investigate and post changes if necessary. Thanks for the review.

aprantl added inline comments.Sep 27 2018, 12:35 PM
packages/Python/lldbsuite/test/macosx/debug_map/Makefile
13 ↗(On Diff #167335)

This is a noop. You can remove it without affecting anything.

22 ↗(On Diff #167335)

same here, just remove it

24 ↗(On Diff #167335)

If you do this, then you should also the NO_DEBUG_INFO_TESTCASE=True otherwise you're going to build dwarf and dsym variants of the test and they will be identical.

25 ↗(On Diff #167335)

this was for debugging? Also remove it

30 ↗(On Diff #167335)

$(CC) -c $(CFLAGS) $(CFLAGS_MAIN) -o $@ $^

-g should already be part of CFLAGS, no?

42 ↗(On Diff #167335)

$(LD) $^ -L. $(LDFLAGS) -o a.out

or even shorter, remove the entire line and just declare the dependencies, because the default rule from Makefile.rules will kick in then.

dexonsmith added inline comments.Sep 27 2018, 2:33 PM
packages/Python/lldbsuite/test/macosx/debug_map/Makefile
22 ↗(On Diff #167335)

I haven't read the full patch to see if this is relevant, but my maybe-out-of-date-Makefile-fu says this is not no-op. It defines C_SOURCES as a variable, as opposed to a function-like macro (which would be C_SOURCES =, no colon). IIRC, with the former, any changes to C_SOURCES (like C_SOURCES += ...) are evaluated immediately when it's a variable, lazily (on use) when it's a macro. Usually the behaviour you want is a variable.

aprantl added inline comments.Sep 27 2018, 3:24 PM
packages/Python/lldbsuite/test/macosx/debug_map/Makefile
22 ↗(On Diff #167335)

What Duncan says is correct. For this particular Makefile though, that difference shouldn't matter and it would be better to keep the file short and simple.

sgraenitz updated this revision to Diff 167593.Sep 29 2018, 5:16 AM
sgraenitz marked 8 inline comments as done.

Address feedback from Adrian and Duncan

packages/Python/lldbsuite/test/macosx/debug_map/Makefile
13 ↗(On Diff #167335)

It was meant to indicate that the python script can override these parameters, but you are right. It's probably more confusing than helpful. Above comment should be enough.

30 ↗(On Diff #167335)

-g should already be part of CFLAGS, no?

Yes, right.

$(CC) -c $(CFLAGS) $(CFLAGS_MAIN) -o $@ $^

Actually, I'd prefer my version with -o main.o $(SRCDIR)/main.c. It's more verbose, but clearly states what happens here. $@ $^ is hard-to-read Makefile magic (to me).

42 ↗(On Diff #167335)

Not sure why, but removing the line causes a lot of compile errors, like
clang-8: error: no input files and
Undefined symbols for architecture x86_64: "_main", ...

I guess I had to declare my OBJECTS then, but it's cleared in Makefile.rules. In the end, it's one line more, but it's not unreasonable right? Maybe I could just keep it.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
400

It turned out, that adding the actual number of compile units to this vector, has a number of scary side effects, which fail tests on the other side of LLDB. I had a closer look at one issue (parsing variables in stack frames for expression evaluation). I could fix this one, but it needs extra code changes. I can imagine these to introduce new silent failures, etc.

I will try again to get the patch more conservative, but it looks more and more like this needs a comprehensive refactoring.

aprantl added inline comments.Sep 29 2018, 9:26 AM
packages/Python/lldbsuite/test/macosx/debug_map/Makefile
30 ↗(On Diff #167335)

One benefit of $^ over spelling the filename out is that it automatically works with the VPATH environment those Makefiles are run in by dotest:

main.o: main.c
    $(CC) -c main.c -o main.o # doesn't work, you need to say $(SRCDIR)/main.c

SRCDIR is a Makefile.rules implementation detail that you need to know about to make sense of the rule, whereas $^ is a automatic variable (https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html) whose meaning is understandable to everyone familiar with make.

I understand that it looks more cryptic at first, but it keeps the rules shorter and less error-prone in the environment the Makefile is being evaluated in.

sgraenitz abandoned this revision.May 9 2019, 2:35 AM