This is an archive of the discontinued LLVM Phabricator instance.

Add split dwarf support to SymbolFileDWARF
ClosedPublic

Authored by tberghammer on Aug 24 2015, 10:08 AM.

Details

Summary

Add basic fission support to SymbolFileDWARF

  • Create new dwo symbol file class
  • Add handling for .dwo sections
  • Change indexes in SymbolFileDWARF to store compile unit offset next to DIE offset
  • Propagate queries from dwarf compile unit to the dwo compile unit where applicable

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Add split dwarf support to SymbolFileDWARF.
tberghammer updated this object.
tberghammer added reviewers: labath, clayborg.
tberghammer added a subscriber: lldb-commits.
clayborg requested changes to this revision.Aug 24 2015, 1:40 PM
clayborg edited edge metadata.

I see the need for a lot of this, but I feel like there are way too many places where we do this:

SymbolFileDWARFDwo* dwo_symbol_file = dwarf_cu->GetDwoSymbolFile();
if (dwo_symbol_file)
    return dwo_symbol_file->XXXX (...);

I would like us to more cleanly abstract ourselves from SymbolFileDWARFDwo. Ideally we wouldn't even see "SymbolFileDWARFDwo" anywhere inside SymbolFileDWARF because it has been abstracted behind DWARFCompileUnit and DWARFDebugInfoEntry and any other low level classes that need to know about them.

So I would like to see if the following is possible:

  • DWARFCompileUnit should return the DIEs from the DWO file when DWARFCompileUnit::GetCompileUnitDIEOnly() or DWARFCompileUnit::DIE() is called.
  • Any code that was doing the pattern from above that was calling dwarf_cu->GetDwoSymbolFile() and deferring to the DWO symbol file should just work normally if the correct DIEs are being handed out.
  • All references to SymbolFileDWARFDwo should be removed from SymbolFileDWARF if we abstract things correctly.
source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.h
16 ↗(On Diff #32972)

This shouldn't be needed here, just a forward declaration for SymbolFileDWARFDwo should do and then include this in DWARFCompileUnit.cpp. No one outside of this class should need to know anything about SymbolFileDWARFDwo. It should be an implementation detail that only DWARFCompileUnit and DWARFDebugInfoEntry need to know about.

208–214 ↗(On Diff #32972)

Make protected and hide this from anyone but DWARFCompileUnit and DWARFDebugInfoEntry.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
330 ↗(On Diff #32972)

I don't think the cast is needed anymore now that "cu" isn't const.

1437–1438 ↗(On Diff #32972)

GetAttributeValue(...) should just "do the right thing" and look into the DWO file if needed. Then this change isn't needed.

1458–1459 ↗(On Diff #32972)

GetAttributeValue(...) should just "do the right thing" and look into the DWO file if needed. Then this change isn't needed.

1480–1481 ↗(On Diff #32972)

GetAttributeValue(...) should just "do the right thing" and look into the DWO file if needed. Then this change isn't needed.

1494–1515 ↗(On Diff #32972)

GetAttributeValue(...) should just "do the right thing" and look into the DWO file if needed. This code will change a bit after that, but the check for the DWO file shouldn't be needed.

1535–1536 ↗(On Diff #32972)

GetAttributeValue(...) should just "do the right thing" and look into the DWO file if needed. Then this change isn't needed.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3997–3999 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
66–67 ↗(On Diff #32972)

Hopefully we don't need this if we abstract things correctly so that the DWARFCompileUnit hands out the correct DIEs from the DWO file and SymbolFileDWARF shouldn't need to know anything about DWO files.

419–423 ↗(On Diff #32972)

Hopefully we don't need this if we abstract things correctly so that the DWARFCompileUnit hands out the correct DIEs from the DWO file and SymbolFileDWARF shouldn't need to know anything about DWO files.

478 ↗(On Diff #32972)

Hopefully we don't need this if we abstract things correctly so that the DWARFCompileUnit hands out the correct DIEs from the DWO file and SymbolFileDWARF shouldn't need to know anything about DWO files.

source/Symbol/ObjectFile.cpp
604–625 ↗(On Diff #32972)

Can you explain why this is needed?

clayborg added inline comments.Aug 24 2015, 1:40 PM
include/lldb/Symbol/ObjectFile.h
372 ↗(On Diff #32972)

Why is this needed? Can you clarify. I don't like this change and would rather avoid it if possible.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
806–818 ↗(On Diff #32972)

Shouldn't this be:

if (form_value.Form() == DW_FORM_addr || form_value.Form() == DW_FORM_GNU_addr_index)
    hi_pc = form_value.Address(dwarf2Data);
else
    hi_pc = form_value.Unsigned();
if (hi_pc != LLDB_INVALID_ADDRESS)
{
    if (lo_pc == LLDB_INVALID_ADDRESS)
        do_offset = hi_pc != LLDB_INVALID_ADDRESS;
    else
        hi_pc += lo_pc; // DWARF 4 introduces <offset-from-lo-pc> to save on relocations
}
source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
371–384 ↗(On Diff #32972)

Hopefully we don't need this if we abstract things correctly so that the DWARFCompileUnit hands out the correct DIEs from the DWO file and SymbolFileDWARF shouldn't need to know anything about DWO files.

404–417 ↗(On Diff #32972)

Hopefully we don't need this if we abstract things correctly so that the DWARFCompileUnit hands out the correct DIEs from the DWO file and SymbolFileDWARF shouldn't need to know anything about DWO files.

1069–1074 ↗(On Diff #32972)

Shouldn't:

const DWARFDebugInfoEntry* cu_die = dwarf_cu->GetCompileUnitDIEOnly ();

already return the right cu_die? Should clients of DWARFCompileUnit be required to do this kind of very specific check? I really want this to be hidden from us by abstracting this kind of thing inside DWARFCompileUnit.

1157–1159 ↗(On Diff #32972)

Why is this necessary? If we hand out the correct DIE from the DWO file in the first place, this should just work with no modifications right?

1626–1631 ↗(On Diff #32972)

It would be nice if this weren't linear. I think we have the same problem int SymbolFileDWARFDebugMap, but not being linear would be nice at some point.

1922–1925 ↗(On Diff #32972)

It would be nice to abstract this behind DWARFCompileUnit. Below we end up calling dwarf_cu->LookupAddress(...). This seems like a better place to do this kind of thing...

2281–2286 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

2405–2410 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3017–3023 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3059–3061 ↗(On Diff #32972)

If .debug_info wasn't there, then SymbolFileDWARF wouldn't have been instantiated. Is this truly needed? Did something change when DWO files are now used?

3086–3087 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3149–3155 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3265–3270 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3723–3728 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3909–3911 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3934–3936 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

3975–3978 ↗(On Diff #32972)

Not sure why this is needed? If each DWARFCompileUnit hands out the correct DIEs, then Index() will point us to the right things and this shouldn't be needed.

This revision now requires changes to proceed.Aug 24 2015, 1:40 PM

In the current version of the patch the compile units in the main object file hands out only the compile unit DIE with the information what is available in the main object file. I considered the other approach (hand out all DIEs by the DWARF compile unit in the main object file) but I dropped it for the following reasons:

  • If we hand out DIEs from a dwo symbol file then each DIE have to store a pointer to the symbol file (or compile unit) it belongs to what is a significant memory overhead (we have more DIEs then Symbols) if we ever want to store all DIE in memory. Even worse is that we have to change all name to DIE index to contain a pointer (compile unit / dwo symbol file) and an offset to find the DIE belongs to (compared to just an offset now) what is more entry then the number DIEs.
  • In an average debug session run from an IDE the user usually sets the breakpoints based on file name + line number, display some stack traces, some variables and do some stepping. If we can index each dwo file separately then we can handle all of these features without parsing the full debug info what can give us some significant speed benefits at debugger startup time.
source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.h
208–214 ↗(On Diff #32972)

I don't really like friend classes and have the opinion that inside a plugin we can let the visibility a bit more open, but don't feel to strongly about it. I can change it if zou prefer that way.

source/Symbol/ObjectFile.cpp
604–625 ↗(On Diff #32972)

We create all of the dwo object files with the same module what we used for the executable object file (belongs to them). Because of it we have several section with the same section name (several dwo file + object file) what can't be stored inside a single module.

This check is to disable adding the sections in the dwo object file into the modules section list because they will be fetched by SymbolFileDWARFDwo::GetCachedSectionData what handles it correctly.

abidh added a subscriber: abidh.Aug 25 2015, 2:35 AM
abidh added inline comments.Aug 25 2015, 2:49 AM
source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp
348 ↗(On Diff #32972)

You are not using the DW_AT_GNU_dwo_id. Is this intentional or an oversight?

tberghammer edited edge metadata.
tberghammer marked 10 inline comments as done.

Fix the minor refactors requested in the review.

I haven't changed the approach to return all DIEs from the dwo file when indexing the main compile unit because I would like to hear your opinion about my concerns (see previous comment) about speed and memory usage first.

I also question why Symbo

In the current version of the patch the compile units in the main object file hands out only the compile unit DIE with the information what is available in the main object file. I considered the other approach (hand out all DIEs by the DWARF compile unit in the main object file) but I dropped it for the following reasons:

  • If we hand out DIEs from a dwo symbol file then each DIE have to store a pointer to the symbol file (or compile unit) it belongs to what is a significant memory overhead (we have more DIEs then Symbols) if we ever want to store all DIE in memory. Even worse is that we have to change all name to DIE index to contain a pointer (compile unit / dwo symbol file) and an offset to find the DIE belongs to (compared to just an offset now) what is more entry then the number DIEs.

Can't we just store the SymbolFile inside the compile unit? We always need the compile unit to really do anything with a DIE anyway. We currently store the DWO file inside the compile unit so it seems that we could just store it once in the compile unit and avoid any extra cost.

  • In an average debug session run from an IDE the user usually sets the breakpoints based on file name + line number, display some stack traces, some variables and do some stepping. If we can index each dwo file separately then we can handle all of these features without parsing the full debug info what can give us some significant speed benefits at debugger startup time.

I don't see how we can ever just index one DWO file? If we index one, we must index them all within an executable otherwise the index will be incomplete. If you set a file + line breakpoint, you can't rely on the file matching the compile unit because there could be inlined functions so you would always need to index all of them. Likewise with setting a breakpoint by function name, you will need to index all DWO files.

Maybe we can:

  • Have a new class that we hand out for a DIE, maybe named DWARFDIE that contains:
class DWARFDIE
{
    DWARFCompileUnit *m_cu;
    DWARFDebugInfoEntry *m_die;
};

Then change all of the places we currently use a "DWARFCompileUnit *cu, DWARFDebugInfoEntry* die" (we always pass them around together) to use a DWARFDIE instead. This allows us to store the DIEs efficiently, yet pass them around in a slightly larger container for usage. This would allow our memory usage to stay very close to where it is (an extra pointer in the compile unit).

Then we modify DWARFCompileUnit to store the "SymbolFileDWARF *" that the compile unit comes from. We can still store the DWO file in the compile unit as well as you are already doing, we would just need to add a "SymbolFileDWARF *m_dwarf;" member variable for the non DWO case (and also for digging up the DW_TAG_compile_unit attributes that aren't in the DWO DW_TAG_compile_unit).

Then we just make sure that all code that hands out DIEs actually hands out DWARFDIE instances instead of returning a "DWARFDebugInfoEntry *" and also having an out parameter that fills in the compile unit.

Thoughts?

I also question why Symbo

In the current version of the patch the compile units in the main object file hands out only the compile unit DIE with the information what is available in the main object file. I considered the other approach (hand out all DIEs by the DWARF compile unit in the main object file) but I dropped it for the following reasons:

  • If we hand out DIEs from a dwo symbol file then each DIE have to store a pointer to the symbol file (or compile unit) it belongs to what is a significant memory overhead (we have more DIEs then Symbols) if we ever want to store all DIE in memory. Even worse is that we have to change all name to DIE index to contain a pointer (compile unit / dwo symbol file) and an offset to find the DIE belongs to (compared to just an offset now) what is more entry then the number DIEs.

Can't we just store the SymbolFile inside the compile unit? We always need the compile unit to really do anything with a DIE anyway. We currently store the DWO file inside the compile unit so it seems that we could just store it once in the compile unit and avoid any extra cost.

In the name to DIE indexes (in SymbolFileDWARF) currently we store only a DIE offset and we find the compile unit based on the fact that the DIE should be inside the range of the compile unit. The compile units leave in the dwo symbol files all start at address 0 so just a DIE offset isn't enough to find the compile unit. We can store the offset in 4 byte (we already do it, but I am not sure it is a good idea) and the compile unit index in another 4 byte what isn't a major overhead, but it can matter for large inferiors. Storing the symbol file in the DIE might be avoidable but then the DIE have to ask the compile unit for the correct symbol file when somebody queries it for an attribute (we don't want the caller of the GetAttribute* function to know about dwo files).

  • In an average debug session run from an IDE the user usually sets the breakpoints based on file name + line number, display some stack traces, some variables and do some stepping. If we can index each dwo file separately then we can handle all of these features without parsing the full debug info what can give us some significant speed benefits at debugger startup time.

I don't see how we can ever just index one DWO file? If we index one, we must index them all within an executable otherwise the index will be incomplete. If you set a file + line breakpoint, you can't rely on the file matching the compile unit because there could be inlined functions so you would always need to index all of them. Likewise with setting a breakpoint by function name, you will need to index all DWO files.

I made a few measurements a few weeks ago and setting a file + line breakpoint re

Maybe we can:

  • Have a new class that we hand out for a DIE, maybe named DWARFDIE that contains:
class DWARFDIE
{
    DWARFCompileUnit *m_cu;
    DWARFDebugInfoEntry *m_die;
};

Then change all of the places we currently use a "DWARFCompileUnit *cu, DWARFDebugInfoEntry* die" (we always pass them around together) to use a DWARFDIE instead. This allows us to store the DIEs efficiently, yet pass them around in a slightly larger container for usage. This would allow our memory usage to stay very close to where it is (an extra pointer in the compile unit).

Then we modify DWARFCompileUnit to store the "SymbolFileDWARF *" that the compile unit comes from. We can still store the DWO file in the compile unit as well as you are already doing, we would just need to add a "SymbolFileDWARF *m_dwarf;" member variable for the non DWO case (and also for digging up the DW_TAG_compile_unit attributes that aren't in the DWO DW_TAG_compile_unit).

Then we just make sure that all code that hands out DIEs actually hands out DWARFDIE instances instead of returning a "DWARFDebugInfoEntry *" and also having an out parameter that fills in the compile unit.

Thoughts?

source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp
348 ↗(On Diff #32972)

I just forgot to check if (fixed).

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
806–818 ↗(On Diff #32972)

Nice catch

1437–1438 ↗(On Diff #32972)

I have to return a SymbolFileDWARF* from GetAttributeValue so Address and String fetching know where it have to find the referenced sections, but it made the code significantly simpler.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3059–3061 ↗(On Diff #32972)

You are right. I removed it

Sorry, first I manage to submit my comments without finishing them.

I also question why Symbo

In the current version of the patch the compile units in the main object file hands out only the compile unit DIE with the information what is available in the main object file. I considered the other approach (hand out all DIEs by the DWARF compile unit in the main object file) but I dropped it for the following reasons:

  • If we hand out DIEs from a dwo symbol file then each DIE have to store a pointer to the symbol file (or compile unit) it belongs to what is a significant memory overhead (we have more DIEs then Symbols) if we ever want to store all DIE in memory. Even worse is that we have to change all name to DIE index to contain a pointer (compile unit / dwo symbol file) and an offset to find the DIE belongs to (compared to just an offset now) what is more entry then the number DIEs.

Can't we just store the SymbolFile inside the compile unit? We always need the compile unit to really do anything with a DIE anyway. We currently store the DWO file inside the compile unit so it seems that we could just store it once in the compile unit and avoid any extra cost.

In the name to DIE indexes (in SymbolFileDWARF) currently we store only a DIE offset and we find the compile unit based on the fact that the DIE should be inside the range of the compile unit. The compile units leave in the dwo symbol files all start at address 0 so just a DIE offset isn't enough to find the compile unit. We can store the offset in 4 byte (we already do it, but I am not sure it is a good idea) and the compile unit index in another 4 byte what isn't a major overhead, but it can matter for large inferiors. Storing the symbol file in the DIE might be avoidable but then the DIE have to ask the compile unit for the correct symbol file when somebody queries it for an attribute (we don't want the caller of the GetAttribute* function to know about dwo files).

  • In an average debug session run from an IDE the user usually sets the breakpoints based on file name + line number, display some stack traces, some variables and do some stepping. If we can index each dwo file separately then we can handle all of these features without parsing the full debug info what can give us some significant speed benefits at debugger startup time.

I don't see how we can ever just index one DWO file? If we index one, we must index them all within an executable otherwise the index will be incomplete. If you set a file + line breakpoint, you can't rely on the file matching the compile unit because there could be inlined functions so you would always need to index all of them. Likewise with setting a breakpoint by function name, you will need to index all DWO files.

I made a few measurements a few weeks ago and to set a file + line breakpoint we only need to parse the line table what is reasonably fast while parsing all DIEs is significantly slower. Setting a breakpoint based on function name require a full dwarf parsing, but if you use an IDE you almost never want to do it.

Maybe we can:

  • Have a new class that we hand out for a DIE, maybe named DWARFDIE that contains:
class DWARFDIE
{
    DWARFCompileUnit *m_cu;
    DWARFDebugInfoEntry *m_die;
};

Then change all of the places we currently use a "DWARFCompileUnit *cu, DWARFDebugInfoEntry* die" (we always pass them around together) to use a DWARFDIE instead. This allows us to store the DIEs efficiently, yet pass them around in a slightly larger container for usage. This would allow our memory usage to stay very close to where it is (an extra pointer in the compile unit).

Then we modify DWARFCompileUnit to store the "SymbolFileDWARF *" that the compile unit comes from. We can still store the DWO file in the compile unit as well as you are already doing, we would just need to add a "SymbolFileDWARF *m_dwarf;" member variable for the non DWO case (and also for digging up the DW_TAG_compile_unit attributes that aren't in the DWO DW_TAG_compile_unit).

Then we just make sure that all code that hands out DIEs actually hands out DWARFDIE instances instead of returning a "DWARFDebugInfoEntry *" and also having an out parameter that fills in the compile unit.

Thoughts?

I like the idea about passing them around together especially as we already do it in a lot of case and it will have only a small overhead, but it don't help on the fact that the name to DIE indexes have to store a compile unit pointer (or a compile unit index). I am not sure how much we want to worry about the memory usage increase because I estimate it to be under 10% increase (without any measurements to prove it), but it will be significantly bigger then the effect of some changes in the Symbol class where you were quite concerned about increasing the size of it.

For our DWARF in .o files, I have SymbolFileDWARFDebugMap which loads the DWARF from .o files. Each .o file is loaded in a completely unchanged version of SymbolFileDWARF. Any lldb::user_id_t that are generated use:

lldb::user_id_t
SymbolFileDWARF::MakeUserID (dw_offset_t die_offset) const
{
    return GetID() | die_offset;
}

For a normal DWARF file GetID() returns 0. When used under a SymbolFileDWARFDebugMap, we set the ID to the index of the DWARF file. This encodes the SymbolFileDWARF's index in SymbolFileDWARFDebugMap's array as the high 32 bits of any IDs that are passed around and allow us to hand out unique IDs where the DIE offset ORed with the SymbolFileDWARF index is the ID. The IDs are then trimmed down to the low 32 bits before the SymbolFileDWARF looks them up. Actually dw_offset_t is 32 bit, so even if you pass a 64 bit value to anything that looks up a DIE, it will get correctly truncated.

So this should be possible. If desired, I can first make a patch that implements the DWARFDIE stuff, then we can update this patch to deal with that after that is done?

If I understand you correctly then you are suggesting to create a class like SymbolFileDWARFDebugMap for handling object files with dwo files. I think that approach is practically have the same code flow as the current one (one symbol file which one stores a list of sub symbol files). The 2 main difference is having a separate top level symbol file class for it makes the abstraction a bit cleaner (we have to refer to the debug map symfile several place in SymbolFileDWARF) but also require more code where some of it is code duplication. With dwo files it is a bit more complicated to solve is because there is no easy way do find out if a file contains dwo entries or not and it is possible for an object file to have some compile unit with dwo entry and some compile unit what is present in the main symbol file (I don't think it is a common scenario).

If we would like to go in the direction with abstracting out the dwo handling more, then I would suggest to change the NameToDIE arrays to store lldb::user_id_t where the 32MSB represents the dwarf compile unit index and 32LSB is the DIE offset. Then each function what currently find a (dwarf) compile unit based on die offset can use the compile unit index while the rest of the dwo handling can be abstracted out into the DWARCompileUnit/DWARFDebugInfoEntry classes (+ possibly a few more DWARF class). My only concern with this approach is that we start storing 2 different information in the higher bits of user_id_t what can cause problems. (Is there a reason we want to abstract this feature out considering that it is almost standard dwarf?)

All in all I don't want to have a separate top level symbol file (referenced by a SymbolVendor) for handling dwo files because I think it is more work to maintain it and don't give us too much benefits. Next to it I am happy with most possible approach.

If you would like to help me with implementing the split dwarf handling then I welcome any help but currently I can focus this issue in most of my time and would like to get it working (with 100% test pass rate) before mid September so I would like to try to avoid situations where we start waiting on each other.

Let me get the DWARFDIE abstraction in and we can see where we are after I get this in as we will see more of what is possible to abstract when this is done.

I though a bit more about the abstraction you plan to introduce (DWARFDIE) and I started to believe we don't need it at all. If we use lldb::user_id_t in the NameToDIE indexes where the first 4 byte is the compile unit offset (for dwo) or the compile unit index (for debug map) and the last 4 byte is the die offset then we will have all of the information we need.

In this setup DWARFCompileUnit::Index have to be changed to index the dwo dwarf files also and all function of DWARFDebugInfoEntry have to be changed to check if it was given the compile unit belongs to the actual DIE or it got a pointer to the DIE in the main object file. In the second case it re-calls itself with the correct SymbolFileDwarf and DWARCompileUnit objects. I think this approach will keep the dwo file handling in the DWARFCompileUnit and in the DWARFDebugInfoEntry classes.

What do you think?

I though a bit more about the abstraction you plan to introduce (DWARFDIE) and I started to believe we don't need it at all. If we use lldb::user_id_t in the NameToDIE indexes where the first 4 byte is the compile unit offset (for dwo) or the compile unit index (for debug map) and the last 4 byte is the die offset then we will have all of the information we need.

Even if we don't need it to implement the DWO feature, we still need it. There are many places where we pass around a DWARFCompileUnit and a DWARFDebugInfoEntry as two arguments and there are many places where the wrong DWARFCompileUnit might be being used for a DWARFDebugInfoEntry. So even if this doesn't help the DWO stuff, I really want to get this change in so this kind of error doesn't happen.

We should still pull the CU index + DWARF offset tricks you mention for sure.

In this setup DWARFCompileUnit::Index have to be changed to index the dwo dwarf files also and all function of DWARFDebugInfoEntry have to be changed to check if it was given the compile unit belongs to the actual DIE or it got a pointer to the DIE in the main object file. In the second case it re-calls itself with the correct SymbolFileDwarf and DWARCompileUnit objects. I think this approach will keep the dwo file handling in the DWARFCompileUnit and in the DWARFDebugInfoEntry classes.

Sounds good, but lets build this on top of my DWARFDIE stuff so we can guarantee that we can hand out a DWARFDIE and always have the right DWARFCompileUnit and DWARFDebugInfoEntry. I am almost done with my DWARFDIE changes. Should be later today.

I got my DWARFDIE stuff in with:

% svn commit
Sending        include/lldb/Core/dwarf.h
Sending        include/lldb/Symbol/ClangASTContext.h
Sending        include/lldb/Symbol/TypeSystem.h
Sending        lldb.xcodeproj/project.pbxproj
Sending        source/Plugins/DynamicLoader/Hexagon-DYLD/DynamicLoaderHexagonDYLD.h
Sending        source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
Sending        source/Plugins/SymbolFile/DWARF/CMakeLists.txt
Adding         source/Plugins/SymbolFile/DWARF/DWARFAttribute.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFAttribute.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.h
Adding         source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp
Adding         source/Plugins/SymbolFile/DWARF/DWARFDIE.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFDIECollection.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFDIECollection.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugPubnames.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugRanges.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFDebugRanges.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFDeclContext.h
Sending        source/Plugins/SymbolFile/DWARF/DWARFFormValue.cpp
Sending        source/Plugins/SymbolFile/DWARF/DWARFFormValue.h
Sending        source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
Sending        source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
Sending        source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.cpp
Sending        source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDebugMap.h
Sending        source/Plugins/SymbolFile/DWARF/UniqueDWARFASTType.cpp
Sending        source/Plugins/SymbolFile/DWARF/UniqueDWARFASTType.h
Sending        source/Symbol/ClangASTContext.cpp
Sending        source/Symbol/TypeSystem.cpp
Sending        source/Target/Target.cpp
Transmitting file data ..................................
Committed revision 246100.
tberghammer updated this revision to Diff 33676.Sep 1 2015, 4:57 AM
tberghammer updated this object.
tberghammer edited edge metadata.

[RFC] DO NOT COMMIT!

Update the design based on the discussion for the previous diff.

The current version have ~35 failures with split dwarf (most of them hit an assertion in clang) and causes a minor regression in non-split dwarf handling. I am working on these issues, but would like to get a feedback about the current design because a major change there can shuffle up everything.

Added some inline comments to explain some implementation decisions

include/lldb/Symbol/ObjectFile.h
372 ↗(On Diff #33676)

We create all of the dwo object files with the same module what we used for the executable object file (belongs to them). Because of it we have several section with the same section name (several dwo file + object file) what can't be stored inside a single module.

This check is to disable adding the sections in the dwo object file into the modules section list because they will be fetched by SymbolFileDWARFDwo::GetCachedSectionData what handles it correctly.

If we want to avoid it we have to create a separate module for each dwo file and then somehow link them together in a way that we can search for a section in both modules (because the dwo file uses sections from the main object file). I think implementing it that way would be significantly worse and I don't see any good option to handle section list fetching in a better way without creating an ObjectFileELFDwo class just because of it (what will have to be constructed separately because the plugin system have no efficient way to select it).

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3706–3708 ↗(On Diff #33676)

This check is necessary because we will call this function on a SymbolFileDWARF fetched from the symbol vendor held by the module what will be the object file while the DWARFDIE will come from a variable's user_id_t.

Pushing this call up to the call stack is not possible (this is the first point in the call chain where this check make sense). If we really want to avoid this check here, then we have to move this function into DWARFCompileUnit or into DWARFDIE where we will have exactly the same check, but the reference for the SymbolFileDWARF members will be a bit more verbose.

clayborg requested changes to this revision.Sep 1 2015, 10:56 AM
clayborg edited edge metadata.

See inlined comments.

Main fix themes:

  • It would be great to try and avoid any manual function calls where we supply the compile unit and a DIE offset that we assume comes from that compile unit. I identified many places where this was happening and we might have more. It is OK in DWARFDebugInfoEntry in places, but everywhere else we should try to avoid. Also anywhere where we grab an attribute, we should store a FormValue instead of a dw_offset_t (or a DIERef) and use that.
  • Modify DWARFDebugInfoEntry::GetAttributeValue() to handle looking through the DW_AT_specification and DW_AT_abstract_origin for attributes with a new default parameter "bool check_specification_or_abstract_origin = false". Add this parameter to all DWARFDebugInfoEntry::GetAttributeValueAs...() functions.
source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
218 ↗(On Diff #33676)

DIERef can and should be initialized with only the "form_value". It contains enough info to extract the correct reference. No need to pass in the die.GetCU()->GetOffset().

1148 ↗(On Diff #33676)

We should store "specification_die_offset" as a FormValue so we don't possibly use the wrong compile unit here. FormValue contains all that is needed to decode a value. If the DW_AT_specification was actually from a DW_AT_abstract_origin where the abstract origin DIE contained a DW_AT_specification, this would use the wrong compile unit and be an invalid DIERef (or point to the wrong thing). In general, we need to avoid this kind of manual compile unit + offset to avoid these issues everywhere.

1170 ↗(On Diff #33676)

Store abstract_origin_die_offset as a FormValue to avoid issues as mentioned above. No manual CU + offset anywhere.

source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp
316–328 ↗(On Diff #33676)

Is DW_AT_GNU_dwo_name always a relative path?

660 ↗(On Diff #33676)

Don't we need the compile unit to be encoded into the DIERef here?

source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp
161 ↗(On Diff #33676)

We probably should swift m_die->LookupAddress to take a "DWARFDIE *function_die, DWARFDIE *block_die" instead of "DWARFDebugInfoEntry* function_die, DWARFDebugInfoEntry* block_die". Then we can avoid doing the manual combination of CU + offset here.

184 ↗(On Diff #33676)

The DWARFCompileUnit::GetID() already shifts the cu_offset over by 32, so we will just lose the compile unit bits here.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
610 ↗(On Diff #33676)

I would like to avoid the manual CU + offset here if possible. We could make die_offsets into an array of DIERef objects.

1250 ↗(On Diff #33676)

Not sure what you were thinking here? You init this with nullptr and then try to use it below? Did you mean to init this with dwarf2Data? If so then we don't need this? Otherwise, please fix

1262 ↗(On Diff #33676)

This will crash due to attribute_symbol_file always being nullptr.

1273–1279 ↗(On Diff #33676)

This is currently always nullptr, so this is dead code

1298 ↗(On Diff #33676)

We should probably just modify DWARFDebugInfoEntry::GetAttributeValue() to take an extra parameter like:

bool check_specification_or_abstract_origin

If this is true DWARFDebugInfoEntry::GetAttributeValue() should look through the DW_AT_specification or DW_AT_abstract_origin. I think we have many many places that do this sort of:

  • check this die for the attribute
  • check the DW_AT_specification for the attribute
  • check the DW_AT_abstract_origin for the attribute

If you do do this, we could make the extra parameter default to "bool check_specification_or_abstract_origin = false" so all code continues to behave as it currently does, then opt it where we need to. The more smarts we put into DWARFDebugInfoEntry::GetAttributeValue() the better off we will be seeing as the compile unit in the main executable file when using DWO files has some attributes in the main executable DW_TAG_compile_unit and some in the other. I don't believe we need anything from the other, but again, the more we centralize this in DWARFDebugInfoEntry::GetAttributeValue() the less problems we will have in the future.

We would also need to add this "bool check_specification_or_abstract_origin" parameter to all of the DWARFDebugInfoEntry::GetAttributeValueAsXXX() functions.

1336–1347 ↗(On Diff #33676)

not sure if we are OK with not checking the DW_AT_specification or DW_AT_abstract_origin here? Seems like a bug that would be fixed with our new DWARFDebugInfoEntry::GetAttributeValue() with the "bool check_specification_or_abstract_origin" parameter...

1365–1387 ↗(On Diff #33676)

not sure if we are OK with not checking the DW_AT_specification or DW_AT_abstract_origin here? Seems like a bug that would be fixed with our new DWARFDebugInfoEntry::GetAttributeValue() with the "bool check_specification_or_abstract_origin" parameter...

1427 ↗(On Diff #33676)

We don't check DW_AT_specification or DW_AT_abstract_origin...

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
3732 ↗(On Diff #33676)

Please store as a FormValue. We have a branch where we need this as a FormValue so this will ease merging.

This revision now requires changes to proceed.Sep 1 2015, 10:56 AM
tberghammer updated this revision to Diff 33931.Sep 3 2015, 5:15 AM
tberghammer edited edge metadata.
tberghammer marked 18 inline comments as done.

Address review comments and fix other bugs found during testing

With the current version of the CL there is no regression under Linux-x86_64 without split dwarf and if split dwarf is enabled we have only 3 test failures + all MI test case (most likely a test case issue). If you are happy with the change in it's current form, then I plan to commit it in as it is now and continue iterating on the remaining failures upstream.

source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp
316–328 ↗(On Diff #33676)

I haven't seen absolute path in it so far, but it can be absolute. Added code to handle it.

660 ↗(On Diff #33676)

No, we don't want to specify the compile unit here. We already checked that the specified DIE offset isn't belongs to the current compile unit. In this case we should try to get a DIE based on offset from the current symbol file. This can happen when a DIE refers to a value in a different compile unit inside the same file based on DIE offset

source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp
161 ↗(On Diff #33676)

I would prefer to leave it as it is now because we have to do the manual combination at some point (because the returned DIE might be in a different CU then the current DIE) and moving it into LookupAddress will make that function even more complicated.

184 ↗(On Diff #33676)

SymbolFileDWARF does it, but DWARFCompileUnit isn't. With shifting by 32 we lose the SymbolFileDWARF id, but that one isn't considered to be part of the DIE ID.

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
1250 ↗(On Diff #33676)

The function isn't used at all now, so I just removed it completely.

(I made the change here to fetch the block data from the correct symbol file, but one of the merge made it broken)

clayborg requested changes to this revision.Sep 3 2015, 11:44 AM
clayborg edited edge metadata.

A few changes.

I think this might create problems for DWARF in .o files on MacOSX. Are you able to test on MacOSX? We used to store the compile unit index in the upper 32 bits of the type ID and all of our compile units from each .o file have 0xb as their cu_offset, so I would think that DIERef would always cause us to grab the first compile unit...

source/Plugins/SymbolFile/DWARF/DIERef.cpp
34–44 ↗(On Diff #33931)

For DWO files, won't every compile unit have offset 0x0000000b? Or do you use the actual compile unit DIE from the main executable?

I have doubts this will work for DWARF in .o files. We used to encode the compile unit index in the upper 32 bits because all of our compile units have a dw_offset_t that is 0xb..

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
603–605 ↗(On Diff #33931)

These 3 lines would be nicer as:

for (const auto &die_ref : die_refs)
{
1025–1036 ↗(On Diff #33931)

This will fail if a DIE can have both a DW_AT_specification and a DW_AT_abstract_origin. Not sure if that can happen. If it can, we will need to check each one individually. Probably best to just check both individually.

1039–1055 ↗(On Diff #33931)

Don't you want to only do all of this code if:

if (Tag() == DW_TAG_compile_unit)
This revision now requires changes to proceed.Sep 3 2015, 11:44 AM
tberghammer updated this revision to Diff 34159.Sep 7 2015, 8:26 AM
tberghammer edited edge metadata.
tberghammer marked 4 inline comments as done.

Address review comments, fix dsym handling and update the xcode project

With this patch there is no regression on Linux-x86_64 and on OSX-x86_64.

source/Plugins/SymbolFile/DWARF/DIERef.cpp
34–44 ↗(On Diff #33931)

The concept of DIERef is to contain the offset of the compile unit in the main object file so based on a DIERef and a SymbolFileDWARF fro the main object file we can find the compile unit and the DIE. In case of dsym it changes a bit with storing the symbol file index in the cu_offset filed instead.

I fixed the implementation to set it up properly in case of dwo files

source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
1025–1036 ↗(On Diff #33931)

We haven't checked both of it previously either and I don't think we will ever have both a DW_AT_specification and a DW_AT_abstract_origin, but changed it to check both for just in case.

1039–1055 ↗(On Diff #33931)

Yes I do.

The beginning of the function (line 987-995) handles the case for non DW_TAG_compile_unit with forwarding the query to the right compile unit and this part of the code is to handle the case when some data about the compile unit is in the main object file and some data is in the dwo file. This can't happen for any other DIE.

clayborg requested changes to this revision.Sep 8 2015, 9:05 AM
clayborg edited edge metadata.

One minor IsValid() fix in DWARFDIE to avoid crashes and this is good to go.

source/Plugins/SymbolFile/DWARF/DWARFDIE.cpp
28–35 ↗(On Diff #34159)

Need to check IsValid() first and place all above code inside the if and have the else return a DIERef().

This revision now requires changes to proceed.Sep 8 2015, 9:05 AM
This revision was automatically updated to reflect the committed changes.