Page MenuHomePhabricator

CompileUnit: Use shared_ptr for storing support file lists
AbandonedPublic

Authored by labath on May 30 2019, 2:17 AM.

Details

Summary

This patch refactors the CompileUnit class to store the support files in
a shared_ptr, and changes the SymbolFiles to hand them out as such.

This allows the file lists to be shared between compile units, or
between other objects. The motivation for this is to enable
SymbolFileDWARF to keep a handle on the file lists it parses, so that it
can use it when processing type units (which will not be handed out as
lldb_private::CompileUnits). This will be implemented in follow-up
patch(es).

Since I was already changing the signature of this function, I made it
return an Expected<shared_ptr> and changed all the places that were
silently doing nothing to return an error. The CompileUnit class still
returns an empty file list in case the parsing failed, but it first
makes sure any errors get reported.

Event Timeline

labath created this revision.May 30 2019, 2:17 AM

This isn't strictly necessary. I could just have SymbolFileDWARF keep one copy a file list for each line table. That way each list would be stored exactly twice. However, it seemed like it would be nice to avoid that (though I don't have any data about how much this saves or anything).

Actually, I've run into a bit of a snag while trying to implement the full line table sharing. The thing is, before DWARF5, the line tables are not fully independent, and require the DW_AT_comp_dir of the compile unit in order to resolve the relative paths correctly. Type units do not have the DW_AT_comp_dir attribute, and so in order to correctly parse their line tables, I'd have to check their DW_AT_stmt_list attribute, find the matching compile unit, and fetch the compilation directory from there. This is relatively easy do to in regular DWARF, though it still requires some fairly elaborate dances if one wants to do it lazily and efficiently, etc.

However, things get a lot more complicated when split-dwarf comes into play. There the compiler will emit a separate line table into the dwo file for use by the type unit, and the compile unit will use the line table from the main file (located through DW_AT_stmt_list on the skeleton unit in the .o file). Here it is pretty much impossible to reliably connect the type unit to the relevant comp_dir. If the DWO file contains a single compilation unit, I might guess that I should look at this one, but in case of multiple compile units in a single dwo file we're out of luck.

So, this got me thinking that maybe this DW_AT_comp_dir business is not worth all the trouble it brings, and I'm now thinking of a simpler solution:

  • for each type unit, parse the line tables without the DW_AT_comp_dir attribute. Cache them, and share them between type units.
  • for each compile unit, parse the line tables *with* the DW_AT_comp_dir. Don't cache and don't share them (i.e., do exactly what we do now)

This means each line table will be parsed at most twice (once for its compile unit, and once for any type units that refer to it). It's not possible to share them between CUs and TUs without going through the complex matching process I described above, or without risking that the CU will get incomplete line tables if they have already been parsed via a type unit. It also means that for types in a TU we might not have complete declaration file names (in DWARF<=4), but I am not sure how many users will actually care about that (not even llvm-dwarfdump tries to do this matching). And this will still prevent the line table being parsed hundreds of times for each type unit that happens to refer to it...

I think that potentially parsing a line table (only it's file list, really) twice is worth the reduction in complexity that a full sharing would require. WDYT?

If you agree, then this patch is also not really necessary (though the llvm::Expected part is a nice refactor regardless IMO).

Actually, I've run into a bit of a snag while trying to implement the full line table sharing. The thing is, before DWARF5, the line tables are not fully independent, and require the DW_AT_comp_dir of the compile unit in order to resolve the relative paths correctly. Type units do not have the DW_AT_comp_dir attribute, and so in order to correctly parse their line tables, I'd have to check their DW_AT_stmt_list attribute, find the matching compile unit, and fetch the compilation directory from there. This is relatively easy do to in regular DWARF, though it still requires some fairly elaborate dances if one wants to do it lazily and efficiently, etc.

However, things get a lot more complicated when split-dwarf comes into play. There the compiler will emit a separate line table into the dwo file for use by the type unit, and the compile unit will use the line table from the main file (located through DW_AT_stmt_list on the skeleton unit in the .o file). Here it is pretty much impossible to reliably connect the type unit to the relevant comp_dir. If the DWO file contains a single compilation unit, I might guess that I should look at this one, but in case of multiple compile units in a single dwo file we're out of luck.

So, this got me thinking that maybe this DW_AT_comp_dir business is not worth all the trouble it brings, and I'm now thinking of a simpler solution:

  • for each type unit, parse the line tables without the DW_AT_comp_dir attribute. Cache them, and share them between type units.
  • for each compile unit, parse the line tables *with* the DW_AT_comp_dir. Don't cache and don't share them (i.e., do exactly what we do now)

This means each line table will be parsed at most twice (once for its compile unit, and once for any type units that refer to it). It's not possible to share them between CUs and TUs without going through the complex matching process I described above, or without risking that the CU will get incomplete line tables if they have already been parsed via a type unit. It also means that for types in a TU we might not have complete declaration file names (in DWARF<=4), but I am not sure how many users will actually care about that (not even llvm-dwarfdump tries to do this matching). And this will still prevent the line table being parsed hundreds of times for each type unit that happens to refer to it...

I think that potentially parsing a line table (only it's file list, really) twice is worth the reduction in complexity that a full sharing would require. WDYT?

If you agree, then this patch is also not really necessary (though the llvm::Expected part is a nice refactor regardless IMO).

That is my current solution in our internal LLDB fork, so yes, this works and is probably the best solution.

labath abandoned this revision.May 30 2019, 8:30 AM

Ok, abandoning this patch in that case.