This is an archive of the discontinued LLVM Phabricator instance.

Tolerate DWARF compile unit without filename.
ClosedPublic

Authored by dsrbecky on Jul 7 2015, 10:29 AM.

Details

Summary

The DW_AT_name attribute of compile unit is optional.
If it is missing, try to get filename from the debug_line section.
This allows the compile unit to be useful without the filename.

Diff Detail

Event Timeline

dsrbecky updated this revision to Diff 29190.Jul 7 2015, 10:29 AM
dsrbecky retitled this revision from to Tolerate DWARF compile unit without filename..
dsrbecky updated this object.
clayborg requested changes to this revision.Jul 7 2015, 11:16 AM
clayborg edited edge metadata.

Just create the compile unit first with an empty file spec, and then afterward, use CompileUnit::GetSupportFiles() so that we don't parse the line table files and throw them away. The exact code you need is in the inline comments. This way we won't end up parsing the support files once in SymbolFileDWARF::ParseCompileUnit() and then again in SymbolFileDWARF::ParseCompileUnitSupportFiles() when the lldb_private::CompileUnit parses them.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
992–993

remove this

995–1005

inside this if statement, adddo the following:

if (cu_sp)
{                    
    // If we just created a compile unit with an invalid file spec, try and get the
    // first entry in the supports files from the line table as that should be the
    // compile unit.
    if (!cu_file_spec)
    {
        cu_file_spec = cu_sp->GetSupportFiles().GetFileSpecAtIndex(0);
        if (cu_file_spec)
            (FileSpec &)(*cu_sp) = cu_file_spec;
    }
This revision now requires changes to proceed.Jul 7 2015, 11:16 AM
dsrbecky updated this revision to Diff 29201.Jul 7 2015, 12:07 PM
dsrbecky edited edge metadata.

Parse line table after creating the compilation unit.

dsrbecky marked an inline comment as done.Jul 7 2015, 12:17 PM

As a side note, what is the compilation unit filename needed for? The filenames in the line table are obviously used for PC->line mapping, and they may be referenced from the debug_info. However, it is not clear to me why compilation unit needs a filename. I believe it is perfectly valid to aggregate all debug information info to single compilation unit, and then there is no obvious primary source file.

Instead of taking the filename from line table, would it be valid to leave it empty? I did not do that since I was worried what might break.

I have noticed that the support files copy compilation units filename to index 0. I agree that the line table should be mapped to index 1 and further, but isn't 0 reserved as unknown/invalid?

clayborg requested changes to this revision.Jul 7 2015, 2:45 PM
clayborg edited edge metadata.

Looks like an extra } at line 991 should be removed right?

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
991

Shouldn't this be removed?

This revision now requires changes to proceed.Jul 7 2015, 2:45 PM
dsrbecky added inline comments.Jul 7 2015, 2:54 PM
source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
991

It is intended. I closes the if (cu_file_spec) above. Otherwise the if the cu_file_spec is empty, the CU will not be even created.

I would recommend viewing the path with with-space changes. It is more obvious.

clayborg accepted this revision.Jul 7 2015, 3:03 PM
clayborg edited edge metadata.

Ahh, I missed the removal of the } at line 1014 in the old code... Looks good.

This revision is now accepted and ready to land.Jul 7 2015, 3:03 PM

As a side note, what is the compilation unit filename needed for? The filenames in the line table are obviously used for PC->line mapping, and they may be referenced from the debug_info. However, it is not clear to me why compilation unit needs a filename. I believe it is perfectly valid to aggregate all debug information info to single compilation unit, and then there is no obvious primary source file.

The primary source filename is in the compilation unit's DW_AT_name and repeated in the .debug_line section's file table. However the current directory of the compilation unit is found only in the compilation unit's DW_AT_comp_dir; this implicitly becomes directory entry 0 in the line table. Sadly this means you cannot collapse all comp units because they can have different comp_dir values.

The only DWARF-specified method for identifying the primary source file in the line table is the name that matches DW_AT_name from the comp unit. If there's no DW_AT_name then you can guess it's the first entry; but that's just a guess.

Thank you for the answer. If you pardon my ignorance... what is the purpose of *primary* source file? How does it differ from the other files? I do not imagine that the entries in the line table care whether the source is primary. I imagine DW_AT_decl_file and similar do not care either. I guess there is a good reason, I just can not think of it.

Likewise, is there any reason for the directory entries other then saving of space? (so in the hypothetical collapsing filenames could be stored as absolute paths without any directory entries in the cu or line table)

As a human I am probably most interested in the "primary" source file associated with an object file. But as you say, primary-ness isn't relevant to how the file list is used within DWARF, or how debuggers would use it. Thus, the filename from the comp unit would seem to be not hugely important.

The directory list is exactly for saving space. It's not hard to come up with compilation units that have sourced in hundreds of include files, and the directory names they come from can be hundreds of characters long, but typically there are lots of include files in any given directory. So, factoring out the directory names is typically a win.

DWARF has the notion in its line tables that the primary source file is the first one in the file list of the line table prologue. Compile units are also how the compiles think of things so this is the way they were represented in the debug info. Things tend to be less clear when you apply link time optimizations where a number of .o files are optimized into new code and then debug info is written out for the newly optimized code, but in general, most compilers have the notion of one file being compiled at a time and that is one primary file. DWARF added the DW_AT_name to the DW_TAG_compile_unit so that you could see the source file by just looking at the DWARF in the .debug_info section. The DW_AT_stmt_list is an attribute that has an offset into the .debug_line section which contains the line table prologue (directory lists, file lists and more) and the line table rows (address -> file + line + column). It would be a pain to not include the name of the compile unit (the file path) in the .debug_info because then you would need to parse a separate section to show the complete info for the compile unit. So part of this is just organization of the debug info and how to make each section stand on its own (there might not be a line table and a compile unit might not have a DW_AT_stmt_list).

DWARF has the notion in its line tables that the primary source file is the first one in the file list of the line table prologue.

I had thought so too, but looking in the specs I don't see any language to confirm that. In fact DWARF 4 added explicit language that the primary file is the one whose name matches DW_AT_name from the comp unit. The order in the line table's file list is not specified.

I expect it's _common_ to put the primary file first, but DWARF does not specify that.

Well, you still need to parse the line table prologue to understand DW_AT_decl_file, etc, but I get your point. If you are doing some simple processing, it nice to have the extra copy so you do not have to bother.

Thanks for the (two) responses :-)

Interesting about the primary source file not being the first...

The compiler that is generating DW_TAG_compile_unit tags that have no DW_AT_name should have a bug file against it to get this fixed to avoid possibly getting this wrong.

Does it have any serious consequence if we get it wrong? (in this case there is only one entry in line table anyway)

No real consequence other than possibly misinformation, but we would like to get it right and have the compiler emit correct DWARF if we can get it to.

tberghammer accepted this revision.Jul 8 2015, 3:14 AM
tberghammer edited edge metadata.

Next time please add lldb-commits as a subscriber instead of llvm-commits

This revision was automatically updated to reflect the committed changes.