This is an archive of the discontinued LLVM Phabricator instance.

Read macro info from .debug_macro section and use it for expression evaluation.
ClosedPublic

Authored by sivachandra on Dec 10 2015, 6:18 PM.

Details

Summary

DWARF 5 proposes a reinvented .debug_macro section. This change follows
that spec.

Currently, only GCC produces the .debug_macro section and hence
the added test is annottated with expectedFailureClang.

Diff Detail

Repository
rL LLVM

Event Timeline

sivachandra retitled this revision from to Read macro info from .debug_macro section and use it for expression evaluation..
sivachandra updated this object.
sivachandra added a subscriber: lldb-commits.

The dwo_test_method fails for the added test as I think there is some GCC bug with the combination of -gsplit-dwarf and -g3.

I should also mention that I did not implement the complete spec as I do not know of any producer which produces all the flavors of the new spec.

tberghammer edited edge metadata.Dec 11 2015, 4:32 AM

The high level architecture looks reasonable but there are a few think I would like you to clean up a bit:

  • Several of your function uses a raw "new ...()" and then returns a pointer. Please put the data into a smart pointer (usually std::unique_ptr) and return that one instead to eliminate the risk of a memory leak.
  • I added a bunch of inline comments. Most of them are minor style issues but there are some more significant ones (e.g. unused data structures)

How much advantage do you expect from this change from the users point of view? Do you think we should try to make clang produce .debug_macros section also?

In case of split dwarf gcc produces several .debug_macro.dwo section for each compile unit what won't work with your current implementation. I haven't checked the spec if gcc-s behavior is correct or not but it might be a good idea to support it (not necessary with this change). If you decide to not support split dwarf then please mark the test as XFAIL(dwo) also

include/lldb/Symbol/DebugMacros.h
56 ↗(On Diff #42489)

(nit): "= default" (for all other similar function also)

131 ↗(On Diff #42489)

(nit): Please use nullptr (for all other similar case also)

137 ↗(On Diff #42489)

What is the benefit of storing unique pointers in the vector compared to the objects itself?

source/Expression/ExpressionSourceCode.cpp
106 ↗(On Diff #42489)

(nit): Please mark this function static and move it out from the anonymous namespace

122 ↗(On Diff #42489)

Should this function return an error in this case so the caller know the macro stream is only half ready?

source/Plugins/SymbolFile/DWARF/DWARFDebugMacro.cpp
56–58 ↗(On Diff #42489)

(nit): Please initialize

source/Plugins/SymbolFile/DWARF/DWARFDebugMacro.h
28 ↗(On Diff #42489)

Please move this enum out from the global namespace (preferably into one of the class) as we are currently leaking both the enum name and the enum fields into the global namespace.

38 ↗(On Diff #42489)

(nit): struct

47 ↗(On Diff #42489)

(nit): Member variable names should start with "m_"

48 ↗(On Diff #42489)

Why do we need this data structure? Currently nobody is using it.
Also should we store DWARFFormValue instead of llvm::dwarf::Form here? It have a lot of feature in it what makes it work with split dwarf (dwo).

clayborg requested changes to this revision.Dec 11 2015, 10:40 AM
clayborg edited edge metadata.

My main concern here is efficient storage. See my inline comment regarding DebugMacroEntryStorage and let me know what you think.

include/lldb/Symbol/DebugMacros.h
98 ↗(On Diff #42489)

You can probably drop this to uint32_t. No one will have a file with more than 4 billions lines of code. Then this class will pack better.

100 ↗(On Diff #42489)

m_file should be stored as an integer. CompileUnits has a CompileUnits::GetSupportFiles() which is the file list that is the same list from the DWARF line table file table. This doesn't mean that we have to hand it to consumers as an index, see my comments below about DebugMacroEntryStorage

101 ↗(On Diff #42489)

We are going to have a million of these objects and we should really store this as efficiently as possible:

DebugMacroEntryStorage
{
    uint32_t m_type:3;
    uint32_t m_line:29;
    uint32_t m_file_idx; // Index to file in CompileUnit::GetSupportFiles() FileSpecList
    ConstString m_str;
};

Then you can expand this into an instance of DebugMacroEntry and have that be what you hand out to external users and this DebugMacroEntry can be as big as you want. Can we inline the entries that are currently representing by adding the m_debug_macros_sp here? Do we really have many places in the macros for a file that include the same thing over and over? I would vote we get rid of the m_debug_macros_sp and just inline the entries where we need to unless we have a specific case where this would make us make way too many DebugMacroEntryStorage objects.

137 ↗(On Diff #42489)

I would rather see this as a vector of just the structs as well as tberghammer suggests. No need to scatter thousands of objects around the heap if we don't need to.

source/Expression/ExpressionSourceCode.cpp
213 ↗(On Diff #42489)

Remove this as it isn't needed. The StreamString has a std::string and it doesn't need to have anything in it to say it is empty.

This revision now requires changes to proceed.Dec 11 2015, 10:40 AM
sivachandra edited edge metadata.
sivachandra marked 11 inline comments as done.

Address comments.

I have addressed all the comments (with changes or my own comments).

About the problem with dwo, I have sent a mail to the GCC guys asking for guidance. Will fix it when I get clarity on that.

And yes, it would definitely be good if clang can also produce .debug_macro sections.

include/lldb/Symbol/DebugMacros.h
101 ↗(On Diff #42489)
  1. I have taken in other changes but did not remove m_debug_macros_sp.
  2. I have removed the file name / file index member completely as we are not using it now anyway!
  3. I did not understand the benefit of creating a new class/struct.
  4. For a simple source file like in the test, I see a 100+ defines inserted by default. If we inline them, and if a symbol file contains a 100 simple source files, we will be storing 10000+ duplicate entries. If each of the source files actually includes a standard library header, I would image there would be many more common entries exploding the number of duplicate entries we will be storing.
source/Expression/ExpressionSourceCode.cpp
122 ↗(On Diff #42489)

The return here does not mean it is "half ready". The function returns from here as macro entries after this appear on or after the current line in the main source file.

source/Plugins/SymbolFile/DWARF/DWARFDebugMacro.h
48 ↗(On Diff #42489)

I have refactored the code around this now. Since the entire header has to be read past to get to the macro entries, I have collected the data in a data-structure. After the refactor, it is only lives on the stack. While I am OK to take it off if you want, I see no harm in keeping it as a data structure.

Also, since the form value is unused anyway, I have now made it dw_form_t.

clayborg requested changes to this revision.Dec 11 2015, 3:27 PM
clayborg edited edge metadata.

Can we change DebugMacroEntry to contain a std::unique_ptr<DebugMacro> instead of a shared pointer? This would save us one full pointer in each DebugMacroEntry.

That is the only change I would like to see.

The expand on the benefit of having two structs like DebugMacroEntryStorage would be you can store it very efficiently with bitfields and store the macros inside DebugMacro as a vector of DebugMacroEntryStorage structs. But the API from DebugMacro could that gets an entry at index could return the bigger fatter version of this. The two structs below would be an example:

We store macro entries inside DebugMacro as:

class DebugMacroEntryStorage
{
    EntryType m_type:3;
    uint32_t m_line:29;
    uint32_t m_file;
    ConstString m_str;
    DebugMacrosSP m_debug_macros_sp;
};

But the function:

DebugMacroEntry
GetMacroEntryAtIndex(const size_t index) const

would return something like:

class DebugMacroEntryStorage
{
    CompileUnit *m_comp_unit;
    EntryType m_type;
    uint32_t m_line;
    FileSpec m_file;
    ConstString m_str;
    DebugMacrosSP m_debug_macros_sp;
};

We would resolve the "m_file" into a FileSpec for public consumption by getting the correct file from the compile unit's support files FileSpecList and not compress the storage and also fill in the m_comp_unit if needed.

This doesn't need to be done, but this is what I was thinking. I did this in LineTable.h where I store line entries as a lldb_private::LineTable::Entry objects, but I hand them out to clients as lldb_private::LineEntry objects.

This revision now requires changes to proceed.Dec 11 2015, 3:27 PM
clayborg accepted this revision.Dec 11 2015, 3:29 PM
clayborg edited edge metadata.

Can we change DebugMacroEntry to contain a std::unique_ptr<DebugMacro> instead of a shared pointer? This would save us one full pointer in each DebugMacroEntry.

That is the only change I would like to see.

Actually never mind. I believe the whole reason this shared pointer is here it to share the object between multiple other objects...

This revision is now accepted and ready to land.Dec 11 2015, 3:29 PM

One area of concern is if you are not tracking file, how can you get the right defines for a given source file line? If you have:

main.c:

#include <foo.h>
#define FOO printf

int main ()
{
    return 0; // Stop here and run: FOO("hello world\n")
}

#undef FOO

But in foo.h you have:
#define FOO puts

nothing
nothing
nothing
nothing

#undef FOO

Do you currently correctly have the following in your macros if stopped at the return statement in main?

#define FOO puts <<< from foo.h
#undef FOO <<< from foo.h
#define FOO print <<< from main.c

Both foo definitions will be on line 1, and we will be stopped on line 5 of main.c, but if we don't know that the entries from foo.h come from foo.h we might accidentally include these statements?

The spec says that, for a given compile unit, the macro entries in the debug info will be in the order in which the compiler processes them. Hence, at line 5 in your example, all these are relevant:

#define FOO puts <<< from foo.h
#undef FOO <<< from foo.h
#define FOO print <<< from main.c

I have actually verified that it works as expected; FOO resolves to printf and not puts at line 5.

You have tickled one aspect that I have ignored though. What if one is stopped in an inlined function/method defined in an included file? I think taking care of this is a fairly straightforward change over what I now have. I will need to bring back the line index in DebugMacroEntry and use it. Will send an update soon.

sivachandra edited edge metadata.

Use file names.

PTAL. I have now covered the case I missed. Added test for that as well.

tberghammer added inline comments.Dec 14 2015, 3:43 AM
source/Plugins/SymbolFile/DWARF/DWARFDebugMacro.cpp
38–41 ↗(On Diff #42614)

If we can get rid of all OperandTable related data then this can become a call to the SkipOperandTable function. (What is the purpose of the operand table if you can process everything without using it?).

source/Plugins/SymbolFile/DWARF/DWARFDebugMacro.h
49 ↗(On Diff #42614)

Constructing it is quite expensive as we insert a new entry for each operand into a std::map what will involve a binary search and most likely a heap allocation. As we don't use it for anything I would prefer to remove it because we don't want to pay the computation cost required to create it for nothing. We can add it back when we have a use case for it.

Going forward in this route I don't see any code reading DWARFDebugMacroHeader::m_operand_table either. I might miss something but if nobody is reading it then we can remove DWARFDebugMacroHeader::m_operand_table, DWARFDebugMacroHeader::OperandTable, DWARFDebugMacroHeader::OperandTableEntry and the call to DWARFDebugMacroHeader::OperandTable::ReadOperandTable what would simplify the code and also improve the efficiency as we don't do a lot of useless work.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
563 ↗(On Diff #42614)

Should we use std::map or std::unordered_map here?

sivachandra marked 2 inline comments as done.

Remove storing of an unused data structure.

Use std::unordered_map instead of std::map.

tberghammer accepted this revision.Dec 15 2015, 3:36 AM
tberghammer edited edge metadata.

LGTM

sivachandra closed this revision.Dec 15 2015, 4:25 PM
This revision was automatically updated to reflect the committed changes.