This is an archive of the discontinued LLVM Phabricator instance.

Enhance SymbolFileDWARF::ParseDeclsForContext performance
ClosedPublic

Authored by guiandrade on Aug 30 2019, 2:52 PM.

Details

Reviewers
clayborg
labath
Summary

This implements DWARFASTParserClang::EnsureAllDIEsInDeclContextHaveBeenParsed so as to provide a faster way to ensure all DIEs linked to a certain declaration context have been parsed.

Currently, we rely on SymbolFileDWARF::ParseDeclsForContext calling DWARFASTParserClang::GetDIEForDeclContext, and only then DWARFASTParserClang::GetDeclForUIDFromDWARF. This change shortcuts that logic and removes redundant calls to DWARFASTParserClang:: GetClangDeclForDIE by deleting DIEs from the m_decl_ctx_to_die map once they have been parsed.

Event Timeline

guiandrade created this revision.Aug 30 2019, 2:52 PM
guiandrade added a comment.EditedAug 30 2019, 2:53 PM

Hey guys,

This change is more for me to get to know what you think. I've noticed that the GetDeclForUIDFromDWARF() calls inside SymbolFileDWARF::ParseDeclsForContext (https://github.com/llvm/llvm-project/blob/ef82098a800178a1f973abb8af86eaa690a29734/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp#L1216) often end up having no side effect besides generating a llvm::DenseMap::find invocation (https://github.com/llvm/llvm-project/blob/f07b4aff06d83c6ad25d95f456fbc12b2d2a0a0c/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp#L3322), especially when we evaluate multiple expressions inside the same scope. So I was wondering if there's a way to improve that. Do you guys think we could do something like what this change proposes, or am I missing something important? Thanks!

labath added a comment.Sep 2 2019, 7:07 AM

I am not very familiar with this code, but I don't see a reason why what you're doing could not work. However, it looks like the implementation of it could be done in a better way. For instance, this function seems to be the only caller of GetDIEForDeclContext. So, we could replace it with something like ast_parser->ParseAllDiesForContext(decl_ctx). This would avoid materializing the std::vector, and the ast parser could store the list of processed dies in a more intelligent (memory friendly) fashion.

The thing that's not clear to me is under what circumstances (if any) can a new DIE appear in a decl_context after ParseDeclsForContext has been called. @clayborg, do you have any idea?

clayborg added a comment.EditedSep 3 2019, 7:07 AM

Hey guys,

This change is more for me to get to know what you think. I've noticed that the GetDeclForUIDFromDWARF() calls inside SymbolFileDWARF::ParseDeclsForContext (https://github.com/llvm/llvm-project/blob/ef82098a800178a1f973abb8af86eaa690a29734/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp#L1216) often end up having no side effect besides generating a llvm::DenseMap::find invocation (https://github.com/llvm/llvm-project/blob/f07b4aff06d83c6ad25d95f456fbc12b2d2a0a0c/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp#L3322), especially when we evaluate multiple expressions inside the same scope. So I was wondering if there's a way to improve that. Do you guys think we could do something like what this change proposes, or am I missing something important? Thanks!

So this function gets called when clang wants to make sure that all decls inside of a function, class, struct, union are parsed. Parsed means DWARF has been converted into clang AST in the TypeSystem for the current DWARF file. This means if the DeclContext that is being used is something the clang expression parser wants to use and search, we can make sure all decls inside of a DeclContext are parsed so they will be visible. It will cause all DIEs to be parsed and all clang AST counterparts to be created for each DIE that gets returned. I really am not following what this patch does. The fact that you see:

clang::Decl *DWARFASTParserClang::GetClangDeclForDIE(const DWARFDIE &die)

seem like it is always just grabbing an entry from the cache (https://github.com/llvm/llvm-project/blob/f07b4aff06d83c6ad25d95f456fbc12b2d2a0a0c/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp#L3322) is only because this DWARFDIE has converted DWARF into a clang AST type already. The first time a decl gets parsed, it will fall through to the code below which converts the DWARF to clang AST types.

So, original code is doing what is intended. More comments below about how we can make this more efficient.

I am not very familiar with this code, but I don't see a reason why what you're doing could not work. However, it looks like the implementation of it could be done in a better way. For instance, this function seems to be the only caller of GetDIEForDeclContext. So, we could replace it with something like ast_parser->ParseAllDiesForContext(decl_ctx). This would avoid materializing the std::vector, and the ast parser could store the list of processed dies in a more intelligent (memory friendly) fashion.

The thing that's not clear to me is under what circumstances (if any) can a new DIE appear in a decl_context after ParseDeclsForContext has been called. @clayborg, do you have any idea?

No new DIEs should appear in a decl context in the future.

Speaking to efficiency we can do a few things:
1 - keep a cache of CompilerDeclContext's that have already been expanded. This way we can avoid the need to get a list of DIE, just to iterate through them and do nothing.

We can declare a new ivar in SymbolFileDWARF:

std::set<CompilerDeclContext> m_parsed_decls_for_decl_ctx;

And then use this std::set. If the pair that is returned has "second" set to "false", then it has already been parsed, else it will set it and fall through and parse all decls one time:

void SymbolFileDWARF::ParseDeclsForContext(CompilerDeclContext decl_ctx) {
  if (!m_parsed_decls_for_decl_ctx.insert(decl_ctx).second)
    return;
  // Fall through and do what we did before

2 - Change GetDIEForDeclContext to take a std::function or llvm::function that will get called with each DWARFDIE we need. This will avoid making a std::vector which we just iterate through. If we go this route we should rename this to ForEachDeclContextDIE(...).

Change:

std::vector<DWARFDIE> DWARFASTParserClang::GetDIEForDeclContext(lldb_private::CompilerDeclContext decl_context);

to:

void DWARFASTParserClang::ForEachDIEInDeclContext(lldb_private::CompilerDeclContext decl_context, std::function<void(DWARFDIE die)> const &callback);

"callback" will get called with each DIE instead of us making a vector.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1210–1211

delete these two lines

1212

Why the reference to 'offset'? Also we shouldn't need to make a key, just use decl_ctx:

uint32_t offset = m_decl_context_to_offset_map[decl_ctx];

Also, m_decl_context_to_offset_map is not populated anywhere. Not sure what this will return as this was never assigned. It will probably just return a default constructed "uint32_t"??

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
493

Any reason we are just using a lldb_private::CompilerDeclContext? Remove this line?

494
typedef std::map<CompilerDeclContext, uint32_t> DeclContextToOffsetMap;
495

I don't see 'm_decl_context_to_offset_map' being populated anywhere?

guiandrade updated this revision to Diff 218564.Sep 3 2019, 4:36 PM
guiandrade marked 6 inline comments as done.
guiandrade retitled this revision from Skip getting declarations for repeated DIEs (WIP) to Cache expanded CompilerDeclContext's to avoid parsing them multiple times.
guiandrade edited the summary of this revision. (Show Details)

Thank you so much for the thorough explanation and ideas, @labath and @clayborg.

I forgot to mention that the initial motivation behind this change was to reduce those find calls as I saw they add up to a lot (10^5 calls debugging a Unreal engine sample at certain breakpoints).

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1212

The idea was to use that reference to populate the map at line 1222. However, I was only using this offset map because I thought DIEs could appear in a decl context. Given that this is not the case, I agree that using a set as you suggested is much better.

guiandrade updated this revision to Diff 218581.Sep 3 2019, 7:18 PM
guiandrade retitled this revision from Cache expanded CompilerDeclContext's to avoid parsing them multiple times to Enhance SymbolFileDWARF::ParseDeclsForContext performance.

Renaming the test

lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
48–49

I think we should not expect any particular order, right?

Thanks for the clarification Greg.

Functionally, this patch seems fine, but I am still wondering if we shouldn't make this more efficient. std::set is not the most memory-efficient structure, and so creating a std::set entry just to store one bit seems wasteful, particularly as there is already a container which uses the context as a key (m_decl_ctx_to_die). Could just shove this bit into the m_decl_ctx_to_die map (probably by changing it to something functionally equivalent to map<DeclContext, pair<bool, vector<DWARFDie>>>)? Given that the sole "raison d´être" of this map is to enable the ParseDeclsForContext functionality, I don't think that having that flag be stored there should be awkward. Quite the opposite, it would remove the awkward back-and-forth between the SymbolFileDWARF and the DWARFASTParsers (symbol file calls ForEachDIEInDeclContext, passing it a callback; then ast parser calls the callback; finally the callback immediately re-enters the parser via GetDeclForUIDFromDWARF) -- instead we could just have the SymbolFileDWARF do ast_parser->PleaseParseAllDiesInThisContextIfTheyHaventBeenParsedAlready(ctx) and have everything happen inside the parser.

So, overall, it seems to me that this way, we could improve the code readability while reducing the time, and avoiding a bigger memory increase. In fact, since we now will not be using the list of dies after they have been parsed once, we could even free up the vector<DWARFDie> after the parsing is complete, and thereby reduce the memory footprint as well. That would be a win-win-win scenario. :)

WDYT?

lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
48–49

Probably not. A more succinct way of expressing that would be EXPECT_THAT(die_list, testing::UnorderedElementsAre(die2, die3))

Thanks for the clarification Greg.

Functionally, this patch seems fine, but I am still wondering if we shouldn't make this more efficient. std::set is not the most memory-efficient structure, and so creating a std::set entry just to store one bit seems wasteful, particularly as there is already a container which uses the context as a key (m_decl_ctx_to_die). Could just shove this bit into the m_decl_ctx_to_die map (probably by changing it to something functionally equivalent to map<DeclContext, pair<bool, vector<DWARFDie>>>)? Given that the sole "raison d´être" of this map is to enable the ParseDeclsForContext functionality, I don't think that having that flag be stored there should be awkward. Quite the opposite, it would remove the awkward back-and-forth between the SymbolFileDWARF and the DWARFASTParsers (symbol file calls ForEachDIEInDeclContext, passing it a callback; then ast parser calls the callback; finally the callback immediately re-enters the parser via GetDeclForUIDFromDWARF) -- instead we could just have the SymbolFileDWARF do ast_parser->PleaseParseAllDiesInThisContextIfTheyHaventBeenParsedAlready(ctx) and have everything happen inside the parser.

So, overall, it seems to me that this way, we could improve the code readability while reducing the time, and avoiding a bigger memory increase. In fact, since we now will not be using the list of dies after they have been parsed once, we could even free up the vector<DWARFDie> after the parsing is complete, and thereby reduce the memory footprint as well. That would be a win-win-win scenario. :)

WDYT?

Great idea, but this idea gave me the idea to use "m_decl_ctx_to_die" as is, and just remove the entry once we have parse all decls. Then we free the memory and don't need a bit. If there is no entry in the m_decl_ctx_to_die map, then ForEachDIEInDeclContext will just not iterate at all?

Great idea, but this idea gave me the idea to use "m_decl_ctx_to_die" as is, and just remove the entry once we have parse all decls. Then we free the memory and don't need a bit. If there is no entry in the m_decl_ctx_to_die map, then ForEachDIEInDeclContext will just not iterate at all?

Yes, that sounds even better!

guiandrade updated this revision to Diff 218779.Sep 4 2019, 1:25 PM

Migrating to EnsureAllDIEsInDeclContextHaveBeenParsed

Thanks for the clarification Greg.

Functionally, this patch seems fine, but I am still wondering if we shouldn't make this more efficient. std::set is not the most memory-efficient structure, and so creating a std::set entry just to store one bit seems wasteful, particularly as there is already a container which uses the context as a key (m_decl_ctx_to_die). Could just shove this bit into the m_decl_ctx_to_die map (probably by changing it to something functionally equivalent to map<DeclContext, pair<bool, vector<DWARFDie>>>)? Given that the sole "raison d´être" of this map is to enable the ParseDeclsForContext functionality, I don't think that having that flag be stored there should be awkward. Quite the opposite, it would remove the awkward back-and-forth between the SymbolFileDWARF and the DWARFASTParsers (symbol file calls ForEachDIEInDeclContext, passing it a callback; then ast parser calls the callback; finally the callback immediately re-enters the parser via GetDeclForUIDFromDWARF) -- instead we could just have the SymbolFileDWARF do ast_parser->PleaseParseAllDiesInThisContextIfTheyHaventBeenParsedAlready(ctx) and have everything happen inside the parser.

So, overall, it seems to me that this way, we could improve the code readability while reducing the time, and avoiding a bigger memory increase. In fact, since we now will not be using the list of dies after they have been parsed once, we could even free up the vector<DWARFDie> after the parsing is complete, and thereby reduce the memory footprint as well. That would be a win-win-win scenario. :)

WDYT?

Great idea, but this idea gave me the idea to use "m_decl_ctx_to_die" as is, and just remove the entry once we have parse all decls. Then we free the memory and don't need a bit. If there is no entry in the m_decl_ctx_to_die map, then ForEachDIEInDeclContext will just not iterate at all?

Thanks for the ideas, I think they greatly simplify the code. Let me know what you guys think of my implementation. I think the main issue with it is that it makes it hard to unit test it...

guiandrade added inline comments.Sep 4 2019, 1:32 PM
lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp
50

Once we agree on how to move forward, the idea is to delete/modify this, not leave it commented out

labath added a comment.Sep 5 2019, 4:56 AM

Thanks. The code looks fine to me. Losing the ability to unit test is a bit of a pity, but since test was pretty icky to begin with, I can't say I'm going to miss it too much...
One way we could possibly test this is with a full scale test via the "target modules dump ast" command. So, the idea would be to run to process to some point, evaluate some expressions, run "target modules dump ast" and check the things are in the output (or that they are *not* there).

It's hard to say what exactly you should be checking for, since the assumption is that we are not changing the behavior here, and so the existing tests should cover that in theory, but the "dump ast" command is a new thing, we don't have too many of tests for it, and so any tests you add will be useful, even if they're not specifically targeting the thing you fix in this patch. If you are able to test the thing that the previous patch fixed in this way, than that would be great though. I'm not 100% sure it can be done, but since (IIUC) the bug/problem was about parsing too many things, I would expect that would manifest itself in some additional entries showing up in the parsed ast.

Thanks. The code looks fine to me. Losing the ability to unit test is a bit of a pity, but since test was pretty icky to begin with, I can't say I'm going to miss it too much...
One way we could possibly test this is with a full scale test via the "target modules dump ast" command. So, the idea would be to run to process to some point, evaluate some expressions, run "target modules dump ast" and check the things are in the output (or that they are *not* there).

It's hard to say what exactly you should be checking for, since the assumption is that we are not changing the behavior here, and so the existing tests should cover that in theory, but the "dump ast" command is a new thing, we don't have too many of tests for it, and so any tests you add will be useful, even if they're not specifically targeting the thing you fix in this patch. If you are able to test the thing that the previous patch fixed in this way, than that would be great though. I'm not 100% sure it can be done, but since (IIUC) the bug/problem was about parsing too many things, I would expect that would manifest itself in some additional entries showing up in the parsed ast.

I'm trying to play with that command to see if anything changes there, but I haven't been able to do so yet. I'm using the following snippet.

// main.cpp
#include "Foo.h"
int main() { while (true) foo(); /*Breakpoint A*/ /*Breakpoint C*/ }

// Foo.h
#pragma once
void foo();

// Foo.cpp
#include "Foo.h"
namespace {
	struct C { void f() {} };
	struct C0 : public C {
		int c00, c01, c02;
	};
};
void foo() {
	C0 c0; // Breakpoint B
	c0.f();
}

Removing this patch and the other one, these are the GetClangDeclForDIE calls we make https://pastebin.com/AcKGCBz7.
After applying this patch, we cut that down to https://pastebin.com/BjwmbmE0.
Nevertheless, in either case our AST ends up looking the same at breakpoint 'C'.

TranslationUnitDecl 0x55fe81dff238 <<invalid sloc>> <invalid sloc>
|-FunctionDecl 0x55fe81dffb28 <<invalid sloc>> <invalid sloc> main 'int ()' extern
|-FunctionDecl 0x55fe81dffbf8 <<invalid sloc>> <invalid sloc> foo 'void ()' extern
|-NamespaceDecl 0x55fe81dffc98 <<invalid sloc>> <invalid sloc>
| `-CXXRecordDecl 0x55fe81dffd20 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct C0
`-UsingDirectiveDecl 0x55fe81dffec0 <<invalid sloc>> <invalid sloc> Namespace 0x55fe81dffc98 ''

I'm not sure if there's a way to have the AST reflect the extra calls (but I also know very little about how that part of the code works). Can you guys think of a piece of code that could do that?

Thanks!

This looks like a good approach to me. Pavel?

This looks like a good approach to me. Pavel?

Even without tests? Just to clarify my last comment, I could not find a way to use the command "target modules dump as" to ensure we won't have a regression in the future. Apparently, with or without the extra calls, that command's output looks the same.

This looks like a good approach to me. Pavel?

Even without tests? Just to clarify my last comment, I could not find a way to use the command "target modules dump as" to ensure we won't have a regression in the future. Apparently, with or without the extra calls, that command's output looks the same.

I would prefer to have tests. I was just saying that the code looked good, and it is now time to proceed with finding a way to test.

This looks like a good approach to me. Pavel?

Even without tests? Just to clarify my last comment, I could not find a way to use the command "target modules dump as" to ensure we won't have a regression in the future. Apparently, with or without the extra calls, that command's output looks the same.

I would prefer to have tests. I was just saying that the code looked good, and it is now time to proceed with finding a way to test.

Oh, got it. Thanks for the clarification.

guiandrade updated this revision to Diff 219372.Sep 9 2019, 9:41 AM
guiandrade edited the summary of this revision. (Show Details)

Falling back to the previous workaround to test EnsureAllDIEsInDeclContextHaveBeenParsed.

Maybe we could use the previous workaround until we find something better?

LGTM. Test seems about as good as we can do. Pavel, you ok with this now?

labath accepted this revision.Sep 11 2019, 2:40 AM

looks fine to me

This revision is now accepted and ready to land.Sep 11 2019, 2:40 AM

looks fine to me

Thank you, guys! Could you please submit it for me?

labath closed this revision.Sep 24 2019, 5:52 AM

Thank you, guys! Could you please submit it for me?

Landed as r370374. Sorry about the delay.