This is an archive of the discontinued LLVM Phabricator instance.

Make DWARFParsing more thread-safe
Needs ReviewPublic

Authored by friss on Jun 20 2018, 2:03 PM.

Details

Summary

Debug information is read lazily so a lot of the state of the reader
is constructred incrementally. There's also nothing preventing symbol
queries to be performed at the same time by multiple threads. This
lead to a series of crashes where the data structures used for
bookkeeping got corrupted because of concurrent modification.

This patch uses a brute-force approach where every such data-structure
got converted to a thread-safe version. I'm happy to discuss
alternatives, and also to hear ideas about whatever preformance testing
I can do to asses the impact of this patch.

Event Timeline

friss created this revision.Jun 20 2018, 2:03 PM

Long term I think one potentially interesting possibility for solving a lot of these threading and lazy evaluation issues is to make a task queue that runs all related operations on a single thread and returns a future you can wait on. This way, the core data structures themselves do not need synchronization because they are only ever accessed from one thread. I actually have a patch in progress to make this kind of infrastructure, which I actually needed for a different reason, but it would work equally well here.

Mostly just putting this out there as an idea for down the road, not that you should do it right now.

friss added a comment.Jun 20 2018, 3:14 PM

Long term I think one potentially interesting possibility for solving a lot of these threading and lazy evaluation issues is to make a task queue that runs all related operations on a single thread and returns a future you can wait on. This way, the core data structures themselves do not need synchronization because they are only ever accessed from one thread. I actually have a patch in progress to make this kind of infrastructure, which I actually needed for a different reason, but it would work equally well here.

Mostly just putting this out there as an idea for down the road, not that you should do it right now.

It's an interesting idea, which comes with a couple of tradeoffs:

  • We would be essentially serializing the DWARF parsing. I do not know if it needs to be a goal that we're able to do efficient multi-threaded parsing, but it's an obvious consequence which would need to be discussed.
  • I have worked on a codebase relying heavily of serialized thread queues and something equivalent to futures to schedule work. Let's say that it was... interesting to debug. If we did this, we don't need to go to the same extremes though. (Also I'm wondering, would tasks running serially be allowed to enqueue other tasks and wait for them? Or would this be forbidden?)

I like the simplification aspect of it though. It is basically equivalent to adding locks at a higher abstraction level than the data-structures like I did. I was pondering that too, but I wasn't sure 1/ which level would be best and 2/ how to be sure I was not going to deadlock the whole thing.

I am not sure this will actually solve the problems you are seeing. This may avoid corrupting the internal DenseMap data structures, but it will not make the algorithm using them actually correct.
For example the pattern in ParseTypeFromDWARF is:

  1. check the "already parsed map". If the DIE is already parsed then you're done.
  2. if the map contains the magic "DIE_IS_BEING_PARSED" key, abort (recursive dwarf references)
  3. otherwise, insert the "DIE_IS_BEING_PARSED" key into the map
  4. do the parsing, which potentially involves recursive ParseTypeFromDWARF calls
  5. insert the parsed type into the map

What you do is make each of the steps (1), (3), (5) atomic individually. However, the whole algorithm is not correct unless the whole sequence is atomic as a whole. Otherwise, if you have two threads trying to parse the same DIE (directly or indirectly), one of them could see the intermediate DIE_IS_BEING_PARSED and incorrectly assume that it encountered recursive types.

So, I think that locking at a higher level would be better. Doing that will certainly be tricky though...

I am not sure this will actually solve the problems you are seeing. This may avoid corrupting the internal DenseMap data structures, but it will not make the algorithm using them actually correct.
For example the pattern in ParseTypeFromDWARF is:

  1. check the "already parsed map". If the DIE is already parsed then you're done.
  2. if the map contains the magic "DIE_IS_BEING_PARSED" key, abort (recursive dwarf references)
  3. otherwise, insert the "DIE_IS_BEING_PARSED" key into the map
  4. do the parsing, which potentially involves recursive ParseTypeFromDWARF calls
  5. insert the parsed type into the map

What you do is make each of the steps (1), (3), (5) atomic individually. However, the whole algorithm is not correct unless the whole sequence is atomic as a whole. Otherwise, if you have two threads trying to parse the same DIE (directly or indirectly), one of them could see the intermediate DIE_IS_BEING_PARSED and incorrectly assume that it encountered recursive types.

We need to make #1 atomic.
#2 would need to somehow know if the type is already being parsed recursively by the current thread. If so, then do what we do now. If not, we need a way to wait on the completion of this type so the other parsing thread can complete it and put it into the map, at which time we grab the right value from the map
So #6 step would need to be added so after we do put it into the map, we can notify other threads that might be waiting

So, I think that locking at a higher level would be better. Doing that will certainly be tricky though...

zturner added a subscriber: friss.Jun 21 2018, 7:48 AM

Related question: Is the laziness done to save memory, startup time, or
both?

I am not sure this will actually solve the problems you are seeing. This may avoid corrupting the internal DenseMap data structures, but it will not make the algorithm using them actually correct.
For example the pattern in ParseTypeFromDWARF is:

  1. check the "already parsed map". If the DIE is already parsed then you're done.
  2. if the map contains the magic "DIE_IS_BEING_PARSED" key, abort (recursive dwarf references)
  3. otherwise, insert the "DIE_IS_BEING_PARSED" key into the map
  4. do the parsing, which potentially involves recursive ParseTypeFromDWARF calls
  5. insert the parsed type into the map

What you do is make each of the steps (1), (3), (5) atomic individually. However, the whole algorithm is not correct unless the whole sequence is atomic as a whole. Otherwise, if you have two threads trying to parse the same DIE (directly or indirectly), one of them could see the intermediate DIE_IS_BEING_PARSED and incorrectly assume that it encountered recursive types.

We need to make #1 atomic.
#2 would need to somehow know if the type is already being parsed recursively by the current thread. If so, then do what we do now. If not, we need a way to wait on the completion of this type so the other parsing thread can complete it and put it into the map, at which time we grab the right value from the map
So #6 step would need to be added so after we do put it into the map, we can notify other threads that might be waiting

It's even more complicated than that, in case you really have reference cycles, you can have multiple threads starting parsing from different points in that cycle, and getting deadlocked waiting for the DIE_IS_BEING_PARSED results from each other.

The only sane algorithm I can come up right now is to make the list of parsed dies local to each thread/parsing entity (e.g. via a "visited" list), and only update the global map once the parsing has completed (successfully or not). This can potentially duplicate some effort where one thread parses a type only to find out that it has already been parsed, but hopefully that is not going to be the common case. The alternative is some complicated resource cycle detection scheme.

It's even more complicated than that, in case you really have reference cycles, you can have multiple threads starting parsing from different points in that cycle, and getting deadlocked waiting for the DIE_IS_BEING_PARSED results from each other.

The only sane algorithm I can come up right now is to make the list of parsed dies local to each thread/parsing entity (e.g. via a "visited" list), and only update the global map once the parsing has completed (successfully or not). This can potentially duplicate some effort where one thread parses a type only to find out that it has already been parsed, but hopefully that is not going to be the common case. The alternative is some complicated resource cycle detection scheme.

I gave this a shot in D52406 but I'm afraid it's too simple to be correct. Pavel, could you give it a look and let me know whether that was what you had in mind?

A little background might help here: The lldb_private::Module lock is used to prevent multiple queries into the DWARF from stomping on each other.

Multi-threaded DWARF parsing was primarily added to speed up indexing and the only place it is used. Is that not true? Indexing is just creating name tables and the accelerator tables we need so that we can partially parse the DWARF later. No type parsing should be happening when indexing. All other accesses to the DWARF must be done via a SymbolFile API that takes the module lock to stop multiple threads from stomping on each other. So my main point is we need to use the module lock to avoid having multiple threads doing important work that will cause crashes. The only multi-threaded access to DWARF currently should be in the indexing only and no important parsing stuff should be going on. If we discover places where multiple threads are making accesses, we just need to ensure they take the module lock and everything should just work.

Related question: Is the laziness done to save memory, startup time, or both?

Both. Memory is the bigger issue here though. We want to only parse what we need when we need it so if someone was to use LLDB for symbolication, they don't incur the cost of parsing all types in any compile units that get touched (like would happen with GDB). Since we convert all DWARF types into clang AST types, parsing more types than needed can cause a lot of memory to be used.

Thanks for the information, Greg!

A little background might help here: The lldb_private::Module lock is used to prevent multiple queries into the DWARF from stomping on each other.

Multi-threaded DWARF parsing was primarily added to speed up indexing and the only place it is used. Is that not true? Indexing is just creating name tables and the accelerator tables we need so that we can partially parse the DWARF later. No type parsing should be happening when indexing. All other accesses to the DWARF must be done via a SymbolFile API that takes the module lock to stop multiple threads from stomping on each other. So my main point is we need to use the module lock to avoid having multiple threads doing important work that will cause crashes.

Looking at SymbolFileDWARF, we only have two methods that take the module lock: PreloadSymbols and CompleteType.

The only multi-threaded access to DWARF currently should be in the indexing only and no important parsing stuff should be going on.

Surely there can be a lot more going on, like two debuggers executing an expression at the same time?

If we discover places where multiple threads are making accesses, we just need to ensure they take the module lock and everything should just work.

So what you're saying is that we should add the module lock to every endpoint in SymbolFileDWARF that can potentially modify our internal data structures (i.e. parsing)?

The only sane algorithm I can come up right now is to make the list of parsed dies local to each thread/parsing entity (e.g. via a "visited" list), and only update the global map once the parsing has completed (successfully or not). This can potentially duplicate some effort where one thread parses a type only to find out that it has already been parsed, but hopefully that is not going to be the common case. The alternative is some complicated resource cycle detection scheme.

I gave this a shot in D52406 but I'm afraid it's too simple to be correct. Pavel, could you give it a look and let me know whether that was what you had in mind?

Yeah, I don't think this will make things fully correct. This avoids the problem when two threads are misinterpreting the DIE_IS_BEING_PARSED markers left by the other thread. However, it is still not correct when two threads are parsing the same DIE concurrently. Imagine the following sequence of events:

  • thread A starts parsing DIE 1, checks that it still hasn't been parsed, inserts the DIE_IS_BEING_PARSED into the local map.
  • thread B does the same
  • thread A finishes parsing DIE 1. constructs a clang AST (C1) for it sets it as the map value
  • thread B does the same overwrites the value with C2
  • thread A carries on using C1
  • thread B carries on using C2

This sounds like something that is not supposed to happen, though I don't really know what the consequences of that are. Also I am not entirely convinced that the mere construction of C1 and C2 does not touch some global structures (which would not be protected by a lock).

I agree with Greg that it would be best to restrict things such that there is only one instance of parsing going on at any given time for a single module. I think this was pretty much the state we reached when this thread fizzled out the last time (there are some extra emails that are not showing in phabricator history, the interesting ones start around here: http://lists.llvm.org/pipermail/lldb-commits/Week-of-Mon-20180618/041937.html). I think the main part that needed to be resolved is whether we need to go anything special when resolving debug info references *between* modules (e.g. to prevent A/B deadlocks).

Thanks for the information, Greg!

A little background might help here: The lldb_private::Module lock is used to prevent multiple queries into the DWARF from stomping on each other.

Multi-threaded DWARF parsing was primarily added to speed up indexing and the only place it is used. Is that not true? Indexing is just creating name tables and the accelerator tables we need so that we can partially parse the DWARF later. No type parsing should be happening when indexing. All other accesses to the DWARF must be done via a SymbolFile API that takes the module lock to stop multiple threads from stomping on each other. So my main point is we need to use the module lock to avoid having multiple threads doing important work that will cause crashes.

Looking at SymbolFileDWARF, we only have two methods that take the module lock: PreloadSymbols and CompleteType.

Most of the locking happens at the SymbolVendor level IIRC. But all virtual function SymbolFile entry points need to take the module lock.

The only multi-threaded access to DWARF currently should be in the indexing only and no important parsing stuff should be going on.

Surely there can be a lot more going on, like two debuggers executing an expression at the same time?

That doesn't matter. One of them will index the DWARF. Both will be able to field requests via the virtual functions defined in lldb_private::SymbolFile (usually via the lldb_private::SymbolVendor (a class that can combine multiple object files/symbol files)) so any query must use the module lock. This is no different than two threads making two different requests from a SymbolFileDWARF.

If we discover places where multiple threads are making accesses, we just need to ensure they take the module lock and everything should just work.

So what you're saying is that we should add the module lock to every endpoint in SymbolFileDWARF that can potentially modify our internal data structures (i.e. parsing)?

Yes, any virtual lldb_private::SymbolFile entry point should lock the recursive module mutex. Check SymbolVendor as no one directly grabs a SymbolFile and uses it, they all go through the module which uses the symbol vendor. So the locking will often occur at the module level or the symbol vendor level.

I agree with Greg that it would be best to restrict things such that there is only one instance of parsing going on at any given time for a single module. I think this was pretty much the state we reached when this thread fizzled out the last time (there are some extra emails that are not showing in phabricator history, the interesting ones start around here: http://lists.llvm.org/pipermail/lldb-commits/Week-of-Mon-20180618/041937.html). I think the main part that needed to be resolved is whether we need to go anything special when resolving debug info references *between* modules (e.g. to prevent A/B deadlocks).

We don't have to worry about references between modules as no module loads _ANY_ debug info from another module. If a module only has a forward declaration to a type, we leave it as a forward declaration. Why do we do this? So we can cache modules in LLDB and use all the parsed state in the next debug session if we rerun. Lets say you have liba.so that has a type "struct Foo { int x; };" and libb.so has a "struct Foo; Foo *foo_ptr;". libb.so has debug info for "foo_ptr" that says it is a forward declaration. At expression evaluation time, we will end up digging up the correct definition for "struct Foo" from liba.so, but we must do that each time we evaluate an expression. If we change liba.so with "struct Foo { int x; int y; };" and re-run, we don't need to reload any debug info for libb.so (or keep track of any definitions we might have loaded from another file to know when liba.so changes that we need to invalidate any number of types in other shared libraries). So because of this, we don't run into the problem. The variable displaying infrastructure in ValueObject and the expression parser know how to locate types when they need to. This keeps everything clean in the debug info parsing code.

I agree with Greg that it would be best to restrict things such that there is only one instance of parsing going on at any given time for a single module. I think this was pretty much the state we reached when this thread fizzled out the last time (there are some extra emails that are not showing in phabricator history, the interesting ones start around here: http://lists.llvm.org/pipermail/lldb-commits/Week-of-Mon-20180618/041937.html). I think the main part that needed to be resolved is whether we need to go anything special when resolving debug info references *between* modules (e.g. to prevent A/B deadlocks).

We don't have to worry about references between modules as no module loads _ANY_ debug info from another module.

What about -gmodules? I don't remember where the module debug info is stored, I think it is not shared so your statement would conceptually hold, but I'm not sure anymore.

I agree with Greg that it would be best to restrict things such that there is only one instance of parsing going on at any given time for a single module. I think this was pretty much the state we reached when this thread fizzled out the last time (there are some extra emails that are not showing in phabricator history, the interesting ones start around here: http://lists.llvm.org/pipermail/lldb-commits/Week-of-Mon-20180618/041937.html). I think the main part that needed to be resolved is whether we need to go anything special when resolving debug info references *between* modules (e.g. to prevent A/B deadlocks).

We don't have to worry about references between modules as no module loads _ANY_ debug info from another module.

What about -gmodules? I don't remember where the module debug info is stored, I think it is not shared so your statement would conceptually hold, but I'm not sure anymore.

We load the -gmodules DWARF and a stand alone file and then copy the AST types over into the AST for the SymbolFileDWARF that uses them. So we are good with -gmodules.