This is an archive of the discontinued LLVM Phabricator instance.

3/3: Fix DWARFUnit::GetUnitDIEPtrOnly stale pointer
ClosedPublic

Authored by jankratochvil on May 13 2018, 12:14 PM.

Details

Summary
+  // GetUnitDIEPtrOnly() needs to return pointer to the first DIE.
+  // But the first element of m_die_array after ExtractDIEsIfNeeded(true)
+  // may move in memory after later ExtractDIEsIfNeeded(false).

I haven't tried to reproduce it. DWARFDebugInfoEntry::collection m_die_array is std::vector, its data may move during its expansion.
I would not mind but I have found I cannot make the code thread-safe for D40470 when it looks already incorrect to me.

Diff Detail

Event Timeline

jankratochvil created this revision.May 13 2018, 12:14 PM

I think I understand what is going on here, but I can't say for certain. Could you elaborate on the implementation details of this somewhere. It would be good to keep a note of this for future maintainers. My understanding of this is:

  • if nothing has been parsed, m_die_array is empty and m_first_die is empty
  • if the cu die has been parsed, m_die_array us empty and m_first_die is full
  • if everything has been parsed m_first_die and m_die_array are full, and the first element of m_die_array contains a copy of m_first_die

Is that an accurate description of what you are doing here? (If that is true, then I think this is workable, but there are still some details which need to be ironed out; e.g., m_first_die.GetFirstChild())

  • if nothing has been parsed, m_die_array is empty and m_first_die is empty
  • if the cu die has been parsed, m_die_array us empty and m_first_die is full
  • if everything has been parsed m_first_die and m_die_array are full, and the first element of m_die_array contains a copy of m_first_die

Is that an accurate description of what you are doing here?

Yes. I will add a comment to the code.

(If that is true, then I think this is workable, but there are still some details which need to be ironed out; e.g., m_first_die.GetFirstChild())

The current code calling GetUnitDIEOnly() (returning DWARFDIE) or GetUnitDIEPtrOnly (returning DWARFDebugInfoEntry *) is never dealing with child DIEs of what it gets - it also makes sense as there is no guarantee they have been read in. Such code would already cause ASAN error accessing memory behind std::vector. So I find such assertion check orthogonal to this patch but I will prepare another patch for that.

Thanks for checking validity of this patch, I was not sure whether LLDB code is intended to be thread-safe in the first place.

(If that is true, then I think this is workable, but there are still some details which need to be ironed out; e.g., m_first_die.GetFirstChild())

The current code calling GetUnitDIEOnly() (returning DWARFDIE) or GetUnitDIEPtrOnly (returning DWARFDebugInfoEntry *) is never dealing with child DIEs of what it gets - it also makes sense as there is no guarantee they have been read in.

I am not sure it's that simple. I've found at least one case (SymbolFileDWARF::ParseImportedModuleswhere we call GetFirstChild() on the value returned by GetUnitDIEOnly` (there may be others which are not easily greppable). Previously that would work if one would call this only after he'd know that all DIEs have been parsed. Now this will never work because GetFirstChild will return whatever is in the memory after m_first_die. I am not sure if this would be caught by asan straight away, though it will most likely cause a crash very soon.

I was thinking of making this safer by changing the GetUnitDIEOnly so that the caller has to explicitly request (either with an argument, or by splitting it into two functions) whether it wants a CU die, which can be used to access other DIEs, or just a bare attribute-only DIE. In second case, we would return &m_first_die, in the first case &m_die_array[0] (after making sure it's properly initialized). Then m_first_die can have has_children property set to false to enforce that noone accesses children that way.

WDYT?

Thanks for checking validity of this patch, I was not sure whether LLDB code is intended to be thread-safe in the first place.

Yeah, thread safety is tricky. I think the DWARF parser was very single-threaded originally. Then, when we started building the index in a multi-threaded manner we needed to make a bunch of code thread-safe, which wasn't before. Now it looks like you are introducing even paralelization at an even deeper level (can't say I understand full what that means yet), so we'll need to make more code thread-safe.

I am not sure it's that simple. I've found at least one case (SymbolFileDWARF::ParseImportedModuleswhere we call GetFirstChild() on the value returned by GetUnitDIEOnly`

OK, that is clearly a bug there, thanks for finding it. I see some new assertion check would be good (I may try even a compile time check by some new class wrapper). SymbolFileDWARF::ParseImportedModules should just call DIE().

DWARFDIE GetUnitDIEOnly() -> DWARFDebugInfoEntry *GetUnitDIEPtrOnly() -> ExtractDIEsIfNeeded(true ) -> CU DIE only
DWARFDIE        DIE    () -> DWARFDebugInfoEntry *       DIEPtr    () -> ExtractDIEsIfNeeded(false) -> all DIEs

I was thinking of making this safer by changing the GetUnitDIEOnly so that the caller has to explicitly request (either with an argument, or by splitting it into two functions) whether it wants a CU die, which can be used to access other DIEs, or just a bare attribute-only DIE. In second case, we would return &m_first_die, in the first case &m_die_array[0] (after making sure it's properly initialized).

But there is already such function, it is called DIE(), we need no new parameter. Maybe we should rename those functions, their current names are cryptic, what about changing the second line to:

DWARFDIE GetUnitDIEOnly() -> DWARFDebugInfoEntry *GetUnitDIEPtrOnly() -> ExtractDIEsIfNeeded(true ) -> CU DIE only
DWARFDIE GetUnitAllDIEs() -> DWARFDebugInfoEntry *GetUnitAllDIEsPtr() -> ExtractDIEsIfNeeded(false) -> all DIEs

Then m_first_die can have has_children property set to false to enforce that noone accesses children that way.

I would prefer an assert (or even a compilation error by some wrapper class) in such case but I agree this may be enough.

clayborg requested changes to this revision.May 14 2018, 9:49 AM

So this problem exists both in the LLDB and LLVM DWARF parsers. I am not sure this fix is safe. I would rather fix this by fixing DWARFDIE class to "do the right thing". We should be able to teach the DWARFDIE class to replace its "m_die" with the updated "m_die" if a method ever causes DWARFDIE to need to expand all DIEs in a DWARFUnit. That seems like a much safer fix. Having m_first_die is not safe because it if you call DWARFDIE::GetFirstChild() it will just add 1 to the "m_die" and we will crash. All parent, sibling and child code just do pointer arithmetic to find their counterparts. So since DWARFDIE has the "DWARFUnit *m_cu;" and "DWARFDebugInfoEntry *m_die;" we should use DWARFDIE to abstract this from users. Anyone playing directly with DWARFDebugInfoEntry must know the rules and do the right thing or just use DWARFDIE.

This revision now requires changes to proceed.May 14 2018, 9:49 AM

Trying to be smart while being lazy and multithreaded is going to make the code complicated (possibility for bugs) and/or introduce a lot of locking overhead.

A lot simpler solution is to let the caller decide if it want's the full CU or just the root DIE, and then make it hard for him to get this wrong. This shouldn't be too much to ask, as that's pretty much what we have now, without the second part. The second part can be achieved by changing the GetUnitDIEOnly function to return a different type, say DWARFBasicDIE which does not even have child accessor functions. Then DWARFDIE can inherit from that and add the extra child/parent accessor methods. This way it will be impossible to get it wrong as you'll get a compile error if you try to access the children of a unit-only die.

jankratochvil retitled this revision from Fix DWARFUnit::GetUnitDIEPtrOnly stale pointer to 3/3: Fix DWARFUnit::GetUnitDIEPtrOnly stale pointer.

So this problem exists both in the LLDB and LLVM DWARF parsers. I am not sure this fix is safe. I would rather fix this by fixing DWARFDIE class to "do the right thing". We should be able to teach the DWARFDIE class to replace its "m_die" with the updated "m_die" if a method ever causes DWARFDIE to need to expand all DIEs in a DWARFUnit. That seems like a much safer fix. Having m_first_die is not safe because it if you call DWARFDIE::GetFirstChild() it will just add 1 to the "m_die" and we will crash. All parent, sibling and child code just do pointer arithmetic to find their counterparts. So since DWARFDIE has the "DWARFUnit *m_cu;" and "DWARFDebugInfoEntry *m_die;" we should use DWARFDIE to abstract this from users. Anyone playing directly with DWARFDebugInfoEntry must know the rules and do the right thing or just use DWARFDIE.

Is this statement still valid now with DWARFBaseDIE?

A better solution here would be to have two functions: one for parsing the Unit DIE only and one for parsing all DIEs:

class DWARFUnit {
  void ExtractUnitDIEIfNeeded();
  size_t ExtractDIEsIfNeeded();
}

Then the code becomes much simpler, we don't need the "m_die_array_size" function and logic is much cleaner,

DWARFUnit::ExtractUnitDIEIfNeeded() will extract into m_first_die only and won't touch m_die_array at all. DWARFUnit::ExtractDIEsIfNeeded() will extract all DIEs into m_die_array. Then there is no need for worrying about m_die_array.size().

So this problem exists both in the LLDB and LLVM DWARF parsers. I am not sure this fix is safe. I would rather fix this by fixing DWARFDIE class to "do the right thing". We should be able to teach the DWARFDIE class to replace its "m_die" with the updated "m_die" if a method ever causes DWARFDIE to need to expand all DIEs in a DWARFUnit. That seems like a much safer fix. Having m_first_die is not safe because it if you call DWARFDIE::GetFirstChild() it will just add 1 to the "m_die" and we will crash. All parent, sibling and child code just do pointer arithmetic to find their counterparts. So since DWARFDIE has the "DWARFUnit *m_cu;" and "DWARFDebugInfoEntry *m_die;" we should use DWARFDIE to abstract this from users. Anyone playing directly with DWARFDebugInfoEntry must know the rules and do the right thing or just use DWARFDIE.

Is this statement still valid now with DWARFBaseDIE?

Statement isn't valid, but we should cleanup the DIE parsing code so we have dedicated parsing for the unit DIE only and for all DIEs and remove the m_die_array_size() function.

+1 for the cleanup. The m_die_array_size function is just weird

source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
312

You are ignoring the keep_compile_unit_die argument here. Is that intentional? (I would be fine with not being able to clear the unit die, but if we got that way, then we should remove this argument altogether.)

jankratochvil added inline comments.May 25 2018, 1:12 AM
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
312

I did not notice, true. But there is no caller of ClearDIEs(false) anyway. And with m_first_die it has even no longer any benefit to clear even the first DIE.

Split out ExtractUnitDIEIfNeeded() out of ExtractDIEsIfNeeded().

clayborg accepted this revision.May 29 2018, 9:26 AM

Only question is if we remove the extra empty scope as noted in inline comments. Looks good.

source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
59

Do we need this empty scope? Remove?

This revision is now accepted and ready to land.May 29 2018, 9:26 AM
jankratochvil marked an inline comment as done.May 29 2018, 9:51 AM
jankratochvil added inline comments.
source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
59

There is now enclosed:

Log *log(LogChannelDWARF::GetLogIfAny(DWARF_LOG_DEBUG_INFO | DWARF_LOG_LOOKUPS));

While lower (currently moved into DWARFUnit::ExtractDIEsEndCheck()) there was:

Log *log(LogChannelDWARF::GetLogIfAll(DWARF_LOG_DEBUG_INFO));

So those two did conflict while being a different variable. I am not sure which way is right but I have removed the empty scope as it is no longer required. Then also I am not sure the log in DWARFUnit::ExtractDIEsEndCheck() should not also have | DWARF_LOG_LOOKUPS but that would be out of the scope of this patch topic.

This revision was automatically updated to reflect the committed changes.
jankratochvil marked an inline comment as done.
xiaobai added inline comments.
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
208 ↗(On Diff #148938)

This function introduces a warning because next_cu_offset is never used within this function. Did you intend to use it in the if conditional below? It looks like you call GetNextCompileUnitOffset there instead of using next_cu_offset.

jankratochvil marked an inline comment as done.May 29 2018, 11:59 AM
jankratochvil added inline comments.
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
208 ↗(On Diff #148938)

Fixed now in rL333449, sorry. Yes, you are right, it remained there from refactorizations.

GDB is using -Werror so I was not used to watch for compiler warnings.

jankratochvil marked an inline comment as done.May 30 2018, 2:00 AM

FYI I also checked in a regression (just looking at the source code) rL333517.

xbolva00 added inline comments.
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
179 ↗(On Diff #148938)

@jankratochvil is this correct assert? Our downstream lldb based on lldb 7 with our custom targets hits this assert. If we remove it, we see no obvious breakages.

xbolva00 added inline comments.Sep 27 2018, 1:03 AM
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
179 ↗(On Diff #148938)

@xbolva00 As you gave no reprodudcer it would be good to know how those two DWARFDebugInfoEntrys differ. Without that I can look at it more only next week - travelling.

labath added inline comments.Sep 30 2018, 10:22 AM
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
179 ↗(On Diff #148938)

I agree with Jan. The two dies should be the same. The fact that they aren't probably means there is a bug somewhere. It would be good to know how the two dies differ and what is the input dwarf they are generated from.

jankratochvil added inline comments.
lldb/trunk/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
179 ↗(On Diff #148938)

A fix of this assertion is now submitted as: D53255