This is an archive of the discontinued LLVM Phabricator instance.

[LLDB][NativePDB] Switch to use DWARFLocationList.
ClosedPublic

Authored by zequanwu on Jul 29 2022, 12:36 PM.

Details

Summary

Before, NativePDB uses scoped range as a workaround for value range, that causes
problems (e.g. a variable's value can only have one range, but usually a
variable's value is located at different address ranges, each at different
locations, in optimized build).
This patch let NativePDB switch to DWARFLocationList so a variable's value can
be described at multiple non-overlapped address ranges and each range maps to a
location.
Because overlapped address ranges sometimes exist in codeview and
DWARFLocationList doesn't allow overlapped ranges (otherwise internal binary
search may fail), here's implementation strategy:

  1. Always prefer whole value locations. Suppose a variable size is 8 bytes, one record is that for range [1, 5) first 4 bytes is at ecx, and another record is that for range [2, 8) the 8 bytes value is at rdx. This results: [1, 2) has first 4 bytes at ecx, [2, 8) has the whole value at rdx.
  2. Always prefer the locations parsed later. Suppose first record is that for range [1, 5) value is at ecx, second record is that for range [2, 6) value is at eax. This results: [1, 2) -> ecx, [2, 6) -> eax.

Diff Detail

Event Timeline

zequanwu created this revision.Jul 29 2022, 12:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 12:36 PM
zequanwu requested review of this revision.Jul 29 2022, 12:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 12:36 PM
labath added a comment.Aug 1 2022, 5:25 AM

The general idea makes sense to me, although I haven't tried to understand the pdb parsing code.

(otherwise internal binary search may fail)

The current search algorithm is buggy, but I think the RangeDataVector class has always intended to support overlapping ranges. If you find yourself needing to do extra work to work its limitations, we should fix that algorithm instead. (However, I know that the generic class will not be able to express the preference for expressions that describe the full value vs. those that describe only a part of it, so I'll believe you if you say that some outside filtering/deduplication is still needed.)

lldb/source/Expression/DWARFExpressionList.cpp
37–39 ↗(On Diff #448702)

What is this attempting to check? That the list does not contain an overlapping entry? That hardly seems like a correct algorithm...

If you find yourself needing to do extra work to work its limitations, we should fix that algorithm instead.

That makes sense. I'll work on fixing RangeVectorData.

lldb/source/Expression/DWARFExpressionList.cpp
37–39 ↗(On Diff #448702)

RangeDataVector::Append just append the new entry to the end of vector. It has no idea if overlaps happens or not.

zequanwu added a comment.EditedAug 1 2022, 3:06 PM

If you find yourself needing to do extra work to work its limitations, we should fix that algorithm instead.

That makes sense. I'll work on fixing RangeVectorData.

After thinking it again, not just RangeVectorData need to be written but also RangeVector, although they are similar. It requires rewrite the whole file RangeMap.h and all other places uses its APIs.
Can we just use RangeVectorData as it is right now and fix it later? So, this patch is not blocked. Otherwise, this local variables functionality is almost unusable for NativePDB plugin when multiple ranges exist.

labath added a comment.Aug 3 2022, 5:58 AM

If you find yourself needing to do extra work to work its limitations, we should fix that algorithm instead.

That makes sense. I'll work on fixing RangeVectorData.

After thinking it again, not just RangeVectorData need to be written but also RangeVector, although they are similar. It requires rewrite the whole file RangeMap.h and all other places uses its APIs.

Why is that? I don't think we need to change the API -- just the way it is implemented. And the RangeVectorData class already contains the upper_bound argument, which should make it possible to find the element containing a value quickly. That basically means taking the algorithm in FindEntryIndexesThatContain, but making it stop at the first element found.

And you can just ignore the RangeVector class. That problem (two separate classes) can be handled separately and holistically.

Alternatively, if we don't want to/need to support overlaps, we could switch to some completely different class, which implements searches correctly. llvm::IntervalMap, for example.

Can we just use RangeVectorData as it is right now and fix it later? So, this patch is not blocked. Otherwise, this local variables functionality is almost unusable for NativePDB plugin when multiple ranges exist.

Maybe.. However I don't want to end up with two broken overlap-detection algorithms (see inline comment) instead of one good one.

lldb/source/Expression/DWARFExpressionList.cpp
37–39 ↗(On Diff #448702)

I get that. My point is that this is not the right way to check for overlaps. It won't catch (1,10) and (4,6) for instance (depending on the order in which you insert them).

If you find yourself needing to do extra work to work its limitations, we should fix that algorithm instead.

That makes sense. I'll work on fixing RangeVectorData.

After thinking it again, not just RangeVectorData need to be written but also RangeVector, although they are similar. It requires rewrite the whole file RangeMap.h and all other places uses its APIs.

Why is that? I don't think we need to change the API -- just the way it is implemented. And the RangeVectorData class already contains the upper_bound argument, which should make it possible to find the element containing a value quickly. That basically means taking the algorithm in FindEntryIndexesThatContain, but making it stop at the first element found.

And you can just ignore the RangeVector class. That problem (two separate classes) can be handled separately and holistically.

Alternatively, if we don't want to/need to support overlaps, we could switch to some completely different class, which implements searches correctly. llvm::IntervalMap, for example.

Oh, I didn't notice that FindEntryIndexesThatContain exist until now. This function indicates that it allows multiple overlapping ranges exist. I originally thought that it doesn't make sense to have overlapping ranges in variable locations and it must be a bug in debug info generation. But now what you are saying at here, "it means all of them are valid", makes sense to me. FindEntryIndexThatContains does the exact thing you mentioned.

zequanwu updated this revision to Diff 450408.Aug 5 2022, 2:28 PM
  • Recurse into parent classes when filling the offset to size map.
  • Previously the range parsed earlier is preferred, now it's the range parsed later is preferred, since it makes the code cleaner and it doesn't really matter as long as it's consistent. Although overlapping ranges are allowed in RangeDataVector, I'm still trying to keep the RangeMap contains non-overlapping ranges, since only whole value ranges are allowed to overlap with each other, and subfield ranges should not overlap with each other or whole value ranges. It's easier just to keep all ranges non-overlapping.
zequanwu edited the summary of this revision. (Show Details)Aug 5 2022, 2:29 PM
labath accepted this revision.Aug 8 2022, 5:14 AM
labath added inline comments.
lldb/include/lldb/Utility/RangeMap.h
54

std::min(s, size)?

This revision is now accepted and ready to land.Aug 8 2022, 5:14 AM
rnk added inline comments.Aug 10 2022, 11:21 AM
lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
771–774

I'm a bit worried about performance. This code runs for every S_LOCAL record. So, every time we encounter a std::string local variable, we walk over the entire string class hierarchy field list to compute this map, which we may or may not need later.

This code is pretty thorough, but can we reduce the scope of this patch by ignoring subfield records stored in memory, since they lack size information? I think that would make it easier to review and test. Just focus on variables in registers, and subfields in registers, since those are easiest to test and understand.

zequanwu added inline comments.Aug 10 2022, 5:38 PM
lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
771–774

std::__1::basic_string<char,std::__1::char_traits<char>,std::__1::allocator<char> > (aka std::string) has a very shallow class hierarchy, only 1 parent ,std::__1::__basic_string_common<1>, which is the top in class hierarchy.
I agree that it's doing duplicate work by revisiting the parent class multiple times, but keeping the extra info could consume more memory, doesn't seems worth it. I think it shouldn't be much a performance problem most time.

but can we reduce the scope of this patch by ignoring subfield records stored in memory, since they lack size information?

Yes, we could, but it's more tedious. llvm/include/llvm/DebugInfo/CodeView/CodeViewRegisters.def has all the register names, then we need to map each register name to its the corresponding byte size.

zequanwu updated this revision to Diff 452031.Aug 11 2022, 4:29 PM
  • Add missing test case for simple type because subfield_register_simple_type.s is deleted, forgot to add it before.
  • Address rnk's comment by only accepting subfield locations that are on register, because we don't have the size information if the subfield is in memory. Previous algorithm was not correct, removed in the update.
zequanwu marked an inline comment as done.Aug 11 2022, 4:31 PM
zequanwu added inline comments.
lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
771–774

I updated to only accept subfield locations that are on register. I didn't realize it's that complicated until I tried to add a test case for it.

rnk accepted this revision.Aug 17 2022, 12:25 PM

Looks good to me!

This revision was automatically updated to reflect the committed changes.
zequanwu marked an inline comment as done.

FYI, this change caused a strange regression for reading PDB files in i386 mode, when LLDB is built in mingw mode (not when it is built in MSVC mode), see https://github.com/llvm/llvm-project/issues/57799 for details.

lldb/test/Shell/SymbolFile/NativePDB/local-variables-registers.s