This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Dataformatters] Adapt C++ std::string dataformatter for D128285
ClosedPublic

Authored by mib on Jun 27 2022, 5:40 PM.

Details

Summary

[lldb/Dataformatters] Adapt C++ std::string dataformatter for D128285

This patch changes the C++ std::string dataformatter to reflect
internal layout changes following D128285.

Now, in short-mode strings, in order to access the __size_ and
__is_long_ attributes, we need to access a packed anonymous struct,
which introduces another indirection.

We need to do the same in order to access the __cap_ field for
long-mode strings.

This should fix the various test failures that are happening on
GreenDragon:

https://green.lab.llvm.org/green/job/lldb-cmake/44918/

rdar://96010248

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

Diff Detail

Event Timeline

mib created this revision.Jun 27 2022, 5:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 5:40 PM
mib requested review of this revision.Jun 27 2022, 5:40 PM
aprantl added inline comments.Jun 27 2022, 5:50 PM
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
638

Let me know if I',m misunderstanding what the code is doing, but this looks like it is replacing the previous implementation? Ideally we would detect the layout and then parse it correctly depending on which version we're dealing with. Otherwise we risk breaking the matrix bot that checks that LLDB can debug C++ produced by older versions of LLVM (and by extension libcxx).

mib added inline comments.
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
638

I've look at D12828 and D123580, and I don't see any way of versioning these changes ... may be @ldionne have an idea on how we could do this properly ?

Also, in D124113, @labath mentions that this data formatter uses mostly indices to parse and access the various fields of the type data structure (because it uses some anonymous structs). This makes it very fragile on our end because our data formatter break every time they make a change in the layout ...

@aprantl, I'll update the line your pointed at to the the field identifier instead of using changing the index while waiting for a better way to version this.

mib added inline comments.Jun 27 2022, 6:10 PM
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
638

@aprantl, I'll update the line you pointed at to *use* the field identifier instead of using changing the index, while waiting for a better way to version this.

mib updated this revision to Diff 440452.Jun 27 2022, 6:31 PM
mib edited the summary of this revision. (Show Details)

Use field identifiers instead of indices and add long-mode support.

ldionne added inline comments.Jun 28 2022, 8:02 AM
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
638

I don't see a way to version this. You don't have access to the value of macros that were defined when the executable was compiled, right? If you did, you could check _LIBCPP_VERSION (1400 = old implementation, 1500 = current implementation). I'm very naive when it comes to debuggers but I assume that's not possible.

JDevlieghere accepted this revision.Jun 28 2022, 3:10 PM

I agree with Adrian that we should try to support both the old and new layout. Is there a way we can fall back? I think it's fine to land the patch as is to unblock the bots and add the fallback later.

This revision is now accepted and ready to land.Jun 28 2022, 3:10 PM

We're seeing a number of LLDB test failures from this change. If I'm understanding correctly, that's working by design because we're building the test code with a version of LLVM that's from a revision that's not quite as recent as the LLDB code we're testing.

This seems like a substantial problem. If I release this LLDB as my company's internal build, and one of my users wants to debug a failing application that happens to have been built before the corresponding internal LLVM release -- or they happen to build their application on a release branch pinned to an older LLVM build -- they won't have a working string data-formatter, right?

labath added inline comments.Jun 30 2022, 8:00 AM
lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp
638

There is no fundamental problem in versioning this. Using the macro is not the best idea (some compilers, in some build modes, may emit it, but I wouldn't want to rely on it), but pretty much anything else is fair game. The presence or absence of an anonymous struct is recorded in the debug info, so using that to decide on the layout is definitely doable (and my follow-up patch does that).

However, I would say that the real, and totally self-inflicted, problem here (besides the fact that we want to format all possible string layouts with a single piece of code) is that the API we're using here does not treat anonymous classes transparently. Ideally, this kind of change shouldn't even be noticed. If I use the user-facing "frame var" command to look inside the string object, then the expression __r_.__value_.__s.__is_long_ works regardless of how many anonymous classes it has to go through. The fact that we have to worry about _that_ here, when there's already so many other things (v1 vs v2, historical layouts, corrupt strings, corrupt debug info, ...) to worry about is just bad.