This is an archive of the discontinued LLVM Phabricator instance.

[lldb][DWARF] Infer no_unique_address attribute
Needs ReviewPublic

Authored by kpdev42 on Feb 5 2023, 11:53 AM.

Details

Summary

The no_unique_address attribute is currently a part of C++20 standard and is being used in both libstdc++ and libc++. This causes problems when debugging C++ code with lldb because the latter doesn't expect two non-empty base classes having the same offset and crashes with assertion. Patch attempts to infer this attribute by detecting two consecutive base classes with the same offset to derived class and marking first of those classes as empty and adding no_unique_address attribute to all of its fields.

~~

Huawei RRI, OS Lab

Diff Detail

Event Timeline

kpdev42 created this revision.Feb 5 2023, 11:53 AM
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
kpdev42 requested review of this revision.Feb 5 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 11:53 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Michael137 added a comment.EditedFeb 5 2023, 5:48 PM

From a first glance looks ok but ideally the clang changes would be separate reviews. E.g.,

  1. Adding markEmpty()
  2. ASTImporter change with unit-test
  3. LLDB changes
clayborg edited reviewers, added: JDevlieghere, jingham; removed: k8stone.Feb 6 2023, 2:37 PM

Changes look fine to me. I would like someone that specializes in the expression parser to give the final ok though.

lldb/source/Plugins/ExpressionParser/Clang/ClangASTImporter.cpp
551

revert whitespace only changes

balazske added inline comments.
clang/lib/AST/ASTImporter.cpp
3896

The import of attributes is handled in function ASTImporter::Import(Decl*). This new line will probably copy all attributes, that may not work in all cases dependent on the attribute types. This may interfere with the later import of attributes, probably these will be duplicated. What was the need for this line? (Simple attributes that do not have references to other nodes could be copied at this place.)

kpdev42 added inline comments.Feb 20 2023, 8:43 PM
clang/lib/AST/ASTImporter.cpp
3896

Unfortunately it is too late to copy attribute in ASTImporter::Import(Decl*), because field has already been added to record in a call to ImportImpl (VisitFieldDecl/addDeclInternal). I've reused the current way of cloning attributes in VisitFieldDecl.

kpdev42 updated this revision to Diff 507272.Mar 22 2023, 2:12 AM

Update patch after landing D145057

Michael137 added inline comments.Mar 30 2023, 10:08 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
1482

Should we add a comment describing why this block is necessary?

Perhaps even putting it into its own helper function

1487

The idea seem reasonable in general.

Though this wouldn't work for overlapping member offsets. E.g.,:

struct C
{
 long c,d;
};

struct D
{
};

struct B
{
  [[no_unique_address]] D x;
};

struct E
{
  [[no_unique_address]] D x;
};

struct Foo : B, E, C {

};

Here B and C have offset 0x0, but E can have offset 0x1. So we wouldn't attach the attribute for E and would still crash. Maybe we should check for overlapping offsets, not just equal

kpdev42 updated this revision to Diff 511362.Apr 6 2023, 4:12 AM

Address review comments

kpdev42 marked 2 inline comments as done.Apr 6 2023, 4:12 AM
Michael137 added inline comments.Apr 6 2023, 9:24 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
1485
1487

The main problem I still see with this is that if we have something like:

struct A : C, B {

};

then we mark C's fields as empty and leave B as is. This still leads to the same crash later on.

Perhaps we should mark we could check the size of the struct and decide based on that which one is the "empty" one

1502

I think it would still be nice to have this as a private helper function on DWARFASTParserClang. But don't feel very strongly about it so feel free to ignore

lldb/test/API/types/TestEmptyBase.py
3

Description needs fixing

23

Ideally this test would live in lldb/test/API/lang/cpp/no_unique_address/

The usual structure is that we have a Makefile in the same directory which consists of:

CXX_SOURCES := main.cpp
                       
include Makefile.rules

Then in the test(self) you just call self.build(). You don't need the setUp(self) and setTearDownCleanup(...) calls then.

Check lldb/test/API/commands/expression/inline-namespace/TestInlineNamespace.py for example.

lldb/test/API/types/empty_base_type.cpp
2

Can we add more test cases. E.g.,:

struct A : C, B {
 ...
};
4

Just for sanity checking in the test

24

same as above

Michael137 added inline comments.Apr 6 2023, 9:45 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
1487

Interestingly there was a discussion on the DWARF mailing list about this some time ago: https://www.mail-archive.com/dwarf-discuss@lists.dwarfstd.org/msg00880.html

There might be room to changing the emitted DWARF to make it easier to determine the empty structure. I will gauge opinions on this thread later today

Michael137 added inline comments.Apr 6 2023, 9:49 AM
lldb/test/API/types/TestEmptyBase.py
26–43

FYI, this bugfix was attempted back in 2021 here: https://reviews.llvm.org/D101237

kpdev42 updated this revision to Diff 511683.Apr 7 2023, 7:04 AM
kpdev42 edited the summary of this revision. (Show Details)

Address review comments

kpdev42 added inline comments.Apr 7 2023, 7:08 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
1487

Unfortunately we cannot analyze record size, because it is always 1 for empty records, whether or not [[no_unique_address]] is used. However we still can analyze field offsets, I think. This what an updated patch does and it seems to handle more different cases

kpdev42 added inline comments.Apr 7 2023, 7:11 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
1487

Yes, this will help a lot, however many people use older versions of clang compiler and also gcc. This fix might be useful for them so far

Michael137 added inline comments.
lldb/test/API/lang/cpp/no_unique_address/main.cpp
38 ↗(On Diff #511683)

I haven't checked the reworked logic yet, but it still crashes on this:

self.expect_expr("_f3")

I'm leaning towards not trying to support older compilers if it gets too complicated. Proposing a DW_AT_no_unique_address seems like the best option to me since that's pretty trivial to implement and requires almost no support on the LLDB side.

CC: @dblaikie @aprantl (since both were part of the original discussion)

kpdev42 updated this revision to Diff 512945.Apr 12 2023, 1:02 PM

Thanks for pointing out the bug @Michael137 . It seems that clang assigns arbitrary offsets for non_unique_address so analyzing them brings me nowhere. In this patch I tried assigning no_unique_address to every empty structure and this fixed issue you found, making the code changes much smaller and simpler. The lldb test suite pass for me as well

Michael137 added inline comments.Apr 13 2023, 5:38 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2215

Generally I'm not sure if attaching a clang::NoUniqueAddressAttr to every empty field is the right approach. That goes slightly against our attempts to construct an AST that's faithful to the source to avoid unpredictable behaviour (which isn't always possible but for the most part we try). This approach was considered in https://reviews.llvm.org/D101237 but concern was raised about it affecting ABI, etc., leading to subtle issues down the line.

Based on the the discussion in https://reviews.llvm.org/D101237 it seemed to me like the only two viable solutions are:

  1. Add a DW_AT_byte_size of 0 to the empty field
  2. Add a DW_AT_no_unique_address

AFAICT Jan tried to implement (1) but never seemed to be able to fully add support for this in the ASTImporter/LLDB. Another issue I see with this is that sometimes the byte-size of said field is not 0, depending on the context in which the structure is used.

I'm still leaning towards proposing a DW_AT_no_unique_address. Which is pretty easy to implement and also reason about from LLDB's perspective. @dblaikie @aprantl does that sound reasonable to you?

2222

Typically the call to record_decl->fields() below would worry me, because if the decl !hasLoadedFieldsFromExternalStorage() then we'd start another ASTImport process, which could lead to some unpredictable behaviour if we are already in the middle of an import. But since CompleteTagDeclarationDefinition sets setHasLoadedFieldsFromExternalStorage(true) I *think* we'd be ok. Might warrant a comment.

2234

Why do we need to mark the parents empty here again? Wouldn't they have been marked in ParseStructureLikeDIE?

kpdev42 added inline comments.Apr 14 2023, 3:04 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
2215

I think that lldb jitter relies on value of DW_AT_member location when/if empty structure address is taken, so assigning no_unique_address attribute shouldn't, in my opinion, affect anything. Also, as I understand, AST obtained from DWARF will not (and cannot) be identical to the source (e.g. because of optimizations)

2222

We can probably use keys of DenseMap in layout_info.base_offsets to stay safe, can't we?

2234

ParseStructureLikeDIE marks only trivially empty records (with no children or with only children being template parameters). All non-trivially empty structs (with only children being other empty structs) are marked here.

So what are next steps? Are we going for implementation of DW_AT_no_unique_address (which is going to be a non-standard extension) ?
@dblaikie @aprantl @Michael137

So what are next steps? Are we going for implementation of DW_AT_no_unique_address (which is going to be a non-standard extension) ?
@dblaikie @aprantl @Michael137

in my opinion that would be the most convenient way forward. Perhaps something like DW_AT_LLVM_no_unique_address even?

Btw what does GCC/gdb do in this case?

If we can come up with a counterexample where the heuristic in this patch is doing the wrong thin then I think emitting a DW_AT_LLVM_no_unique_address attribute sounds reasonable to me. But otherwise I don't see a problem with using a heuristic.