This is an archive of the discontinued LLVM Phabricator instance.

[lldb][DWARFASTParserClang] Resolve nested types when parsing structures
AbandonedPublic

Authored by Endill on Jul 31 2023, 11:24 PM.

Details

Summary

Fixes #64291

Diff Detail

Event Timeline

Endill created this revision.Jul 31 2023, 11:24 PM
Herald added a project: Restricted Project. · View Herald Transcript
Endill requested review of this revision.Jul 31 2023, 11:24 PM
Endill added inline comments.Jul 31 2023, 11:28 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3046

I've used https://github.com/llvm/llvm-project/blob/fa2b038cadf17d08014e5fb75c47b5024860953e/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp#L2689 as a reference, and I guess I'm missing CompileUnit counterpart of this. Surprisingly to me, even with CompileUnit part missing, I was able to debug a Clang crash successfully.

This patch is intentionally aimed squarely at enums.
But I'd like lldb to address the issue of accessing compile-time constants holistically. Or at least agree on a path forward for that.

Endill retitled this revision from [WIP][lldb] Parse enums while parsing a type to [lldb] Parse enums while parsing a type.Jul 31 2023, 11:38 PM

Thanks for addressing this.

Can you add a test? Possibly in lldb/test/API/lang/cpp/enum_types/. Or lldb/test/API/lang/cpp/const_static_integral_member/

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3043

At first glance this seems OK but I'll want to check why the type doesn't get resolved when clang::Sema asks LLDB for it by name

3043

Checking locally and will report back

Michael137 added inline comments.Aug 2 2023, 1:53 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3043

Ok, so what happens for expr Outer::Enum::var is that LLDB will first resolve the Outer structure (and all its member fields, but not nested types). When clang looks for Enum, it wants to find it in a direct lookup into the Outer DeclContext. But that lookup will fail because we never created the decls for the nested type. There is no fallback to the external source in such a case unfortunately so LLDB never has the chance to correct this. See the LookupQualifiedName code path in Sema::BuildCXXNestedNameSpecifier.

So I think this proposed approach should be fine. But what I think could be better is if we just do the same for DW_TAG_enumeration_type and DW_TAG_structure_type as we do for DW_TAG_subprogram.

I.e., simply push back the type into member_function_dies (we should rename this) and then it will get resolved properly in CompleteRecordType.

Could you update the commit message with a description of the failure and summary of the fix? And change the title to something like [lldb][DWARFASTParserClang] Resolve nested types when parsing structures

lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3043

Also DW_TAG_union_type while you're here :)

Endill added inline comments.Aug 2 2023, 2:32 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3043

I can look into doing things more lazily like DW_TAG_subprogram, if you can point me to code paths for this on LLDB side:

When clang looks for Enum, it wants to find it in a direct lookup into the Outer DeclContext. But that lookup will fail because we never created the decls for the nested type. There is no fallback to the external source in such a case unfortunately so LLDB never has the chance to correct this.

Endill retitled this revision from [lldb] Parse enums while parsing a type to [lldb][DWARFASTParserClang] Resolve nested types when parsing structures.Aug 2 2023, 2:37 AM
Michael137 added inline comments.Aug 2 2023, 2:38 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
3043

The DW_TAG_subprogram path is just above this one in the switch statement (line 3030). Search for member_function_dies in this file.

if you can point me to code paths for this on LLDB side

Clang requests type completion by calling FindExternalVisibleDeclsByName (which LLDB implements). So if you break in that function you should be able to see what LLDB finds when it looks for the outer structure and the enum. But you don't necessarily need to do that. If we just fill up the member_function_dies vector with the nested enum/union/structure types like we do for DW_TAG_subprogram that is sufficient.

Are you still planning on moving this forward? Otherwise I could commandeer the revision to get this in. I do think it's a useful bug to address

Endill added a comment.Aug 9 2023, 6:48 AM

Are you still planning on moving this forward? Otherwise I could commandeer the revision to get this in. I do think it's a useful bug to address

I do. Locally I've been preparing additional changes on top of this patch that expose enums in SBType, so that I can do end-to-end test to ensure this sufficiently addresses my use case.
On top of that, I'm planning to redo this patch after the example of DW_TAG_subprogram per your suggestion.

Unfortunately, I didn't have time last week for this, and now I'm knee deep into triaging old Clang bugs. I'm not finished with that until number of open issues is brought under 20k :)

Are you still planning on moving this forward? Otherwise I could commandeer the revision to get this in. I do think it's a useful bug to address

I do. Locally I've been preparing additional changes on top of this patch that expose enums in SBType, so that I can do end-to-end test to ensure this sufficiently addresses my use case.
On top of that, I'm planning to redo this patch after the example of DW_TAG_subprogram per your suggestion.

Unfortunately, I didn't have time last week for this, and now I'm knee deep into triaging old Clang bugs. I'm not finished with that until number of open issues is brought under 20k :)

Sounds good, thanks!

I tested this patch together with the following new code:

uint32_t TypeSystemClang::GetNumMemberEnums(lldb::opaque_compiler_type_t type) {
  using EnumIt = clang::DeclContext::specific_decl_iterator<clang::EnumDecl>;
  if (!type)
    return 0;
  clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));
  clang::DeclContext *ctx = qual_type->getAsRecordDecl()->getDeclContext();
  return std::distance(EnumIt(ctx->decls_begin()), EnumIt(ctx->decls_end()));
}

CompilerType
TypeSystemClang::GetMemberEnumAtIndex(lldb::opaque_compiler_type_t type,
                                      size_t index) {
  using EnumIt = clang::DeclContext::specific_decl_iterator<clang::EnumDecl>;
  if (!type)
    return CompilerType();

  clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));
  clang::DeclContext *ctx = qual_type->getAsRecordDecl()->getDeclContext();
  size_t enum_index = 0;
  for (EnumIt enums_it(ctx->decls_begin()), enums_end(ctx->decls_end());
       enums_it != enums_end;
       ++enums_it, ++enum_index) {
      if (enum_index == index) {
        return CompilerType(weak_from_this(), *enums_it);
      }
  }
}

I created all the wrappers to make it available in Python. The result was unsatisfactory: this code doesn't even trigger DWARFASTParserClang::ParseChildMembers that this patch touches, and return 0 instead of 1. This doesn't change even if I trigger ParseChildMembers via other means before asking for a number of member enums in a type.

Code I tested this on:

using intptr_t = long;
using uintptr_t = unsigned long;

struct PointerIntPairInfo {
  enum MaskAndShiftConstants : uintptr_t {
    PointerBitMask =
        ~(uintptr_t)(((intptr_t)1 << 3) - 1),
  };

  int a{};
};

static uintptr_t dummy() {
  return PointerIntPairInfo::PointerBitMask;
}

int main()
{
    PointerIntPairInfo p;
    __builtin_debugtrap();
    return p.a + foo();
}

If you have any suggestions what I missed or did wrong, please let me know.

I'll continue with this patch nevertheless, but it's clear now that there's still a way to go until I can access that enum without going through slow expression evaluator.

I tested this patch together with the following new code:

uint32_t TypeSystemClang::GetNumMemberEnums(lldb::opaque_compiler_type_t type) {
  using EnumIt = clang::DeclContext::specific_decl_iterator<clang::EnumDecl>;
  if (!type)
    return 0;
  clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));
  clang::DeclContext *ctx = qual_type->getAsRecordDecl()->getDeclContext();
  return std::distance(EnumIt(ctx->decls_begin()), EnumIt(ctx->decls_end()));
}

CompilerType
TypeSystemClang::GetMemberEnumAtIndex(lldb::opaque_compiler_type_t type,
                                      size_t index) {
  using EnumIt = clang::DeclContext::specific_decl_iterator<clang::EnumDecl>;
  if (!type)
    return CompilerType();

  clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));
  clang::DeclContext *ctx = qual_type->getAsRecordDecl()->getDeclContext();
  size_t enum_index = 0;
  for (EnumIt enums_it(ctx->decls_begin()), enums_end(ctx->decls_end());
       enums_it != enums_end;
       ++enums_it, ++enum_index) {
      if (enum_index == index) {
        return CompilerType(weak_from_this(), *enums_it);
      }
  }
}

I created all the wrappers to make it available in Python. The result was unsatisfactory: this code doesn't even trigger DWARFASTParserClang::ParseChildMembers that this patch touches, and return 0 instead of 1. This doesn't change even if I trigger ParseChildMembers via other means before asking for a number of member enums in a type.

Code I tested this on:

using intptr_t = long;
using uintptr_t = unsigned long;

struct PointerIntPairInfo {
  enum MaskAndShiftConstants : uintptr_t {
    PointerBitMask =
        ~(uintptr_t)(((intptr_t)1 << 3) - 1),
  };

  int a{};
};

static uintptr_t dummy() {
  return PointerIntPairInfo::PointerBitMask;
}

int main()
{
    PointerIntPairInfo p;
    __builtin_debugtrap();
    return p.a + foo();
}

If you have any suggestions what I missed or did wrong, please let me know.

I'll continue with this patch nevertheless, but it's clear now that there's still a way to go until I can access that enum without going through slow expression evaluator.

What were your lldb commands when you tested this?

LLDB currently completes types lazily when it thinks it can. Does your new API still fail if you run expr p prior? (the idea is that that would trigger completion of the type and parse dwarf). If we dont ever call GetFullCompilerType on your type LLDB will never try to pull in the definition

Endill updated this revision to Diff 551811.Aug 20 2023, 3:23 AM

Follow the DW_TAG_subprogram approach for DW_TAG_enum_type, DW_TAG_structure_type, and DW_TAG_union_type: rely on ParseType() called afterwards instead of eagerly parsing them.

What were your lldb commands when you tested this?

script import lldb; frame = lldb.thread.GetFrameAtIndex(0); print(frame.variables[0].type.GetNumberOfMemberEnums())

LLDB currently completes types lazily when it thinks it can. Does your new API still fail if you run expr p prior? (the idea is that that would trigger completion of the type and parse dwarf). If we dont ever call GetFullCompilerType on your type LLDB will never try to pull in the definition

No amount of tinkering with expr makes script print(frame.variables[0].type.GetNumberOfMemberEnums()) output a non-zero value.
I tested this with the changes I uploaded here half an hour ago.

Michael137 added a comment.EditedAug 20 2023, 5:26 AM

What were your lldb commands when you tested this?

script import lldb; frame = lldb.thread.GetFrameAtIndex(0); print(frame.variables[0].type.GetNumberOfMemberEnums())

LLDB currently completes types lazily when it thinks it can. Does your new API still fail if you run expr p prior? (the idea is that that would trigger completion of the type and parse dwarf). If we dont ever call GetFullCompilerType on your type LLDB will never try to pull in the definition

No amount of tinkering with expr makes script print(frame.variables[0].type.GetNumberOfMemberEnums()) output a non-zero value.
I tested this with the changes I uploaded here half an hour ago.

What were your lldb commands when you tested this?

script import lldb; frame = lldb.thread.GetFrameAtIndex(0); print(frame.variables[0].type.GetNumberOfMemberEnums())

LLDB currently completes types lazily when it thinks it can. Does your new API still fail if you run expr p prior? (the idea is that that would trigger completion of the type and parse dwarf). If we dont ever call GetFullCompilerType on your type LLDB will never try to pull in the definition

No amount of tinkering with expr makes script print(frame.variables[0].type.GetNumberOfMemberEnums()) output a non-zero value.
I tested this with the changes I uploaded here half an hour ago.

I think you may want to use GetCompleteQualType before iterating the DeclContext; that's how some of the other TypeSystemClang APIs do it. That will make sure we complete the type by the time we look for the enums.

E.g.,:

clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));                               
if (GetCompleteQualType(&getASTContext(), qual_type)) {                                                     
  const clang::RecordType *record_type =                                                                         
      llvm::cast<clang::RecordType>(qual_type.getTypePtr());                                                
  const clang::RecordDecl *record_decl = record_type->getDecl();                                                 
  assert(record_decl);                                                                                           
  return std::distance(EnumIt(record_decl->decls_begin()), EnumIt(record_decl->decls_end())); 
}

Will review it more thoroughly on tomorrow though

I think you may want to use GetCompleteQualType before iterating the DeclContext; that's how some of the other TypeSystemClang APIs do it. That will make sure we complete the type by the time we look for the enums.

I tried to incorporate this function into my version of GetNumberOfMemberEnums, but it didn't work out.

E.g.,:

clang::QualType qual_type = RemoveWrappingTypes(GetCanonicalQualType(type));                               
if (GetCompleteQualType(&getASTContext(), qual_type)) {                                                     
  const clang::RecordType *record_type =                                                                         
      llvm::cast<clang::RecordType>(qual_type.getTypePtr());                                                
  const clang::RecordDecl *record_decl = record_type->getDecl();                                                 
  assert(record_decl);                                                                                           
  return std::distance(EnumIt(record_decl->decls_begin()), EnumIt(record_decl->decls_end())); 
}

Will review it more thoroughly on tomorrow though

But your version actually works, returning 1! Thank you very much!

Sorry for the delay, just came back from vacation

The change itself LGTM. Can we add a test though? We do have DWARFASTParserClang unittests: https://github.com/llvm/llvm-project/blob/main/lldb/unittests/SymbolFile/DWARF/DWARFASTParserClangTests.cpp

If the yaml input file becomes tedious to write you could also just an API test. E.g., under https://github.com/llvm/llvm-project/tree/main/lldb/test/API/lang/cpp

Let me know if you need some pointers in writing the tests

Also, I assume the extra changes to make the PointerIntPair formatter work will be in a follow-up patch?

Also, I assume the extra changes to make the PointerIntPair formatter work will be in a follow-up patch?

Yes. That work is not finished yet.

@Michael137 I moved this over to GitHub, and I need your help there writing the unit test.