This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Store shared pointers in DieToTypePtr map instead of raw pointers
Needs ReviewPublic

Authored by augusto2112 on Jan 9 2023, 12:50 PM.

Details

Summary

Storing raw pointers in DieToTypePtr may cause use-after-frees to occur,
since there are no guarantees that the shared pointers that owns the
underlying pointer to the type are kept around as long as the map.
Change the map to store a shared pointer instead.

Diff Detail

Event Timeline

augusto2112 created this revision.Jan 9 2023, 12:50 PM
Herald added a project: Restricted Project. · View Herald Transcript
augusto2112 requested review of this revision.Jan 9 2023, 12:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 12:50 PM

I changed the value of the map to a shared pointer, but could change it to a weak pointer instead, not sure which is more appropriate in this case.

I changed the value of the map to a shared pointer, but could change it to a weak pointer instead, not sure which is more appropriate in this case.

I was about to ask the same thing. What currently determines the lifetime of the stored objects?

I changed the value of the map to a shared pointer, but could change it to a weak pointer instead, not sure which is more appropriate in this case.

I was about to ask the same thing. What currently determines the lifetime of the stored objects?

Currently, a bunch of functions in DWARFASTParserClang and SymbolFileDWARF will create a new Type SP, store the underlying pointer in the map, and return the SP. The lifetime of the stored objects varies by whatever the caller does with the returned SP. Looking at some of the callers, I don't see a very clear pattern, most seem to use it and immediately discard it (use after free when we look up the same type again and the underlying pointer is still in the map), but at least one caller (SymbolFileDWARF::ResolveType) returns the underlying pointer to its callers, which would still be UB even if we changed the map to store weak pointers. Based on that I'm thinking we should store them as shared pointers instead of weak ones.

I changed the value of the map to a shared pointer, but could change it to a weak pointer instead, not sure which is more appropriate in this case.

I was about to ask the same thing. What currently determines the lifetime of the stored objects?

Currently, a bunch of functions in DWARFASTParserClang and SymbolFileDWARF will create a new Type SP, store the underlying pointer in the map, and return the SP. The lifetime of the stored objects varies by whatever the caller does with the returned SP. Looking at some of the callers, I don't see a very clear pattern, most seem to use it and immediately discard it (use after free when we look up the same type again and the underlying pointer is still in the map), but at least one caller (SymbolFileDWARF::ResolveType) returns the underlying pointer to its callers, which would still be UB even if we changed the map to store weak pointers. Based on that I'm thinking we should store them as shared pointers instead of weak ones.

Then maybe the solution is to store the type as unique pointers in the map and hand out raw pointers, relying on the map to keep the object alive? I feel pretty uncomfortable with handing out shared pointers without clearly understanding its lifetime cycle.

I changed the value of the map to a shared pointer, but could change it to a weak pointer instead, not sure which is more appropriate in this case.

I was about to ask the same thing. What currently determines the lifetime of the stored objects?

Currently, a bunch of functions in DWARFASTParserClang and SymbolFileDWARF will create a new Type SP, store the underlying pointer in the map, and return the SP. The lifetime of the stored objects varies by whatever the caller does with the returned SP. Looking at some of the callers, I don't see a very clear pattern, most seem to use it and immediately discard it (use after free when we look up the same type again and the underlying pointer is still in the map), but at least one caller (SymbolFileDWARF::ResolveType) returns the underlying pointer to its callers, which would still be UB even if we changed the map to store weak pointers. Based on that I'm thinking we should store them as shared pointers instead of weak ones.

Then maybe the solution is to store the type as unique pointers in the map and hand out raw pointers, relying on the map to keep the object alive? I feel pretty uncomfortable with handing out shared pointers without clearly understanding its lifetime cycle.

Yeah that seems like a better solution. I'm checking how hard implementing that would be. TypeSP is used in a bunch of places in LLDB.

clayborg requested changes to this revision.Jan 11 2023, 3:09 PM

Each SymbolFile has its own type list that keeps the shared pointers to all types. So there should be no need for any of these changes here unless someone isn't correctly storing a newly created type in the SymbolFile's type list. You can see the types being stored in code like:

TypeSP type_sp(... /* create type here*/ ...);
dwarf->GetTypeList().Insert(type_sp); // Store the share reference in the symbol file's type list
lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
208–209

Above the "type_sp" is immediately stored in the SymbolFile's type list. So we don't need to do anything here and it _is_ safe to just store a "Type *".

This revision now requires changes to proceed.Jan 11 2023, 3:09 PM

Storing raw pointers in DieToTypePtr may cause use-after-frees to occur, since there are no guarantees that the shared pointers that owns the underlying pointer to the type are kept around as long as the map. Change the map to store a shared pointer instead.

This isn't true. SymbolFile has a m_type_list that is used to store all shared pointers to any types that are created. Since SymbolFileDWARF is an instance of SymbolFile, it is safe to store "Type *" anywhere inside this class because as along as the SymbolFileDWARF is around, so will the SymbolFile that owns all of the types. If there are any places where someone is creating a type and _not_ storing it in the SymbolFile::m_type_list, then this is the bug and that should be fixed.

Each SymbolFile has its own type list that keeps the shared pointers to all types. So there should be no need for any of these changes here unless someone isn't correctly storing a newly created type in the SymbolFile's type list. You can see the types being stored in code like:

TypeSP type_sp(... /* create type here*/ ...);
dwarf->GetTypeList().Insert(type_sp); // Store the share reference in the symbol file's type list

What triggered the use after free for me was this snippet in DWARFASTParserClang::ParseTypeModifier:

  ...
  TypeSP lldb_function_type_sp = ParseTypeFromDWARF(
      sc, function_type, &function_type_is_new_pointer);

  if (lldb_function_type_sp) {
    clang_type = m_ast.CreateBlockPointerType(
        lldb_function_type_sp->GetForwardCompilerType());
    encoding_data_type = Type::eEncodingIsUID;
    attrs.type.Clear();
    resolve_state = Type::ResolveState::Full;
  }
}

lldb_function_type_sp is created, the underlying pointer is placed in the map, the SP is destroyed at the end of the scope, and any lookups for that type will get back the dangling pointer.

I could update this particular instance, but this just seems like a really easy mistake to make... There's nothing to suggest to callers that they must update the m_type_list when they ask for a new type. Wouldn't it be better we prevent this somehow, by updating the raw pointers of the map by being either unique, shared or weak pointers?

Each SymbolFile has its own type list that keeps the shared pointers to all types. So there should be no need for any of these changes here unless someone isn't correctly storing a newly created type in the SymbolFile's type list. You can see the types being stored in code like:

TypeSP type_sp(... /* create type here*/ ...);
dwarf->GetTypeList().Insert(type_sp); // Store the share reference in the symbol file's type list

What triggered the use after free for me was this snippet in DWARFASTParserClang::ParseTypeModifier:

  ...
  TypeSP lldb_function_type_sp = ParseTypeFromDWARF(
      sc, function_type, &function_type_is_new_pointer);

  if (lldb_function_type_sp) {
    clang_type = m_ast.CreateBlockPointerType(
        lldb_function_type_sp->GetForwardCompilerType());
    encoding_data_type = Type::eEncodingIsUID;
    attrs.type.Clear();
    resolve_state = Type::ResolveState::Full;
  }
}

lldb_function_type_sp is created, the underlying pointer is placed in the map, the SP is destroyed at the end of the scope, and any lookups for that type will get back the dangling pointer.

I could update this particular instance, but this just seems like a really easy mistake to make... There's nothing to suggest to callers that they must update the m_type_list when they ask for a new type. Wouldn't it be better we prevent this somehow, by updating the raw pointers of the map by being either unique, shared or weak pointers?

We really need the SymbolFile to own (have a strong reference to) all types that each SymbolFile subclass creates. One idea would be to make the lldb_private::Type's constructor protected and then friend class the SymbolFile class only, and then add an accessor where all types must be created via a SymbolFile API. Right now anyone can create a type with:

TypeSP type_sp(new Type(...));

This could turn into:

TypeSP type_sp(symbol_file->CreateType(...));

The main issue is shared pointer and weak pointer objects are more expensive objects as they are two pointers each and we need to keep these extra maps, like the the dwarf->GetDIEToType() as small as they can be. We also need to the type to not disappear as it needs to be owned by the SymbolFile classes. Not sure if anyone is creating types outside of the SymbolFile or subclasses.

So my requirement is that all types created by SymbolFile objects must have a single strong reference to the type in SymbolFile::m_type_list. Right now this isn't enforced, but it would be great if it was.

Make Type constructor private

@clayborg I went with your suggestion and made Type's constructor private, and any new Type created is automatically added to the TypeList. I think later on we should make TypeList keep a vector of unique pointers instead of shared ones.

New changes looks great, thanks for tackling this. Just a few comments that need to be updated and this will be good to go as long as the test suite passes just fine!

@clayborg I went with your suggestion and made Type's constructor private, and any new Type created is automatically added to the TypeList. I think later on we should make TypeList keep a vector of unique pointers instead of shared ones.

If we did switch the unique pointers, then we would always need to hand out "Type *" and this could cause crashes and use after free errors. So I would hold off on that change.

lldb/include/lldb/Symbol/SymbolFile.h
414 ↗(On Diff #489137)

Just add a comment here saying something lie "This function is used to create types that belong to a SymbolFile. The symbol file will own a strong reference to the type in an internal type list."

lldb/include/lldb/Symbol/Type.h
229–230 ↗(On Diff #489137)

add a \see SymbolFileCommon::MakeType() reference in the header documentation here so users will know what function to use if the get a compile error.