This is an archive of the discontinued LLVM Phabricator instance.

[C++20] [Modules] Warn for the use of preferred_name in modules
AbandonedPublic

Authored by ChuanqiXu on Jul 14 2022, 3:11 AM.

Details

Summary

Currently, the use of preferred_name would block implementing std modules in libcxx. See https://github.com/llvm/llvm-project/issues/56490 for example.
The problem is pretty hard and it looks like we couldn't solve it in a short time. So we warn for the case to tell the users to not use preferred_name friendly.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Jul 14 2022, 3:11 AM
Herald added a project: Restricted Project. · View Herald Transcript
ChuanqiXu requested review of this revision.Jul 14 2022, 3:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 3:11 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Or is it better to add a special option -fno-preferred-name to ignore it?

I guess I don't have a good idea why this attribute would cause ODR issues? It would seem that if it appeared in 2 different TUs that we could just 'pick' whichever we wanted, right? The description in the bug report of the problem isn't clear to me what the actual issue is.

I guess I don't have a good idea why this attribute would cause ODR issues? It would seem that if it appeared in 2 different TUs that we could just 'pick' whichever we wanted, right?

If the compiler finds it appeared in 2 different TUs with different definition (although the judgement is wrong), the compiler would emit an error. So it would block the uses of C++20 Modules with preferred_name.

The description in the bug report of the problem isn't clear to me what the actual issue is.

Sorry. My bad. Let me try to clarify it. When we write the attribute preferred_name(foo) in ASTWriter, the compiler would try to write the type for the argument foo. Then when the compiler write the type for foo, the compiler find the type for foo is a TypeDef type. So the compiler would write the corresponding type foo_templ<char>. The key point here is that the AST for foo_templ<char> is complete now. Since the AST for foo_templ<char> is constructed in Sema.

But problem comes when we read it. When we read the attribute preferred_name(foo), we would read the type for the argument foo and then we would try to read the type foo_templ<char> later. However, the key problem here is that when we read foo_templ<char>, its AST is not constructed yet! So we get a different type with the writer writes. So here is the ODR violation.

The problem is fundamental and I've spent 2 weeks on it. But I don't find any fixes for it. Then I found that, once I disabled preferred_name, we could go much further. So I am wondering if it would be an option to skip preferred_name if C++ modules is enabled. The idea may not be good in general. But I feel it might be an option in this specific case given it is hard to solve and preferred_name is primarily for printers.

I guess I don't have a good idea why this attribute would cause ODR issues? It would seem that if it appeared in 2 different TUs that we could just 'pick' whichever we wanted, right?

If the compiler finds it appeared in 2 different TUs with different definition (although the judgement is wrong), the compiler would emit an error. So it would block the uses of C++20 Modules with preferred_name.

The description in the bug report of the problem isn't clear to me what the actual issue is.

Sorry. My bad. Let me try to clarify it. When we write the attribute preferred_name(foo) in ASTWriter, the compiler would try to write the type for the argument foo. Then when the compiler write the type for foo, the compiler find the type for foo is a TypeDef type. So the compiler would write the corresponding type foo_templ<char>. The key point here is that the AST for foo_templ<char> is complete now. Since the AST for foo_templ<char> is constructed in Sema.

But problem comes when we read it. When we read the attribute preferred_name(foo), we would read the type for the argument foo and then we would try to read the type foo_templ<char> later. However, the key problem here is that when we read foo_templ<char>, its AST is not constructed yet! So we get a different type with the writer writes. So here is the ODR violation.

The problem is fundamental and I've spent 2 weeks on it. But I don't find any fixes for it. Then I found that, once I disabled preferred_name, we could go much further. So I am wondering if it would be an option to skip preferred_name if C++ modules is enabled. The idea may not be good in general. But I feel it might be an option in this specific case given it is hard to solve and preferred_name is primarily for printers.

Hmm... interesting. I wish I had a better understanding of the ASTWriter to help with this, but given:

1- Getting modules compiled is important
2- this attribute is 'ignore-able', and is for diagnostics only
3- the ASTWriter/ASTReader seems to be messing this up.

I think there _IS_ perhaps an acceptability to ignoring this attribute when it would cause a problem. However, I think doing it as you're doing it now is wrong. IF we are going to solve the symptom here, I think we should use a much more precise cut, and make either ASTReader not emit the attribute, or ASTWriter just 'ignore' it when reading. WDYT? @aaron.ballman as well...

I think there _IS_ perhaps an acceptability to ignoring this attribute when it would cause a problem. However, I think doing it as you're doing it now is wrong. IF we are going to solve the symptom here, I think we should use a much more precise cut, and make either ASTReader not emit the attribute, or ASTWriter just 'ignore' it when reading. WDYT? @aaron.ballman as well...

I'm less convinced that's reasonable, because that root cause seems like it will impact several other attributes as well. For example, I would be IBOutletCollection, OwnerAttr, PointerAttr, and TypeTagForDatatypeAttr all behave the same way as they all take a type argument. I don't think we want to selectively disable so many attributes (for example, disallowing owner and pointer attributes means we lose out on C++ Core Guideline features that people will likely expect to be able to use with modules. However, I agree that being able to modularize the STL is an important use case.

I'd like to understand better what the root cause is. The attribute requires a resolved type name, which means we must have seen the type before the attribute when writing the AST out. When reading the AST back in, why is the type not visible before the attribute? That sounds like we're writing the AST out in the wrong order somehow, or something odd like that.

I'm not following what the root problem is. You stated:

When we write the attribute preferred_name(foo) in ASTWriter, the compiler would try to write the type for the argument foo. Then when the compiler write the type for foo, the compiler find the type for foo is a TypeDef type. So the compiler would write the corresponding type foo_templ<char>. The key point here is that the AST for foo_templ<char> is complete now. Since the AST for foo_templ<char> is constructed in Sema.

Are you saying that the act of writing the AST is triggering the instantiation/completion of foo_templ<char>? Serialization should certainly not cause such side effects.

I'm likewise not following the concern with reading the AST:

When we read the attribute preferred_name(foo), we would read the type for the argument foo and then we would try to read the type foo_templ<char> later. However, the key problem here is that when we read foo_templ<char>, its AST is not constructed yet! So we get a different type with the writer writes. So here is the ODR violation.

I wouldn't expect the same Sema instance to be used to both write and read the same AST.

The test case defines two modules (A and Use) that both have definitions of foo_templ, foo, and foo_templ<char> in their global module fragments. It sounds to me like the issue here might be that these definitions are not getting merged (perhaps because of the issue you are trying to describe and address).

Can you try again to explain the root problem? Perhaps with references to relevant code?

For example, I would be IBOutletCollection, OwnerAttr, PointerAttr, and TypeTagForDatatypeAttr all behave the same way as they all take a type argument. I don't think we want to selectively disable so many attributes (for example, disallowing owner and pointer attributes means we lose out on C++ Core Guideline features that people will likely expect to be able to use with modules.

I agree that there may be other problematic cases. But all of IBOutletCollection,OwnerAttr, PointerAttr and TypeTagForDatatypeAttr wouldn't meet the same problem. And from what I can see, now PreferredName is special indeed. Since PreferredName is the only attribute whose argument type is TypeArgument<"TypedefType">. The reason is stated below.

I'm not following what the root problem is. You stated:

When we write the attribute preferred_name(foo) in ASTWriter, the compiler would try to write the type for the argument foo. Then when the compiler write the type for foo, the compiler find the type for foo is a TypeDef type. So the compiler would write the corresponding type foo_templ<char>. The key point here is that the AST for foo_templ<char> is complete now. Since the AST for foo_templ<char> is constructed in Sema.

Are you saying that the act of writing the AST is triggering the instantiation/completion of foo_templ<char>? Serialization should certainly not cause such side effects.

No, I don't mention that.

I'm likewise not following the concern with reading the AST:

When we read the attribute preferred_name(foo), we would read the type for the argument foo and then we would try to read the type foo_templ<char> later. However, the key problem here is that when we read foo_templ<char>, its AST is not constructed yet! So we get a different type with the writer writes. So here is the ODR violation.

I wouldn't expect the same Sema instance to be used to both write and read the same AST.

Yeah, I wouldn't expect that too.

Can you try again to explain the root problem? Perhaps with references to relevant code?

Yeah, following of is my explanation to the root problem. It might be a little bit long. First, for the following code:

C++
template<class _CharT>
class basic_string_view;

typedef basic_string_view<char> string_view;

template<class _CharT>
class
__attribute__((__preferred_name__(string_view)))
basic_string_view {
public:
    basic_string_view() 
    {
    }
};

inline basic_string_view<char> foo()
{
  return basic_string_view<char>();
}

The dumped AST would be:

|-ClassTemplateDecl 0x7913ac8 <./string_view.h:1:1, line:2:7> col:7 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913950 <line:1:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913a18 <line:2:1, col:7> col:7 in A.<global> hidden class basic_string_view
| `-ClassTemplateSpecializationDecl 0x7913cd0 <line:1:1, line:2:7> line:9:1 in A.<global> class basic_string_view definition
|   |-DefinitionData pass_in_registers empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
|   | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
|   | |-MoveConstructor exists simple trivial
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial constexpr
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   |-PreferredNameAttr 0x79143c8 <line:8:16, col:46> string_view
|   |-CXXRecordDecl 0x7914698 <line:7:1, line:9:1> col:1 in A.<global> implicit class basic_string_view
|   |-AccessSpecDecl 0x7914748 <line:10:1, col:7> col:1 in A.<global> public
|   |-CXXConstructorDecl 0x7935158 <line:11:5, line:13:5> line:11:5 in A.<global> used basic_string_view 'void ()'
|   | `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
|   |-CXXConstructorDecl 0x7935300 <line:9:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (const string_view &)' inline default trivial noexcept-unevaluated 0x7935300
|   | `-ParmVarDecl 0x7935420 <col:1> col:1 in A.<global> 'const string_view &'
|   |-CXXConstructorDecl 0x79354d0 <col:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (string_view &&)' inline default trivial noexcept-unevaluated 0x79354d0
|   | `-ParmVarDecl 0x79355f0 <col:1> col:1 in A.<global> 'string_view &&'
|   `-CXXDestructorDecl 0x79357d0 <col:1> col:1 in A.<global> implicit referenced constexpr ~basic_string_view 'void () noexcept' inline default trivial
|-TypedefDecl 0x7913e78 <line:4:1, col:33> col:33 in A.<global> hidden referenced string_view 'basic_string_view<char>':'string_view'
| `-TemplateSpecializationType 0x7913df0 'basic_string_view<char>' sugar basic_string_view
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   `-RecordType 0x7913dd0 'string_view'
|     `-ClassTemplateSpecialization 0x7913cd0 'basic_string_view'
|-ClassTemplateDecl 0x7914040 prev 0x7913ac8 <line:6:1, line:14:1> line:9:1 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913ed8 <line:6:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913fa8 prev 0x7913a18 <line:7:1, line:14:1> line:9:1 in A.<global> hidden class basic_string_view definition
| | |-DefinitionData empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial constexpr needs_implicit
| | |-PreferredNameAttr 0x7914100 <line:8:16, col:46> string_view
| | |-CXXRecordDecl 0x7914168 <line:7:1, line:9:1> col:1 in A.<global> hidden implicit referenced class basic_string_view
| | |-AccessSpecDecl 0x7914218 <line:10:1, col:7> col:1 in A.<global> public
| | `-CXXConstructorDecl 0x79142d0 <line:11:5, line:13:5> line:11:5 in A.<global> hidden basic_string_view<_CharT> 'void ()'
| |   `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
| `-ClassTemplateSpecialization 0x7913cd0 'basic_string_view'

Then when ASTReader reads ClassTemplateDecl at 0x7913ac8, the ASTReader would read ClassTemplateSpecializationDecl at 0x7913cd0. Then the ASTReader would try to read the PreferredNameAttr at 0x79143c8. This happens at https://github.com/llvm/llvm-project/blob/557a471ab353d2be067d12b8cc352706ab089fc5/clang/lib/Serialization/ASTReaderDecl.cpp#L594-L600. An important note here is that the ASTReader reads the attributes in the very earlier state so that the ASTReader haven't read the name! (The ASTReader would read the name of Decl in ASTReaderDecl::VisitNamedDecl, which would happen after ASTReaderDecl::VisitDecl.)

When the ASTReader reads PreferredNameAttr, it would try to read the TypedefType since it's the argument type of PreferredNameAttr. The reading code is generated from clang/include/clang/AST/TypeProperties.td:

C++
QualType readTypedefType() {
    auto &ctx = R.getASTContext();
    Decl* declaration = R.find("declaration").readDeclRef();
    llvm::Optional<QualType> canonicalType = R.find("canonicalType").template readOptional<QualType>();
    
    QualType finalCanonicalType =
      canonicalType ? ctx.getCanonicalType(*canonicalType)
                    : QualType();
    return ctx.getTypedefType(cast<TypedefNameDecl>(declaration),
                              finalCanonicalType);
  
}

The problem comes when we read the canonical type. The ASTReader would try to read RecordType at 0x7913dd0:

C++
QualType readRecordType() {
    auto &ctx = R.getASTContext();
    bool dependent = R.find("dependent").readBool();
    Decl* declaration = R.find("declaration").readDeclRef();
    
    auto record = cast<RecordDecl>(declaration);
    QualType result = ctx.getRecordType(record);
    if (dependent)
      const_cast<Type *>(result.getTypePtr())
          ->addDependence(TypeDependence::DependentInstantiation);
    return result;
  
  }

Note the third line of the function, it would try to read the declaration! And from the above AST, we know the ASTReader would read ClassTemplateSpecializationDecl at 0x7913cd0. And this is what we're reading! Note that we wouldn't fall in an infinite loop since we would do a quick return in https://github.com/llvm/llvm-project/blob/bd228a17727ec0f51f8606bbcfb4fe65aebbd0dd/clang/lib/Serialization/ASTReader.cpp#L7488-L7494. But we still haven't completed the reading of ClassTemplateSpecializationDecl at 0x7913cd0. We only started at the very early stage. So the type of that RecordType is broken.

Long story short, the key problem here is that ASTReader don't take care of the dependent loop. It looks like we can't fix the problem by a simple patch. And the problem itself is really small. Currently only PreferredNameAttr would trigger the problem. So it is an option to me to ignore the attribute in the specific case.

Thank you for the detailed explanation, @ChuanqiXu, that was very helpful.

It looks to me like the problem may be that the initial declaration of the basic_string_view class template is non-defining, but when serializing that declaration, we serialize a definition of the basic_string_view<char> specialization (note that we do not attempt to serialize a definition of the class template at this point); the AST serialization appears to be "jumping ahead". When the defining declaration is later serialized, we serialize the class template definition, but not the specialization definition (presumably because it was already serialized and/or dumped). This analysis assumes that the act of dumping the AST reflects the serialization order; that may not be a valid assumption. I didn't look into whether the serialization code does actually "jump ahead" as the dump appears to indicate. From the AST writing side, it seems to me that there should be a rule that an instantiated declaration should never be written prior to its point of instantiation. Thus, a corrected AST dump would look more like:

|-ClassTemplateDecl 0x7913ac8 <./string_view.h:1:1, line:2:7> col:7 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913950 <line:1:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913a18 <line:2:1, col:7> col:7 in A.<global> hidden class basic_string_view
| `-ClassTemplateSpecializationDecl 0x7913cd0 <line:1:1, line:2:7> line:9:1 in A.<global> class basic_string_view definition
|   `-TemplateArgument type 'char'
|     `-BuiltinType 0x78c77b0 'char'
|-TypedefDecl 0x7913e78 <line:4:1, col:33> col:33 in A.<global> hidden referenced string_view 'basic_string_view<char>':'string_view'
| `-TemplateSpecializationType 0x7913df0 'basic_string_view<char>' sugar basic_string_view
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   `-RecordType 0x7913dd0 'string_view'
|     `-ClassTemplateSpecialization 0x7913cd0 'basic_string_view'
|-ClassTemplateDecl 0x7914040 prev 0x7913ac8 <line:6:1, line:14:1> line:9:1 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913ed8 <line:6:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913fa8 prev 0x7913a18 <line:7:1, line:14:1> line:9:1 in A.<global> hidden class basic_string_view definition
| | |-DefinitionData empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial constexpr needs_implicit
| | |-PreferredNameAttr 0x7914100 <line:8:16, col:46> string_view
| | |-CXXRecordDecl 0x7914168 <line:7:1, line:9:1> col:1 in A.<global> hidden implicit referenced class basic_string_view
| | |-AccessSpecDecl 0x7914218 <line:10:1, col:7> col:1 in A.<global> public
| | `-CXXConstructorDecl 0x79142d0 <line:11:5, line:13:5> line:11:5 in A.<global> hidden basic_string_view<_CharT> 'void ()'
| |   `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
| `-ClassTemplateSpecializationDecl 0x7913cd0 <line:1:1, line:2:7> line:9:1 in A.<global> class basic_string_view definition
|   |-DefinitionData pass_in_registers empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
|   | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
|   | |-MoveConstructor exists simple trivial
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial constexpr
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   |-PreferredNameAttr 0x79143c8 <line:8:16, col:46> string_view
|   |-CXXRecordDecl 0x7914698 <line:7:1, line:9:1> col:1 in A.<global> implicit class basic_string_view
|   |-AccessSpecDecl 0x7914748 <line:10:1, col:7> col:1 in A.<global> public
|   |-CXXConstructorDecl 0x7935158 <line:11:5, line:13:5> line:11:5 in A.<global> used basic_string_view 'void ()'
|   | `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
|   |-CXXConstructorDecl 0x7935300 <line:9:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (const string_view &)' inline default trivial noexcept-unevaluated 0x7935300
|   | `-ParmVarDecl 0x7935420 <col:1> col:1 in A.<global> 'const string_view &'
|   |-CXXConstructorDecl 0x79354d0 <col:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (string_view &&)' inline default trivial noexcept-unevaluated 0x79354d0
|   | `-ParmVarDecl 0x79355f0 <col:1> col:1 in A.<global> 'string_view &&'
|   `-CXXDestructorDecl 0x79357d0 <col:1> col:1 in A.<global> implicit referenced constexpr ~basic_string_view 'void () noexcept' inline default trivial

I wonder if it would be worth pursuing (unrelated to this issue) introducing an ImplicitInstantiationDecl or similar that would be inserted in the AST at the point of instantiation for an instantiated declaration/definition. That might be pretty noisy, but would (I think) simplify some aspects of the AST serialization.

I neglected to explicitly mention in conjunction with my last comment, but @ChuanqiXu, can you check to see if we are indeed serializing class template specializations "too early" and, if so, whether delaying such serialization until a defining point resolves the issue?

For example, I would be IBOutletCollection, OwnerAttr, PointerAttr, and TypeTagForDatatypeAttr all behave the same way as they all take a type argument. I don't think we want to selectively disable so many attributes (for example, disallowing owner and pointer attributes means we lose out on C++ Core Guideline features that people will likely expect to be able to use with modules.

I agree that there may be other problematic cases. But all of IBOutletCollection,OwnerAttr, PointerAttr and TypeTagForDatatypeAttr wouldn't meet the same problem. And from what I can see, now PreferredName is special indeed. Since PreferredName is the only attribute whose argument type is TypeArgument<"TypedefType">. The reason is stated below.

All of the attributes I pointed to take a TypeArgument, so I'm still not certain I understand why you think these wouldn't be similarly impacted...

Can you try again to explain the root problem? Perhaps with references to relevant code?

Yeah, following of is my explanation to the root problem. It might be a little bit long. First, for the following code:

C++
template<class _CharT>
class basic_string_view;

typedef basic_string_view<char> string_view;

template<class _CharT>
class
__attribute__((__preferred_name__(string_view)))
basic_string_view {
public:
    basic_string_view() 
    {
    }
};

inline basic_string_view<char> foo()
{
  return basic_string_view<char>();
}

The dumped AST would be:

|-ClassTemplateDecl 0x7913ac8 <./string_view.h:1:1, line:2:7> col:7 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913950 <line:1:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913a18 <line:2:1, col:7> col:7 in A.<global> hidden class basic_string_view
| `-ClassTemplateSpecializationDecl 0x7913cd0 <line:1:1, line:2:7> line:9:1 in A.<global> class basic_string_view definition
|   |-DefinitionData pass_in_registers empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
|   | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
|   | |-MoveConstructor exists simple trivial
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial constexpr
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   |-PreferredNameAttr 0x79143c8 <line:8:16, col:46> string_view
|   |-CXXRecordDecl 0x7914698 <line:7:1, line:9:1> col:1 in A.<global> implicit class basic_string_view
|   |-AccessSpecDecl 0x7914748 <line:10:1, col:7> col:1 in A.<global> public
|   |-CXXConstructorDecl 0x7935158 <line:11:5, line:13:5> line:11:5 in A.<global> used basic_string_view 'void ()'
|   | `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
|   |-CXXConstructorDecl 0x7935300 <line:9:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (const string_view &)' inline default trivial noexcept-unevaluated 0x7935300
|   | `-ParmVarDecl 0x7935420 <col:1> col:1 in A.<global> 'const string_view &'
|   |-CXXConstructorDecl 0x79354d0 <col:1> col:1 in A.<global> implicit constexpr basic_string_view 'void (string_view &&)' inline default trivial noexcept-unevaluated 0x79354d0
|   | `-ParmVarDecl 0x79355f0 <col:1> col:1 in A.<global> 'string_view &&'
|   `-CXXDestructorDecl 0x79357d0 <col:1> col:1 in A.<global> implicit referenced constexpr ~basic_string_view 'void () noexcept' inline default trivial
|-TypedefDecl 0x7913e78 <line:4:1, col:33> col:33 in A.<global> hidden referenced string_view 'basic_string_view<char>':'string_view'
| `-TemplateSpecializationType 0x7913df0 'basic_string_view<char>' sugar basic_string_view
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c77b0 'char'
|   `-RecordType 0x7913dd0 'string_view'
|     `-ClassTemplateSpecialization 0x7913cd0 'basic_string_view'
|-ClassTemplateDecl 0x7914040 prev 0x7913ac8 <line:6:1, line:14:1> line:9:1 in A.<global> hidden basic_string_view
| |-TemplateTypeParmDecl 0x7913ed8 <line:6:10, col:16> col:16 in A.<global> hidden class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913fa8 prev 0x7913a18 <line:7:1, line:14:1> line:9:1 in A.<global> hidden class basic_string_view definition
| | |-DefinitionData empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial constexpr needs_implicit
| | |-PreferredNameAttr 0x7914100 <line:8:16, col:46> string_view
| | |-CXXRecordDecl 0x7914168 <line:7:1, line:9:1> col:1 in A.<global> hidden implicit referenced class basic_string_view
| | |-AccessSpecDecl 0x7914218 <line:10:1, col:7> col:1 in A.<global> public
| | `-CXXConstructorDecl 0x79142d0 <line:11:5, line:13:5> line:11:5 in A.<global> hidden basic_string_view<_CharT> 'void ()'
| |   `-CompoundStmt 0x79143a8 <line:12:5, line:13:5>
| `-ClassTemplateSpecialization 0x7913cd0 'basic_string_view'

Then when ASTReader reads ClassTemplateDecl at 0x7913ac8, the ASTReader would read ClassTemplateSpecializationDecl at 0x7913cd0. Then the ASTReader would try to read the PreferredNameAttr at 0x79143c8. This happens at https://github.com/llvm/llvm-project/blob/557a471ab353d2be067d12b8cc352706ab089fc5/clang/lib/Serialization/ASTReaderDecl.cpp#L594-L600. An important note here is that the ASTReader reads the attributes in the very earlier state so that the ASTReader haven't read the name! (The ASTReader would read the name of Decl in ASTReaderDecl::VisitNamedDecl, which would happen after ASTReaderDecl::VisitDecl.)

When the ASTReader reads PreferredNameAttr, it would try to read the TypedefType since it's the argument type of PreferredNameAttr. The reading code is generated from clang/include/clang/AST/TypeProperties.td:

The argument type is TypeArgument, not TypdefType. The guts of the reading code for the preferred name attribute *should* be:

case attr::PreferredName: {
  bool isInherited = Record.readInt();
  bool isImplicit = Record.readInt();
  bool isPackExpansion = Record.readInt();
  TypeSourceInfo * typedefType = Record.readTypeSourceInfo();
  New = new (Context) PreferredNameAttr(Context, Info, typedefType);
  cast<InheritableAttr>(New)->setInherited(isInherited);
  New->setImplicit(isImplicit);
  New->setPackExpansion(isPackExpansion);
  break;
}

Long story short, the key problem here is that ASTReader don't take care of the dependent loop. It looks like we can't fix the problem by a simple patch. And the problem itself is really small. Currently only PreferredNameAttr would trigger the problem. So it is an option to me to ignore the attribute in the specific case.

I'm still not quite seeing that this only impacts one attribute. I *think* this would happen for the others listed if the attribute argument was similarly a typedef as with your example using preferred_name.

I neglected to explicitly mention in conjunction with my last comment, but @ChuanqiXu, can you check to see if we are indeed serializing class template specializations "too early" and, if so, whether delaying such serialization until a defining point resolves the issue?

Personally, the problem is not serializing class template specializations "too early". The problem is that we are deserializing class template specializations "too early". The key point here is that Modules would read declarations lazily for performance reasons. This is a feature of modules.

The argument type is TypeArgument, not TypdefType. The guts of the reading code for the preferred name attribute *should* be:

Oh, My bad. I didn't mention it in the above response. In https://github.com/llvm/llvm-project/blob/469044cfd355d34573643a57b5d2a78a9c341327/clang/lib/Sema/SemaDeclAttr.cpp#L1427-L1456, we could find that the type of PreferredName must be a type def type to a record type.

I'm still not quite seeing that this only impacts one attribute. I *think* this would happen for the others listed if the attribute argument was similarly a typedef as with your example using preferred_name.

After experimenting the all day, I can't say the similar problem won't definitely happen to other attributes if they using a typedef as the attribute argument.


Here is the story of my experiment. The readers could skip this if not interested. IBOutletCollection is an Objective-C attributes. And the type argument of TypeTagForDatatype is required to be integral type: https://releases.llvm.org/9.0.0/tools/clang/docs/AttributeReference.html#type-tag-for-datatype. So let's see about gsl::Owner and gsl::Pointer. They are symmetric so let's see gsl::Owner.

First, I tried to following example, which replace [[gsl::Owner]] with preferred_name:

template<class _CharT>
class basic_string_view;

typedef basic_string_view<char> string_view;

template<class _CharT>
class
[[gsl::Owner(string_view)]]
basic_string_view {
public:
    basic_string_view() 
    {
    }
};

inline basic_string_view<char> foo()
{
  return basic_string_view<char>();
}

Then it don't meet the ODR violation problem. The reason is in the following AST:

|-ClassTemplateDecl 0x7914038 <string_view.h:1:1, line:2:7> col:7 basic_string_view
| |-TemplateTypeParmDecl 0x7913ec0 <line:1:10, col:16> col:16 class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7913f88 <line:2:1, col:7> col:7 class basic_string_view
| | `-OwnerAttr 0x7914670 <line:9:8, col:25> string_view
| `-ClassTemplateSpecializationDecl 0x7914240 <line:1:1, line:2:7> line:10:1 class basic_string_view definition
|   |-DefinitionData pass_in_registers empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
|   | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param implicit_has_const_param
|   | |-MoveConstructor exists simple trivial
|   | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial constexpr
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c7e50 'char'
|   |-OwnerAttr 0x7914b98 <line:9:8, col:25> string_view
|   |-CXXRecordDecl 0x7914c68 <line:7:1, line:10:1> col:1 implicit class basic_string_view
|   |-AccessSpecDecl 0x7914d18 <line:11:1, col:7> col:1 public
|   |-CXXConstructorDecl 0x7933d20 <line:12:5, line:14:5> line:12:5 used basic_string_view 'void ()'
|   | `-CompoundStmt 0x7914978 <line:13:5, line:14:5>
|   |-CXXConstructorDecl 0x7933ed0 <line:10:1> col:1 implicit constexpr basic_string_view 'void (const basic_string_view<char> &)' inline default trivial noexcept-unevaluated 0x7933ed0
|   | `-ParmVarDecl 0x7933ff0 <col:1> col:1 'const basic_string_view<char> &'
|   |-CXXConstructorDecl 0x79340a0 <col:1> col:1 implicit constexpr basic_string_view 'void (basic_string_view<char> &&)' inline default trivial noexcept-unevaluated 0x79340a0
|   | `-ParmVarDecl 0x79341c0 <col:1> col:1 'basic_string_view<char> &&'
|   `-CXXDestructorDecl 0x79343a0 <col:1> col:1 implicit referenced constexpr ~basic_string_view 'void () noexcept' inline default trivial
|-TypedefDecl 0x79143e8 <line:4:1, col:33> col:33 referenced string_view 'basic_string_view<char>':'basic_string_view<char>'
| `-TemplateSpecializationType 0x7914360 'basic_string_view<char>' sugar basic_string_view
|   |-TemplateArgument type 'char'
|   | `-BuiltinType 0x78c7e50 'char'
|   `-RecordType 0x7914340 'basic_string_view<char>'
|     `-ClassTemplateSpecialization 0x7914240 'basic_string_view'
|-ClassTemplateDecl 0x79145b0 prev 0x7914038 <line:6:1, line:15:1> line:10:1 basic_string_view
| |-TemplateTypeParmDecl 0x7914448 <line:6:10, col:16> col:16 class depth 0 index 0 _CharT
| |-CXXRecordDecl 0x7914518 prev 0x7913f88 <line:7:1, line:15:1> line:10:1 class basic_string_view definition
| | |-DefinitionData empty standard_layout trivially_copyable has_user_declared_ctor can_const_default_init
| | | |-DefaultConstructor exists non_trivial user_provided defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial constexpr needs_implicit
| | |-OwnerAttr 0x79146d0 <line:9:8, col:25> string_view
| | |-CXXRecordDecl 0x7914738 <line:7:1, line:10:1> col:1 implicit referenced class basic_string_view
| | |-AccessSpecDecl 0x79147e8 <line:11:1, col:7> col:1 public
| | `-CXXConstructorDecl 0x79148a0 <line:12:5, line:14:5> line:12:5 basic_string_view<_CharT> 'void ()'
| |   `-CompoundStmt 0x7914978 <line:13:5, line:14:5>
| `-ClassTemplateSpecialization 0x7914240 'basic_string_view'

The most significant difference with preferred_name is: there is another OwnerAttr in CXXRecordDecl at 0x7913f88. So it would read the attributes before it starts to read ClassTemplateSpecializationDecl. This is done by https://github.com/llvm/llvm-project/blob/469044cfd355d34573643a57b5d2a78a9c341327/clang/lib/Sema/SemaDeclAttr.cpp#L5051-L5052. When we handle OwnerAttr, it would add the attribute for every redeclaration. But PreferredNameAttr would only add the attribute for the declaration itself. So here is the problem.

Then I tried to add PreferredNameAttr for every redeclaration. But I met two problems.

(1) It would meet an infinite instantiation problem for the following case:

C++
template<int A, int B, typename ...T> struct Foo;
template<typename ...T> using Bar = Foo<1, 2, T...>;
using Bar2 = Foo<1, 2, int, float>;
template<int A, int B, typename ...T> struct [[clang::preferred_name(::Bar<T...>)]] Foo {};
Foo<1, 2, int, float>::nosuch x; // expected-error {{no type named 'nosuch' in 'preferred_name::Bar<int, float>'}}

::Foo<1, 2, int, float>::nosuch x; // expected-error {{no type named 'nosuch' in 'preferred_name::Bar<int, float>'}}

But gsl::Owner wouldn't meet the problem. And I haven't found the reason.

(2) It would crash when compiling stdmodules. The crash stack is relatively long and I haven't located the reason. So I omit them here.


@tahonermann @aaron.ballman As a summary, I admit this revision is something like "I met a problem that I couldn't solve so I tried to skip it instead of solving it". I understand this is generally bad. I have fought with the problem for about 2 weeks. Then I found things goes pretty good after I tried to skip it. Now the stdmodules have these examples: https://github.com/ChuanqiXu9/stdmodules/tree/master/examples. I know this is completely incomplete. But it should be a good direction. And I am trying to use it in actual projects, too. And the blocking issue is the preferred_name, which is a printer hinter to me. So I just feel like it may be not too bad to skip it. And clang15 is going to be released recently. It would be good to have something that people could have fun with. I know the current status of C++20 Modules is not mature really. But it would be really helpful if some users are willing to eat dog foods to give feedbacks. It looks like I am the only one tester for C++20 Modules now.

It looks like @erichkeane likes the idea. How do you think about it @tahonermann @aaron.ballman?

I'll note that I don't "like" the idea, so much as I see this patch as an 'improvement' to unblock efforts with additional value, though not much of one. I suspect as Aaron does that there is a significant problem that needs fixing here with how our serialization works, and that we should be putting effort into fixing that.

As far as THIS patch, I would be OK with 'holding my nose' and accepting it short-term so we can get Module-ized versions of the STL into clang-15, but only if someone was still actively working ot figure out our serialization problem, and only if Tom/Aaron are OK holding their nose as well.

I'll note that I don't "like" the idea, so much as I see this patch as an 'improvement' to unblock efforts with additional value, though not much of one. I suspect as Aaron does that there is a significant problem that needs fixing here with how our serialization works, and that we should be putting effort into fixing that.

As far as THIS patch, I would be OK with 'holding my nose' and accepting it short-term so we can get Module-ized versions of the STL into clang-15, but only if someone was still actively working ot figure out our serialization problem, and only if Tom/Aaron are OK holding their nose as well.

Oh, got it. Sorry for misunderstanding. The conclusion here makes sense. I would try to look at the serialization problem in a long term.

Personally, the problem is not serializing class template specializations "too early". The problem is that we are deserializing class template specializations "too early".

Yes, I agree.

The key point here is that Modules would read declarations lazily for performance reasons. This is a feature of modules.

Indeed it is, and an important one. I don't think laziness poses a problem though; the issue is ensuring that dependencies are (de)serialized in a proper order. In this case, that implies that, at least for a class template declared with the __preferred_name__ attribute, a non-defining declaration must be deserialized before the defining declaration.

I can imagine several ways in which things may be going wrong. Perhaps (de)serialization is switching to a canonical declaration when it should not be (e.g., reading/writing a definition when only a declaration should be). Perhaps, for implicitly instantiated declarations (at least for ones for which __preferred_name__ is attached), separate non-defining and defining declarations should be (de)serialized (in the dumps above, there appears to only be a single address associated with the basic_string_view<char> specialization). Perhaps ClassTemplateSpecializationDecl should be (de)serialized such that a (valid) non-defining declaration is created before attempting to load anything else that might lead to recursion to the same declaration (the intent being that an early return when recursion is detected still yields a valid incomplete declaration).

I briefly looked at some code, but (unsurprisingly) did not see anything obvious. I am inclined towards finding a better fix for this before accepting it. If a good fix isn't identified for the initial Clang 15 release, I think an argument can be made for including it in a Clang 15.0.X release.

ChuanqiXu updated this revision to Diff 446404.Jul 21 2022, 2:53 AM
ChuanqiXu retitled this revision from [Modules] Disable preferred_name attribute in C++20 Modules to [C++20] [Modules] Warn for the use of preferred_name in modules.
ChuanqiXu edited the summary of this revision. (Show Details)

Don't try to skip the problem. Warns for it instead.

I can imagine several ways in which things may be going wrong. Perhaps (de)serialization is switching to a canonical declaration when it should not be (e.g., reading/writing a definition when only a declaration should be). Perhaps, for implicitly instantiated declarations (at least for ones for which __preferred_name__ is attached), separate non-defining and defining declarations should be (de)serialized (in the dumps above, there appears to only be a single address associated with the basic_string_view<char> specialization). Perhaps ClassTemplateSpecializationDecl should be (de)serialized such that a (valid) non-defining declaration is created before attempting to load anything else that might lead to recursion to the same declaration (the intent being that an early return when recursion is detected still yields a valid incomplete declaration).

I feel like the problem may come in the process of instantiation. For example, if we add the attribute for CXXRecordDecl at 0x7913a18 in this specific case, it would be fine. Although it may be OK to address the problem in serialization/deserialization side too, I just feel like it might not be worhy now..

@erichkeane @aaron.ballman @tahonermann I get another idea. We could tell the user honestly that we couldn't handle it (temporarily). We could still modularize the STL by a trick: https://github.com/ChuanqiXu9/stdmodules/blob/e81f4e9e74f96021f2e45c48f44da93e806c4524/Makefile#L3

In this way, we don't need to worry about the dirty implementation pollutes the sources. (I think it is not bad to emit a warning to tell the user we couldn't do something (now).)

@erichkeane @aaron.ballman @tahonermann I get another idea. We could tell the user honestly that we couldn't handle it (temporarily). We could still modularize the STL by a trick: https://github.com/ChuanqiXu9/stdmodules/blob/e81f4e9e74f96021f2e45c48f44da93e806c4524/Makefile#L3

In this way, we don't need to worry about the dirty implementation pollutes the sources. (I think it is not bad to emit a warning to tell the user we couldn't do something (now).)

Thats a heck of an "STL Trick". The diagnostic itself needs some rewording, and I don't know if we do diagnostics for our implementation deficiencies.

I still consider the 'disable the attribute' as perhaps the 'best idea' for now other than fixing the bug, but had suggested earlier doing it NOT at the SemaDeclAttr level, but to do it at the ASTWriter-when-in-modules level.

@erichkeane @aaron.ballman @tahonermann I get another idea. We could tell the user honestly that we couldn't handle it (temporarily). We could still modularize the STL by a trick: https://github.com/ChuanqiXu9/stdmodules/blob/e81f4e9e74f96021f2e45c48f44da93e806c4524/Makefile#L3

In this way, we don't need to worry about the dirty implementation pollutes the sources. (I think it is not bad to emit a warning to tell the user we couldn't do something (now).)

Thats a heck of an "STL Trick". The diagnostic itself needs some rewording, and I don't know if we do diagnostics for our implementation deficiencies.

I know there was an example. In clang14.x and before, when we write module partitions, the compiler would say "sorry, module partitions are not yet supported". We can find this in https://github.com/llvm/llvm-project/commit/69350e569dc47f871590243b5e46a68520640dcd.

I still consider the 'disable the attribute' as perhaps the 'best idea' for now other than fixing the bug, but had suggested earlier doing it NOT at the SemaDeclAttr level, but to do it at the ASTWriter-when-in-modules level.

It looks better to workaround it in ASTWriter in Sema to me. I sent another revision D130331 as an alternative since the current revision looks acceptable to me. I will be happy if either of them get accepted.

ChuanqiXu abandoned this revision.Jul 26 2022, 8:56 AM

Abandon this since we're prefer D130331, thanks for every one here!