Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[clang] template / auto deduction deduces common sugar
ClosedPublic

Authored by mizvekov on Oct 6 2021, 6:09 PM.

Details

Summary

After upgrading the type deduction machinery to retain type sugar in
D110216, we were left with a situation where there is no general
well behaved mechanism in Clang to unify the type sugar of multiple
deductions of the same type parameter.

So we ended up making an arbitrary choice: keep the sugar of the first
deduction, ignore subsequent ones.

In general, we already had this problem, but in a smaller scale.
The result of the conditional operator and many other binary ops
could benefit from such a mechanism.

This patch implements such a type sugar unification mechanism.

The basics:

This patch introduces a getCommonSugaredType(QualType X, QualType Y)
method to ASTContext which implements this functionality, and uses it
for unifying the results of type deduction and return type deduction.
This will return the most derived type sugar which occurs in both X and
Y.

Example:

Suppose we have these types:

using Animal = int;
using Cat = Animal;
using Dog = Animal;

using Tom = Cat;
using Spike = Dog;
using Tyke = Dog;

For X = Tom, Y = Spike, this will result in Animal.
For X = Spike, Y = Tyke, this will result in Dog.

How it works:

We take two types, X and Y, which we wish to unify as input.
These types must have the same (qualified or unqualified) canonical
type.

We dive down fast through top-level type sugar nodes, to the
underlying canonical node. If these canonical nodes differ, we
build a common one out of the two, unifying any sugar they had.
Note that this might involve a recursive call to unify any children
of those. We then return that canonical node, handling any qualifiers.

If they don't differ, we walk up the list of sugar type nodes we dived
through, finding the last identical pair, and returning that as the
result, again handling qualifiers.

Note that this patch will not unify sugar nodes if they are not
identical already. We will simply strip off top-level sugar nodes that
differ between X and Y. This sugar node unification will instead be
implemented in a subsequent patch.

This patch also implements a few users of this mechanism:

  • Template argument deduction.
  • Auto deduction, for functions returning auto / decltype(auto), with special handling for initializer_list as well.

Further users will be implemented in a subsequent patch.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mizvekov added inline comments.Sep 1 2022, 7:30 PM
clang/include/clang/AST/ASTContext.h
2822–2826

Assertion failure, the function is supposed to handle only the case where hasSame[Unqualified]Type(X, Y) == true.

It will always return a type R for which hasSame[Unqualified]Type(R, Y) == true also holds, so there are no special return values.

davrec accepted this revision.Sep 5 2022, 5:03 PM

I hope I'm not stepping on any toes given the recent changes in code ownership, but I'm accepting this and D130308 because

  1. They are a nice improvement to the AST that naturally follows from the earlier work adding sugar to type deductions,
  2. I have given it a fairly close look and it LGTM,
  3. @rsmith has taken a step back and so may not be available to give it a final look,
  4. @mizvekov has verified that the performance impact is negligible, and
  5. Other reviewers seem to have at least given it a once over and have not identified major issues.

Anyone object/need more time to review?

Thanks for your patience, this was a sizable review that it took a while to be able to make time for. A handful of non-functional reviews, but the functionality looks fine.

clang/lib/AST/ASTContext.cpp
12146

As a nit, I'd prefer this assert above the 'if' above, that way it catches Nulls in that case. It seems that the 'contract' for this function is no null values, so we want to catch this ASAP.

12152

For class types, canonical decl is usually the definition if one exists. So I assume we do mean 'first' here.

12161

Do we REALLY need to tolerate 'null' here as well? It seems we should do a normal cast and have the cast assert.

12179

Please comment these functions, it isn't clear on read through what you mean by 'Array' here. You probably mean for this to be something like getCommonTypes.

12211

Please give me a message here, just anything reasonably descriptive so that when it shows up I have something to grep.

12248

I guess I don't see the po int of the next 3 functions? There is no sugaring or anything, AND they already assume they are the same expression/element/etc?

12330

something a little more unique in this (and others!) would be appreciated.

mizvekov updated this revision to Diff 458587.Sep 7 2022, 3:51 PM
mizvekov edited the summary of this revision. (Show Details)
mizvekov marked 5 inline comments as done.
mizvekov marked an inline comment as done.Sep 7 2022, 4:33 PM
mizvekov added inline comments.
clang/lib/AST/ASTContext.cpp
12146

Actually we want to handle the case they are both null, in which case we return null as well.
This case would trip the assert below, as declaresSameEntity considers that not the same entity.

For example, unconstrained AutoType will have null ConceptDecl.

In the D130308 patch, we add another variant, `getCommonDeclChecked, which requires non-null inputs, which can be used in places where non-null is required.

I went ahead and moved the introduction of getCommonDeclChecked to this patch, even though there is only one use of it.

12161

Yes, as above.

12179

Yeah that name works as well.

12248

Well the point is to not repeat the assert in all use sites.

Thanks for your patience, this was a sizable review that it took a while to be able to make time for. A handful of non-functional reviews, but the functionality looks fine.

Thanks, I appreciate the effort!

erichkeane accepted this revision.Sep 8 2022, 6:18 AM
This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2022, 10:18 AM
This revision was automatically updated to reflect the committed changes.
ychen added a subscriber: ychen.Sep 11 2022, 11:00 PM

Hi @mizvekov , I noticed that deduction for partial ordering also inherits this new behavior. Do you think partial ordering could opt out of this, or is it better for partial ordering to deal with the new sugared types?

Hi @mizvekov , I noticed that deduction for partial ordering also inherits this new behavior. Do you think partial ordering could opt out of this, or is it better for partial ordering to deal with the new sugared types?

I would expect partial ordering to be performed only using canonical types, that type sugar should make no difference there.

Note that previous to this patch, we would already produce sugar on deduction, but the behavior with regards to deducing the same template parameter from multiple places was pretty badly behaved: We would just keep the sugar from the first deduction, and ignore any subsequent ones. That meant that the deduction order had an effect on the result. With this patch, deduction becomes order agnostic.

What kind of difference are you seeing?

alexfh added a subscriber: alexfh.Sep 12 2022, 6:29 AM

Hi Matheus, an early heads up: this commit is causing clang crashes on some of our code. I'll post more details a bit later.

Hi @mizvekov , I noticed that deduction for partial ordering also inherits this new behavior. Do you think partial ordering could opt out of this, or is it better for partial ordering to deal with the new sugared types?

I would expect partial ordering to be performed only using canonical types, that type sugar should make no difference there.

Note that previous to this patch, we would already produce sugar on deduction, but the behavior with regards to deducing the same template parameter from multiple places was pretty badly behaved: We would just keep the sugar from the first deduction, and ignore any subsequent ones. That meant that the deduction order had an effect on the result. With this patch, deduction becomes order agnostic.

I see. I think it is definitely a good thing. I'm still learning what the expected AST should look like during the partial ordering.

What kind of difference are you seeing?

For

template <typename...> struct A {};

template <class T>
bool foo(A<T>);

template <class T, class... Args>
bool foo(A<T, Args...>);

template <class Tuple> bool bar()
{
    return foo(Tuple{});
}

A<T> is currently modeled as ElaboratedType. It was TemplateSpecializationType before. Reading comments for ElaboratedType, it looks like sugar type might not be needed here?

ElaboratedType 0xd79c8f0 'A<T>' sugar dependent
`-TemplateSpecializationType 0xd79c8b0 'A<T>' dependent A
  `-TemplateArgument type 'T'
    `-TemplateTypeParmType 0xd79c7f0 'T' dependent depth 0 index 0
      `-TemplateTypeParm 0xd79c768 'T'

A<T> is currently modeled as ElaboratedType. It was TemplateSpecializationType before. Reading comments for ElaboratedType, it looks like sugar type might not be needed here?

Ah you might be seeing an ElaboratedTYpe here, where there was none before, perhaps because of https://reviews.llvm.org/D112374, and not because of this patch.

Yeah ElaboratedType is just sugar that should have no effect on partial ordering. But then I think no sugar should have effect on partial ordering. What is stopping you from just looking at the canonical type instead? On a canonical type, you would never see an ElaboratedType node, or a TemplateSpecializationType which is not dependent.

Is this related to the AutoType canonicalization issues we were discussing in the other patch of yours?

A<T> is currently modeled as ElaboratedType. It was TemplateSpecializationType before. Reading comments for ElaboratedType, it looks like sugar type might not be needed here?

Ah you might be seeing an ElaboratedTYpe here, where there was none before, perhaps because of https://reviews.llvm.org/D112374, and not because of this patch.

Yeah ElaboratedType is just sugar that should have no effect on partial ordering. But then I think no sugar should have effect on partial ordering. What is stopping you from just looking at the canonical type instead? On a canonical type, you would never see an ElaboratedType node, or a TemplateSpecializationType which is not dependent.

Thanks for the link. I'm not blocked by any of these patches, instead just trying to have a mental model of when to expect the sugared type :-). For partial ordering, the TemplateSpecializationType could be dependent, since the injected template arguments are dependent, I guess that's the reason there is the ElaboratedType?

Is this related to the AutoType canonicalization issues we were discussing in the other patch of yours?

Nope. I found this AST difference while investigating D133683.

Thanks for the link. I'm not blocked by any of these patches, instead just trying to have a mental model of when to expect the sugared type :-). For partial ordering, the TemplateSpecializationType could be dependent, since the injected template arguments are dependent, I guess that's the reason there is the ElaboratedType?

The ElaboratedType is a sort of a companion node to other nodes that represent things in the language which can have (non-dependent) nested name qualifiers (a NNS for short) and / or an elaborated type specifier (such as the struct in struct A).

It's only purpose is to carry that extra bit of data, like some external storage really, and it shouldn't affect the semantics of the program once the source code is parsed into an AST.

Here, in your example, the ElaboratedType is there, as a companion to that TemplateSpecializationType, just to say that this template specialization was written without any name qualifiers nor elaboration.

ychen added a comment.Sep 12 2022, 2:10 PM

Thanks for the link. I'm not blocked by any of these patches, instead just trying to have a mental model of when to expect the sugared type :-). For partial ordering, the TemplateSpecializationType could be dependent, since the injected template arguments are dependent, I guess that's the reason there is the ElaboratedType?

The ElaboratedType is a sort of a companion node to other nodes that represent things in the language which can have (non-dependent) nested name qualifiers (a NNS for short) and / or an elaborated type specifier (such as the struct in struct A).

It's only purpose is to carry that extra bit of data, like some external storage really, and it shouldn't affect the semantics of the program once the source code is parsed into an AST.

Here, in your example, the ElaboratedType is there, as a companion to that TemplateSpecializationType, just to say that this template specialization was written without any name qualifiers nor elaboration.

Very helpful explanation :-).

Hi Matheus, an early heads up: this commit is causing clang crashes on some of our code. I'll post more details a bit later.

The stack trace of the failure looks like this:

 #0 0x0000564e08f78aca llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./clang-after+0x7778aca)
 #1 0x0000564e08f76b78 llvm::sys::RunSignalHandlers() (./clang-after+0x7776b78)                                                                                                                                                                       
 #2 0x0000564e08f10027 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (./clang-after+0x7710027)
 #3 0x0000564e08f102bd CrashRecoverySignalHandler(int) (./clang-after+0x77102bd)
 #4 0x00007f4773e731c0 __restore_rt
 #5 0x0000564e06591b6b (./clang-after+0x4d91b6b)
 #6 0x0000564e06374ef8 checkDeducedTemplateArguments(clang::ASTContext&, clang::DeducedTemplateArgument const&, clang::DeducedTemplateArgument const&) (./clang-after+0x4b74ef8)
 #7 0x0000564e0636b7b1 DeduceTemplateArgumentsByTypeMatch(clang::Sema&, clang::TemplateParameterList*, clang::QualType, clang::QualType, clang::sema::TemplateDeductionInfo&, llvm::SmallVectorImpl<clang::DeducedTemplateArgument>&, unsigned int, bool, bool) (./clang-after+0x4b6b7b1)
 #8 0x0000564e0636c896 DeduceTemplateArgumentsByTypeMatch(clang::Sema&, clang::TemplateParameterList*, clang::QualType, clang::QualType, clang::sema::TemplateDeductionInfo&, llvm::SmallVectorImpl<clang::DeducedTemplateArgument>&, unsigned int, bool, bool) (./clang-after+0x4b6c896)
 #9 0x0000564e06376430 DeduceTemplateArguments(clang::Sema&, clang::TemplateParameterList*, clang::QualType const*, unsigned int, clang::QualType const*, unsigned int, clang::sema::TemplateDeductionInfo&, llvm::SmallVectorImpl<clang::DeducedTemplateArgument>&, unsigned int, bool) (./clang-after+0x4b76430)
#10 0x0000564e06370895 isAtLeastAsSpecializedAs(clang::Sema&, clang::SourceLocation, clang::FunctionTemplateDecl*, clang::FunctionTemplateDecl*, clang::TemplatePartialOrderingContext, unsigned int, bool) (./clang-after+0x4b70895)
#11 0x0000564e063702c1 clang::Sema::getMoreSpecializedTemplate(clang::FunctionTemplateDecl*, clang::FunctionTemplateDecl*, clang::SourceLocation, clang::TemplatePartialOrderingContext, unsigned int, unsigned int, bool, bool) (./clang-after+0x4b702c1)
#12 0x0000564e0626a07c clang::isBetterOverloadCandidate(clang::Sema&, clang::OverloadCandidate const&, clang::OverloadCandidate const&, clang::SourceLocation, clang::OverloadCandidateSet::CandidateSetKind) (./clang-after+0x4a6a07c)
#13 0x0000564e0625e199 clang::OverloadCandidateSet::BestViableFunction(clang::Sema&, clang::SourceLocation, clang::OverloadCandidate*&) (./clang-after+0x4a5e199)
#14 0x0000564e06272aec clang::Sema::BuildOverloadedCallExpr(clang::Scope*, clang::Expr*, clang::UnresolvedLookupExpr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, bool) (./clang-after+0x4a72aec)
#15 0x0000564e05fd3157 clang::Sema::BuildCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, bool) (./clang-after+0x47d3157)
#16 0x0000564e05feb5b6 clang::Sema::ActOnCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*) (./clang-after+0x47eb5b6)
#17 0x0000564e063d0bc6 clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformCallExpr(clang::CallExpr*) (./clang-after+0x4bd0bc6)
#18 0x0000564e063c80f9 clang::Sema::SubstInitializer(clang::Expr*, clang::MultiLevelTemplateArgumentList const&, bool) (./clang-after+0x4bc80f9)
#19 0x0000564e064057c8 clang::Sema::InstantiateVariableInitializer(clang::VarDecl*, clang::VarDecl*, clang::MultiLevelTemplateArgumentList const&) (./clang-after+0x4c057c8)
#20 0x0000564e063fa1be clang::Sema::BuildVariableInstantiation(clang::VarDecl*, clang::VarDecl*, clang::MultiLevelTemplateArgumentList const&, llvm::SmallVector<clang::Sema::LateInstantiatedAttribute, 16u>*, clang::DeclContext*, clang::LocalInstantiationScope*, bool, clang::VarTemplateSpecializationDecl*) (./clang-after+0x4bfa1be)
#21 0x0000564e063f99db clang::TemplateDeclInstantiator::VisitVarDecl(clang::VarDecl*, bool, llvm::ArrayRef<clang::BindingDecl*>*) (./clang-after+0x4bf99db)
#22 0x0000564e0642f599 void llvm::function_ref<void ()>::callback_fn<clang::Sema::SubstDecl(clang::Decl*, clang::DeclContext*, clang::MultiLevelTemplateArgumentList const&)::$_0>(long) (./clang-after+0x4c2f599)
#23 0x0000564e05cc55ee clang::Sema::runWithSufficientStackSpace(clang::SourceLocation, llvm::function_ref<void ()>) (./clang-after+0x44c55ee)
#24 0x0000564e06406ceb clang::Sema::SubstDecl(clang::Decl*, clang::DeclContext*, clang::MultiLevelTemplateArgumentList const&) (./clang-after+0x4c06ceb)
#25 0x0000564e063e91ac clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformDeclStmt(clang::DeclStmt*) (./clang-after+0x4be91ac)
#26 0x0000564e063e08c7 clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformCompoundStmt(clang::CompoundStmt*, bool) (./clang-after+0x4be08c7)
#27 0x0000564e063c96be clang::Sema::SubstStmt(clang::Stmt*, clang::MultiLevelTemplateArgumentList const&) (./clang-after+0x4bc96be)
#28 0x0000564e06408b57 clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, clang::FunctionDecl*, bool, bool, bool) (./clang-after+0x4c08b57)
#29 0x0000564e0640ad95 clang::Sema::PerformPendingInstantiations(bool) (./clang-after+0x4c0ad95)
#30 0x0000564e06408c54 clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, clang::FunctionDecl*, bool, bool, bool) (./clang-after+0x4c08c54)
#31 0x0000564e0640ad95 clang::Sema::PerformPendingInstantiations(bool) (./clang-after+0x4c0ad95)
#32 0x0000564e05cc7b40 clang::Sema::ActOnEndOfTranslationUnitFragment(clang::Sema::TUFragmentKind) (./clang-after+0x44c7b40)
#33 0x0000564e05cc8877 clang::Sema::ActOnEndOfTranslationUnit() (./clang-after+0x44c8877)
#34 0x0000564e05a344ec clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) (./clang-after+0x42344ec)
#35 0x0000564e05a2effe clang::ParseAST(clang::Sema&, bool, bool) (./clang-after+0x422effe)
#36 0x0000564e057af31a clang::FrontendAction::Execute() (./clang-after+0x3faf31a)
#37 0x0000564e057224e6 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (./clang-after+0x3f224e6)
#38 0x0000564e049d9446 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (./clang-after+0x31d9446)
#39 0x0000564e049cdb40 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (./clang-after+0x31cdb40)
#40 0x0000564e049cbda4 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (./clang-after+0x31cbda4)
#41 0x0000564e058afcf7 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef>>, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>*, bool*) const::$_1>(long) (./clang-after+0x40afcf7)
#42 0x0000564e08f0fef1 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (./clang-after+0x770fef1)
#43 0x0000564e058af64c clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef>>, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>*, bool*) const (./clang-after+0x40af64c)
#44 0x0000564e05874f77 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (./clang-after+0x4074f77)
#45 0x0000564e05875230 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*>>&, bool) const (./clang-after+0x4075230)
#46 0x0000564e058911af clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__u::pair<int, clang::driver::Command const*>>&) (./clang-after+0x40911af)
#47 0x0000564e049cb50b clang_main(int, char**) (./clang-after+0x31cb50b)
#48 0x00007f4773d04633 __libc_start_main
#49 0x0000564e049c846a _start (./clang-after+0x31c846a)
clang-after: error: clang frontend command failed with exit code 133 (use -v to see invocation)
clang version google3-trunk (1819d5999c7091781a6441dad4bcede9c554ca90)
Target: x86_64-unknown-linux-gnu

I'm reducing the test case.

Hi Matheus, an early heads up: this commit is causing clang crashes on some of our code. I'll post more details a bit later.

The stack trace of the failure looks like this:

And the assertions-enabled build of clang prints this:

Unexpected sugar-free: TemplateTypeParm
UNREACHABLE executed at llvm-project/clang/lib/AST/ASTContext.cpp:12383!

Hi Matheus, an early heads up: this commit is causing clang crashes on some of our code. I'll post more details a bit later.

The stack trace of the failure looks like this:

And the assertions-enabled build of clang prints this:

Unexpected sugar-free: TemplateTypeParm
UNREACHABLE executed at llvm-project/clang/lib/AST/ASTContext.cpp:12383!

Reduced test case:

$ cat q.cc
struct a {
  int b;
};
template <typename c, typename d> using e = c d::*;
struct f {
  long b() const;
  template <typename c> c g(e<c, a>) const;
  a h;
};
template <typename c, typename d, typename... i>
void j(e<c, d>, const d *, i...);
template <typename c, typename d> c j(e<c, d>, const d *);
template <typename c> c f::g(e<c, a> k) const { return j(k, &h); }
long f::b() const { return g(&a::b); }
$ ./clang-after q.cc
...
1.      <eof> parser at end of file
2.      q.cc:13:28: instantiating function definition 'f::g<int>'
...
clang-after: error: unable to execute command: Trace/breakpoint trap
clang-after: error: clang frontend command failed due to signal (use -v to see invocation)

Please fix or revert soon.

Thanks!

I've reverted d200db38637884fd0b421802c6094b2a03ceb29e in 637da9de4c6619c0e179c2c2f0dbfebd08ac2a0f.

And a bit more compact test case:

template <typename T>
using P = int T::*;

template <typename T, typename... A>
void j(P<T>, T, A...);

template <typename T>
void j(P<T>, T);

struct S {
  int b;
};
void g(P<S> k, S s) { j(k, s); }
mizvekov reopened this revision.Sep 13 2022, 3:38 AM

Even simpler:

template <class T> using P = T;

template <class T, class... A> void j(P<T>, T, A...);
template <class T> void j(P<T>, T);

void g() { j(P<int>{}, int{}); }
davrec added inline comments.Sep 13 2022, 5:41 AM
clang/lib/AST/ASTContext.cpp
12370

Could we just return X here? Would that just default to the old behavior instead of crashing whenever unforeseen cases arise?

mizvekov added inline comments.Sep 13 2022, 5:52 AM
clang/lib/AST/ASTContext.cpp
12370

No, I think we should enforce the invariants and make sure we are handling everything that can be handled.

Classing TemplateTypeParm as sugar-free was what was wrong and we missed this on the review.

mizvekov updated this revision to Diff 459725.Sep 13 2022, 5:58 AM
mizvekov edited the summary of this revision. (Show Details)
mizvekov updated this revision to Diff 459726.Sep 13 2022, 6:05 AM
davrec added inline comments.Sep 13 2022, 6:05 AM
clang/lib/AST/ASTContext.cpp
12370

There might always going to be a few rare corner cases vulnerable to this though, particularly as more types are added and the people adding them don't pay strict attention to how to incorporate them here, and don't write the requisite tests (which seem very difficult to foresee and produce). When those cases arise we will be crashing even though we could produce a perfectly good program with the intended semantics; the only thing that would suffer for most users is slightly less clear diagnostic messages for those rare cases. I think it would be better to let those cases gradually percolate to our attention via bug reports concerning those diagnostics, rather than inconveniencing the user and demanding immediate attention via crashes. Or am I missing something?

mizvekov added inline comments.Sep 13 2022, 6:13 AM
clang/lib/AST/ASTContext.cpp
12370

I could on the same argument remove all asserts here and just let the program not crash on unforeseen circumstances.

On the other hand, having these asserts here helps us catch bugs not only here, but in other parts of the code. For example uniquing / canonicalization bugs.

If someone changes the properties of a type so that the assumptions here are not valid anymore, it's helpful to have that pointed out soon.

Going for as an example this specific bug, if there werent those asserts / unreachables in place and we had weakened the validation here, it would take a very long time for us to figure out we were making the wrong assumption with regards to TemplateTypeParmType.

I'd rather figure that out sooner on CI / internal testing rather than have a bug created by a user two years from now.

davrec added inline comments.Sep 13 2022, 6:40 AM
clang/lib/AST/ASTContext.cpp
12370

Yes its nicer to developers to get stack traces and reproductions whenever something goes wrong, and crashing is a good way to get those, but users probably won't be so thrilled about it. Especially given that the main selling point of this patch is that it makes diagnostics nicer for users: isn't it a bit absurd to crash whenever we can't guarantee our diagnostics will be perfect?

And again the real problem is future types not being properly incorporated here and properly tested: i.e. the worry is that this will be a continuing source of crashes, even if we handle all the present types properly.

How about we leave it as an unreachable for now, to help ensure all the present types are handled, but if months or years from now there continue to be crashes due to this, then just return X (while maybe write something to llvm::errs() to encourage users to report it), so we don't make the perfect the enemy of the good.

For reference, another small reproducer of the crash, but with a different stack trace than the first example posted here:

// Must compile with -std=c++03 to crash
#include <algorithm>

int main(int, char**)
{
  int i[3] = {1, 2, 3};
  int j[3] = {4, 5, 6};
  std::swap(i, j);

  return 0;
}

Compile with -std=c++03 to reproduce the assertion failure.
We found it by running the libcxx tests.

mizvekov added inline comments.Sep 13 2022, 8:48 AM
clang/lib/AST/ASTContext.cpp
12370

It's not about crashing when it won't be perfect. We already do these kinds of things, see the FIXMEs around the TemplateArgument and NestedNameSpecifier, where we just return something worse than we wish to, just because we don't have time to implement it now.

These unreachables and asserts here are about testing / documenting our knowledge of the system and making it easier to find problems. These should be impossible to happen in correct code, and if they do happen because of mistakes, fixing them sooner is just going to save us resources.

llvm::errs suggestion I perceive as out of line with current practice in LLVM, we don't and have a system for producing diagnostics for results possibly affected by FIXME and TODO situations and the like, as far as I know, and I am not aware of any discussions in that direction. I expect a lot of complexity and noise if we went this way.
And I think this would have even less chance of working out if we started to incorporate the reporting of violation of invariants into that.

I think even just using raw llvm::errs on clang would be incorrect per design, and could likely break users that parse our output.

I think if months and years from now, if someone stumbled upon an assert firing here and, instead of understanding what is happening and then fixing the code, just removed / weakened the assert, that would simply not be good and I hope a reviewer would stop that from happening :)

@alexfh This new revision that I just pushed should be good.

Do you want to give it a look / test, or should we go ahead and merge it?

mizvekov added a comment.EditedSep 13 2022, 8:58 AM

For reference, another small reproducer of the crash, but with a different stack trace than the first example posted here:

// Must compile with -std=c++03 to crash
#include <algorithm>

int main(int, char**)
{
  int i[3] = {1, 2, 3};
  int j[3] = {4, 5, 6};
  std::swap(i, j);

  return 0;
}

Compile with -std=c++03 to reproduce the assertion failure.
We found it by running the libcxx tests.

Thanks. Is this a different crash, or is It the same unreachable firing as the one @alexfh reported?

EDIT: NVM, I see it's a different problem, thanks for reporting.

aaronpuchert added inline comments.Sep 13 2022, 5:03 PM
clang/lib/AST/ASTContext.cpp
12370

I tend to agree that an assertion is appropriate for this. But you could turn this into

assert(false && "...");
return X;

which would still assert, but fall back to something "reasonable" in builds without assertions. The issue with llvm_unreachable is that it translates to __builtin_unreachable() in builds without assertions, and the optimizer takes advantage of that quite heavily. That's why llvm_unreachable is better left to places where we're pretty sure they're unreachable unless something went horribly wrong, such as after switches that handle all enumeration values.

mizvekov added inline comments.Sep 13 2022, 5:49 PM
clang/lib/AST/ASTContext.cpp
12370

I see, that is reasonable.

But grepping for assert(false in the code base, I see that they are pretty rare, accounting for a bit less than 0.5% in clang/ and 0.3% in llvm/, compared to the total number of asserts. They occur fifty times less than plain llvm_unreachables. I don't remember ever seeing them in practice, so I was not aware this was something we would do.

I feel like in these cases where we are using llvm_unreachable in this patch, I am 100% sure they are really unreachable, unless something went horribly wrong. But I mean I could be completely wrong as well, I must admit, so maybe in those cases that is what went horribly wrong? (Me being wrong I mean)

mizvekov updated this revision to Diff 459944.Sep 13 2022, 6:07 PM
mizvekov updated this revision to Diff 459946.Sep 13 2022, 6:17 PM

@vhscampos that issue should be fixed now as well, thanks for the report!

mizvekov updated this revision to Diff 459959.Sep 13 2022, 7:24 PM
mizvekov updated this revision to Diff 460010.Sep 14 2022, 1:45 AM
mizvekov updated this revision to Diff 460021.Sep 14 2022, 3:33 AM
aaronpuchert added inline comments.Sep 14 2022, 1:47 PM
clang/lib/AST/ASTContext.cpp
12370

It depends on how likely you think it is that someone inadvertently violates the invariant without noticing (even after running the tests). For internal invariants or things unlikely to miss in a test, llvm_unreachable should be fine. The less certain you are that mistakes would be hard to make or easy to find, the more preferable would be a fallback. If you think this was a one-off blunder and is unlikely to reoccur, you can also stick to llvm_unreachable. So this was just a suggestion, and since I merely skimmed over the code I'm not the best to judge.

davrec added inline comments.Sep 15 2022, 7:06 AM
clang/lib/AST/ASTContext.cpp
12370

The assert(false) was a great suggestion. I see the argument for unreachable too.

It seems there are two ways to proceed: unreachable for now, but downgrade to assert(false) if the issue re-arises, or assert(false) for now, upgraded to unreachable later if we find that assert is never tripped.

I'm inclined to prefer the latter, assert(false) for now, assuming there is not a noticeable performance impact on optimized builds with no assertions.

Reason: IIUC, it is not just a) new types, but also b) changes to Sema to support new features that do not construct type sugar properly but are not caught during review, could introduce bugs here that are not detected until a corner case (involving combining instances of such types) is discovered much later. While these bugs do need to be caught and fixed, it would be nice if builds without assertions could continue to work while they were.

But either course is reasonable and @mizekov has the best understanding of how likely such bugs are to find their way in.

@alexfh This new revision that I just pushed should be good.

Do you want to give it a look / test, or should we go ahead and merge it?

Thanks for the fix! If it fixes the test case I provided, that should be enough for now. If it breaks anything else, we'll find this out in a few days after you land the patch.

mizvekov updated this revision to Diff 460586.Sep 15 2022, 6:09 PM
This revision was not accepted when it landed; it landed in state Needs Review.Sep 16 2022, 2:21 AM
This revision was automatically updated to reflect the committed changes.