This is an archive of the discontinued LLVM Phabricator instance.

[clang] Implement divergence for TypedefType and UsingType
ClosedPublic

Authored by mizvekov on Sep 7 2022, 7:43 PM.

Details

Summary

With this patch, TypedefTypes and UsingTypes can have an
underlying type which diverges from their corresponding
declarations.

For the TypedefType case, this can be seen when getting
the common sugared type between two redeclarations with
different sugar.

For both cases, this will become important as resugaring
is implemented, as this will allow us to resugar these
when they were dependent before instantiation.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Diff Detail

Event Timeline

mizvekov created this revision.Sep 7 2022, 7:43 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: martong. · View Herald Transcript
mizvekov requested review of this revision.Sep 7 2022, 7:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 7:43 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
mizvekov edited the summary of this revision. (Show Details)Sep 7 2022, 7:45 PM
mizvekov updated this revision to Diff 459368.Sep 11 2022, 7:47 AM
mizvekov updated this revision to Diff 460133.Sep 14 2022, 9:24 AM
mizvekov updated this revision to Diff 460593.Sep 15 2022, 6:10 PM
martong removed a subscriber: martong.Sep 16 2022, 12:19 AM
mizvekov updated this revision to Diff 460910.Sep 16 2022, 2:34 PM
mizvekov updated this revision to Diff 461015.Sep 17 2022, 10:43 AM
mizvekov edited the summary of this revision. (Show Details)

I'm still having trouble from the description and test figuring out what this means. Can you better clarify the intent/effect of this?

Is the point that the type-aliases could also store a non-canonicalized type? if so, perhaps the patch should better reflect that (AND, perhaps do some assertions to make sure we don't unintentionally store types that don't canonicalize identically).

clang/include/clang/AST/Type.h
1808

It isn't clear to me what you mean by 'diverges' and 'divergent here.

clang/lib/AST/Type.cpp
3443

this bit doesn't seem right to me. Why isn't this using the normal setter here? Or even the trailing objects setter?

I'm still having trouble from the description and test figuring out what this means. Can you better clarify the intent/effect of this?

The effect is that you can effectively have a TypedefType/UsingType that has an underlying type that is not identical to the one which is held in the TypedefDecl/UsingDecl.

Another way we could achieve this, though I am saying it just for exposition as it wouldn't be good, would be by emitting a redeclaration with the type we want. The reason that is not good is because redeclarations are expensive to create and they are not uniqued, and they would waste a lot of memory.

Want we really want this for is resugaring: A TypedefType can have a dependent type, and then when an instantiation of that is created, it will have a Type substitution node that only references the canonical type of the template argument used to instantiate it.
With resugaring, we can replace that underlying type with one referencing the original argument, and would thus necessitate to "change" the underlying type. We don't want to emit redeclarations during resugaring as that is too expensive, as I explained.

Since resugaring is not implemented before this patch, and in order to test this functionality, we use a nifty trick that became possible after we merged D130308: We can unify two different TypedefTypes/UsingTypes that refer to different TypedefDecl/UsingDecl redeclarations (as written in source), each redeclaration having a non-identical type, which after unification becomes yet a different type, which we will use as the underlying type of our unified TypedefType/UsingType.

Is the point that the type-aliases could also store a non-canonicalized type? if so, perhaps the patch should better reflect that (AND, perhaps do some assertions to make sure we don't unintentionally store types that don't canonicalize identically).

They can store a separate QualType which must be the SameType as their canonical type, yes.

The assertion is already in place for both TypedefType and UsingType constructors.

clang/include/clang/AST/Type.h
1808

I ate the type, will fix that, but what I mean here is that a TypedefType/UsingType, without this patch, can only have the same type as is declared in the declaration it references.

With this patch, if the UnderlyingType passed in the constructor is different from the one that is declared, then the node will be able to hold this different type, and we will say that the node is 'divergent', which is the term I came up with, synonymous with different.

clang/lib/AST/Type.cpp
3443

Well there isn't supposed to be a normal setter, but right, I forgot to update this to use the trailing objects setter.

Thanks for the explanation, I think that makes more sense to me. A summarized version in the commit message would be appreciated.

As far as the 'asserts', i see them now in the ASTContext functions.

as far as Divergent, I wonder if we should call it something more descriptive, since it isn't just that it 'differs', it is that it is a non-canonical version, right?

as far as Divergent, I wonder if we should call it something more descriptive, since it isn't just that it 'differs', it is that it is a non-canonical version, right?

Well, currently without this patch TypedefTypes and UsingTypes can already have a non-canonical underlying type, it's just that we only store a reference to the declaration and we access the non-canonical type from there.

The canonical type is still stored in the type itself, like any other type of course.

"Divergent" is a term I came up with, but I would be all ears for a better alternative.

I considered that calling it just Different but that didn't sound right, even though its synonymous.

I considered calling it hasDifferentUnderlyingTypefromDeclaration but felt that was too verbose.

as far as Divergent, I wonder if we should call it something more descriptive, since it isn't just that it 'differs', it is that it is a non-canonical version, right?

Well, currently without this patch TypedefTypes and UsingTypes can already have a non-canonical underlying type, it's just that we only store a reference to the declaration and we access the non-canonical type from there.

Ah, right, hrmph.

The canonical type is still stored in the type itself, like any other type of course.

"Divergent" is a term I came up with, but I would be all ears for a better alternative.

I considered that calling it just Different but that didn't sound right, even though its synonymous.

I considered calling it hasDifferentUnderlyingTypefromDeclaration but felt that was too verbose.

I agree with all of that, but still am not thrilled at 'Divergent', it isn't particularly descriptive... Divergent has some additional implications that I'm not sure we mean as well (that is, it isn't a perfect synonym for different).

Perhaps something more like hasLessCanonicalizedType or hasMoreSpecificType or something like that? I'm grasping a little, but I think I would like it to be more clear that we're storing the SAME type, just with additional sugar.

I agree with all of that, but still am not thrilled at 'Divergent', it isn't particularly descriptive... Divergent has some additional implications that I'm not sure we mean as well (that is, it isn't a perfect synonym for different).

Yeah, it gives the sense of growing further apart, though that distinction would be meaningless for type nodes as they are not supposed to be mutable.

Divergent is a term that is already used within llvm subproject, but not in Clang.

Perhaps something more like hasLessCanonicalizedType or hasMoreSpecificType or something like that? I'm grasping a little, but I think I would like it to be more clear that we're storing the SAME type, just with additional sugar.

Something like hasDifferentSugaredType, though that is still a bit on the long side.

I also considered resugared, that would be my preferred alternative I think, as it's short and specific.

One important consideration is that we will print this in the AST node dumper, so having a short name helps and we won't be compelled to use two different terms.

erichkeane accepted this revision.Sep 19 2022, 7:51 AM

I agree with all of that, but still am not thrilled at 'Divergent', it isn't particularly descriptive... Divergent has some additional implications that I'm not sure we mean as well (that is, it isn't a perfect synonym for different).

Yeah, it gives the sense of growing further apart, though that distinction would be meaningless for type nodes as they are not supposed to be mutable.

Divergent is a term that is already used within llvm subproject, but not in Clang.

Perhaps something more like hasLessCanonicalizedType or hasMoreSpecificType or something like that? I'm grasping a little, but I think I would like it to be more clear that we're storing the SAME type, just with additional sugar.

Something like hasDifferentSugaredType, though that is still a bit on the long side.

I also considered resugared, that would be my preferred alternative I think, as it's short and specific.

One important consideration is that we will print this in the AST node dumper, so having a short name helps and we won't be compelled to use two different terms.

hasDifferentSugaredType is the best option I've seen so far. I'm not in love with it, but it is definitely 'better'. I don't think it is horrifyingly long, and we have similar length things throughout the codebase. If you come up with something better, feel free to commit with it, else this patch LGTM.

This revision is now accepted and ready to land.Sep 19 2022, 7:51 AM

If I understand this correctly: whenever isDivergent() is true, getDecl() or getFoundDecl() is an arbitrary choice between multiple candidate (re)declarations of a typedef or using decl. In particular it will be the first redeclaration, even though each different redeclaration can contain different sugar information.

E.g. considering your examples:

using X1 = int;
using Y1 = int;

using RPB1 = X1*;
using RPX1 = RPB1;
using RPB1 = Y1*; // redeclared
using RPY1 = RPB1;

namespace A { using type1 = X1*; };
namespace C { using A::type1; };
using UPX1 = C::type1;
namespace A { using type1 = Y1*; };  // redeclared
namespace C { using A::type1; };     // redeclared
using UPY1 = C::type1;

So far so good: isDivergent() would be false for all the above TypedefTypes and UsingTypes, since the UnderlyingTypes and Decls/FoundDecls are consistent. But these types are a different matter:

auto t29 = 0 ? (RPX1){} : (RPY1){};
auto t32 = 0 ? (UPX1){} : (UPY1){};

In the process of unifying the sugar to get the deduced type of t29/t32, we will compare their sugar, which has the same basic structure for each, but different decls/found decls; so unifying these types will simply involve unifying their decls; and getCommonDecl(X,Y) has been defined to will simply get the older declaration between them. So, the TypedefType/UsingType sugar in t29/t32 will reference the X1 decls and friends, *not* the Y1s, as its Decls/FoundDecls. (Right?)

The UnderlyingType however will not be as arbitrary, as it will use getCommonType to skip over everything the two don't have in common.

If I have this right, my question is: since the Decl/FoundDecl is an arbitrary choice, should we solve this by either

  1. constructing these with a null Decl/FoundDecl for such types, or if that is problematic,
  2. renaming isDivergent to declIsArbitrary() or something like that, to suggest that not only do the underlying type and decl diverge, it is the underlying type which is more meaningful, not the decl.
clang/include/clang/AST/Type.h
4510

The name of this might be less important than just documenting it extremely well here, with an example.

If I understand this correctly: whenever isDivergent() is true, getDecl() or getFoundDecl() is an arbitrary choice between multiple candidate (re)declarations of a typedef or using decl. In particular it will be the first redeclaration, even though each different redeclaration can contain different sugar information.

Well that is only true for this particular way, as used in the tests for this patch, of creating a divergent TypedefType.

The general idea is just that the TypedefType will have an underlying type sugar which is decoupled from the one in the declaration.

In general, forgetting about just the above particular way of creating them, there might still exist only one TypedefDecl in the whole program, which every TypedefType is referencing, and they can still be divergent, ie have different sugared underlying type.

But these types are a different matter:

auto t29 = 0 ? (RPX1){} : (RPY1){};
auto t32 = 0 ? (UPX1){} : (UPY1){};

In the process of unifying the sugar to get the deduced type of t29/t32, we will compare their sugar, which has the same basic structure for each, but different decls/found decls; so unifying these types will simply involve unifying their decls; and getCommonDecl(X,Y) has been defined to will simply get the older declaration between them. So, the TypedefType/UsingType sugar in t29/t32 will reference the X1 decls and friends, *not* the Y1s, as its Decls/FoundDecls. (Right?)

But again, we are discussing one particular mechanism of creating these divergent types.
How that mechanism works was defined by a previous patch.

But I don't think the choice between declarations is arbitrary. That type unification mechanism has one basic principle, when two properties of a type are different, we pick the one that is closest to the canonical property, (except in some very special circumstances where the standard requires something different).
So the choice for the older decl is principled, the canonical decl will be the oldest one. If that wasn't the case, we would need some other choice.

The UnderlyingType however will not be as arbitrary, as it will use getCommonType to skip over everything the two don't have in common.

Yes, except that neither is just arbitrary.

If I have this right, my question is: since the Decl/FoundDecl is an arbitrary choice, should we solve this by either

  1. constructing these with a null Decl/FoundDecl for such types, or if that is problematic,
  2. renaming isDivergent to declIsArbitrary() or something like that, to suggest that not only do the underlying type and decl diverge, it is the underlying type which is more meaningful, not the decl.

I don't think the declaration picked is arbitrary, and again they are the "Same" declaration, so picking null would be throwing out more information than is necessary for no principled reason.

In general, forgetting about just the above particular way of creating them, there might still exist only one TypedefDecl in the whole program, which every TypedefType is referencing, and they can still be divergent, ie have different sugared underlying type.

Can you give a code example demonstrating such a case, and any other distinctive cases where isDivergent() will be true?

But these types are a different matter:

auto t29 = 0 ? (RPX1){} : (RPY1){};
auto t32 = 0 ? (UPX1){} : (UPY1){};

In the process of unifying the sugar to get the deduced type of t29/t32, we will compare their sugar, which has the same basic structure for each, but different decls/found decls; so unifying these types will simply involve unifying their decls; and getCommonDecl(X,Y) has been defined to will simply get the older declaration between them. So, the TypedefType/UsingType sugar in t29/t32 will reference the X1 decls and friends, *not* the Y1s, as its Decls/FoundDecls. (Right?)

But again, we are discussing one particular mechanism of creating these divergent types.
How that mechanism works was defined by a previous patch.

But I don't think the choice between declarations is arbitrary. That type unification mechanism has one basic principle, when two properties of a type are different, we pick the one that is closest to the canonical property, (except in some very special circumstances where the standard requires something different).
So the choice for the older decl is principled, the canonical decl will be the oldest one. If that wasn't the case, we would need some other choice.

I don't think the declaration picked is arbitrary, and again they are the "Same" declaration, so picking null would be throwing out more information than is necessary for no principled reason.

Hmm. Redeclarations never change semantic information, and in general don't change sugar information, so the older one is indeed the common one from any principled standpoint - but as your test cases demonstrate, in the case of typedef and using-decl redeclarations, the sugar information can be changed. Does this result in some arbitrariness in choosing the older decl as the common decl? My first instinct is that it does. But as I think about it further...I think I agree with you, it is still principled to choose the earlier using-decl/typedef decl (though an argument can be made for choosing the very first decl, not just the earlier one, for typedefs and using-decls).

But, if we agree the decl choice is principled, the next question: what is the different principle behind choosing a different underlying type? Why can't the underlying type for t29/t32 be sugar for X1 and friends too?

To state the issue more practically: when users of the AST see isDivergent()==true, what do they do? Which path should they take to fetch information from this node?

If we absolutely have to allow these types to diverge, we need very good documentation that helps them answer this.

Wrangling in @sammccall here since he developed UsingType and so might be affected/need clarity on how to handle isDivergent().

what is the different principle behind choosing a different underlying type? Why can't the underlying type for t29/t32 be sugar for X1 and friends too?

The underlying type is taken from the mechanism implemented in D130308, but it's not a different principle. We are still trying to keep the information they have in common, and go towards the canonical state when there is a difference.

If we absolutely have to allow these types to diverge, we need very good documentation that helps them answer this.

We are adding the minimum amount of new information to the AST to get what we want: A TypedefType which can have an underlying different from the one in the declaration it references.
This is the minimum we need in order to be able to represent a resugared typedef, without losing the information that the type came from a typedef in the first place.

Otherwise, just knowing that they can diverge doesn't help much, they would still be missing the reason of why they diverged, and that will depend on what mechanisms were at play.

There is an engineering compromise in that we don't want to pack new AST nodes that help fully expose that some divergence was created either by type combination, or by template specialization resugaring (when that is implemented), to avoid ballooning complexity with information that might not be useful.

But if that information were important, that could justify the cost in memory and complexity. Do you foresee any need for it?

Otherwise, the isDivergent() flag is there just for exposition / debugging purposes when looking at the AST dump, as I find that helpful.
But we could not add the method and not print a flag in the AST dumper, implementing just the behavior change.

mizvekov updated this revision to Diff 461627.Sep 20 2022, 11:11 AM

The descriptions here are pretty general and opaque nad it's hard to follow what this means in practice.
It sounds like the motivating cases are to do with template instantiation - can we see them? e.g. code + proposed diagnostics before/after, or some other description of how this will be used?
The ternary-expression stuff doesn't seem compelling enough to justify the complexity here without more context. (Probably others have this context, but future readers of this code won't).

My intuition is that if a type desugars to something other than what the UsingDecl points to, then UsingType isn't the right model - it seems we're trying to shim some extra indirection that doesn't have much to do with using. Maybe we can isolate the complexity better in a new sugar typenode, rather than weakening the semantics of Using/TypedefType?
But I can easily imagine I'm missing the point and concrete examples might clear it up.

mizvekov added a comment.EditedSep 20 2022, 3:23 PM

The descriptions here are pretty general and opaque nad it's hard to follow what this means in practice.
It sounds like the motivating cases are to do with template instantiation - can we see them? e.g. code + proposed diagnostics before/after, or some other description of how this will be used?
The ternary-expression stuff doesn't seem compelling enough to justify the complexity here without more context. (Probably others have this context, but future readers of this code won't).

Yeah, the ternary-expression stuff is just a test case for this change, it's not a justification per se.
What we need this is for the template specialization resugaring work: https://reviews.llvm.org/D127695
I will give an example using a Typedef for exposition, but the same motivation applies with UsingType.

Say we have this code:

template <class T> struct A { using type1 = T; };
using Int = int;

using type2 = A<Int>::type1;

See this example live (with that resugaring patch) at: https://godbolt.org/z/jP64Pern8

We want the underlying type of type2 to have that Int sugar. It would also be good if we could keep the TypedefType sugar referencing the instantiation of type1 from the A<int> template.

The problem here is that the TypedefType is currently limited to having the declaration's underlying type, which is just from an instantiation of A<int>, and knows nothing about camel case Int.
This is similar to the infamous problem with std::string turning into std::basic_string in templates.

We could emit a new TypedefDecl with the underlying type that we want, but creating declarations is expensive, so we want to avoid that.
We could create a new AST node that represented more accurately what was happening. But the question is, is this worth the cost?
Do you see use cases for this yourself?

We want resugaring to be cheap and introduce as little changes in the AST as possible, to get a better chance of affording to have it to be on all the time.
If we introduce new nodes with more information, that might increase the cost and complexity, and it's hard to justify without an use case.

My intuition is that if a type desugars to something other than what the UsingDecl points to, then UsingType isn't the right model - it seems we're trying to shim some extra indirection that doesn't have much to do with using. Maybe we can isolate the complexity better in a new sugar typenode, rather than weakening the semantics of Using/TypedefType?
But I can easily imagine I'm missing the point and concrete examples might clear it up.

It does not seem too far fetched that the UsingType could point to a resugared version of the declaration's underlying type, without having to create a new declaration.

The other options I can think of are:

  • Introduce new AST nodes.
  • Simply strip the TypedefType/UsingType.
mizvekov updated this revision to Diff 461850.Sep 21 2022, 4:45 AM
mizvekov marked an inline comment as done.Sep 21 2022, 4:50 AM
mizvekov updated this revision to Diff 461966.Sep 21 2022, 11:44 AM

I will give an example using a Typedef for exposition, but the same motivation applies with UsingType.

Say we have this code:

template <class T> struct A { using type1 = T; };
using Int = int;

using type2 = A<Int>::type1;

See this example live (with that resugaring patch) at: https://godbolt.org/z/jP64Pern8

We want the underlying type of type2 to have that Int sugar. It would also be good if we could keep the TypedefType sugar referencing the instantiation of type1 from the A<int> template.

The problem here is that the TypedefType is currently limited to having the declaration's underlying type, which is just from an instantiation of A<int>, and knows nothing about camel case Int.
This is similar to the infamous problem with std::string turning into std::basic_string in templates.

We could emit a new TypedefDecl with the underlying type that we want, but creating declarations is expensive, so we want to avoid that.
We could create a new AST node that represented more accurately what was happening. But the question is, is this worth the cost?
Do you see use cases for this yourself?

We want resugaring to be cheap and introduce as little changes in the AST as possible, to get a better chance of affording to have it to be on all the time.
If we introduce new nodes with more information, that might increase the cost and complexity, and it's hard to justify without an use case.

This explanation and example are very helpful. Parts of them probably should make it into the documentation for isDivergent() or whatever it is called.

More importantly the documentation needs to emphasize that when isDivergent() is true, the underlying type really is more meaningful than the decl, and should be relied on wherever possible. After all the reason they differ *for now* is that the underlying type may not have an associated decl, simply due to the cost concerns of introducing such a decl. But it is possible in the future we could change course and decide to introduce the implicit TypedefDecls or UsingDecls. Then they would no longer be divergent, and it is the underlying decl that would change, not the underlying type.

Whether isDivergent() is the right name, vs. resugared(), or hasLessCanonicalizedType() or hasMoreSpecificType() as @erichkeane suggested (all seem reasonable), is less important to me than just documenting that method well.

@sammccall ping, is the explanation above reasonable?

sammccall accepted this revision.Sep 29 2022, 2:00 AM

Sorry about the delay, I've been juggling too many things :-(

TL;DR: yes, I think it's reasonable. I think the implementation complexity is justified by what we get. Thanks for explaining!

I think to minimize interface complexity, we should basically consider this to just weaken the invariants of TypedefType etc, and have comments reflecting that. Accordingly like to see inDivergent() inverted and named something like typeMatchesDecl() if it needs to be exposed at all.

But since this is just naming nits and I'm having trouble staying responsive, LGTM from my side once you feel it's clear enough.


template <class T> struct A { using type1 = T; };
using Int = int;

using type2 = A<Int>::type1;

[...] We want the underlying type of type2 to have that Int sugar. [...] The problem here is that the TypedefType is currently limited to having the declaration's underlying type, which is just from an instantiation of A<int>, and knows nothing about camel case Int.

Thanks, this is compelling! Couple of observations, not objections:

  • applications often print the fully-sugared type (type2) and/or the canonical type (int) with nothing in between. This might limit the benefits - I've often wished we had better & sharable heuristics for how much desugaring users want.
  • in this simple example we could naively solve the problem by having the class instantiation that it was instantiated with Int sugar, and making the original underlying type of A<int>::type1 as Int. (There are downsides here, you can obviously get the "wrong" sugar if there are multiple instantiation sites).

Anyway, I don't really want to argue scope/design, you've clearly thought way more about this, just want to give you more to think about. Question for this patch is just cost vs benefits.

This is similar to the infamous problem with std::string turning into std::basic_string in templates.

Yes! Though half of my basic_string sugar problems are things like substr's return type. This seems harder to solve in a principled way, though maybe heuristically doable when the written return type uses the injected-class-name? Anyway, again offtopic for this patch...

We could emit a new TypedefDecl with the underlying type that we want, but creating declarations is expensive, so we want to avoid that.
We could create a new AST node that represented more accurately what was happening. But the question is, is this worth the cost?
Do you see use cases for this yourself?

From the API here, "divergent" exposed as a property etc, I assumed you expected some clients to handle it. And I also expected that it would end up being in more nodes than TypedefType and UsingType - it's hard to see why it stops there.
In that case an additional node could improve code complexity: isolates+centralizes the code handling this case, makes it harder to miss one kind of divergent node, keeps the rest of the AST simpler for code that doesn't care about resugaring.

But it sounds more like we expect ~no programmatic clients to care, and that this change is mostly:

  • drop the assumption that decl & type have the same sugar, because of resugaring
  • add isDivergent() just to enable the text dump to provide hints to human analysis

But the question is, is this worth the cost?

If you're talking about *runtime* cost - it's not obvious that it's significantly more expensive? IIUC wherever types are resugared we're having to duplicate the chain of sugar (e.g. a function that returns T** is going to be instantiated as functiontype(pointertype(pointertype(typedeftype(int)))) and then resugared as functiontype(pointertype(pointertype(/*divergent*/typedeftype(typedeftype(int))))). The difference between enlarging typedeftype with trailing objects vs adding an extra node feels like a drop in the bucket. That said, I can't think of a way to do this that's both compact *and* much simpler, and simplicity would be the point.

(I'm not really asking you to measure this: I know it's a bunch of work. Just want to make sure that if we're deciding about cheap vs expensive we're probing our intuition a bit)

We want resugaring to be cheap and introduce as little changes in the AST as possible, to get a better chance of affording to have it to be on all the time.

Yes. In fact I've been scared by the discussion of "maybe we should have flags to ramp up/down sugar" as it's hard enough to write AST-wrangling code correctly against a single configuration :-)

It does not seem too far fetched that the UsingType could point to a resugared version of the declaration's underlying type, without having to create a new declaration.

OK, I think I'm sold on this.
I think it's conceptually simpler to say "this may be the case" (weaken invariants) than say "there are two flavors of TypedefType: strong-invarant and weak-invariant". Divergence is then just an implementation detail (IOW non-divergence is a storage optimization).
I don't have a strong opinion on whether this *only* means change the emphasis of the documentation, or we should also remove the isDivergent() accessor. If there's another way to achieve the debugging goals I'd lean toward the latter.

clang/include/clang/AST/Type.h
4510

I think getUnderlyingType() is the right place to document this new wrinkle in the AST.

Something like:

The returned type might have different sugar than the `getFoundDecl()`.
This occurs e.g. when we "resugar" types accessed through template instantiations.
4510

IIUC clients are *not* expected to care about this.
This isn't obvious, I think we should do one of:

  • remove the accessor
  • make it private and friend the things that need it
  • add a comment that makes it clear this is a detail (e.g. describe it as a storage optimization for simple cases)

I think it's worth inverting the boolean sense, e.g. call it UsesDeclaredType: if clients should assume in general that these have different sugar, it's when they *match* that's the special case.

clang/lib/AST/ASTContext.cpp
4650

Probably worth a comment somewhere in this function: we can have different underlying sugar types for the same decl, TypeForDecl points to the one whose sugar matches the decl.

Sorry about the delay, I've been juggling too many things :-(

No worries, thanks!

TL;DR: yes, I think it's reasonable. I think the implementation complexity is justified by what we get. Thanks for explaining!

I think to minimize interface complexity, we should basically consider this to just weaken the invariants of TypedefType etc, and have comments reflecting that. Accordingly like to see inDivergent() inverted and named something like typeMatchesDecl() if it needs to be exposed at all.

Yeah, I think it's better to just not expose it in general.
One minor problem with inverting it is that we would make the most common case the positive case. Though if we want to avoid the churn and noise, on the TextNode dumper we will probably want to add a flag only in the negative case.

Thanks, this is compelling! Couple of observations, not objections:

  • applications often print the fully-sugared type (type2) and/or the canonical type (int) with nothing in between. This might limit the benefits - I've often wished we had better & sharable heuristics for how much desugaring users want.

Yeah I am aware, the aka handling will step over typedefs, we will have to work on a better solution for that separate problem.

  • in this simple example we could naively solve the problem by having the class instantiation that it was instantiated with Int sugar, and making the original underlying type of A<int>::type1 as Int. (There are downsides here, you can obviously get the "wrong" sugar if there are multiple instantiation sites).

Yeah, I think it could cause too many surprises if we instantiated only one time per canonical type, but then used the sugar for the first instantiation:

  • It would cause surprising behavior when instantiating with attributes which have effects beyond meaning for the user. Think over-aligned typedefs and such.
  • Unless resugaring is on and always able to correct it when printing a diagnostic, it could produce references to types which are distant from the context the problem occurred.

Yes! Though half of my basic_string sugar problems are things like substr's return type. This seems harder to solve in a principled way, though maybe heuristically doable when the written return type uses the injected-class-name? Anyway, again offtopic for this patch...

Yeah, that for now is out of scope of the whole project even, except that maybe we could make this work for templated explicit object parameter.

The injected class name approach feels like it would even come short of solving it, even ignoring false positives. I think this would be a problem we would want to solve even for methods of regular (non-template) classes.

But it does not seem like it would justify the cost of templating a parameter just for the return type sugar.

And I also expected that it would end up being in more nodes than TypedefType and UsingType - it's hard to see why it stops there.

It won't stop here, we can do this for any type node which gets its sugared type from some other object which is expensive to rebuild (declarations, expressions, etc)

TypedefType is the most critical here, and UsingType is it's secret twin, so might as well do both in the same patch.

But it sounds more like we expect ~no programmatic clients to care, and that this change is mostly:

  • drop the assumption that decl & type have the same sugar, because of resugaring
  • add isDivergent() just to enable the text dump to provide hints to human analysis

Yeah that was the point of my question, I don't see why any user would care, and just rather not add complexity here. Glad that you agree :)

If you're talking about *runtime* cost - it's not obvious that it's significantly more expensive? IIUC wherever types are resugared we're having to duplicate the chain of sugar (e.g. a function that returns T** is going to be instantiated as functiontype(pointertype(pointertype(typedeftype(int)))) and then resugared as functiontype(pointertype(pointertype(/*divergent*/typedeftype(typedeftype(int))))). The difference between enlarging typedeftype with trailing objects vs adding an extra node feels like a drop in the bucket. That said, I can't think of a way to do this that's both compact *and* much simpler, and simplicity would be the point.

No, I am not worried about the difference in reusing TypedefType + extra bit, versus making it a derived class or some such.
I was more worried about the cost of of adding AST nodes which explain how a type was resugared, for example providing a pointer to the naming context used to resugar. That seems too expensive, for no utility.

Yes. In fact I've been scared by the discussion of "maybe we should have flags to ramp up/down sugar" as it's hard enough to write AST-wrangling code correctly against a single configuration :-)

Yeah, I see the point, I would like to avoid flags as well. Even a flag for resugaring on-off seems problematic, as resugaring introduces a bunch of novel sugaring configurations.

OK, I think I'm sold on this.
I think it's conceptually simpler to say "this may be the case" (weaken invariants) than say "there are two flavors of TypedefType: strong-invarant and weak-invariant". Divergence is then just an implementation detail (IOW non-divergence is a storage optimization).

Yeah exactly, I don't see a point in exposing the flag to the user except as a debugging aid.
The difference here could potentially break weird/dubious code, ie code that wants to analyze the type but traverses TypedefTypes through the declaration, instead of the more straightforward desugar route.

I don't have a strong opinion on whether this *only* means change the emphasis of the documentation, or we should also remove the isDivergent() accessor. If there's another way to achieve the debugging goals I'd lean toward the latter.

The debugging goal can be easily achieved by comparing the underlying type against the declaration's type, so it's not critical at all, there is just the advantage of having a single implementation to query this.

mizvekov updated this revision to Diff 464774.Oct 3 2022, 12:36 PM