This is an archive of the discontinued LLVM Phabricator instance.

[Sema] a[x] has type T when a has type T* or T[], even when T is dependent
ClosedPublic

Authored by sammccall on Aug 2 2021, 6:56 AM.

Diff Detail

Event Timeline

sammccall created this revision.Aug 2 2021, 6:56 AM
sammccall requested review of this revision.Aug 2 2021, 6:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 6:56 AM

@rsmith: we have two open questions here...


1: Expressions whose types are no longer dependent.

In the rare case where RHS is type-dependent and LHS is a known pointer, e.g.

template <typename Idx>
int access(int *arr, Idx i) {
  return arr[i];
}

we're now changing the ArraySubscriptExpr's type from DependentTy to int, while keeping the expr type-dependent.

Is this OK, or should we avoid it by artificially requiring LHS specifically to be type-dependent to do the refinement?


2: LHS vs RHS symmetry.

We only bother to check if LHS is a pointer, so type_dependent_pointer[integer] gets a specific dependent type, while the obscure integer[type_dependent_pointer] remains DependentTy.

Is this OK, or must we handle the rare case in the same way?


Functionally, doing the "safe" thing in both cases seems fine. But I don't want to spray unnecessary defensive code around, for maintenance reasons.

nridge added a subscriber: nridge.Aug 2 2021, 10:31 PM

Do either of you have thoughts on this one?

Sorry for leaving this without any replies, I think the summary you have is already of our offline discussion. Let me raise my final concerns, if you think we're covered for those and others don't chime in this week I suppose we can consider this as good to go.


I think 2nd case is fine, as we're unlikely to regress anything by handling just LHS. The rare case just won't get the benefits.


I am more worried about creating "incorrect" nodes in some cases. I think it isn't valid in C/C++ for both LHS && RHS to be pointers/arrays in a subscript expression, but I've got no idea about when it's diagnosed in clang and what is put into the AST.

If that detection happens before creating any SubscriptExprs (i.e. hitting changes in this patch), I guess we're all fine (i.e. we won't generate a node with an incorrect ResultType).
If it happens after creating these exprs though, now we'll have a bunch of expressions with erroneous ResultType info which might trip over some things. (Worst case scenario, detection completely fails and some code that should've been diagnosed as being broken will miscompile instead.)

hokein added a subscriber: hokein.Dec 13 2021, 6:29 AM

I am more worried about creating "incorrect" nodes in some cases. I think it isn't valid in C/C++ for both LHS && RHS to be pointers/arrays in a subscript expression, but I've got no idea about when it's diagnosed in clang and what is put into the AST.

... we'll have a bunch of expressions with erroneous ResultType info which might trip over some things.

Yeah, I think this is a problem. To put it another way:

  • if X is Foo* and y is dependent, x[y] has type Foo *or* it is invalid
  • if we claim it has type Foo then either template-instantiation must check it's valid, or we may accept invalid code
  • accepting invalid code can turn into miscompiles via SFINAE

Probably rebuilding during template instantiation does verify enough but I'm not 100% sure.

In the motivating case, the subscript is a known integer type and the LHS is an array or pointer. In this case we don't have the above concern, and we also don't have my #1 above. So I'll restrict the code to cover that case.

In the motivating case, the subscript is a known integer type and the LHS is an array or pointer. In this case we don't have the above concern, and we also don't have my #1 above. So I'll restrict the code to cover that case.

Actually I like the restriction a lot, but this *doesn't* take care of #1.
Even with the restriction, we can still end up with a non-dependent type for our type-dependent ArraySubscriptExpr:

Case 1: base and index do not have dependent types. One example of this is that base can be a reference to a static member of the current instantiation that is an array of unknown bound.

template <int>
struct foo {
  static int arr[];
  static constexpr int first = arr[0]; // arr is type-dependent but has non-dependent type int[].
};

Case 2: base is an array with dependent size.

template <int N>
struct foo {
  static int arr[N];
  static constexpr int first = arr[0];
};

Case 3: index is a dependent type that is nevertheless known to be a good index.

static int arr[];

template <int>
struct foo {
  enum E { Zero = 0; }
  static constexpr int first = arr[Zero];
};

So I see two options:

  • arbitrarily force the type to be dependent - if we end up with a non-dependent type, use DependentTy instead. This "forgetting" the type is consistent with other situations, like this->member inside a template, which the standard says is type-dependent and clang assigns DependentTy.
  • accept that we have type-dependent expressions of non-dependent types in some cases. This is consistent with the idea that such exceptions exist today (the DeclRefExpr to unknown-bound-array static members mentioned above).

I feel a little out of my depth, so I'm going to go with the "safe option" of bailing out to DependentTy.
@rsmith or other experts, It would be great to get guidance on whether it's safe to create type-dependent expressions without dependent types.

sammccall updated this revision to Diff 396730.Dec 30 2021, 4:24 PM

Restrict to array/pointer + index, insist on a dependent type.

I'm going to land this now in its conservative version based on:

if you think we're covered for those and others don't chime in this week I suppose we can consider this as good to go.

Happy to address any more comments and/or make it less conservative.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 30 2021, 4:30 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

@kadircet What we forgot to think about here is that this allows more semantic checks to happen at template parsing time, which affects diagnostics.

This is OK (in fact good) as if those checks fail the template cannot be instantiated, and the code is IFNDR.
But I should have had a test: https://github.com/llvm/llvm-project/commit/4777eb2954080864bcf9dfca0e828c637268eb13