This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
-
SemaExpr.cpp
-
test/AST/
-
AST/
-
ast-dump-array.cpp

Differential D107275

[Sema] a[x] has type T when a has type T* or T[], even when T is dependent
ClosedPublic

Authored by sammccall on Aug 2 2021, 6:56 AM.

Download Raw Diff

Details

Reviewers

kadircet
rsmith

Commits

rG09f8315bba39: [Sema] a[x] has type T when a has type T* or T[], even when T is dependent

Summary

This more precise type is useful for tools, e.g.
fixes https://github.com/clangd/clangd/issues/831

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sammccall created this revision.Aug 2 2021, 6:56 AM

Herald added a subscriber: usaxena95. · View Herald TranscriptAug 2 2021, 6:56 AM

sammccall requested review of this revision.Aug 2 2021, 6:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 6:56 AM

Herald added subscribers: cfe-commits, ilya-biryukov. · View Herald Transcript

@rsmith: we have two open questions here...

1: Expressions whose types are no longer dependent.

In the rare case where RHS is type-dependent and LHS is a known pointer, e.g.

template <typename Idx>
int access(int *arr, Idx i) {
  return arr[i];
}

we're now changing the ArraySubscriptExpr's type from DependentTy to int, while keeping the expr type-dependent.

Is this OK, or should we avoid it by artificially requiring LHS specifically to be type-dependent to do the refinement?

2: LHS vs RHS symmetry.

We only bother to check if LHS is a pointer, so type_dependent_pointer[integer] gets a specific dependent type, while the obscure integer[type_dependent_pointer] remains DependentTy.

Is this OK, or must we handle the rare case in the same way?

Functionally, doing the "safe" thing in both cases seems fine. But I don't want to spray unnecessary defensive code around, for maintenance reasons.

Harbormaster completed remote builds in B117444: Diff 363469.Aug 2 2021, 7:36 AM

nridge added a subscriber: nridge.Aug 2 2021, 10:31 PM

Do either of you have thoughts on this one?

Sorry for leaving this without any replies, I think the summary you have is already of our offline discussion. Let me raise my final concerns, if you think we're covered for those and others don't chime in this week I suppose we can consider this as good to go.

I think 2nd case is fine, as we're unlikely to regress anything by handling just LHS. The rare case just won't get the benefits.

I am more worried about creating "incorrect" nodes in some cases. I think it isn't valid in C/C++ for both LHS && RHS to be pointers/arrays in a subscript expression, but I've got no idea about when it's diagnosed in clang and what is put into the AST.

If that detection happens before creating any SubscriptExprs (i.e. hitting changes in this patch), I guess we're all fine (i.e. we won't generate a node with an incorrect ResultType).
If it happens after creating these exprs though, now we'll have a bunch of expressions with erroneous ResultType info which might trip over some things. (Worst case scenario, detection completely fails and some code that should've been diagnosed as being broken will miscompile instead.)

hokein added a subscriber: hokein.Dec 13 2021, 6:29 AM

In D107275#3182906, @kadircet wrote:

I am more worried about creating "incorrect" nodes in some cases. I think it isn't valid in C/C++ for both LHS && RHS to be pointers/arrays in a subscript expression, but I've got no idea about when it's diagnosed in clang and what is put into the AST.

... we'll have a bunch of expressions with erroneous ResultType info which might trip over some things.

Yeah, I think this is a problem. To put it another way:

if X is Foo* and y is dependent, x[y] has type Foo *or* it is invalid
if we claim it has type Foo then either template-instantiation must check it's valid, or we may accept invalid code
accepting invalid code can turn into miscompiles via SFINAE

Probably rebuilding during template instantiation does verify enough but I'm not 100% sure.

In the motivating case, the subscript is a known integer type and the LHS is an array or pointer. In this case we don't have the above concern, and we also don't have my #1 above. So I'll restrict the code to cover that case.

In D107275#3214272, @sammccall wrote:

In the motivating case, the subscript is a known integer type and the LHS is an array or pointer. In this case we don't have the above concern, and we also don't have my #1 above. So I'll restrict the code to cover that case.

Actually I like the restriction a lot, but this *doesn't* take care of #1.
Even with the restriction, we can still end up with a non-dependent type for our type-dependent ArraySubscriptExpr:

Case 1: base and index do not have dependent types. One example of this is that base can be a reference to a static member of the current instantiation that is an array of unknown bound.

template <int>
struct foo {
  static int arr[];
  static constexpr int first = arr[0]; // arr is type-dependent but has non-dependent type int[].
};

Case 2: base is an array with dependent size.

template <int N>
struct foo {
  static int arr[N];
  static constexpr int first = arr[0];
};

Case 3: index is a dependent type that is nevertheless known to be a good index.

static int arr[];

template <int>
struct foo {
  enum E { Zero = 0; }
  static constexpr int first = arr[Zero];
};

So I see two options:

arbitrarily force the type to be dependent - if we end up with a non-dependent type, use DependentTy instead. This "forgetting" the type is consistent with other situations, like this->member inside a template, which the standard says is type-dependent and clang assigns DependentTy.
accept that we have type-dependent expressions of non-dependent types in some cases. This is consistent with the idea that such exceptions exist today (the DeclRefExpr to unknown-bound-array static members mentioned above).

I feel a little out of my depth, so I'm going to go with the "safe option" of bailing out to DependentTy.
@rsmith or other experts, It would be great to get guidance on whether it's safe to create type-dependent expressions without dependent types.

Restrict to array/pointer + index, insist on a dependent type.

I'm going to land this now in its conservative version based on:

if you think we're covered for those and others don't chime in this week I suppose we can consider this as good to go.

Happy to address any more comments and/or make it less conservative.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 30 2021, 4:30 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG09f8315bba39: [Sema] a[x] has type T when a has type T* or T[], even when T is dependent (authored by sammccall). · Explain Why

This revision was automatically updated to reflect the committed changes.

sammccall added a commit: rG09f8315bba39: [Sema] a[x] has type T when a has type T* or T[], even when T is dependent.

Harbormaster completed remote builds in B141062: Diff 396730.Dec 30 2021, 4:55 PM

@kadircet What we forgot to think about here is that this allows more semantic checks to happen at template parsing time, which affects diagnostics.

This is OK (in fact good) as if those checks fail the template cannot be instantiated, and the code is IFNDR.
But I should have had a test: https://github.com/llvm/llvm-project/commit/4777eb2954080864bcf9dfca0e828c637268eb13

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaExpr.cpp

40 lines

test/

AST/

ast-dump-array.cpp

55 lines

Diff 396731

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,639 Lines • ▼ Show 20 Lines

static bool isMSPropertySubscriptExpr(Sema &S, Expr *Base) {		static bool isMSPropertySubscriptExpr(Sema &S, Expr *Base) {
auto *BaseNoParens = Base->IgnoreParens();		auto *BaseNoParens = Base->IgnoreParens();
if (auto *MSProp = dyn_cast<MSPropertyRefExpr>(BaseNoParens))		if (auto *MSProp = dyn_cast<MSPropertyRefExpr>(BaseNoParens))
return MSProp->getPropertyDecl()->getType()->isArrayType();		return MSProp->getPropertyDecl()->getType()->isArrayType();
return isa<MSPropertySubscriptExpr>(BaseNoParens);		return isa<MSPropertySubscriptExpr>(BaseNoParens);
}		}

		// Returns the type used for LHS[RHS], given one of LHS, RHS is type-dependent.
		// Typically this is DependentTy, but can sometimes be more precise.
		//
		// There are cases when we could determine a non-dependent type:
		// - LHS and RHS may have non-dependent types despite being type-dependent
		// (e.g. unbounded array static members of the current instantiation)
		// - one may be a dependent-sized array with known element type
		// - one may be a dependent-typed valid index (enum in current instantiation)
		//
		// We always return a dependent type, in such cases it is DependentTy.
		// This avoids creating type-dependent expressions with non-dependent types.
		// FIXME: is this important to avoid? See https://reviews.llvm.org/D107275
		static QualType getDependentArraySubscriptType(Expr LHS, Expr RHS,
		const ASTContext &Ctx) {
		assert(LHS->isTypeDependent() \|\| RHS->isTypeDependent());
		QualType LTy = LHS->getType(), RTy = RHS->getType();
		QualType Result = Ctx.DependentTy;
		if (RTy->isIntegralOrUnscopedEnumerationType()) {
		if (const PointerType *PT = LTy->getAs<PointerType>())
		Result = PT->getPointeeType();
		else if (const ArrayType *AT = LTy->getAsArrayTypeUnsafe())
		Result = AT->getElementType();
		} else if (LTy->isIntegralOrUnscopedEnumerationType()) {
		if (const PointerType *PT = RTy->getAs<PointerType>())
		Result = PT->getPointeeType();
		else if (const ArrayType *AT = RTy->getAsArrayTypeUnsafe())
		Result = AT->getElementType();
		}
		// Ensure we return a dependent type.
		return Result->isDependentType() ? Result : Ctx.DependentTy;
		}

ExprResult		ExprResult
Sema::ActOnArraySubscriptExpr(Scope S, Expr base, SourceLocation lbLoc,		Sema::ActOnArraySubscriptExpr(Scope S, Expr base, SourceLocation lbLoc,
Expr *idx, SourceLocation rbLoc) {		Expr *idx, SourceLocation rbLoc) {
if (base && !base->getType().isNull() &&		if (base && !base->getType().isNull() &&
base->getType()->isSpecificPlaceholderType(BuiltinType::OMPArraySection))		base->getType()->isSpecificPlaceholderType(BuiltinType::OMPArraySection))
return ActOnOMPArraySectionExpr(base, lbLoc, idx, SourceLocation(),		return ActOnOMPArraySectionExpr(base, lbLoc, idx, SourceLocation(),
SourceLocation(), /Length/ nullptr,		SourceLocation(), /Length/ nullptr,
/Stride=/nullptr, rbLoc);		/Stride=/nullptr, rbLoc);
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	if (idx->getType()->isNonOverloadPlaceholderType()) {
ExprResult result = CheckPlaceholderExpr(idx);		ExprResult result = CheckPlaceholderExpr(idx);
if (result.isInvalid()) return ExprError();		if (result.isInvalid()) return ExprError();
idx = result.get();		idx = result.get();
}		}

// Build an unanalyzed expression if either operand is type-dependent.		// Build an unanalyzed expression if either operand is type-dependent.
if (getLangOpts().CPlusPlus &&		if (getLangOpts().CPlusPlus &&
(base->isTypeDependent() \|\| idx->isTypeDependent())) {		(base->isTypeDependent() \|\| idx->isTypeDependent())) {
return new (Context) ArraySubscriptExpr(base, idx, Context.DependentTy,		return new (Context) ArraySubscriptExpr(
		base, idx, getDependentArraySubscriptType(base, idx, getASTContext()),
VK_LValue, OK_Ordinary, rbLoc);		VK_LValue, OK_Ordinary, rbLoc);
}		}

// MSDN, property (C++)		// MSDN, property (C++)
// https://msdn.microsoft.com/en-us/library/yhfk0thd(v=vs.120).aspx		// https://msdn.microsoft.com/en-us/library/yhfk0thd(v=vs.120).aspx
// This attribute can also be used in the declaration of an empty array in a		// This attribute can also be used in the declaration of an empty array in a
// class or structure definition. For example:		// class or structure definition. For example:
// __declspec(property(get=GetX, put=PutX)) int x[];		// __declspec(property(get=GetX, put=PutX)) int x[];
// The above statement indicates that x[] can be used with one or more array		// The above statement indicates that x[] can be used with one or more array
▲ Show 20 Lines • Show All 737 Lines • ▼ Show 20 Lines	Sema::CreateBuiltinArraySubscriptExpr(Expr *Base, SourceLocation LLoc,
// to the expression *((e1)+(e2)). This means the array "Base" may actually be		// to the expression *((e1)+(e2)). This means the array "Base" may actually be
// in the subscript position. As a result, we need to derive the array base		// in the subscript position. As a result, we need to derive the array base
// and index from the expression types.		// and index from the expression types.
Expr BaseExpr, IndexExpr;		Expr BaseExpr, IndexExpr;
QualType ResultType;		QualType ResultType;
if (LHSTy->isDependentType() \|\| RHSTy->isDependentType()) {		if (LHSTy->isDependentType() \|\| RHSTy->isDependentType()) {
BaseExpr = LHSExp;		BaseExpr = LHSExp;
IndexExpr = RHSExp;		IndexExpr = RHSExp;
ResultType = Context.DependentTy;		ResultType =
		getDependentArraySubscriptType(LHSExp, RHSExp, getASTContext());
} else if (const PointerType *PTy = LHSTy->getAs<PointerType>()) {		} else if (const PointerType *PTy = LHSTy->getAs<PointerType>()) {
BaseExpr = LHSExp;		BaseExpr = LHSExp;
IndexExpr = RHSExp;		IndexExpr = RHSExp;
ResultType = PTy->getPointeeType();		ResultType = PTy->getPointeeType();
} else if (const ObjCObjectPointerType *PTy =		} else if (const ObjCObjectPointerType *PTy =
LHSTy->getAs<ObjCObjectPointerType>()) {		LHSTy->getAs<ObjCObjectPointerType>()) {
BaseExpr = LHSExp;		BaseExpr = LHSExp;
IndexExpr = RHSExp;		IndexExpr = RHSExp;
▲ Show 20 Lines • Show All 14,435 Lines • Show Last 20 Lines

clang/test/AST/ast-dump-array.cpp

	Show All 20 Lines
	class array {			class array {
	T data[Size];			T data[Size];

	using array_T_size = T[Size];			using array_T_size = T[Size];
	// CHECK: `-DependentSizedArrayType 0x{{[^ ]*}} 'T[Size]' dependent <col:25, col:30>			// CHECK: `-DependentSizedArrayType 0x{{[^ ]*}} 'T[Size]' dependent <col:25, col:30>
	using const_array_T_size = const T[Size];			using const_array_T_size = const T[Size];
	// CHECK: `-DependentSizedArrayType 0x{{[^ ]*}} 'const T[Size]' dependent <col:37, col:42>			// CHECK: `-DependentSizedArrayType 0x{{[^ ]*}} 'const T[Size]' dependent <col:37, col:42>
	};			};

				struct V {};
				template <typename U, typename Idx, int N>
				void testDependentSubscript() {
				U* a;
				U b[5];
				Idx i{};
				enum E { One = 1 };

				// Can types of subscript expressions can be determined?
				// LHS is a type-dependent array, RHS is a known integer type.
				a[1];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} 'U'
				b[1];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} 'U'

				// Reverse case: RHS is a type-dependent array, LHS is an integer.
				1[a];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} 'U'
				1[b];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} 'U'

				// LHS is a type-dependent array, RHS is type-dependent.
				a[i];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'
				b[i];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'

				V *a2;
				V b2[5];

				// LHS is a known array, RHS is type-dependent.
				a2[i];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'
				b2[i];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'

				// LHS is a known array, RHS is a type-dependent index.
				// We know the element type is V, but insist on some dependent type.
				a2[One];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'
				b2[One];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'

				V b3[N];
				// LHS is an array with dependent bounds but known elements.
				// We insist on a dependent type.
				b3[0];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} '<dependent type>'

				U b4[N];
				// LHS is an array with dependent bounds and dependent elements.
				b4[0];
				// CHECK: ArraySubscriptExpr {{.}}line:[[@LINE-1]]{{.}} 'U'
				}