This is an archive of the discontinued LLVM Phabricator instance.

[Sema] When dereferencing a pointer of dependent type, infer the result type.
Needs ReviewPublic

Authored by courbet on Oct 25 2021, 7:24 AM.

Details

Summary

Example:

template <typename T> auto f(T* t) {
  return *t;
}

Before that change, the UnaryOperator for *t has type <dependent type>.
After the change, its type is a T lvalue.

I've added simple tests to verify that giving knowledge of the type to clang
does not yields false positives in contexts where c++ does not allow it to use
that knowledge.

Diff Detail

Event Timeline

courbet requested review of this revision.Oct 25 2021, 7:24 AM
courbet created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2021, 7:24 AM

As per the comment in BuiltinTypes.def (see below), Dependent is
allowed in context where the type is deducible, but is there any reason
not to deduce the type if we can do it cheaply in some cases ?

// This represents the type of an expression whose type is
// totally unknown, e.g. 'T::foo'.  It is permitted for this to
// appear in situations where the structure of the type is
// theoretically deducible.
BUILTIN_TYPE(Dependent, DependentTy)

As per the comment in BuiltinTypes.def (see below), Dependent is
allowed in context where the type is deducible, but is there any reason
not to deduce the type if we can do it cheaply in some cases ?

// This represents the type of an expression whose type is
// totally unknown, e.g. 'T::foo'.  It is permitted for this to
// appear in situations where the structure of the type is
// theoretically deducible.
BUILTIN_TYPE(Dependent, DependentTy)

I've been trying to think if this will cause problems or not, and I'm not convinced one way or the other. I was thinking that if we resolve the type to a non-dependent type, but it is used within another type (making the second type also dependent), won't we change the point of instantiation for that second dependent type?

I think the idea of this change is OK. The key is that the dereference expression will still be type-dependent, even if we happen to actually know its type. (For an Expr that isTypeDependent(), the type produced by getType() isn't something the standard knows or cares about and is only used for our own type-checking-of-templates purposes.) The property we need to guarantee is that, if the Expr is valid, then the type is correct, and I think that holds here. However, I wouldn't want to guarantee that all parts of Clang get this right, and we might find some lurking bugs are exposed by this change.

I think this change is being made in an imperfect place, though. Instead of putting a special case in here, how would we feel about making Type::isOverloadableType() smarter, to return false for dependent types that can't possibly be class or enum types (eg, dependent pointer types, array types, and maybe more exotic things like function types, atomic types, and complex types)? This would then apply to all operators, not only unary *. Eg, we know the type of &p where p is T*, and we know the type of p + n where p is T* and n is integral. That change might have a lot more fallout, but it'd certainly be interesting to see what it breaks.

Is this change observable in some way? We should include a test with this change that demonstrates what effect it has. (The existing test in this patch passes with or without the change.) If nothing else I think we should be able to see the effect of this change in the output of -ast-dump. (Making the more general change I mentioned above may help here, by giving earlier diagnostics for some cases for which instantiation could never succeed, such as rejecting p + q where p and q are both pointer types.)

I think this change is being made in an imperfect place, though. Instead of putting a special case in here, how would we feel about making Type::isOverloadableType() smarter, to return false for dependent types that can't possibly be class or enum types (eg, dependent pointer types, array types, and maybe more exotic things like function types, atomic types, and complex types)? This would then apply to all operators, not only unary *. Eg, we know the type of &p where p is T*, and we know the type of p + n where p is T* and n is integral. That change might have a lot more fallout, but it'd certainly be interesting to see what it breaks.

Thanks for the suggestion ! I've improved test coverage for unary operators in rG1427742750ed. There was a minor breakage with this change in Sema::DefaultLvalueConversion which did not handle dependant pointers.

Binary operators look like they break more things. Working on it now.

Is this change observable in some way? We should include a test with this change that demonstrates what effect it has. (The existing test in this patch passes with or without the change.)

Yes, I agree. The idea of this test was just to check that adding this did not have side effects on how clang was handling what the standard mandates, but you've covered this in the first part of your answer, thanks.
I don't think it's observable directly (I'm observing the type using getType() directly in my case), so I'll add an AST dump test, thanks for the suggestion.

courbet updated this revision to Diff 384771.Nov 4 2021, 8:59 AM

Implement the proposed changes.

Is this change observable in some way?

With the new changes, we are now catching more typing errors before instantiation. I've added more tests to show that.

courbet updated this revision to Diff 385768.Nov 9 2021, 4:06 AM

one more spaceship operator fix.

aaron.ballman added inline comments.Dec 16 2021, 10:54 AM
clang/lib/Sema/SemaExpr.cpp
10547–10549
13827–13828

One thing that makes me a bit uncomfortable is that the logic for these used to be far more understandable when it was just checking for a dependent type. Now we need to sprinkle "and not a pointer type" in places, but it's not particularly clear as to why for a naïve reader of the code.

I wonder if we want some sort of type predicate isDeferrablyDependentType() or something along those lines, or whether that's not really plausible? Basically, I'm hoping to find a way that, as a code reviewer, I can more easily spot places where isTypeDependent() should really be caring about type dependent pointers as a special case.

courbet updated this revision to Diff 395410.Dec 20 2021, 4:37 AM

Try to get rid of "dependent and not a pointer" checks checks.

clang/lib/Sema/SemaExpr.cpp
13827–13828

Right, so in this function Op->isTypeDependent() is always used to bail out from type checking. The structure is typically:

if (E->isTypeDependent()) {
  // bail out
}
// Actual type checking on provable types.
if (E->hasSomeTypeProperty1()) {
  // OK case, do some property1-specific checking
} else if (E->hasSomeTypeProperty2()) {
  // OK case, do some property2-specific checking
} else {
  // Emit some error
}

My original approach was to do:

if (E->isTypeDependent() && !E->isPointerType()) {
  // bail out
}
// Actual type checking on provable types or pointer types.
if (E->hasSomeTypeProperty1()) {
  // OK case, do some property1-specific checking
} else if (E->hasSomeTypeProperty2()) {
  // OK case, do some property2-specific checking
} else {
  // Emit some error
}

It turns out that in all cases, hasSomeTypeProperty1 is actually isPointerType (or stuff like isScalarType, which includes pointers), so the current code is actually already checking for pointerness, so in a sense my new pointer type cheking is already included in subsequent checks, I'm not adding anything new.

But maybe another change like this one will be able to prove more stuff about dependent types (e.g. a dependent type could have propery2 too ), so what about we only bail out on dependent types *after* we are done with the type checking:

if (E->isTypeDependent() && !E->isPointerType()) {
  // bail out
}
// Actual type checking on provable types or pointer types.
if (E->hasSomeTypeProperty1()) {
  // OK case, do some property1-specific checking
} else if (E->hasSomeTypeProperty2()) {
  // OK case, do some property2-specific checking
} else if (if (E->isTypeDependent()) {
  // bail out
} else {
  // Emit some error
}

This essentially goes from "If the type is not dependent, try to prove stuff about the type, else return error" to "try to prove stuff about the type, else if not dependent, return error".

I'm modified the code here to use this approach, what do you think ?