This is an archive of the discontinued LLVM Phabricator instance.

[C++] Ignore top-level qualifiers in casts
ClosedPublic

Authored by olestrohm on May 18 2021, 7:00 AM.

Details

Summary

Ignore top-level qualifiers in casts, which fixes issues in reinterpret_cast.

Original aim:
Allow converting between types with different address spaces.

This allows converting a __private int to a __private int, which is currently not possible.
It also allows converting a value to any other address space, which can make sense since
the actual bit representation of the values doesn't depend on the address space.

This is the solution I chose currently allows converting between any address space
because the address space of SrcExpr is erased before entering the function,
so allowing only converting when the address spaces is the same would require larger changes.

I'm not sure if this conversion should be allowed, though converting to the exact same type definitely should.

Fixes the first issue in PR49221

Diff Detail

Event Timeline

olestrohm created this revision.May 18 2021, 7:00 AM
olestrohm requested review of this revision.May 18 2021, 7:00 AM

Just to move the discussion from https://reviews.llvm.org/D101519 here. First of all I think this change should not be OpenCL specific i.e. there is not reason to do something different from C++ in general.

Embedded C doesn't regulate the conversions between non-pointer types with different address spaces but in OpenCL C we have allowed them apparently https://godbolt.org/z/9f75zxj7v. I am not sure whether this was done intentionally or not.

However, in C there are no references and other features that might create problems.

At the same time it seems C++ allows casting away const https://godbolt.org/z/fo3axbq3e

But it contradicts the comment in the code:

C++ 5.2.10p2 has a note that mentions that, subject to all other
restrictions, a cast to the same type is allowed so long as it does not
cast away constness. In C++98, the intent was not entirely clear here,
since all other paragraphs explicitly forbid casts to the same type.
// C++11 clarifies this case with p2.

I would like to see if @rjmccall has any opinion on whether reinterpreting between non-pointer types with different address spaces is reasonable?

In general it makes sense to follow logic for other type qualifier here but I was thinking of whether we can allow something weird to be written by relaxing this for address spaces

local int i;
const global int & ii = reinterpret_cast<global int>(i);

I think this code is completely meaningless as it requires creating a temporary in global address space? I am not sure what we will even emit in IR?

Top-level qualifiers aren't normally meaningful on pr-values.

The C standard says that casts are to the unqualified type:

N2454 6.5.4p5: Preceding an expression by a parenthesized type name converts the value of the expression to the unqualified version of the named type.

The C++ standard does not appear to have similar wording. On the other hand, the C++ standard says that e.g. "The result of the expression (T) cast-expression is of type T", and similarly for the other casts, which is clearly just wrong if T is a reference type; the wording clarifies that the expression is an l-value or x-value if the type is a reference but doesn't remove the reference-ness of the expression type as it must, unless that's done by some other clause at a distance. CC'ing Richard.

It makes sense to me that qualifiers on the cast type ought to be uniformly and silently ignored, including address space qualifiers. C++ has a very strange feature of allowing qualifiers on class types in particular circumstances, so we might need to be permissive about that, although that's tricky for address spaces because (unlike with const or volatile) we cannot in fact allocate a temporary in an arbitrary address space.

I agree that this should be done uniformly, not just in OpenCL, unless OpenCL wants rules that differ from what we think they ought to be in general.

The C++ standard does not appear to have similar wording. On the other hand, the C++ standard says that e.g. "The result of the expression (T) cast-expression is of type T", and similarly for the other casts, which is clearly just wrong if T is a reference type; the wording clarifies that the expression is an l-value or x-value if the type is a reference but doesn't remove the reference-ness of the expression type as it must, unless that's done by some other clause at a distance.

It is indeed done by a different clause at a distance:

[expr.type]
  1. If an expression initially has the type “reference to T”, the type is adjusted to T prior to any further analysis. The expression designates the object or function denoted by the reference, and the expression is an lvalue or an xvalue, depending on the expression.
  1. If a prvalue initially has the type “cv T”, where T is a cv-unqualified non-class, non-array type, the type of the expression is adjusted to T prior to any further analysis.

C++ has a very strange feature of allowing qualifiers on class types in particular circumstances, so we might need to be permissive about that, although that's tricky for address spaces because (unlike with const or volatile) we cannot in fact allocate a temporary in an arbitrary address space.

Right, specifically C++ preserves top-level const and volatile qualifications on prvalues only if they're of class or array type. If an address-space-qualified temporary makes sense, then I think extending this behavior to address spaces makes sense too:

  • For array prvalues, an AS1 int[3] should decay to an AS1 int*, not to a plain int*, because it seems clear the intent is for the array to be created in the given address space.
  • For class prvalues, I think it's important that we preserve the address space, because (for example) a function AS1 MyClass f(); might take a return value slot as an AS1 pointer, so a prvalue f() should retain the address space qualification.

While we can't allocate an address-space-qualified temporary right now, would it be a good idea to prepare for the possibility that we will one day be able to, in some cases? For example, in a system with separate control and data stacks, we might want to put some data on the control stack; in an exotic heterogeneous compute system we might be able to allocate memory in the host and device stack from a single function. If so, following the C++ rule for all qualifiers might be a good direction. (We'll presumably need to look at the other qualifiers and see if any of them would have problems with this treatment.)

The C++ standard does not appear to have similar wording. On the other hand, the C++ standard says that e.g. "The result of the expression (T) cast-expression is of type T", and similarly for the other casts, which is clearly just wrong if T is a reference type; the wording clarifies that the expression is an l-value or x-value if the type is a reference but doesn't remove the reference-ness of the expression type as it must, unless that's done by some other clause at a distance.

It is indeed done by a different clause at a distance:

[expr.type]
  1. If an expression initially has the type “reference to T”, the type is adjusted to T prior to any further analysis. The expression designates the object or function denoted by the reference, and the expression is an lvalue or an xvalue, depending on the expression.
  1. If a prvalue initially has the type “cv T”, where T is a cv-unqualified non-class, non-array type, the type of the expression is adjusted to T prior to any further analysis.

Ah, thank you.

C++ has a very strange feature of allowing qualifiers on class types in particular circumstances, so we might need to be permissive about that, although that's tricky for address spaces because (unlike with const or volatile) we cannot in fact allocate a temporary in an arbitrary address space.

Right, specifically C++ preserves top-level const and volatile qualifications on prvalues only if they're of class or array type. If an address-space-qualified temporary makes sense, then I think extending this behavior to address spaces makes sense too:

  • For array prvalues, an AS1 int[3] should decay to an AS1 int*, not to a plain int*, because it seems clear the intent is for the array to be created in the given address space.
  • For class prvalues, I think it's important that we preserve the address space, because (for example) a function AS1 MyClass f(); might take a return value slot as an AS1 pointer, so a prvalue f() should retain the address space qualification.

Hmm. That's an interesting question, because while I can see how it's an interesting thing to be able to express, it might also lead to otherwise-unnecessary template errors, e.g. if a template type argument is inferred to be __global MyClass. On the other hand, that kind of inference can easily lead to other problems with e.g. locals or fields; making it work in slightly more cases isn't necessarily worthwhile.

While we can't allocate an address-space-qualified temporary right now, would it be a good idea to prepare for the possibility that we will one day be able to, in some cases? For example, in a system with separate control and data stacks, we might want to put some data on the control stack; in an exotic heterogeneous compute system we might be able to allocate memory in the host and device stack from a single function. If so, following the C++ rule for all qualifiers might be a good direction. (We'll presumably need to look at the other qualifiers and see if any of them would have problems with this treatment.)

Hmm, okay, I cede the point. I have some concerns here (C compatibility and addressability), but we don't have to design AS-qualified return values right now. I agree that we should be stripping qualifiers except on class and array types.

I believe array types are actually ruled out for casts because there are no conversions to array type.

olestrohm updated this revision to Diff 347922.May 26 2021, 4:57 AM

I've cleaned up the check. The qualifiers of SrcType are removed prior to this function, so only DestType needs to have the address space removed.

reinterpret_cast only allows integral types, so structs are not an issue here, though pointers are.
However this change only allows converting between pointer types as long as they're the exact same, save for the outermost address space.

rjmccall added inline comments.May 28 2021, 10:03 AM
clang/lib/Sema/SemaCast.cpp
2370

I think the upshot of the conversation Richard and I just had is that there's a bug here for all qualifiers; that is, reinterpret_cast<const int>(5) should be allowed. Probably the right thing to do is to strip qualifiers from DestType in the CastOperation constructor when the type is not a class or an array. Richard, do you agree? This isn't strictly implied by the C++ wording since the semantic analysis of the cast is logically prior to the introduction of a qualifier pr-value expression.

olestrohm added inline comments.Jun 1 2021, 2:43 AM
clang/lib/Sema/SemaCast.cpp
2370

Ah, I see, I did indeed misunderstand the conversation above. I see now that with what Richard mentioned with If a prvalue initially has the type “cv T”, where T is a cv-unqualified non-class, non-array type, the type of the expression is adjusted to T prior to any further analysis. it makes sense to strip qualifiers for those cases. I also checked out gcc to compare to another implementation, and it does work there, so this behaviour doesn't seem unreasonable.

I'll rework this to strip all qualifiers somewhere in the CastOperation constructor as you suggested then.

olestrohm updated this revision to Diff 350560.Jun 8 2021, 3:07 AM
olestrohm retitled this revision from [C++4OpenCL] Allow address space conversion in reinterpret_cast to [C++] Ignore top-level qualifiers in casts.
olestrohm edited the summary of this revision. (Show Details)

I've added a check in the constructor for CastOperator that removes qualifiers for non-class, non-array types.
I had to add a check for ObjectiveC, since they seem to use qualifiers for a lot of language features,
but maybe a better solution would be to remove specific qualifiers instead?

This actually didn't cause that many changes in behaviour or error messages, maybe because the other casts
already have checks for this, but there was one change in a C++ test that looks a little strange after this patch,
but maybe that's alright.

Hopefully this is makes sense, but do suggest changes to the approach if you think there is a better one.

Anastasia added inline comments.Jun 9 2021, 4:33 AM
clang/test/SemaCXX/warn-reinterpret-base-class.cpp
301

I wonder if there is a way to track the original type for the diagnostic? Or maybe it is intuitive enough considering the quoted spec wording...

olestrohm added inline comments.Jun 9 2021, 4:38 AM
clang/test/SemaCXX/warn-reinterpret-base-class.cpp
301

This actually already happens with the source type too.
const short s = 0;
reinterpret_cast<int>(s);

gives the error "reinterpret_cast from 'short' to 'int' is not allowed", so while this is definitely worse, it's a general problem with reinterpret_cast.

Anastasia accepted this revision.Jun 17 2021, 4:26 AM

LGTM! Thanks!

I think the diagnostic wording can be improved separately if it's considered important as it seems this issue already existed.

I suggest adding [Sema] tag in your final commit title too and it would be good to explain generic C++ rules in the commit message.

This revision is now accepted and ready to land.Jun 17 2021, 4:26 AM
This revision was landed with ongoing or failed builds.Jul 5 2021, 4:28 AM
This revision was automatically updated to reflect the committed changes.