Page MenuHomePhabricator

[Sema] Change addr space diagnostics in casts to follow C++ style
ClosedPublic

Authored by Anastasia on Feb 18 2019, 5:13 AM.

Details

Summary

This patch adds a new diagnostic for mismatching address spaces to be used for C++ casts (TODO: only enabled in C style cast for now, the rest will follow!).

This change extends C-style cast rules to account for address spaces. It also adds a separate function for address space cast checking that can be used to map from a separate address space cast operator addrspace_cast (to be added as a follow up patch).

Note, that after this change clang will no longer allows arbitrary address space conversions in reinterpret_casts because they can lead to accidental errors. The implicit safe conversions would still be allowed.

See RFC for more details: http://lists.llvm.org/pipermail/cfe-dev/2018-December/060546.html

Diff Detail

Repository
rL LLVM

Event Timeline

Anastasia created this revision.Feb 18 2019, 5:13 AM
ebevhan added inline comments.Feb 18 2019, 7:44 AM
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

This might not be applicable to this patch, but just something I noticed.

So reinterpret_cast only operates on pointers when dealing with address spaces. What about something like

T a;
T& a_ref = reinterpret_cast<T&>(a);

reinterpret_cast on an lvalue like this is equivalent to *reinterpret_cast<T*>(&a). So for AS code:

__private T x;
__generic T& ref = reinterpret_cast<__generic T&>(x);

This should work, since *reinterpret_cast<__generic T*>(&x) is valid, correct?

What if we have the reference cast case with a different address space like this? Doesn't the IsLValueCast check need to be first?

2309 ↗(On Diff #187227)

Maybe I'm mistaken, but won't getting the canonical type here drop qualifiers (like cv) in nested pointers? If so, an addrspace_cast might strip qualifiers by mistake.

Anastasia marked 2 inline comments as done.Feb 18 2019, 11:04 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

What if we have the reference cast case with a different address space like this? Doesn't the IsLValueCast check need to be first?

Ok, let me see if I understand you. I changed __generic -> __global since it's invalid and the error is produced as follows:

test.cl:7:21: error: reinterpret_cast from 'int' to '__global int &' is not allowed
__global int& ref = reinterpret_cast<__global int&>(x);

Is this not what we are expecting here?

2309 ↗(On Diff #187227)

Yes, indeed I will need to extend this to nested pointers when we are ready. But for now I can try to change this bit... however I am not sure it will work w/o canonical types when we have typedef. I will try to create an example and see.

ebevhan added inline comments.Feb 19 2019, 12:45 AM
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

My last sentence could have been a bit clearer. Yes, for the __private->__global case, it should error. But I was thinking more of the case where the AS conversion is valid, like __private->__generic. Then we will do the AS conversion, but we should have done both an AS conversion and an LValueBitCast, because we need to both convert the 'pointer' and also dereference it.

Anastasia marked an inline comment as done.Feb 19 2019, 2:50 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

Hmm... it seems like here we can only have one cast kind... I guess CK_LValueBitCast leads to the generation of bitcast in the IR? But addrspacecast can perform both bit reinterpretation and address translation, so perhaps it makes sense to only have an address space conversion in this case? Unless some other logic is needed elsewhere before CodeGen... I will try to construct a test case in plain C++.

Anastasia marked an inline comment as done.Feb 19 2019, 8:18 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2309 ↗(On Diff #187227)

I checked the canonical type does preserve the qualifiers correctly.

Here is the AST dump of the following C type mytype const __generic* where typedef __generic int* mytype;.

PointerType 0x204d3b0 'const __generic mytype *'
`-QualType 0x204d369 'const __generic mytype' const __generic
  `-TypedefType 0x204d320 'mytype' sugar
    |-Typedef 0x204d1b0 'mytype'
    `-PointerType 0x204d170 '__generic int *'
      `-QualType 0x204d158 '__generic int' __generic
        `-BuiltinType 0x2009750 'int'

and it's canonical representation in AST is:

PointerType 0x204d380 '__generic int *const __generic *'
`-QualType 0x204d349 '__generic int *const __generic' const __generic
  `-PointerType 0x204d170 '__generic int *'
    `-QualType 0x204d158 '__generic int' __generic
      `-BuiltinType 0x2009750 'int'

So using canonical type will just simply handling of nested pointer chain by avoiding special casing typedefs. We won't loose any qualifiers.

ebevhan added inline comments.Feb 20 2019, 5:02 AM
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

Hmm... it seems like here we can only have one cast kind... I guess CK_LValueBitCast leads to the generation of bitcast in the IR?

That's likely, yes. The dereference in the example disappears if we assign to a T& and it becomes implied if we assign to a T through lvalue-to-rvalue conversion, so in both cases we're taking the address of something and then converting the address to a different type.

But addrspacecast can perform both bit reinterpretation and address translation, so perhaps it makes sense to only have an address space conversion in this case? Unless some other logic is needed elsewhere before CodeGen... I will try to construct a test case in plain C++.

Oh, you're right! For some reason I was thinking that addrspacecast could only change the addrspace of the argument, but apparently it could change the pointee type too. At least according to the langref. So long as nothing in Clang assumes that a CK_AddressSpaceConversion can't change the pointee type as well as the address space I guess it should be safe.

When I try a 'simultaneous' conversion in C, I don't get a single addrspacecast, though: https://godbolt.org/z/Q818yW So I wonder if existing code handles this.

A test case would be good.

2309 ↗(On Diff #187227)

Okay, seems good then!

Anastasia marked an inline comment as done.Feb 20 2019, 5:53 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

I am not sure I understood the problem you are describing, may be I need an example...

But the valid address space conversion example in OpenCL

__private T x = ...;
__generic T& ref = reinterpret_cast<__generic T&>(x);

produces correctly addrspacecast

%0 = load i32*, i32** %x.addr, align 4
%1 = addrspacecast i32* %0 to i32 addrspace(4)*
store i32 addrspace(4)* %1, i32 addrspace(4)** %ref

However, if we keep checking CK_LValueBitCast first clang attempts to generate bitcast but fails because pointers are in different address spaces.

Anastasia marked an inline comment as done.Feb 20 2019, 6:00 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

Sorry, I haven't seen your previous comment before sending mine. :)

When I try a 'simultaneous' conversion in C, I don't get a single addrspacecast

I think 2 conversions in IR are not necessary, not sure how we ended up doing this though... but we had related discussion on that while implementing some OpenCL features and the conclusion was that we should just have an addrspacecast. Do you think we should correct this in C?

When I try a 'simultaneous' conversion in C, I don't get a single addrspacecast, though: https://godbolt.org/z/Q818yW So I wonder if existing code handles this.

Ok, I will check whether we cover this with any CodeGen test and if not I will add one.

ebevhan added inline comments.Feb 20 2019, 6:25 AM
lib/Sema/SemaCast.cpp
2224 ↗(On Diff #187227)

I think 2 conversions in IR are not necessary, not sure how we ended up doing this though... but we had related discussion on that while implementing some OpenCL features and the conclusion was that we should just have an addrspacecast. Do you think we should correct this in C?

Yes, I don't think it's necessary either, the addrspacecast should be enough.

Looking closer at the emission in the C case, Clang actually emits a single addrspacecast. Something else in LLVM is splitting it up into an addrspacecast and a bitcast. So it seems to be working just fine.

I was just concerned that we might hit edge cases where one conversion wasn't enough to convert both the address space and the pointee type, but it seems like CK_AddressSpaceConversion acts as both if it needs to. So it all seems good!

Anastasia updated this revision to Diff 187573.Feb 20 2019, 6:51 AM

Added a CodeGen test to cover address space of reference in reinterpret_cast.

Anastasia marked 2 inline comments as done.Feb 20 2019, 10:11 AM

One comment, but otherwise LGTM.

lib/Sema/SemaCast.cpp
2309 ↗(On Diff #187227)

It seems to me that the rules in this function are the reasonable cross-language rules. If you want to check for OpenCL at the top as a fast-path check (taking advantage of that fact that no other language modes currently have overlapping address spaces), that might be reasonable, but the comment and indentation should reflect that.

  • Added a comment to explain OpenCL check.
Anastasia marked an inline comment as done.Feb 27 2019, 10:16 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2309 ↗(On Diff #187227)

Btw, I attempted to remove the OpenCL check and unfortunately found failure in the following test: test/CodeGenCXX/address-space-cast.cpp

error: C-style cast from 'char *' to '__attribute__((address_space(5))) char *' converts between mismatching address spaces
  __private__ char *priv_char_ptr = (__private__ char *)gen_char_ptr;

I am not sure what such conversion does but I feel if I want to update it I might need to get wider agreement. I think sending an RFC might be a good way forward.

ebevhan added inline comments.Feb 28 2019, 1:09 AM
lib/Sema/SemaCast.cpp
2309 ↗(On Diff #187227)

Seems like that would happen if the address spaces don't overlap, which none of the target address spaces do. That sort of cast works in C because the ASes are only tested for compatibility when you're building for OpenCL (in SemaCast:checkAddressSpaceCast). Without a guard on OpenCL, any target AS compatibility check will fail.

A possible fix could be to omit the isAddressSpaceOverlapping check when !OpenCL, but properly formalizing the AS conversion behavior would most likely be the best fix for this.

Anastasia marked an inline comment as done.Feb 28 2019, 2:43 AM
Anastasia added inline comments.
lib/Sema/SemaCast.cpp
2309 ↗(On Diff #187227)

Ok yes, I would like to formalize this indeed, but I will definitely need some help. I think the best way to proceed is to gather all the ideas we have discussed so far and send an RFC.

Hmm. Yeah, Embedded C allows these casts, so contra my previous comment, I think we can't make them ill-formed outside of OpenCL mode.

Hmm. Yeah, Embedded C allows these casts, so contra my previous comment, I think we can't make them ill-formed outside of OpenCL mode.

Ok, would these rules apply in regular C99 mode then? Do you think I should change anything in the comment about the addr space cast logic here?

In the abstract, it would be nice to warn about casts that always have UB because the address spaces don't overlap; however, I'm worried about how people might be using address spaces in strange ways today, knowing that they *do* overlap, only in ways that the compiler isn't smart enough for. I think we should just be permissive in non-OpenCL mode.

Anastasia updated this revision to Diff 189361.EditedMar 5 2019, 10:46 AM

Updated FiXME explaining why C++ mode is more permissive.

In the abstract, it would be nice to warn about casts that always have UB because the address spaces don't overlap; however, I'm worried about how people might be using address spaces in strange ways today, knowing that they *do* overlap, only in ways that the compiler isn't smart enough for. I think we should just be permissive in non-OpenCL mode.

Ok, I have explained this in FIXME.

rjmccall added inline comments.Mar 5 2019, 10:49 AM
lib/Sema/SemaCast.cpp
2293 ↗(On Diff #189361)

Please structure this as an early exit.

Anastasia updated this revision to Diff 189474.Mar 6 2019, 3:25 AM

Restructure code with early return.

Anastasia marked an inline comment as done.Mar 6 2019, 3:25 AM
rjmccall accepted this revision.Mar 6 2019, 7:14 AM

Thanks, LGTM.

This revision is now accepted and ready to land.Mar 6 2019, 7:14 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2019, 9:06 AM

This change has caused a regression for generic C++. Should this affect OpenCL only?

The following simple test causes the diagnostic to be emitted:

#define ATTR_GLOBAL attribute((address_space(1)))

int calc(int ATTR_GLOBAL *ip) {

int i = *ip;
return i+1;

}

int main() {

int i = 99;
int *ip = &i;

auto *g_i_ptr = reinterpret_cast<int  ATTR_GLOBAL*>(ip);
return calc(g_i_ptr);

}