Page MenuHomePhabricator

Make address space conversions a bit stricter.
ClosedPublic

Authored by ebevhan on Feb 14 2019, 7:41 AM.

Details

Summary

The semantics for converting nested pointers between address
spaces are not very well defined. Some conversions which do not
really carry any meaning only produce warnings, and in some cases
warnings hide invalid conversions, such as 'global int*' to
'local float*'!

This patch changes the logic in checkPointerTypesForAssignment
and checkAddressSpaceCast to fail properly on implicit conversions
that should definitely not be permitted. We also dig deeper into the
pointer types and warn on explicit conversions where the address
space in a nested pointer changes, regardless of whether the address
space is compatible with the corresponding pointer nesting level
on the destination type.

See https://bugs.llvm.org/show_bug.cgi?id=39674

Diff Detail

Event Timeline

ebevhan created this revision.Feb 14 2019, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 7:41 AM
ebevhan marked an inline comment as done.Feb 14 2019, 11:08 AM
ebevhan added inline comments.
test/SemaOpenCL/address-spaces.cl
161

This doesn't seem entirely correct still, but I'm not sure what to do about it.

rjmccall added inline comments.Feb 15 2019, 3:21 PM
lib/Sema/SemaCast.cpp
2347

This should if (Nested ? DestPPointee.getAddressSpace() != SrcPPointee.getAddressSpace() : !DestPPtr->isAddressSpaceOverlapping(*SrcPtr)), I think.

lib/Sema/SemaExpr.cpp
7722

The TODO here is fixed as much as anything else is.

7797

Please extract variables for cast<PointerType>(lhptee)->getPointeeType() (and for rhptee).

14228

Yeah, I think that would be more straightforward.

ebevhan updated this revision to Diff 187230.Feb 18 2019, 5:24 AM
ebevhan marked 6 inline comments as done.Feb 18 2019, 5:31 AM
ebevhan added inline comments.
lib/Sema/SemaCast.cpp
2334–2335

I'd also like to petition to remove the guard on OpenCL here. Maybe an RFC for formalizing the support for address space conversion semantics is in order?

lib/Sema/SemaExpr.cpp
14228

I ended up not making it that different from the original one. Perhaps it's not different enough?

Anastasia added inline comments.Feb 18 2019, 7:29 AM
include/clang/Basic/DiagnosticSemaKinds.td
7013

I am wondering if we could just unify with the diagnostic above?

We could add another select at the end:

" changes address space of %select{nested|}3 pointer"
lib/Sema/SemaCast.cpp
2334–2335

Yes, I'd support that! It would be good to generalize the rules as much as we can rather than adding extra checks for languages (the semantic is very similar).

As a matter of fact I tried to do the same in https://reviews.llvm.org/D58346 in TryAddressSpaceCast and some C++ tests fail. So I had to add OpenCL check for now. :( Perhaps, they could just be changed but needs agreement with the community first.

2335

I think my patch is divorcing C++ from here https://reviews.llvm.org/D58346.

Although, I might need to update it to add the same checks. I can do it once your patch is finalized. :)

lib/Sema/SemaExpr.cpp
14231

May be, since if we are using a typedef that is a pointer type isPointerType() might not return true? I would just extend a test case with typedef to a pointer type as one of the nested levels.

test/CodeGenOpenCL/numbered-address-space.cl
17

Does this not compile any more?

test/SemaOpenCL/address-spaces.cl
161

Is it because Sema::IncompatiblePointer has priority? We might want to change that. I think it was ok before because qualifier's mismatch was only a warning but now with the address spaces we are giving an error. I wonder if adding a separate enum item for address spaces (something like Sema::IncompatibleNestedPointerAddressSpace) would simplify things.

ebevhan marked 5 inline comments as done.Feb 18 2019, 8:26 AM
ebevhan added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
7013

That is doable, but all of the 'typecheck' errors/warnings are made to be regular. If I add another parameter, there needs to be a special case in DiagnoseAssignmentResult for that error in particular.

lib/Sema/SemaCast.cpp
2335

Yes, I noticed that you weren't calling checkAddressSpaceCast any longer in one of the code paths.

I suspect that your patch might already be doing the same thing... though not explicitly. TryAddressSpaceCast will only check compatibility on the top level pointee, and then it goes on to check type similarity, and if the types are not similar, it bails. Types with different nested pointer address spaces are not (should not be?) considered compatible, so we should get the same outcome. We won't get the 'nested' error, though.

lib/Sema/SemaExpr.cpp
14231

It sort of depends on what checkPointerTypesForAssignment considers invalid.

I think maybe I should throw this logic away and instead simply add a new enum member to AssignmentResult.

test/CodeGenOpenCL/numbered-address-space.cl
17

No, these tests were a bit shaky. I'm not even sure what they're supposed to be testing. It's trying to pass an arbitrary AS int * pointer to a function that takes __local float *. That AS conversion is illegal (implicitly), but the illegal conversion was 'shadowed' by the 'incompatible pointer' warning, so we didn't get an error. This is one of the things this patch fixes.

Since it's a codegen test, it should be producing something, but I'm not really sure what is interesting to produce here, so I just removed it.

test/SemaOpenCL/address-spaces.cl
161

Is it because Sema::IncompatiblePointer has priority?

Sort of. The problem is that the AS pointee qualifiers match up until the 'end' of the RHS pointer chain (LHS: private->global->local, RHS: private->global), so we never get an 'incompatible address space' to begin with. We only get that if 1) the bottommost type is equal after unwrapping pointers (as far as both sides go), or 2) any of the 'shared' AS qualifiers (as far as both sides go) were different.

The idea is that stopping when either side is no longer a pointer will produce 'incompatible pointers' when you have different pointer depths, but it doesn't consider anything below the 'shallowest' side of the pointer chain, so we miss out on any AS mismatches further down.

(Not that there's anything to mismatch, really. There is no matching pointer on the other side, so what is really the error?)

What should the criteria be for when the pointer types 'run out'? I could have it keep digging through the other pointer until it hits a different AS? This would mean that this:

int **** a;
int ** b = a;

could give a different warning than it does today, though (incompatible nested qualifiers instead of incompatible pointers, which doesn't make sense...) . We would have to skip the lhptee == rhptee check if we 'kept going' despite one side not being a pointer type. So I don't know if that's the right approach in general.

Or should we be searching 'backwards' instead, starting from the innermost pointee? I don't know.

It really feels like the whole checkPointerTypesForAssignment routine and everything surrounding it is a bit messy. It relies on an implicit result from another function (typesAreCompatible) and then tries to deduce why that function thought the types weren't compatible. Then another function later on (DiagnoseAssignmentResult) tries to deduce why THIS function thought something was wrong.

I wonder if adding a separate enum item for address spaces (something like Sema::IncompatibleNestedPointerAddressSpace) would simplify things.

This would simplify the logic on the error emission side, since we don't need to duplicate the logic for determining what went wrong, but doesn't help with diagnosing the actual problem. Probably a good idea to add it anyway, I just wanted to avoid adding a new enum member since that means updating a lot of code elsewhere.

ebevhan marked an inline comment as done.Feb 18 2019, 8:30 AM
ebevhan added inline comments.
test/SemaOpenCL/address-spaces.cl
161

We only get that if 1) the bottommost type is equal after unwrapping pointers (as far as both sides go), or 2) any of the 'shared' AS qualifiers (as far as both sides go) were different.

Sorry, should only be 2) here. Was focused on the whole 'incompatible nested qualifiers' result.

Anastasia added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
7013

Oh I see... might not worth it?

test/CodeGenOpenCL/numbered-address-space.cl
17

Ok, I will just loop in @arsenm to confirm. OpenCL doesn't regulate arbitrary address spaces. And C doesn't regulate OpenCL ones. So interplay between those two has undefined behavior in my opinion. However, OpenCL code can make use of arbitrary address spaces since it's a valid Clang extension... But I am not sure what happens with this undefined behaviors.

For this specific case I would rather expect an error... but not sure it's worth testing this anyway.

May be Matt can provide us some more insights!

test/SemaOpenCL/address-spaces.cl
161

What should the criteria be for when the pointer types 'run out'? I could have it keep digging through the other pointer until it hits a different AS?

Hmm, good point! C99 spec seems to be helpless. C++ seems to imply that it checks pointers left to right as far as I interpret that from [conv.qual]. Not sure what we should do... Would it make sense to align with C++ or otherwise whatever is simpler? At least there is a diagnostic generated. So perhaps after all it's good enough for now!

I wonder if adding a separate enum item for address spaces (something like Sema::IncompatibleNestedPointerAddressSpace) would simplify things.

This would simplify the logic on the error emission side, since we don't need to duplicate the logic for determining what went wrong, but doesn't help with diagnosing the actual problem. Probably a good idea to add it anyway, I just wanted to avoid adding a new enum member since that means updating a lot of code elsewhere.

Ok, common helper function could be another solution to avoid duplication but it seems the logic is not entirely identical.

ebevhan updated this revision to Diff 187531.Feb 20 2019, 1:33 AM
ebevhan marked 2 inline comments as done.Feb 20 2019, 1:45 AM
ebevhan added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
7013

I think keeping the generality makes it a bit simpler. Technically many of the errors here could be folded into one or two instead, but that hasn't been done, so...

test/SemaOpenCL/address-spaces.cl
161

Would it make sense to align with C++ or otherwise whatever is simpler?

I think C++ is a bit stricter in general, it doesn't permit this, but C is more lenient. There is a diagnostic, so I left a FIXME but it should probably be revisited.

Ok, common helper function could be another solution to avoid duplication but it seems the logic is not entirely identical.

I added a new enum member and removed the logic on the error emission side.

Anastasia added inline comments.Feb 20 2019, 2:53 AM
lib/Sema/SemaExpr.cpp
7811

I am wondering since the behavior is so specialized for the address spaces whether we should name this something like IncompatibleNestedPointerAddressSpaceMismatch.

test/SemaOpenCL/address-spaces.cl
159

Are we sure it has to be an error? May be we can change this wording to something like - unknown behavior that needs clarifying? Some similar text to your comment in the code.

ebevhan updated this revision to Diff 188504.Feb 27 2019, 2:18 AM
ebevhan marked an inline comment as done.
ebevhan marked an inline comment as done.
ebevhan added inline comments.
test/SemaOpenCL/address-spaces.cl
159

Well... If __private int ** to __generic int ** is an error but __private int *** to __generic int ** isn't, that seems a bit odd to me...

Anastasia accepted this revision.Feb 27 2019, 9:54 AM

LGTM! Thanks a lot for fixing this old bug! Btw, do you plan to look at generalizing this to C++ as well?

I don't feel we need anything for mixing OpenCL with GNU style address spaces at this point. We can always extend this in the future (in case it turns out to be useful to someone!).

test/SemaOpenCL/address-spaces.cl
159

Ok, I am still confused about this case because it's converting number of pointers. It's not exactly the same to me as converting address spaces of pointers. Since __generic is default addr space in OpenCL, in your example following 2 inner address spaces seem to match:
__private int * __generic* __generic* -> __generic int * __generic*

But in any case keeping FIXME is definitely the right thing to do here until we come up with some sort of rules.

This revision is now accepted and ready to land.Feb 27 2019, 9:54 AM

Generally, with an explicit cast, C allows any pointer cast with a reasonable interpretation, even if the underlying operation is suspicious. For example, you can cast an "long*" to a "int*" (as in "(int*)(long*)p") without any complaint, even though dereferencing the result is likely undefined behavior. (C doesn't allow explicitly writing a reinterpret_cast, but that's basically the operation in question.)

Along those lines, in general, the normal C rules should allow casting foo* to bar* for any object types foo and bar, even if foo and bar are pointers with address spaces, like __local int * and __global int *. I don't see anything in the OpenCL standard that would contradict this. It looks like this patch changes that?

I agree we shouldn't allow implicit conversions, though...

rjmccall added inline comments.Feb 27 2019, 10:28 AM
test/SemaOpenCL/address-spaces.cl
159

For reasons that have always been obscure to me, Clang is very permissive about pointer-type changes in C, far more than the C standard actually requires. It's hard to change that in general, but I think it's fine to harden specific new cases like nested address-space conversions.

LGTM! Thanks a lot for fixing this old bug! Btw, do you plan to look at generalizing this to C++ as well?

That does sound like a good idea and I will probably look into it when I have more time. I'd also like to look into making an RFC for the improved address space specification support. Again, when I have time :)

Generally, with an explicit cast, C allows any pointer cast with a reasonable interpretation, even if the underlying operation is suspicious. For example, you can cast an "long*" to a "int*" (as in "(int*)(long*)p") without any complaint, even though dereferencing the result is likely undefined behavior. (C doesn't allow explicitly writing a reinterpret_cast, but that's basically the operation in question.)

That is true... The current rules are very permissive about pointer casts.

Along those lines, in general, the normal C rules should allow casting foo* to bar* for any object types foo and bar, even if foo and bar are pointers with address spaces, like __local int * and __global int *. I don't see anything in the OpenCL standard that would contradict this. It looks like this patch changes that?

I vaguely recall that when I was looking into fixing this in our downstream a while back I found that OpenCL did not have any explicit provisions for nested address spaces, only direct ones.

So should we drop the extra error on nested space mismatches, since the rules should be as permissive as possible unless the spec prohibits it? Technically the patch could be changed to only fix the incorrect warning on direct address space mismatches instead.

Along those lines, in general, the normal C rules should allow casting foo* to bar* for any object types foo and bar, even if foo and bar are pointers with address spaces, like __local int * and __global int *. I don't see anything in the OpenCL standard that would contradict this. It looks like this patch changes that?

s6.5 of OpenCL spec is very explicit about this case: "A pointer to address space A can only be assigned to a pointer to the same address space A or a pointer to the generic address space. Casting a pointer to address space A to a pointer to address space B is illegal if A and B are named address spaces and A is not the same as B". Although there is a typo with missing first occurrence of B.

The behavior with generic is explained in s6.5.5.

Btw, this patch doesn't change anything in the address space conversions of pointers in OpenCL in general but just changes the behavior of nested pointers for which spec doesn't say anything. I think trying to reject code that is doing something dangerous is a good thing!

I think trying to reject code that is doing something dangerous is a good thing!

Refusing to compile code which is suspicious, but not forbidden by the specification, will likely cause compatibility issues; there are legitimate reasons to use casts which look weird.

I'm not against adding some sort of address space suspicious cast warning to catch cases where we think the user meant to do something else. But that's a separate issue, and it needs a proper cost-benefit analysis, including an analysis of the false-positive rate on existing code.

I think trying to reject code that is doing something dangerous is a good thing!

Refusing to compile code which is suspicious, but not forbidden by the specification, will likely cause compatibility issues; there are legitimate reasons to use casts which look weird.

The spec dioesn't allow these conversions either, it just simply doesn't cover this corner case at all. I don't think we are changing anything in terms of compatibility. If you have any examples of such casts that can be legitimate I would like to understand them better. What I have seen so far were the examples where addrspacecast was lost in IR for the memory segments translation and therefore wrong memory areas were accessed.

I'm not against adding some sort of address space suspicious cast warning to catch cases where we think the user meant to do something else.

I simply don't see how these conversions can be useful and some are definitely indirectly forbidden (there is no precise wording however). There are other ways to perform such conversions differently (by being more explicit) where correct IR can be then generated with addrspacecast. I don't think we are loosing anything in terms of functionality.

But that's a separate issue, and it needs a proper cost-benefit analysis, including an analysis of the false-positive rate on existing code.

Do you have any suggestions how to do this in practice with such rare corner case?

I think trying to reject code that is doing something dangerous is a good thing!

Refusing to compile code which is suspicious, but not forbidden by the specification, will likely cause compatibility issues; there are legitimate reasons to use casts which look weird.

The spec dioesn't allow these conversions either, it just simply doesn't cover this corner case at all. I don't think we are changing anything in terms of compatibility. If you have any examples of such casts that can be legitimate I would like to understand them better. What I have seen so far were the examples where addrspacecast was lost in IR for the memory segments translation and therefore wrong memory areas were accessed.

The spec just says that the casts follow C rules... and C says you can cast a pointer to an object type to a pointer to another object type (subject to alignment restrictions). By default, a pointer to a pointer isn't special.

In practice, unusual casts tend to show up in code building a datastructure using union-like constructs. In plain C, for example, sometimes you have a pointer to a float, and sometimes you have a pointer to an int, determined dynamically. I expect similar cases show up where a pointer points to memory which contains either a pointer in the global address-space, or a pointer in the local address-space, determined dynamically. In some cases, it might be clearer to use void* in more places, but that's mostly style issue.

I'm not against adding some sort of address space suspicious cast warning to catch cases where we think the user meant to do something else.

I simply don't see how these conversions can be useful and some are definitely indirectly forbidden (there is no precise wording however). There are other ways to perform such conversions differently (by being more explicit) where correct IR can be then generated with addrspacecast. I don't think we are loosing anything in terms of functionality.

In some cases, the correct code may not involve an addrspacecast at all; the pointer could just have the "wrong" pointee type, and the code expects to explicitly fix it. In that case, how do you fix it? Write (foo*)(void*)p instead of (foo*)p?

But that's a separate issue, and it needs a proper cost-benefit analysis, including an analysis of the false-positive rate on existing code.

Do you have any suggestions how to do this in practice with such rare corner case?

If the warning never triggers on any code you have access to, that would still be a useful datapoint.

I think trying to reject code that is doing something dangerous is a good thing!

Refusing to compile code which is suspicious, but not forbidden by the specification, will likely cause compatibility issues; there are legitimate reasons to use casts which look weird.

The spec dioesn't allow these conversions either, it just simply doesn't cover this corner case at all. I don't think we are changing anything in terms of compatibility. If you have any examples of such casts that can be legitimate I would like to understand them better. What I have seen so far were the examples where addrspacecast was lost in IR for the memory segments translation and therefore wrong memory areas were accessed.

The spec just says that the casts follow C rules... and C says you can cast a pointer to an object type to a pointer to another object type (subject to alignment restrictions). By default, a pointer to a pointer isn't special.

In practice, unusual casts tend to show up in code building a datastructure using union-like constructs. In plain C, for example, sometimes you have a pointer to a float, and sometimes you have a pointer to an int, determined dynamically. I expect similar cases show up where a pointer points to memory which contains either a pointer in the global address-space, or a pointer in the local address-space, determined dynamically. In some cases, it might be clearer to use void* in more places, but that's mostly style issue.

Ok, do you have any example for pointers with different types by some chance? It would be very helpful because I can try to see what would be the behavior with different address spaces...

One fundamental difference between C and OpenCL C is that we wanted to give more clear semantic to address spaces. It is not possible to casts between pointers of arbitrary address spaces. We wanted to make such conversions very explicit using generic address space. If the code is written using unions (or some other way) that requires such "meaningless" conversions it can still be cast but the cast has to be written using pointer indirection with generic address spaces. That was done deliberately to prevent accidental erroneous patterns to be compiled.

But that's a separate issue, and it needs a proper cost-benefit analysis, including an analysis of the false-positive rate on existing code.

Do you have any suggestions how to do this in practice with such rare corner case?

If the warning never triggers on any code you have access to, that would still be a useful datapoint.

To my knowledge these corner cases hasn't occurred yet in any code pattern known to me up. I created this upstream bug myself while sorting out some other address space related aspects. However, it's not impossible that it might happen in the future.

I still feel this might be the right direction for OpenCL, do you think this might not be right for C? If that's the case potentially we could add OpenCL specific checks for now and then try to clarify what should be the right strategy for C... it feels in C we just converts between addr spaces freely. The only danger of this conversion with nested pointers is that it produces bitcast instead of addrspacecast, which might result in accessing the wrong memory. Therefore, we wanted to restrict this in the first place.

Any more input on this?

I could redo the patch to simply fix the bug and not make the conversions stricter, if that's preferable.

Any more input on this?

I could redo the patch to simply fix the bug and not make the conversions stricter, if that's preferable.

I was playing a bit with some examples of enum with pointer field of various address spaces and I couldn't find any case where successfully converting nested pointers was useful. Instead it could easily result in execution of incorrect code. So I am sticking to the opinion to be more strict.

Similarly, C++ is more strict with similar conversions i.e. nested pointers between Derived and Base. I still think since we are implementing new rules it is good to be more helpful rather than inheriting older logic of permitting everything like in C.

Also of course if we find later any issue we could fix them and modify the behavior.

If nobody else agrees with my position on this, I'm not going to continue arguing on the explicit cast behavior. But please add a testcase showing that at least (global int**)(void*)(local int**)p works without an error.

I think C probably requires us to allow this under an explicit cast, but we can at least warn about it. It does not require us to allow this as an implicit conversion, and that should be an error.

I think C probably requires us to allow this under an explicit cast, but we can at least warn about it. It does not require us to allow this as an implicit conversion, and that should be an error.

Are you referring to casts to and from void*? I think the standard says those have to be allowed, and I don't see anything about whether or not the conversion has to be explicit.

I think C probably requires us to allow this under an explicit cast, but we can at least warn about it. It does not require us to allow this as an implicit conversion, and that should be an error.

Are you referring to casts to and from void*? I think the standard says those have to be allowed, and I don't see anything about whether or not the conversion has to be explicit.

No, I mean casts between pointer types that change the pointee address space.

Well, it doesn't seem to me like there is consensus on prohibiting nested address space conversion like this.

I can simply redo the patch to only include the bugfix on implicit conversions and drop the nesting level checks.

Well, it doesn't seem to me like there is consensus on prohibiting nested address space conversion like this.

I can simply redo the patch to only include the bugfix on implicit conversions and drop the nesting level checks.

I thought the conclusion is:

  • Explicit conversions allowed for nested pointers with a warning.
  • Implicit conversions are disallowed for nested pointers with an error.

Do you not see it this way?

Well, it doesn't seem to me like there is consensus on prohibiting nested address space conversion like this.

I can simply redo the patch to only include the bugfix on implicit conversions and drop the nesting level checks.

I thought the conclusion is:

  • Explicit conversions allowed for nested pointers with a warning.
  • Implicit conversions are disallowed for nested pointers with an error.

    Do you not see it this way?

Alright, that seems reasonable to me. Some of the patch goes away, then. I'll update it.

ebevhan updated this revision to Diff 195520.Apr 17 2019, 1:40 AM
ebevhan edited the summary of this revision. (Show Details)
rjmccall accepted this revision.Apr 17 2019, 8:11 AM

Alright, this seems reasonable to me.

This was accepted a while ago, but never landed. I don't have commit access; could someone commit it?

This was accepted a while ago, but never landed. I don't have commit access; could someone commit it?

Sure!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2019, 7:24 AM