This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Enhance AMDGPU AA.
ClosedPublic

Authored by hliao on Oct 15 2020, 7:58 PM.

Details

Summary
  • In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects.
    • When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space.
    • When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space.

Diff Detail

Event Timeline

hliao created this revision.Oct 15 2020, 7:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2020, 7:58 PM
hliao requested review of this revision.Oct 15 2020, 7:58 PM
hliao added a comment.Oct 15 2020, 8:00 PM

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

arsenm added inline comments.Oct 15 2020, 8:01 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105

Is it true that we couldn't have a relocation for an LDS value?

106

Typo CONSATNT

hliao updated this revision to Diff 298602.Oct 16 2020, 5:38 AM

Fix typos and revise the coding style following clang-tidy.

hliao marked an inline comment as done.Oct 16 2020, 5:41 AM
hliao added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105

Do you mean GOT-like relocation for LDS? That's possible if we don't want to clone sub-function per kernel-function. But, as GOT is created in the backend, it needs a PseudoValue for IR instead of a real Value. We could easily tell them from the load from constant memory in user code.

foad added a subscriber: foad.Oct 16 2020, 8:31 AM

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

FYI I've done some Vulkan CTS testing using LLPC with this patch and didn't notice any problems.

rampitec added inline comments.Oct 16 2020, 11:30 AM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

What is we pass a pointer to LDS allocated on the host side?

hliao added inline comments.Oct 16 2020, 12:41 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

No LDS pointer is passed through constant variables in user code. So far, LDS is allocated and assigned during codegen. Late, as LDS could be assigned per kernel and from 0, we could choose clone sub-function (called by more than one kernel functions) so that assignment could be done within linker, or add GOT like relocation if sub-function is not cloned. The late needs to prepare GOT entries on the host side but that load from GOT could be easily distinguished in the backend as it needs to use PseuodSource in MMO.

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

FYI I've done some Vulkan CTS testing using LLPC with this patch and didn't notice any problems.

thanks

rampitec added inline comments.Oct 16 2020, 12:48 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

But that's not constant, it is just Argument?

hliao added inline comments.Oct 16 2020, 1:18 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

In HIP, LDS pointer cannot be passed as an argument to a kernel.

yaxunl added a subscriber: tra.Oct 16 2020, 1:49 PM

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

In OpenCL it is legal.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

hliao added a comment.Oct 16 2020, 8:59 PM

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

I don't think that should be valid regard to the lifetime definition of shared variables, which have a maximal lifetime of the kernel execution. Instead, global CONSTANT and GLOBAL variables have longer lifetime than kernel execution. It's not valid to assign a pointer of LOCAL variables to a CONSTANT or GLOBAL in the static initializer (when no kernel execution is started). Within kernel/function body, CONSTANT cannot be assigned anymore. That's not a valid case.

hliao added a comment.EditedOct 16 2020, 9:00 PM

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

hliao added inline comments.Oct 16 2020, 9:03 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

But, OpenCL doesn't allow generic pointer as kernel function arguments. We won't have generic pointer arguments for OpenCL

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

hliao added a comment.Oct 17 2020, 9:25 AM

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

Unless we change how LDS is used in the future hardware, I don't see any change of the lifetime of LDS variables. Similar to PRIVATE variables, they are only valid once the kernel is executed. Any reference to them outside the kernel execution doesn't make any sense. All languages and models so far have the similar usage of LDS. Any reference to LDS variables outside kernel execution needs persistent LDS support. That should be a significant change that requires more component support, especially the hardware.

t-tye added a comment.Oct 17 2020, 9:59 AM

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

rampitec added a comment.EditedOct 17 2020, 1:09 PM

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

rampitec accepted this revision.Oct 17 2020, 1:38 PM

LGTM, I think we have resolved my doubts.

This revision is now accepted and ready to land.Oct 17 2020, 1:38 PM
t-tye added a comment.Oct 17 2020, 2:51 PM

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

Theoretically one can forge a generic pointer which will point to a specific LDS location after it is allocated, but I believe taking a pointer to an unallocated memory is a UB by any language standards.

t-tye added a comment.Oct 17 2020, 4:00 PM

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

The LDS/SCRATCH is actually allocated on individual wave/group-of-wave creation which is even smaller granularity than dispatch. The language defines that even if a thread has a valid LDS/SCRATCH address for some other wave, it is UB to access it. So the host (or some other wave) can get a pointer to LDS/SCRATCH, and can pass it to another wave, but only the wave that "owns" the allocation can access it. What the language may say is that passing a generic into a kernel is not allowed if it points to LDS. I do not believe CUDA explicitly states this, but some compilers appear to implement this.

Also note that an LDS pointer passed into a kernel dispatch A cannot be a legal LDS for that dispatch since until the waves of that dispatch are created, they have no LDS. So if a generic pointer is passed in as a kernel argument then best it can be is the LDS for some other already created wave of another dispatch B. The waves of dispatch A cannot access that LDS pointer as then do not "own" it. The best they can do is pass it to the "owning" waves of dispatch B. That cannot be done via kernel arguments as the waves are already executing. So even if the language allows kernel arguments to be generic pointers pointing to LDS, it is safe for single-thread-AA to assume they cannot alias this waves LDS since such pointers are UB to be accessed by this wave.

Theoretically one can forge a generic pointer which will point to a specific LDS location after it is allocated, but I believe taking a pointer to an unallocated memory is a UB by any language standards.

The language models state that addresses must reference allocated objects (or one past the end) that are created, and it is UB to access any pointer that does not reference an allocated object. Forging or type punning (except for char in C) is UB.

This is fine for graphics.

arsenm added inline comments.Oct 19 2020, 10:31 AM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
108

The discussion so far has been about loads from kernel arguments, but this check is far weaker. Would it be possible to construct a relocation in non-kernel memory that refers to LDS?

This is fine for graphics.

Right, graphics shaders manage LDS such that it is used to pass arguments between certain shader kinds. But that is not the case for compute.

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114

It is not legal in OpenCL. In OpenCL you actually pass in a byte size that causes the driver to cause that much LDS to be allocated and the pointer to it passed to the kernel. So the host is not passing an LDS pointer, it is passing a dynamic size for and LDS allocation.

As mentioned above, passing a generic pointer to OpenCL also cannot pass a pointer to LDS that the kernel can access. LDS is allocated by waves as they are created, so it is not possible to pass in a pointer to something that does not yet exist.

t-tye added inline comments.Oct 19 2020, 11:54 PM
llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105

Is it true that we couldn't have a relocation for an LDS value?

Well it depends how they are defined. Graphics is using LDS relocations to stitch together pipeline parts to pass arguments through LDS.

Compute does not allow this. If LDS allocation is managed in a TLS like manner (which is not currently done) then maybe there is a place for LDS relocations. But it is not like a global memory relocation.

This revision was landed with ongoing or failed builds.Oct 20 2020, 7:06 AM
This revision was automatically updated to reflect the committed changes.