This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
5/12
AMDGPUAliasAnalysis.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
amdgpu-alias-analysis.ll

Differential D89525

[amdgpu] Enhance AMDGPU AA.
ClosedPublic

Authored by hliao on Oct 15 2020, 7:58 PM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm
yaxunl
nhaehnle

Commits

rG2a0e4d1c01c9: [amdgpu] Enhance AMDGPU AA.

Summary

In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects.
- When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space.
- When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hliao created this revision.Oct 15 2020, 7:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2020, 7:58 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 5 others. · View Herald Transcript

hliao requested review of this revision.Oct 15 2020, 7:58 PM

Herald added a subscriber: wdng. · View Herald TranscriptOct 15 2020, 7:58 PM

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

arsenm added inline comments.Oct 15 2020, 8:01 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105	Is it true that we couldn't have a relocation for an LDS value?
106	Typo CONSATNT

Harbormaster completed remote builds in B75260: Diff 298539.Oct 15 2020, 8:30 PM

Fix typos and revise the coding style following clang-tidy.

hliao marked an inline comment as done.Oct 16 2020, 5:41 AM

hliao added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105	Do you mean GOT-like relocation for LDS? That's possible if we don't want to clone sub-function per kernel-function. But, as GOT is created in the backend, it needs a PseudoValue for IR instead of a real Value. We could easily tell them from the load from constant memory in user code.

Harbormaster completed remote builds in B75297: Diff 298602.Oct 16 2020, 6:07 AM

In D89525#2333924, @hliao wrote:

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

FYI I've done some Vulkan CTS testing using LLPC with this patch and didn't notice any problems.

rampitec added inline comments.Oct 16 2020, 11:30 AM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	What is we pass a pointer to LDS allocated on the host side?

hliao added inline comments.Oct 16 2020, 12:41 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	No LDS pointer is passed through constant variables in user code. So far, LDS is allocated and assigned during codegen. Late, as LDS could be assigned per kernel and from 0, we could choose clone sub-function (called by more than one kernel functions) so that assignment could be done within linker, or add GOT like relocation if sub-function is not cloned. The late needs to prepare GOT entries on the host side but that load from GOT could be easily distinguished in the backend as it needs to use PseuodSource in MMO.

In D89525#2334916, @foad wrote:

In D89525#2333924, @hliao wrote:

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

FYI I've done some Vulkan CTS testing using LLPC with this patch and didn't notice any problems.

thanks

rampitec added inline comments.Oct 16 2020, 12:48 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	But that's not constant, it is just Argument?

hliao added inline comments.Oct 16 2020, 1:18 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	In HIP, LDS pointer cannot be passed as an argument to a kernel.

In D89525#2333924, @hliao wrote:

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	In OpenCL it is legal.

In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

In D89525#2336002, @yaxunl wrote:
In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

In D89525#2335949, @yaxunl wrote:
In D89525#2333924, @hliao wrote:

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

I don't think that should be valid regard to the lifetime definition of shared variables, which have a maximal lifetime of the kernel execution. Instead, global CONSTANT and GLOBAL variables have longer lifetime than kernel execution. It's not valid to assign a pointer of LOCAL variables to a CONSTANT or GLOBAL in the static initializer (when no kernel execution is started). Within kernel/function body, CONSTANT cannot be assigned anymore. That's not a valid case.

In D89525#2336008, @rampitec wrote:
In D89525#2336002, @yaxunl wrote:
In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.
But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

hliao added inline comments.Oct 16 2020, 9:03 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	But, OpenCL doesn't allow generic pointer as kernel function arguments. We won't have generic pointer arguments for OpenCL

In D89525#2336479, @hliao wrote:
In D89525#2336008, @rampitec wrote:
In D89525#2336002, @yaxunl wrote:
In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.
But not in OpenCL.
For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

In D89525#2336485, @rampitec wrote:
In D89525#2336479, @hliao wrote:
In D89525#2336008, @rampitec wrote:
In D89525#2336002, @yaxunl wrote:
In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.
But not in OpenCL.
For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.
OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

Unless we change how LDS is used in the future hardware, I don't see any change of the lifetime of LDS variables. Similar to PRIVATE variables, they are only valid once the kernel is executed. Any reference to them outside the kernel execution doesn't make any sense. All languages and models so far have the similar usage of LDS. Any reference to LDS variables outside kernel execution needs persistent LDS support. That should be a significant change that requires more component support, especially the hardware.

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

In D89525#2336479, @hliao wrote:
In D89525#2336008, @rampitec wrote:
In D89525#2336002, @yaxunl wrote:
In D89525#2335991, @rampitec wrote:
In D89525#2335989, @yaxunl wrote:
In D89525#2335972, @rampitec wrote:
In D89525#2335949, @yaxunl wrote:
I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:
__shared__ int a;

__constant__ int *b = &a;
Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice
But you are not checking for a constant pointer here!
In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.
Where do you check for this specifically in this block:
} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.
But not in OpenCL.
For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

In D89525#2336864, @t-tye wrote:

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

LGTM, I think we have resolved my doubts.

This revision is now accepted and ready to land.Oct 17 2020, 1:38 PM

In D89525#2336864, @t-tye wrote:

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

In D89525#2337058, @t-tye wrote:

In D89525#2336864, @t-tye wrote:

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

Theoretically one can forge a generic pointer which will point to a specific LDS location after it is allocated, but I believe taking a pointer to an unallocated memory is a UB by any language standards.

In D89525#2337059, @rampitec wrote:

In D89525#2337058, @t-tye wrote:

In D89525#2336864, @t-tye wrote:

LDS and SCRATCH both behave more like TLS. The allocations come into existence when when a thread (or group of threads) get created, and the lifetime ends when those thread(s) terminate. It is UB to reference that memory outside that lifetime. Furthermore, it is UB to dereference the address of LDS and SCRATCH in any thread other than the one that created the address. These rules are defined by the languages although not well explained.

Passing an LDS or SCRATCH address between threads is meaningful provided only the thread(s) that "own" the address dereference it. So storing the address in a global "place" to be read later by an "owning" thread is meaningful. However, some languages may restrict what they allow. So passing as a kernel argument in CUDA appears to not be allowed even though it is meaningful provided the above restricts are met. In OpenCL, there are special rules for passing LDS/Local to a kernel. In OpenCL you actually pass in a byte size, and the kernel dispatch allocates dynamic LDS automatically and passes the address of that to the created thread(s). CUDA has a different syntax for dynamic LDS/Local that is more like TLS.

So how is TLS handled? It seems a TLS address cannot be compile/link time value since it is a runtime concept. So using relocations to initialize global memory program scope variables seems invalid. Initializing a pointer object that is allocated in LDS/SCRATCH to be the address of another LDS/SCRATCH allocated in the same "owning" thread is meaningful and could be implemented using relocations. However, I suspect the languages do not allow this. I am unclear if TLS allows this either.

So you are saying that is always OK to assume no aliasing between a flat pointer which is a kernel argument and a pointer to LDS? OK, thanks!

No I am not quite saying that as some languages are not clears. Having said that, some compiler implementations are assuming that for some languages. Basically the rule is language specific, so AA would need to ask the language if it is permissible to assume that or not. Also bear in mind the OpenCL case for LDS where the kernel argument is not really being passed in from externally, but created independently for each thread/group-of-threads.

Generic pointers are another issue. They are pointers that may point to multiple address spaces. But the rules of dereferncing when they reference the non-global address space are the same. There can be rules that allow a generic pointers to be known to only point to one address space, in which case they can be treated the same as if they were a pointer to that address space. At the hardware level, FLAT instructions can be used to implement language generic pointers. But FLAT instructions can also be used when the address space is fixed, in which case the semantics are the same as the single address space case.

Unlike OpenCL, the CUDA language does not have the address space of pointers as part of the type system. But it still allows allocation of objects to specific address spaces. For CUDA all addressing is conceptually generic, but the allocation address space can be propagated to know the fixed address space of the FLAT operations.

To me deciding point here was that LDS is not actually allocated on host, but instead requested to be allocated at dispatch. If so then host cannot get an actual pointer to it and thus cannot convert it to a generic pointer and pass to a kernel.

The LDS/SCRATCH is actually allocated on individual wave/group-of-wave creation which is even smaller granularity than dispatch. The language defines that even if a thread has a valid LDS/SCRATCH address for some other wave, it is UB to access it. So the host (or some other wave) can get a pointer to LDS/SCRATCH, and can pass it to another wave, but only the wave that "owns" the allocation can access it. What the language may say is that passing a generic into a kernel is not allowed if it points to LDS. I do not believe CUDA explicitly states this, but some compilers appear to implement this.

Also note that an LDS pointer passed into a kernel dispatch A cannot be a legal LDS for that dispatch since until the waves of that dispatch are created, they have no LDS. So if a generic pointer is passed in as a kernel argument then best it can be is the LDS for some other already created wave of another dispatch B. The waves of dispatch A cannot access that LDS pointer as then do not "own" it. The best they can do is pass it to the "owning" waves of dispatch B. That cannot be done via kernel arguments as the waves are already executing. So even if the language allows kernel arguments to be generic pointers pointing to LDS, it is safe for single-thread-AA to assume they cannot alias this waves LDS since such pointers are UB to be accessed by this wave.

Theoretically one can forge a generic pointer which will point to a specific LDS location after it is allocated, but I believe taking a pointer to an unallocated memory is a UB by any language standards.

The language models state that addresses must reference allocated objects (or one past the end) that are created, and it is UB to access any pointer that does not reference an allocated object. Forging or type punning (except for char in C) is UB.

This is fine for graphics.

arsenm added inline comments.Oct 19 2020, 10:31 AM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
108	The discussion so far has been about loads from kernel arguments, but this check is far weaker. Would it be possible to construct a relocation in non-kernel memory that refers to LDS?

In D89525#2339241, @nhaehnle wrote:

This is fine for graphics.

Right, graphics shaders manage LDS such that it is used to pass arguments between certain shader kinds. But that is not the case for compute.

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
114	It is not legal in OpenCL. In OpenCL you actually pass in a byte size that causes the driver to cause that much LDS to be allocated and the pointer to it passed to the kernel. So the host is not passing an LDS pointer, it is passing a dynamic size for and LDS allocation. As mentioned above, passing a generic pointer to OpenCL also cannot pass a pointer to LDS that the kernel can access. LDS is allocated by waves as they are created, so it is not possible to pass in a pointer to something that does not yet exist.

t-tye added inline comments.Oct 19 2020, 11:54 PM

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
104–105	Is it true that we couldn't have a relocation for an LDS value? Well it depends how they are defined. Graphics is using LDS relocations to stitch together pipeline parts to pass arguments through LDS. Compute does not allow this. If LDS allocation is managed in a TLS like manner (which is not currently done) then maybe there is a place for LDS relocations. But it is not like a global memory relocation.

This revision was landed with ongoing or failed builds.Oct 20 2020, 7:06 AM

Closed by commit rG2a0e4d1c01c9: [amdgpu] Enhance AMDGPU AA. (authored by hliao). · Explain Why

This revision was automatically updated to reflect the committed changes.

hliao added a commit: rG2a0e4d1c01c9: [amdgpu] Enhance AMDGPU AA..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUAliasAnalysis.cpp

38 lines

test/

CodeGen/

AMDGPU/

amdgpu-alias-analysis.ll

62 lines

Diff 299353

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	AliasResult AMDGPUAAResult::alias(const MemoryLocation &LocA,
AAQueryInfo &AAQI) {		AAQueryInfo &AAQI) {
unsigned asA = LocA.Ptr->getType()->getPointerAddressSpace();		unsigned asA = LocA.Ptr->getType()->getPointerAddressSpace();
unsigned asB = LocB.Ptr->getType()->getPointerAddressSpace();		unsigned asB = LocB.Ptr->getType()->getPointerAddressSpace();

AliasResult Result = getAliasResult(asA, asB);		AliasResult Result = getAliasResult(asA, asB);
if (Result == NoAlias)		if (Result == NoAlias)
return Result;		return Result;

		// In general, FLAT (generic) pointers could be aliased to LOCAL or PRIVATE
		// pointers. However, as LOCAL or PRIVATE pointers point to local objects, in
		// certain cases, it's still viable to check whether a FLAT pointer won't
		// alias to a LOCAL or PRIVATE pointer.
		MemoryLocation A = LocA;
		MemoryLocation B = LocB;
		// Canonicalize the location order to simplify the following alias check.
		if (asA != AMDGPUAS::FLAT_ADDRESS) {
		std::swap(asA, asB);
		std::swap(A, B);
		}
		if (asA == AMDGPUAS::FLAT_ADDRESS &&
		(asB == AMDGPUAS::LOCAL_ADDRESS \|\| asB == AMDGPUAS::PRIVATE_ADDRESS)) {
		const auto *ObjA =
		getUnderlyingObject(A.Ptr->stripPointerCastsAndInvariantGroups());
		if (const LoadInst *LI = dyn_cast<LoadInst>(ObjA)) {
		// If a generic pointer is loaded from the constant address space, it
		// could only be a GLOBAL or CONSTANT one as that address space is soley
		arsenmUnsubmitted Not Done Reply Inline Actions Is it true that we couldn't have a relocation for an LDS value? arsenm: Is it true that we couldn't have a relocation for an LDS value?
		hliaoAuthorUnsubmitted Done Reply Inline Actions Do you mean GOT-like relocation for LDS? That's possible if we don't want to clone sub-function per kernel-function. But, as GOT is created in the backend, it needs a PseudoValue for IR instead of a real Value. We could easily tell them from the load from constant memory in user code. hliao: Do you mean GOT-like relocation for LDS? That's possible if we don't want to clone sub-function…
		t-tyeUnsubmitted Not Done Reply Inline Actions Is it true that we couldn't have a relocation for an LDS value? Well it depends how they are defined. Graphics is using LDS relocations to stitch together pipeline parts to pass arguments through LDS. Compute does not allow this. If LDS allocation is managed in a TLS like manner (which is not currently done) then maybe there is a place for LDS relocations. But it is not like a global memory relocation. t-tye: > Is it true that we couldn't have a relocation for an LDS value? Well it depends how they are…
		// prepared on the host side, where only GLOBAL or CONSTANT variables are
		arsenmUnsubmitted Done Reply Inline Actions Typo CONSATNT arsenm: Typo CONSATNT
		// visible. Note that this even holds for regular functions.
		if (LI->getPointerAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS)
		arsenmUnsubmitted Not Done Reply Inline Actions The discussion so far has been about loads from kernel arguments, but this check is far weaker. Would it be possible to construct a relocation in non-kernel memory that refers to LDS? arsenm: The discussion so far has been about loads from kernel arguments, but this check is far weaker.
		return NoAlias;
		} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
		const Function *F = Arg->getParent();
		switch (F->getCallingConv()) {
		case CallingConv::AMDGPU_KERNEL:
		// In the kernel function, kernel arguments won't alias to (local)
		rampitecUnsubmitted Not Done Reply Inline Actions What is we pass a pointer to LDS allocated on the host side? rampitec: What is we pass a pointer to LDS allocated on the host side?
		hliaoAuthorUnsubmitted Done Reply Inline Actions No LDS pointer is passed through constant variables in user code. So far, LDS is allocated and assigned during codegen. Late, as LDS could be assigned per kernel and from 0, we could choose clone sub-function (called by more than one kernel functions) so that assignment could be done within linker, or add GOT like relocation if sub-function is not cloned. The late needs to prepare GOT entries on the host side but that load from GOT could be easily distinguished in the backend as it needs to use PseuodSource in MMO. hliao: No LDS pointer is passed through constant variables in user code. So far, LDS is allocated and…
		rampitecUnsubmitted Not Done Reply Inline Actions But that's not constant, it is just Argument? rampitec: But that's not constant, it is just Argument?
		hliaoAuthorUnsubmitted Done Reply Inline Actions In HIP, LDS pointer cannot be passed as an argument to a kernel. hliao: In HIP, LDS pointer cannot be passed as an argument to a kernel.
		rampitecUnsubmitted Not Done Reply Inline Actions In OpenCL it is legal. rampitec: In OpenCL it is legal.
		hliaoAuthorUnsubmitted Done Reply Inline Actions But, OpenCL doesn't allow generic pointer as kernel function arguments. We won't have generic pointer arguments for OpenCL hliao: But, OpenCL doesn't allow generic pointer as kernel function arguments. We won't have generic…
		t-tyeUnsubmitted Not Done Reply Inline Actions It is not legal in OpenCL. In OpenCL you actually pass in a byte size that causes the driver to cause that much LDS to be allocated and the pointer to it passed to the kernel. So the host is not passing an LDS pointer, it is passing a dynamic size for and LDS allocation. As mentioned above, passing a generic pointer to OpenCL also cannot pass a pointer to LDS that the kernel can access. LDS is allocated by waves as they are created, so it is not possible to pass in a pointer to something that does not yet exist. t-tye: It is not legal in OpenCL. In OpenCL you actually pass in a byte size that causes the driver to…
		// variables in shared or private address space.
		return NoAlias;
		default:
		// TODO: In the regular function, if that local variable in the
		// location B is not captured, that argument pointer won't alias to it
		// as well.
		break;
		}
		}
		}

// Forward the query to the next alias analysis.		// Forward the query to the next alias analysis.
return AAResultBase::alias(LocA, LocB, AAQI);		return AAResultBase::alias(LocA, LocB, AAQI);
}		}

bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,		bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,
AAQueryInfo &AAQI, bool OrLocal) {		AAQueryInfo &AAQI, bool OrLocal) {
unsigned AS = Loc.Ptr->getType()->getPointerAddressSpace();		unsigned AS = Loc.Ptr->getType()->getPointerAddressSpace();
if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|		if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll

	; RUN: opt -mtriple=amdgcn-- -aa-eval -amdgpu-aa -amdgpu-aa-wrapper -disable-basic-aa -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -mtriple=amdgcn-- -data-layout=A5 -aa-eval -amdgpu-aa -amdgpu-aa-wrapper -disable-basic-aa -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s
	; RUN: opt -mtriple=r600-- -aa-eval -amdgpu-aa -amdgpu-aa-wrapper -disable-basic-aa -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -mtriple=r600-- -data-layout=A5 -aa-eval -amdgpu-aa -amdgpu-aa-wrapper -disable-basic-aa -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s

	; CHECK: NoAlias: i8 addrspace(1)* %p1, i8 addrspace(5)* %p			; CHECK: NoAlias: i8 addrspace(1)* %p1, i8 addrspace(5)* %p

	define void @test(i8 addrspace(5)* %p, i8 addrspace(1)* %p1) {			define void @test(i8 addrspace(5)* %p, i8 addrspace(1)* %p1) {
	ret void			ret void
	}			}

	; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(4)* %p			; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(4)* %p
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	define void @test_7_6(i8 addrspace(7)* %p, i8 addrspace(6)* %p1) {			define void @test_7_6(i8 addrspace(7)* %p, i8 addrspace(6)* %p1) {
	ret void			ret void
	}			}

	; CHECK: MayAlias: i8 addrspace(7)* %p, i8 addrspace(7)* %p1			; CHECK: MayAlias: i8 addrspace(7)* %p, i8 addrspace(7)* %p1
	define void @test_7_7(i8 addrspace(7)* %p, i8 addrspace(7)* %p1) {			define void @test_7_7(i8 addrspace(7)* %p, i8 addrspace(7)* %p1) {
	ret void			ret void
	}			}

				@cst = internal addrspace(4) global i8* undef, align 4

				; CHECK-LABEL: Function: test_8_0
				; CHECK: NoAlias: i8 addrspace(3)* %p, i8* %p1
				; CHECK: NoAlias: i8 addrspace(3)* %p, i8* addrspace(4)* @cst
				; CHECK: MayAlias: i8* %p1, i8* addrspace(4)* @cst
				define void @test_8_0(i8 addrspace(3)* %p) {
				%p1 = load i8, i8 addrspace(4)* @cst
				ret void
				}

				; CHECK-LABEL: Function: test_8_1
				; CHECK: NoAlias: i8 addrspace(5)* %p, i8* %p1
				; CHECK: NoAlias: i8 addrspace(5)* %p, i8* addrspace(4)* @cst
				; CHECK: MayAlias: i8* %p1, i8* addrspace(4)* @cst
				define void @test_8_1(i8 addrspace(5)* %p) {
				%p1 = load i8, i8 addrspace(4)* @cst
				ret void
				}

				; CHECK-LABEL: Function: test_8_2
				; CHECK: NoAlias: i8 addrspace(5)* %p1, i8* %p
				define amdgpu_kernel void @test_8_2(i8* %p) {
				%p1 = alloca i8, align 1, addrspace(5)
				ret void
				}

				; CHECK-LABEL: Function: test_8_3
				; CHECK: MayAlias: i8 addrspace(5)* %p1, i8* %p
				; TODO: So far, %p1 may still alias to %p. As it's not captured at all, it
				; should be NoAlias.
				define void @test_8_3(i8* %p) {
				%p1 = alloca i8, align 1, addrspace(5)
				ret void
				}

				@shm = internal addrspace(3) global i8 undef, align 4

				; CHECK-LABEL: Function: test_8_4
				; CHECK: NoAlias: i8 addrspace(3)* %p1, i8* %p
				; CHECK: NoAlias: i8 addrspace(3)* @shm, i8* %p
				; CHECK: MayAlias: i8 addrspace(3)* %p1, i8 addrspace(3)* @shm
				define amdgpu_kernel void @test_8_4(i8* %p) {
				%p1 = getelementptr i8, i8 addrspace(3)* @shm, i32 0
				ret void
				}

				; CHECK-LABEL: Function: test_8_5
				; CHECK: MayAlias: i8 addrspace(3)* %p1, i8* %p
				; CHECK: MayAlias: i8 addrspace(3)* @shm, i8* %p
				; CHECK: MayAlias: i8 addrspace(3)* %p1, i8 addrspace(3)* @shm
				; TODO: So far, %p1 may still alias to %p. As it's not captured at all, it
				; should be NoAlias.
				define void @test_8_5(i8* %p) {
				%p1 = getelementptr i8, i8 addrspace(3)* @shm, i32 0
				ret void
				}