Page MenuHomePhabricator

AMDGPU: Add intrinsics for address space identification
ClosedPublic

Authored by arsenm on Aug 13 2019, 11:13 PM.

Details

Reviewers
rampitec
b-sumner
Summary

The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.

Diff Detail

Event Timeline

arsenm created this revision.Aug 13 2019, 11:13 PM

Looks fine to me. Thanks!

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

That would probably be better.

Looks fine to me.

This revision is now accepted and ready to land.Aug 14 2019, 11:03 AM

Do we really need these to be "amdgpu" specific?

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

Do we really need these to be "amdgpu" specific?

I don't know what a generic intrinsic that does this would look like. This is pretty specific to these two cases, and in general I don't think pointer address spaces can be identified

Do we really need these to be "amdgpu" specific?

I don't know what a generic intrinsic that does this would look like. This is pretty specific to these two cases, and in general I don't think pointer address spaces can be identified

I would have thought it could be pessimistically resolved in the IR or precise at runtime if the hardware supports that.
E.g.,

if (__builtin_is_private(ptr)) { ... } else { ... }

and we can fold

%cast = addrspacecast i8 addrspace(1)* %gptr to i8*
%is.private = call i1 @llvm.is.private(i8* %cast)

to %is.private = true

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

I'm not sure what you mean by data movement. InferAddressSpaces handles this in this patch when the source is inferred. I don't think there are any other passes that would ever need to care about this.

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

I'm not sure what you mean by data movement. InferAddressSpaces handles this in this patch when the source is inferred. I don't think there are any other passes that would ever need to care about this.

If I want to localize data in a loop inside a, let's say, OpenMP target offloading region, I could check if the pointer I have is already shared/private and if not move the data to a shared/private allocation. Maybe I misunderstand what this intrinsic is supposed to do though.

If you want to go ahead with this, we can add the generic stuff later.

arsenm closed this revision.Wed, Sep 4, 7:20 PM

r371009