This is an archive of the discontinued LLVM Phabricator instance.

Not loving the magic constants here but I don't think we have a enum or similar right now.
I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).
Anyway, LGTM.

This revision is now accepted and ready to land.Oct 12 2021, 12:27 PM

Harbormaster completed remote builds in B128426: Diff 379120.Oct 12 2021, 12:49 PM

In D111665#3059427, @jdoerfert wrote:

Not loving the magic constants here but I don't think we have a enum or similar right now.

Yup.

I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).

I guess size_t was 'good enough' to accommodate all pointer sizes (though it should've been uintptr_t).

I think this chain of conversions gets quickly instcombined away even at -O1:
E.g: https://godbolt.org/z/4vd94cEsj

Closed by commit rGf526ee5b8517: [CUDA] Provide address space conversion builtins. (authored by tra). · Explain WhyOct 12 2021, 2:57 PM

This revision was automatically updated to reflect the committed changes.

tra added a commit: rGf526ee5b8517: [CUDA] Provide address space conversion builtins..

In D111665#3059690, @tra wrote:

In D111665#3059427, @jdoerfert wrote:

Not loving the magic constants here but I don't think we have a enum or similar right now.

Yup.

I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).

I guess size_t was 'good enough' to accommodate all pointer sizes (though it should've been uintptr_t).

I think this chain of conversions gets quickly instcombined away even at -O1:
E.g: https://godbolt.org/z/4vd94cEsj

Except when it doesn't get instcombined away: https://godbolt.org/z/YE4EfEPde

In D111665#3059989, @jdoerfert wrote:

Except when it doesn't get instcombined away: https://godbolt.org/z/YE4EfEPde

Well, it does get translated into sensible PTX, so, while not ideal, it's not too big of a deal.
Using an integer is a sensible approach to prevent accidental load/store using a wrong address space.

An alternative would be to make conversion functions return a pointer with specific AS attribute, but that's clang-specific and it would not work for something that needs to plug in into CUDA headers that were written for NVCC.

So, yeah. It could be better, but it's tolerable. At least we didn't have to resort to using inline asm. :-)

Revision Contents

Path

Size

clang/

lib/

Headers/

__clang_cuda_intrinsics.h

32 lines

Diff 379120

clang/lib/Headers/__clang_cuda_intrinsics.h

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	inline __device__ unsigned __funnelshift_rc(unsigned low32, unsigned high32,
asm("shf.r.clamp.b32 %0, %1, %2, %3;"		asm("shf.r.clamp.b32 %0, %1, %2, %3;"
: "=r"(ret)		: "=r"(ret)
: "r"(low32), "r"(high32), "r"(shiftWidth));		: "r"(low32), "r"(high32), "r"(shiftWidth));
return ret;		return ret;
}		}

#endif // !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 320		#endif // !defined(__CUDA_ARCH__) \|\| __CUDA_ARCH__ >= 320

		#if CUDA_VERSION >= 11000
		extern "C" {
		__device__ inline size_t __nv_cvta_generic_to_global_impl(const void *__ptr) {
		return (size_t)(void __attribute__((address_space(1))) *)__ptr;
		}
		__device__ inline size_t __nv_cvta_generic_to_shared_impl(const void *__ptr) {
		return (size_t)(void __attribute__((address_space(3))) *)__ptr;
		}
		__device__ inline size_t __nv_cvta_generic_to_constant_impl(const void *__ptr) {
		return (size_t)(void __attribute__((address_space(4))) *)__ptr;
		}
		__device__ inline size_t __nv_cvta_generic_to_local_impl(const void *__ptr) {
		return (size_t)(void __attribute__((address_space(5))) *)__ptr;
		}
		__device__ inline void *__nv_cvta_global_to_generic_impl(size_t __ptr) {
		return (void )(void __attribute__((address_space(1))) )__ptr;
		}
		__device__ inline void *__nv_cvta_shared_to_generic_impl(size_t __ptr) {
		return (void )(void __attribute__((address_space(3))) )__ptr;
		}
		__device__ inline void *__nv_cvta_constant_to_generic_impl(size_t __ptr) {
		return (void )(void __attribute__((address_space(4))) )__ptr;
		}
		__device__ inline void *__nv_cvta_local_to_generic_impl(size_t __ptr) {
		return (void )(void __attribute__((address_space(5))) )__ptr;
		}
		__device__ inline uint32_t __nvvm_get_smem_pointer(void *__ptr) {
		return __nv_cvta_generic_to_shared_impl(__ptr);
		}
		} // extern "C"
		#endif // CUDA_VERSION >= 11000

#endif // defined(__CLANG_CUDA_INTRINSICS_H__)		#endif // defined(__CLANG_CUDA_INTRINSICS_H__)

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Provide address space conversion builtins.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 379120

clang/lib/Headers/__clang_cuda_intrinsics.h

[CUDA] Provide address space conversion builtins.
ClosedPublic