This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Provide address space conversion builtins.
ClosedPublic

Authored by tra on Oct 12 2021, 11:24 AM.

Details

Summary

CUDA-11 headers rely on these NVCC builtins.
Despite having __nv previx, those are *not* provided by libdevice.

Diff Detail

Event Timeline

tra created this revision.Oct 12 2021, 11:24 AM
tra requested review of this revision.Oct 12 2021, 11:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 11:24 AM
tra added a comment.Oct 12 2021, 11:36 AM

Tested by verifying that generated code (both PTX and SASS) matches that of NVCC: https://godbolt.org/z/rWYYx63bT

jdoerfert accepted this revision.Oct 12 2021, 12:27 PM

Not loving the magic constants here but I don't think we have a enum or similar right now.
I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).
Anyway, LGTM.

This revision is now accepted and ready to land.Oct 12 2021, 12:27 PM
tra added a comment.Oct 12 2021, 1:59 PM

Not loving the magic constants here but I don't think we have a enum or similar right now.

Yup.

I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).

I guess size_t was 'good enough' to accommodate all pointer sizes (though it should've been uintptr_t).

I think this chain of conversions gets quickly instcombined away even at -O1:
E.g: https://godbolt.org/z/4vd94cEsj

This revision was automatically updated to reflect the committed changes.

Not loving the magic constants here but I don't think we have a enum or similar right now.

Yup.

I also have to question the people that choose size_t here... we will end up with int2ptr(ptr2int(...)) IR everywhere if this is actually used (outside the asm uses in cuda).

I guess size_t was 'good enough' to accommodate all pointer sizes (though it should've been uintptr_t).

I think this chain of conversions gets quickly instcombined away even at -O1:
E.g: https://godbolt.org/z/4vd94cEsj

Except when it doesn't get instcombined away: https://godbolt.org/z/YE4EfEPde

tra added a comment.Oct 12 2021, 5:01 PM

Except when it doesn't get instcombined away: https://godbolt.org/z/YE4EfEPde

Well, it does get translated into sensible PTX, so, while not ideal, it's not too big of a deal.
Using an integer is a sensible approach to prevent accidental load/store using a wrong address space.

An alternative would be to make conversion functions return a pointer with specific AS attribute, but that's clang-specific and it would not work for something that needs to plug in into CUDA headers that were written for NVCC.

So, yeah. It could be better, but it's tolerable. At least we didn't have to resort to using inline asm. :-)