Implicit functions are treated as if they were host device and clang does not allow overloading
those with host or device variants.
In order for users to provide their own standard allocators, we must create host and device variants of these declarations during CUDA compilation.
Oh, I like this way better than a bool arg.