nvcc allows using std::malloc and std::free in device code.
When std::malloc or std::free is passed as a template
function argument with template argument deduction,
there is no diagnostics. e.g.
#include <memory>
__global__ void kern() {
void *p = std::malloc(1);
std::free(p);
}
int main()
{
std::shared_ptr<float> a;
a = std::shared_ptr<float>(
(float*)std::malloc(sizeof(float) * 100),
std::free
);
return 0;
}However, the same code fails to compile with clang
(https://godbolt.org/z/1roGvo6YY). The reason is
that clang does not have logic to choose a function
argument from an overloaded set of candidates
based on host/device attributes for template argument
deduction.
Currently, clang does have a logic to choose a candidate
based on the constraints of the candidates. This patch
extends that logic to account for the CUDA host/device-based
preference.
Maybe CheckCUDAPreference should return -1/0/1 or an enum. std::optional does not seem to be very readable here.
E.g. if(MorePreferableByCUDA) sounds like it's going to be satisfied when FD is a better choice than Result, but it's not the case.
I think this would be easier to follow:
if (CheckCUDAPreference(FD, Result) <= 0) // or `!= CP_BETTER` continue;