Changeset View
Standalone View
clang/test/CodeGenCUDA/lambda.cu
- This file was added.
// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \ | |||||
// RUN: -triple x86_64-linux-gnu \ | |||||
// RUN: | FileCheck -check-prefix=HOST %s | |||||
// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \ | |||||
// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \ | |||||
// RUN: | FileCheck -check-prefix=DEV %s | |||||
#include "Inputs/cuda.h" | |||||
// Device side kernel name. | |||||
// HOST: @[[KERN_CAPTURE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_capturevEUlvE_EvT_\00" | |||||
// HOST: @[[KERN_RESOLVE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_resolvevEUlvE_EvT_\00" | |||||
// Check functions emitted for test_capture in host compilation. | |||||
// Check lambda is not emitted in host compilation. | |||||
// HOST-LABEL: define void @_Z12test_capturev | |||||
// HOST: call void @_Z19test_capture_helperIZ12test_capturevEUlvE_EvT_ | |||||
// HOST-LABEL: define internal void @_Z19test_capture_helperIZ12test_capturevEUlvE_EvT_ | |||||
// HOST: call void @_Z16__device_stub__gIZ12test_capturevEUlvE_EvT_ | |||||
// HOST-NOT: define{{.*}}@_ZZ4mainENKUlvE_clEv | |||||
// Check functions emitted for test_resolve in host compilation. | |||||
// Check host version of template function 'overloaded' is emitted and called | |||||
// by the lambda function. | |||||
// HOST-LABEL: define void @_Z12test_resolvev | |||||
// HOST: call void @_Z19test_resolve_helperIZ12test_resolvevEUlvE_EvT_() | |||||
// HOST-LABEL: define internal void @_Z19test_resolve_helperIZ12test_resolvevEUlvE_EvT_ | |||||
tra: The test example may not be doing what it's seemingly supposed to be doing:
https://cppinsights. | |||||
It works. We need to think about this in device compilation. In device compilation, global variable is a device variable, the lambda is a device host function, therefore the lambda is accessing the real a, not the shadow. In the host compilation, the lambda is not really called, therefore it is not emitted. I will update the lit test with these checks. yaxunl: It works.
We need to think about this in device compilation. In device compilation, global… | |||||
Clang manages to see through to the initializer of a, but I'm not sure how much we can rely on this. Anyways, it's an issue orthogonal to this patch. My concern is that tests are often used as an example of things that are OK to do, and capturing lambdas are a pretty big foot-shooting gun when used with CUDA. It's very easy to do wrong thing without compiler complaining about them. I'm fairly confident that I can hide the initializer with sufficiently complicated code, force clang to access a via this and make everything fail at runtime. IMO, what we have here is a 'happens to work' situation. I do not want to call it 'guaranteed to work' without making sure that it always does. In order to demonstrate that lambda is host/device, you do not need it to be a capturing lambda. You can make it call an overloaded function with host and device variants and verify that the lambda works on host and device sides. tra: Clang manages to see through to the initializer of `a`, but I'm not sure how much we can rely… | |||||
I added one more test, where a lambda function calls a template function which is overloaded with a host version and a device version. The lambda is called in both host function and in kernel. Test shows correct version of template function are emitted in host and device compilation. I think it is not a surprise that the lambda function is able to resolve the host/device-ness of the callee correctly. We are doing resolution in a host device function and the two candidates are same-side vs wrong-side. yaxunl: I added one more test, where a lambda function calls a template function which is overloaded… | |||||
// HOST: call void @_Z16__device_stub__gIZ12test_resolvevEUlvE_EvT_ | |||||
// HOST: call void @_ZZ12test_resolvevENKUlvE_clEv | |||||
// HOST-LABEL: define internal void @_ZZ12test_resolvevENKUlvE_clEv | |||||
Typo rjmccall: Typo | |||||
// HOST: call i32 @_Z10overloadedIiET_v | |||||
// HOST-LABEL: define linkonce_odr i32 @_Z10overloadedIiET_v | |||||
// HOST: ret i32 2 | |||||
// Check kernel is registered with correct device side kernel name. | |||||
// HOST: @__hipRegisterFunction({{.*}}@[[KERN_CAPTURE]] | |||||
// HOST: @__hipRegisterFunction({{.*}}@[[KERN_RESOLVE]] | |||||
// DEV: @a = addrspace(1) externally_initialized global i32 0 | |||||
// Check functions emitted for test_capture in device compilation. | |||||
// Check lambda is emitted in device compilation and accessing device variable. | |||||
// DEV-LABEL: define amdgpu_kernel void @_Z1gIZ12test_capturevEUlvE_EvT_ | |||||
// DEV: call void @_ZZ12test_capturevENKUlvE_clEv | |||||
// DEV-LABEL: define internal void @_ZZ12test_capturevENKUlvE_clEv | |||||
// DEV: store i32 1, i32* addrspacecast (i32 addrspace(1)* @a to i32*) | |||||
// Check functions emitted for test_resolve in device compilation. | |||||
// Check device version of template function 'overloaded' is emitted and called | |||||
// by the lambda function. | |||||
// DEV-LABEL: define amdgpu_kernel void @_Z1gIZ12test_resolvevEUlvE_EvT_ | |||||
// DEV: call void @_ZZ12test_resolvevENKUlvE_clEv | |||||
// DEV-LABEL: define internal void @_ZZ12test_resolvevENKUlvE_clEv | |||||
// DEV: call i32 @_Z10overloadedIiET_v | |||||
There is a typo here, DEV-LABEL ashi1: There is a typo here, DEV-LABEL | |||||
fixed. thanks yaxunl: fixed. thanks | |||||
// DEV-LABEL: define linkonce_odr i32 @_Z10overloadedIiET_v | |||||
// DEV: ret i32 1 | |||||
__device__ int a; | |||||
template<class T> | |||||
__device__ T overloaded() { return 1; } | |||||
template<class T> | |||||
__host__ T overloaded() { return 2; } | |||||
template<class F> | |||||
__global__ void g(F f) { f(); } | |||||
template<class F> | |||||
void test_capture_helper(F f) { g<<<1,1>>>(f); } | |||||
template<class F> | |||||
void test_resolve_helper(F f) { g<<<1,1>>>(f); f(); } | |||||
// Test capture of device variable in lambda function. | |||||
void test_capture(void) { | |||||
test_capture_helper([](){ a = 1;}); | |||||
} | |||||
// Test resolving host/device function in lambda function. | |||||
// Callee should resolve to correct host/device function based on where | |||||
// the lambda function is called, not where it is defined. | |||||
void test_resolve(void) { | |||||
test_resolve_helper([](){ overloaded<int>();}); | |||||
We are allowing regular lambda to be used in the device functions. That should be explicitly marked by making that lambda __device__ or __host__ __device__. Even though we may not have static checks for capture so far, that should be easily extended with those attributes. hliao: We are allowing regular lambda to be used in the device functions. That should be explicitly… | |||||
} |
The test example may not be doing what it's seemingly supposed to be doing:
https://cppinsights.io/s/3a5c42ff
h() gets a temporary host-side object which keeps the reference to a and that reference will actually point to the host-side shadow of the actual device-side a. When you get to execute g it's this may not be very usable on device side and thus f.operator() will probably not work.
Alas, we currently have no diagnostics for that kind of error.
Change it to a non-capturing lambda, perhaps?