Page MenuHomePhabricator

[WIP][CUDA] Use shared MangleContext for CUDA and CXX CG
Needs ReviewPublic

Authored by psalz on Jul 1 2019, 8:36 AM.


NOTE: This is a work in progress and mainly intended to highlight the issue - i.e., I'm not certain the provided solution is appropriate.

Given this CUDA program

template<typename Lambda>
__global__ void run_this(Lambda lambda) {

template<typename T>
struct remove_reference {
    using type = T;

template<typename T>
struct remove_reference<T&> {
    using type = T;

template<typename T>
constexpr typename remove_reference<T>::type&& move(T&& t) {
    return static_cast<typename remove_reference<T>::type&&>(t);

int main() {
    auto foo = move([](){});
    run_this<<<1, 1, 1>>>([]() __device__ { printf("Hello World\n"); }); 
    return 0;

the assertion at the top of CGNVCUDARuntime::emitDeviceStub will fail. For release builds the effect is simply a cudaErrorInvalidDeviceFunction error at run time. The reason for this is that the mangled names of the device stub and the actual device side function differ: The stub is called _Z8run_thisIZ4mainE3$_1EvT_, while the device function is _Z8run_thisIZ4mainE3$_0EvT_. The difference comes down to the anonymous struct ID that is maintained and assigned by the ManglerContext. It appears that for the latter getAnonymousStructId is never called for the moved no-op lambda, resulting in an ID of 0 for the kernel.

My proposed solution would be to simply share the ManglerContext used by the CGNVCUDARuntime and CGCXXABI code generators. For this I've added a new ASTContext::getSharedMangleContext function that memoizes created manglers for the given target ABI. From looking at ManglerContext to me at least it doesn't look like that could cause any issues, but then again, I really don't know much about Clang's internals.

Of course an alternative solution could be to make sure that getAnonymousStructId is always called for both lambdas (and in the correct order), but again I don't really know why that is not happening in the first place.

Diff Detail

rC Clang

Event Timeline

psalz created this revision.Jul 1 2019, 8:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2019, 8:36 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I don't know of any problems that the shared mangle context would cause, though I'm not sure about using the shared_ptr. It seems to me that SOMEONE should own this, and the other should use a reference. IMO, the SharedMangleContexts should be a unique_ptr and CGCXXABI/CGCUDANV should have a reference to the proper implementation.

psalz updated this revision to Diff 207488.Jul 2 2019, 2:12 AM

Move ownership of shared MangleContexts to ASTContext, return references from getSharedMangleContext.

aheejin resigned from this revision.Jul 2 2019, 8:20 PM

Sorry, I don't think I know enough about this code to review this.

hliao resigned from this revision.Nov 1 2019, 11:13 AM