This is an archive of the discontinued LLVM Phabricator instance.

Differential D90311

[CUDA][HIP] Fix linkage for -fgpu-rdc
ClosedPublic

Authored by yaxunl on Oct 28 2020, 7:58 AM.

Download Raw Diff

Details

Reviewers

tra

Commits

rGabd8cd9199d1: [CUDA][HIP] Fix linkage for -fgpu-rdc

Summary

Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.

This is fine for -fno-gpu-rdc since there is only one TU.

However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.

To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.

Diff Detail

Event Timeline

yaxunl requested review of this revision.Oct 28 2020, 7:58 AM

yaxunl created this revision.

ashi1 added a subscriber: ashi1.Oct 28 2020, 8:09 AM

LGTM.

This revision is now accepted and ready to land.Nov 2 2020, 2:25 PM

Closed by commit rGabd8cd9199d1: [CUDA][HIP] Fix linkage for -fgpu-rdc (authored by yaxunl). · Explain WhyNov 3 2020, 5:08 AM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGabd8cd9199d1: [CUDA][HIP] Fix linkage for -fgpu-rdc.

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 5:08 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CodeGenModule.cpp

11 lines

test/

CodeGenCUDA/

device-fun-linkage.cu

19 lines

Diff 301285

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 4,383 Lines • ▼ Show 20 Lines	if (Linkage == GVA_DiscardableODR)
return !Context.getLangOpts().AppleKext ? llvm::Function::LinkOnceODRLinkage		return !Context.getLangOpts().AppleKext ? llvm::Function::LinkOnceODRLinkage
: llvm::Function::InternalLinkage;		: llvm::Function::InternalLinkage;

// An explicit instantiation of a template has weak linkage, since		// An explicit instantiation of a template has weak linkage, since
// explicit instantiations can occur in multiple translation units		// explicit instantiations can occur in multiple translation units
// and must all be equivalent. However, we are not allowed to		// and must all be equivalent. However, we are not allowed to
// throw away these explicit instantiations.		// throw away these explicit instantiations.
//		//
// We don't currently support CUDA device code spread out across multiple TUs,		// CUDA/HIP: For -fno-gpu-rdc case, device code is limited to one TU,
// so say that CUDA templates are either external (for kernels) or internal.		// so say that CUDA templates are either external (for kernels) or internal.
// This lets llvm perform aggressive inter-procedural optimizations.		// This lets llvm perform aggressive inter-procedural optimizations. For
		// -fgpu-rdc case, device function calls across multiple TU's are allowed,
		// therefore we need to follow the normal linkage paradigm.
if (Linkage == GVA_StrongODR) {		if (Linkage == GVA_StrongODR) {
if (Context.getLangOpts().AppleKext)		if (getLangOpts().AppleKext)
return llvm::Function::ExternalLinkage;		return llvm::Function::ExternalLinkage;
if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice)		if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
		!getLangOpts().GPURelocatableDeviceCode)
return D->hasAttr<CUDAGlobalAttr>() ? llvm::Function::ExternalLinkage		return D->hasAttr<CUDAGlobalAttr>() ? llvm::Function::ExternalLinkage
: llvm::Function::InternalLinkage;		: llvm::Function::InternalLinkage;
return llvm::Function::WeakODRLinkage;		return llvm::Function::WeakODRLinkage;
}		}

// C++ doesn't have tentative definitions and thus cannot have common		// C++ doesn't have tentative definitions and thus cannot have common
// linkage.		// linkage.
if (!getLangOpts().CPlusPlus && isa<VarDecl>(D) &&		if (!getLangOpts().CPlusPlus && isa<VarDecl>(D) &&
▲ Show 20 Lines • Show All 1,738 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/device-fun-linkage.cu

This file was added.

				// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
				// RUN: -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefix=NORDC %s
				// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
				// RUN: -fgpu-rdc -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefix=RDC %s

				#include "Inputs/cuda.h"

				// NORDC: define internal void @_Z4funcIiEvv()
				// NORDC: define void @_Z6kernelIiEvv()
				// RDC: define weak_odr void @_Z4funcIiEvv()
				// RDC: define weak_odr void @_Z6kernelIiEvv()

				template <typename T> __device__ void func() {}
				template <typename T> __global__ void kernel() {}

				template __device__ void func<int>();
				template __global__ void kernel<int>();