This is an archive of the discontinued LLVM Phabricator instance.

One issue with this patch is that demanglers will no longer be able to deal with the name. While they do know to ignore .stub suffix, they can't deal with __device_stub_ prefix.
E.g:

% c++filt __device_stub___Z10kernelfuncIiEvv
__device_stub___Z10kernelfuncIiEvv
% c++filt _Z10kernelfuncIiEvv.stub
void kernelfunc<int>() [clone .stub]

clang/lib/CodeGen/CGCUDANV.cpp
222–236	I'm not sure I understand what exactly this assertion checks. The condition appears to be true is host/device ABIs are different OR the name of the current function is the same as the (possibly mangled) device-side name + __device_stub_ prefix. While the first part makes sense, I'm not sure I understand the name comparison part. Could you tell me more and, maybe, add a comment explaining what's going on here.
798	I suspect `return "__device_stub__" + Name;` would do. StringRef will convert to std::string and copy elision should avoid unnecessary copy.

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

clang/lib/CodeGen/CGCUDANV.cpp
222–236	The second is to ensure, if, under the same ABI, kernel stub name derived from device-side name mangling should be the same the sub name generated from host-side, CGF.CurFn->getName() is the mangled named from host compilation

hliao marked an inline comment as done.Jun 14 2019, 10:08 AM

hliao added inline comments.

clang/lib/CodeGen/CGCUDANV.cpp
222–236	previous assertion expression gets the same goal, if ABI is different, the stub name from device-side should match the stub name from the host-side compilation. As we add a dedicated interface to the derive stub name, we could simplify the comparison to a single one. Also, we put the simple condition checking ahead (a common practice) to reduce the overhead of string comparison

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

clang/lib/CodeGen/CGCUDANV.cpp
222–236	This definitely needs a comment.

hliao marked an inline comment as done.Jun 14 2019, 11:28 AM

hliao added inline comments.

clang/lib/CodeGen/CGCUDANV.cpp
798	"devicestub__" + Name results in Twine, where not copy is generated. Only the final str() converts Twine into std::string involving copies. Otherwise, there's one copy from Name to std::string and another copy by std::string operator+, right?

In D63335#1543854, @tra wrote:

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

In D63335#1543854, @tra wrote:

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

Yeah, I understand that un-demangleable name causes lots of frustration. But, based on what I learned, CUDA generated the similar thing, e.g. __device_stub__Z15transformKernelPfiif is the stub function from cuda 10.1

In D63335#1543854, @tra wrote:

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and

$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)

In D63335#1544019, @hliao wrote:

In D63335#1543854, @tra wrote:

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

In D63335#1544026, @hliao wrote:
Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and
$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)

I don't think it's a good idea. While it demangles to something, it's not what the demangled name should be. Stub's signature should match that of the kernel.

In D63335#1544021, @hliao wrote:

Yeah, I understand that un-demangleable name causes lots of frustration. But, based on what I learned, CUDA generated the similar thing, e.g. __device_stub__Z15transformKernelPfiif is the stub function from cuda 10.1

NVCC does a lot of things differently. It does not mean it's a good reason for us to copy *all* of their choices.
Let's figure out the underlying reasons for this change and then we can figure out about what's the right thing to do here.

In D63335#1544311, @tra wrote:

In D63335#1544019, @hliao wrote:

In D63335#1543854, @tra wrote:

In D63335#1543845, @hliao wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

In D63335#1544315, @hliao wrote:

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

OK, so the real issue is that demangled name looks identical to debugger.
One way to deal with that is to , essentially, break mangling in compiler.
Another would be to teach debugger how to distinguish the stub from the kernel using additional information likely available to debugger (i.e. mangled name or the location of the symbol -- is it in the host binary or in the GPU binary).

I would argue that breaking mangling is not the best choice here.
I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

This revision now requires changes to proceed.Jun 14 2019, 2:28 PM

In D63335#1544320, @tra wrote:

In D63335#1544315, @hliao wrote:

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

OK, so the real issue is that demangled name looks identical to debugger.
One way to deal with that is to , essentially, break mangling in compiler.
Another would be to teach debugger how to distinguish the stub from the kernel using additional information likely available to debugger (i.e. mangled name or the location of the symbol -- is it in the host binary or in the GPU binary).

I would argue that breaking mangling is not the best choice here.
I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

Just revise the interface for device kernel stubbing.

Harbormaster completed remote builds in B33428: Diff 204856.Jun 14 2019, 2:52 PM

hliao retitled this revision from [HIP] Change kernel stub name again to [HIP] Add the interface deriving the stub name of device kernels..Jun 14 2019, 2:53 PM

hliao edited the summary of this revision. (Show Details)

In D63335#1544324, @hliao wrote:

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

:-( Sorry about that. I realize how frustrating that can be.

Perhaps it's worth trying once more. You can argue that this change will have trouble being upstreamed without a good technical explanation why it must be done in the compiler. Perhaps they do have compelling reasons why it's hard to do in the debugger, but without specific details from their end it appears indistinguishable from a (possibly misguided) quick fix. It may help if you could get the debugger folks to chime in directly on the review.

In D63335#1544428, @tra wrote:

In D63335#1544324, @hliao wrote:

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

:-( Sorry about that. I realize how frustrating that can be.

Perhaps it's worth trying once more. You can argue that this change will have trouble being upstreamed without a good technical explanation why it must be done in the compiler. Perhaps they do have compelling reasons why it's hard to do in the debugger, but without specific details from their end it appears indistinguishable from a (possibly misguided) quick fix. It may help if you could get the debugger folks to chime in directly on the review.

shall we review code refactoring first, so that that change could be just a single line change. Yes, I could post that later and drag in necessary stake holders.

LGTM. This is a cleaner way to provide stub name tweaks.

clang/lib/CodeGen/CGCUDANV.cpp
223	says .... -> (e.g. ....)
225	... should be `the same` ... or, perhaps, `identical`
clang/lib/CodeGen/CodeGenModule.cpp
1091	Adjust kernel stub mangling as we may need to be able to differentiate them from the kernel itself (e.g. for HIP).

This revision is now accepted and ready to land.Jun 14 2019, 3:18 PM

LGTM. Thanks.

Closed by commit rL363553: [HIP] Add the interface deriving the stub name of device kernels. (authored by hliao). · Explain WhyJun 17 2019, 5:48 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 5:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tra mentioned this in D68578: [HIP] Fix device stub name.Oct 7 2019, 9:12 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCUDANV.cpp

26 lines

CGCUDARuntime.h

5 lines

CodeGenModule.cpp

9 lines

Diff 204856

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,
unsigned Flags) override {		unsigned Flags) override {
DeviceVars.push_back({&Var, VD, Flags});		DeviceVars.push_back({&Var, VD, Flags});
}		}

/// Creates module constructor function		/// Creates module constructor function
llvm::Function *makeModuleCtorFunction() override;		llvm::Function *makeModuleCtorFunction() override;
/// Creates module destructor function		/// Creates module destructor function
llvm::Function *makeModuleDtorFunction() override;		llvm::Function *makeModuleDtorFunction() override;
		/// Construct and return the stub name of a kernel.
		std::string getDeviceStubName(llvm::StringRef Name) const override;
};		};

}		}

std::string CGNVCUDARuntime::addPrefixToName(StringRef FuncName) const {		std::string CGNVCUDARuntime::addPrefixToName(StringRef FuncName) const {
if (CGM.getLangOpts().HIP)		if (CGM.getLangOpts().HIP)
return ((Twine("hip") + Twine(FuncName)).str());		return ((Twine("hip") + Twine(FuncName)).str());
return ((Twine("cuda") + Twine(FuncName)).str());		return ((Twine("cuda") + Twine(FuncName)).str());
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (DeviceMC->shouldMangleDeclName(ND)) {
DeviceSideName = Out.str();		DeviceSideName = Out.str();
} else		} else
DeviceSideName = ND->getIdentifier()->getName();		DeviceSideName = ND->getIdentifier()->getName();
return DeviceSideName;		return DeviceSideName;
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
assert(getDeviceSideName(CGF.CurFuncDecl) == CGF.CurFn->getName() \|\|		// Ensure either we have different ABIs between host and device compilations,
getDeviceSideName(CGF.CurFuncDecl) + ".stub" == CGF.CurFn->getName() \|\|		// says host compilation following MSVC ABI but device compilation follows
		traUnsubmitted Not Done Reply Inline Actions says .... -> (e.g. ....) tra: says .... -> (e.g. ....)
CGF.CGM.getContext().getTargetInfo().getCXXABI() !=		// Itanium C++ ABI or, if they follow the same ABI, kernel names after
CGF.CGM.getContext().getAuxTargetInfo()->getCXXABI());		// mangling should be same after name stubbing. The later checking is very
		traUnsubmitted Not Done Reply Inline Actions ... should be `the same` ... or, perhaps, `identical` tra: ... should be `the same` ... or, perhaps, `identical`
		// important as the device kernel name being mangled in host-compilation is
		// used to resolve the device binaries to be executed. Inconsistent naming
		// result in undefined behavior. Even though we cannot check that naming
		// directly between host- and device-compilations, the host- and
		// device-mangling in host compilation could help catch certain ones.
		assert((CGF.CGM.getContext().getAuxTargetInfo() &&
		(CGF.CGM.getContext().getAuxTargetInfo()->getCXXABI() !=
		CGF.CGM.getContext().getTargetInfo().getCXXABI())) \|\|
		getDeviceStubName(getDeviceSideName(CGF.CurFuncDecl)) ==
		CGF.CurFn->getName());

		traUnsubmitted Not Done Reply Inline Actions I'm not sure I understand what exactly this assertion checks. The condition appears to be true is host/device ABIs are different OR the name of the current function is the same as the (possibly mangled) device-side name + __device_stub_ prefix. While the first part makes sense, I'm not sure I understand the name comparison part. Could you tell me more and, maybe, add a comment explaining what's going on here. tra: I'm not sure I understand what exactly this assertion checks. The condition appears to be true…
		hliaoAuthorUnsubmitted Done Reply Inline Actions The second is to ensure, if, under the same ABI, kernel stub name derived from device-side name mangling should be the same the sub name generated from host-side, CGF.CurFn->getName() is the mangled named from host compilation hliao: The second is to ensure, if, under the same ABI, kernel stub name derived from device-side name…
		traUnsubmitted Not Done Reply Inline Actions This definitely needs a comment. tra: This definitely needs a comment.
		hliaoAuthorUnsubmitted Done Reply Inline Actions previous assertion expression gets the same goal, if ABI is different, the stub name from device-side should match the stub name from the host-side compilation. As we add a dedicated interface to the derive stub name, we could simplify the comparison to a single one. Also, we put the simple condition checking ahead (a common practice) to reduce the overhead of string comparison hliao: previous assertion expression gets the same goal, if ABI is different, the stub name from…
EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH))		CudaFeature::CUDA_USES_NEW_LAUNCH))
emitDeviceStubBodyNew(CGF, Args);		emitDeviceStubBodyNew(CGF, Args);
else		else
emitDeviceStubBodyLegacy(CGF, Args);		emitDeviceStubBodyLegacy(CGF, Args);
}		}

▲ Show 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	if (CGM.getLangOpts().HIP) {
DtorBuilder.SetInsertPoint(ExitBlock);		DtorBuilder.SetInsertPoint(ExitBlock);
} else {		} else {
DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);		DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);
}		}
DtorBuilder.CreateRetVoid();		DtorBuilder.CreateRetVoid();
return ModuleDtorFunc;		return ModuleDtorFunc;
}		}

		std::string CGNVCUDARuntime::getDeviceStubName(llvm::StringRef Name) const {
		if (!CGM.getLangOpts().HIP)
		return Name;
		return std::move((Name + ".stub").str());
		traUnsubmitted Not Done Reply Inline Actions I suspect `return "__device_stub__" + Name;` would do. StringRef will convert to std::string and copy elision should avoid unnecessary copy. tra: I suspect `return "__device_stub__" + Name;` would do. StringRef will convert to std::string…
		hliaoAuthorUnsubmitted Done Reply Inline Actions "devicestub__" + Name results in Twine, where not copy is generated. Only the final str() converts Twine into std::string involving copies. Otherwise, there's one copy from Name to std::string and another copy by std::string operator+, right? hliao: "__device__stub__" + Name results in Twine, where not copy is generated. Only the final str()…
		}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

clang/lib/CodeGen/CGCUDARuntime.h

Show All 9 Lines
// subclasses of this implement code generation for specific CUDA		// subclasses of this implement code generation for specific CUDA
// runtime libraries.		// runtime libraries.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H		#ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
#define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H		#define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H

		#include "llvm/ADT/StringRef.h"

namespace llvm {		namespace llvm {
class Function;		class Function;
class GlobalVariable;		class GlobalVariable;
}		}

namespace clang {		namespace clang {

class CUDAKernelCallExpr;		class CUDAKernelCallExpr;
Show All 32 Lines	public:

/// Constructs and returns a module initialization function or nullptr if it's		/// Constructs and returns a module initialization function or nullptr if it's
/// not needed. Must be called after all kernels have been emitted.		/// not needed. Must be called after all kernels have been emitted.
virtual llvm::Function *makeModuleCtorFunction() = 0;		virtual llvm::Function *makeModuleCtorFunction() = 0;

/// Returns a module cleanup function or nullptr if it's not needed.		/// Returns a module cleanup function or nullptr if it's not needed.
/// Must be called after ModuleCtorFunction		/// Must be called after ModuleCtorFunction
virtual llvm::Function *makeModuleDtorFunction() = 0;		virtual llvm::Function *makeModuleDtorFunction() = 0;

		/// Construct and return the stub name of a kernel.
		virtual std::string getDeviceStubName(llvm::StringRef Name) const = 0;
};		};

/// Creates an instance of a CUDA runtime class.		/// Creates an instance of a CUDA runtime class.
CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);		CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);

}		}
}		}

#endif		#endif

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 1,082 Lines • ▼ Show 20 Lines	StringRef CodeGenModule::getMangledName(GlobalDecl GD) {
auto FoundName = MangledDeclNames.find(CanonicalGD);		auto FoundName = MangledDeclNames.find(CanonicalGD);
if (FoundName != MangledDeclNames.end())		if (FoundName != MangledDeclNames.end())
return FoundName->second;		return FoundName->second;

// Keep the first result in the case of a mangling collision.		// Keep the first result in the case of a mangling collision.
const auto *ND = cast<NamedDecl>(GD.getDecl());		const auto *ND = cast<NamedDecl>(GD.getDecl());
std::string MangledName = getMangledNameImpl(*this, GD, ND);		std::string MangledName = getMangledNameImpl(*this, GD, ND);

// Postfix kernel stub names with .stub to differentiate them from kernel		// Derive the kernel stub from CUDA runtime.
		traUnsubmitted Not Done Reply Inline Actions Adjust kernel stub mangling as we may need to be able to differentiate them from the kernel itself (e.g. for HIP). tra: Adjust kernel stub mangling as we may need to be able to differentiate them from the kernel…
// names in device binaries. This is to facilitate the debugger to find
// the correct symbols for kernels in the device binary.
if (auto *FD = dyn_cast<FunctionDecl>(GD.getDecl()))		if (auto *FD = dyn_cast<FunctionDecl>(GD.getDecl()))
if (getLangOpts().HIP && !getLangOpts().CUDAIsDevice &&		if (!getLangOpts().CUDAIsDevice && FD->hasAttr<CUDAGlobalAttr>())
FD->hasAttr<CUDAGlobalAttr>())		MangledName = getCUDARuntime().getDeviceStubName(MangledName);
MangledName = MangledName + ".stub";

auto Result = Manglings.insert(std::make_pair(MangledName, GD));		auto Result = Manglings.insert(std::make_pair(MangledName, GD));
return MangledDeclNames[CanonicalGD] = Result.first->first();		return MangledDeclNames[CanonicalGD] = Result.first->first();
}		}

StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,		StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,
const BlockDecl *BD) {		const BlockDecl *BD) {
MangleContext &MangleCtx = getCXXABI().getMangleContext();		MangleContext &MangleCtx = getCXXABI().getMangleContext();
▲ Show 20 Lines • Show All 4,690 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add the interface deriving the stub name of device kernels.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 204856

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CGCUDARuntime.h

clang/lib/CodeGen/CodeGenModule.cpp

[HIP] Add the interface deriving the stub name of device kernels.
ClosedPublic