This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
-
SemaDeclAttr.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
kernel-dbg-info.cu

Differential D79866

[HIP] Do not emit debug info for stub function
ClosedPublic

Authored by yaxunl on May 13 2020, 8:09 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
aaron.ballman

Commits

rG1b7bf1bd75dc: [HIP] Do not emit debug info for stub function

Summary

The stub function is generated by compiler and its instructions have nothing
to do with the kernel source code.

Currently clang generates debug info for the stub function, which causes
confusion for the debugger. For example, when users set break point
on a line of a kernel, the debugger should break on that line when the kernel is
executed and reaches that line, but instead the debugger breaks in the stub function.

This patch disables debug info for stub function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.May 13 2020, 8:09 AM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptMay 13 2020, 8:10 AM

Herald added a subscriber: aprantl. · View Herald Transcript

I do not see the behavior the patch is supposed to fix in CUDA.
If I compile a simple program, host-side debugger does not see the kernel, sees __device_stub_kernel and, if the breakpoint is set on kernel, it treats it as a yet-to-be-loaded one and does end up breaking on intry into the kernel on the GPU side.

E.g.:

(cuda-gdb) info symbol kernel
No symbol "kernel" in current context.
(cuda-gdb) info symbol __device_stub__kernel
__device_stub__kernel() in section .text
(cuda-gdb) b kernel
Function "kernel" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (kernel) pending.
(cuda-gdb) r
Starting program: /usr/local/google/home/tra/work/llvm/build/debug/print
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Hello from host
[New Thread 0x7fffdffff700 (LWP 227347)]
[New Thread 0x7fffdf7fe700 (LWP 227348)]
[New Thread 0x7fffdeffd700 (LWP 227349)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "print" hit Breakpoint 1, kernel<<<(1,1,1),(1,1,1)>>> () at print.cu:3
3         printf("Hello\n");

Perhaps it's HIP-specific behavior that needs this tweak.
For CUDA, I would rather that we continue to emit debug info for the stub (in it's __device_stub form). It is useful for debugging some issues.

In D79866#2034460, @tra wrote:
I do not see the behavior the patch is supposed to fix in CUDA.
If I compile a simple program, host-side debugger does not see the kernel, sees __device_stub_kernel and, if the breakpoint is set on kernel, it treats it as a yet-to-be-loaded one and does end up breaking on intry into the kernel on the GPU side.

E.g.:
(cuda-gdb) info symbol kernel
No symbol "kernel" in current context.
(cuda-gdb) info symbol __device_stub__kernel
__device_stub__kernel() in section .text
(cuda-gdb) b kernel
Function "kernel" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (kernel) pending.
(cuda-gdb) r
Starting program: /usr/local/google/home/tra/work/llvm/build/debug/print
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Hello from host
[New Thread 0x7fffdffff700 (LWP 227347)]
[New Thread 0x7fffdf7fe700 (LWP 227348)]
[New Thread 0x7fffdeffd700 (LWP 227349)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Thread 1 "print" hit Breakpoint 1, kernel<<<(1,1,1),(1,1,1)>>> () at print.cu:3
3         printf("Hello\n");
Perhaps it's HIP-specific behavior that needs this tweak.
For CUDA, I would rather that we continue to emit debug info for the stub (in it's __device_stub form). It is useful for debugging some issues.

can you try set bp by using file name and line number on the kernel?

In D79866#2034683, @yaxunl wrote:

can you try set bp by using file name and line number on the kernel?

In regular gdb it is set on the stub.
In cuda-gdb the behavior is interesting -- it initially gets set and breaks on the stub, but it also breaks on the entry into kernel. Once the kernel address is known, cuda-gdb no longer breaks on the stub. If I set breakpoint after the kernel is known, it's set on the kernel only and never stops on stub.

Overall the behavior is reasonable, IMO.

In D79866#2034748, @tra wrote:

In D79866#2034683, @yaxunl wrote:

can you try set bp by using file name and line number on the kernel?

In regular gdb it is set on the stub.
In cuda-gdb the behavior is interesting -- it initially gets set and breaks on the stub, but it also breaks on the entry into kernel. Once the kernel address is known, cuda-gdb no longer breaks on the stub. If I set breakpoint after the kernel is known, it's set on the kernel only and never stops on stub.

Overall the behavior is reasonable, IMO.

According to our debugger developers, cuda-gdb's behavior is not upstreamable. I can limit this patch to HIP if it is not needed for CUDA.

limit change to HIP

tra accepted this revision.May 13 2020, 1:42 PM

This revision is now accepted and ready to land.May 13 2020, 1:42 PM

Closed by commit rG1b7bf1bd75dc: [HIP] Do not emit debug info for stub function (authored by yaxunl). · Explain WhyMay 13 2020, 3:17 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2020, 3:17 PM

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaDeclAttr.cpp

6 lines

test/

CodeGenCUDA/

kernel-dbg-info.cu

33 lines

Diff 263869

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,355 Lines • ▼ Show 20 Lines	if (const auto *Method = dyn_cast<CXXMethodDecl>(FD)) {
}		}
S.Diag(Method->getBeginLoc(), diag::warn_kern_is_method) << Method;		S.Diag(Method->getBeginLoc(), diag::warn_kern_is_method) << Method;
}		}
// Only warn for "inline" when compiling for host, to cut down on noise.		// Only warn for "inline" when compiling for host, to cut down on noise.
if (FD->isInlineSpecified() && !S.getLangOpts().CUDAIsDevice)		if (FD->isInlineSpecified() && !S.getLangOpts().CUDAIsDevice)
S.Diag(FD->getBeginLoc(), diag::warn_kern_is_inline) << FD;		S.Diag(FD->getBeginLoc(), diag::warn_kern_is_inline) << FD;

D->addAttr(::new (S.Context) CUDAGlobalAttr(S.Context, AL));		D->addAttr(::new (S.Context) CUDAGlobalAttr(S.Context, AL));
		// In host compilation the kernel is emitted as a stub function, which is
		// a helper function for launching the kernel. The instructions in the helper
		// function has nothing to do with the source code of the kernel. Do not emit
		// debug info for the stub function to avoid confusing the debugger.
		if (S.LangOpts.HIP && !S.LangOpts.CUDAIsDevice)
		D->addAttr(NoDebugAttr::CreateImplicit(S.Context));
}		}

static void handleGNUInlineAttr(Sema &S, Decl *D, const ParsedAttr &AL) {		static void handleGNUInlineAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
const auto *Fn = cast<FunctionDecl>(D);		const auto *Fn = cast<FunctionDecl>(D);
if (!Fn->isInlineSpecified()) {		if (!Fn->isInlineSpecified()) {
S.Diag(AL.getLoc(), diag::warn_gnu_inline_attribute_requires_inline);		S.Diag(AL.getLoc(), diag::warn_gnu_inline_attribute_requires_inline);
return;		return;
}		}
▲ Show 20 Lines • Show All 3,407 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/kernel-dbg-info.cu

This file was added.

				// RUN: echo "GPU binary would be here" > %t

				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O0 \
				// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
				// RUN: -o - -x hip \| FileCheck %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O0 \
				// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
				// RUN: -o - -x hip -fcuda-is-device \| FileCheck -check-prefix=DEV %s

				#include "Inputs/cuda.h"

				extern "C" __global__ void ckernel(int *a) {
				*a = 1;
				}

				// Device side kernel names
				// CHECK: @[[CKERN:[0-9]]] = {{.}} c"ckernel\00"

				// DEV: define {{.}}@ckernel{{.}}!dbg
				// DEV: store {{.*}}!dbg
				// DEV: ret {{.*}}!dbg

				// CHECK-NOT: define {{.}}@__device_stub__ckernel{{.}}!dbg
				// CHECK: define {{.*}}@[[CSTUB:__device_stub__ckernel]]
				// CHECK-NOT: call {{.}}@hipLaunchByPtr{{.}}!dbg
				// CHECK: call {{.}}@hipLaunchByPtr{{.}}@[[CSTUB]]
				// CHECK-NOT: ret {{.*}}!dbg

				// CHECK-LABEL: define {{.}}@_Z8hostfuncPi{{.}}!dbg
				// CHECK: call void @[[CSTUB]]{{.*}}!dbg
				void hostfunc(int *a) {
				ckernel<<<1, 1>>>(a);
				}