Added support for the -gline-directives-only option + fixed logic of the
debug info for CUDA devices. If optimization level is O0, then options
--[no-]cuda-noopt-device-debug do not affect the debug info level. If
the optimization level is >O0, debug info options are used +
--no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is
used, the optimization level for the device code is kept and the
emission of the debug directives is used.
If the opt level is > O0, debug info is requested +
--cuda-noopt-device-debug option is used, the optimization is disabled
for the device code + required debug info is emitted.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Nice. So, in effect, for optimized builds we'll generate pre-DWARF line info only, unless --cuda-noopt-device-debug is specified.
Will this deal with the warnings about back-end being unable to handle particular debug options?
On a side note, when DWARF is functional in NVPTX we need to seriously consider per-GPU control for it. Enabling debug info blows up cubin size (ptxas apparently packs compressed PTX inside *cubin*) and we run into ELF reloc overflows in some tensorflow builds if all GPU variants carry it.
The change in name here from "line tables" to "directives only" feels a bit confusing. "Limited" seems to be a bit more clear, or even remaining line tables only. Can you explain where you were going with this particular set of changes in a bit more detail please?
Thanks!
-eric
Can't say I have much of an informed opinion about the parts that are only in the CUDA code. The "line directives only" terminology did come from a suggestion I made in one of the other reviews I can't seem to find right now.. ah, here: https://reviews.llvm.org/D51177 - whether or not that matches up with the use in the CUDA ToolChain code, I'm not sure.
CUDA/NVPTX supports only 3 types of the debug info: limited/full, debug directives and no debug info at all. It does not support debug tables, so I just convert this into debug directives only.
The main idea is to mimic what nvcc does. It behaves absolutely the same way. If the opt level is O0, we can use full debug info. if opt level is >O0, we can use only lineinfo(debug directives) or no debug info. If we enabling debug info for the device code using --cuda-noopt-device-debug, the opt level for the device code is lowered to O0 and we enable full debug info. The host code will be optimized still.
Updated processing of the debug options. -g1 (line-tables) is considered as the regular debug directive, which may emit some useful debug info.
The llvm backend patch here has discussion around debug info kinds that we should iron out first.
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
292 ↗ | (On Diff #172003) | Is this an nvcc compatibility flag? |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
292 ↗ | (On Diff #172003) | No, nvcc uses different set of flags. It uses -g for the debug info for the host code and -G for the device code. I'm not the original author of this option. clang uses it to control emission of the debug info for the device. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
282–285 ↗ | (On Diff #172003) | This enum doesn't appear to be complete? Either way can you make it match the other and document what each thing means a bit more? |
289 ↗ | (On Diff #172003) | Please document this routine in prose. |
292 ↗ | (On Diff #172003) | OK. |
706–708 ↗ | (On Diff #172003) | Is this really doing anything? |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
282–285 ↗ | (On Diff #172003) | No, it is complete, but probably has some wrong names. I reworked it. Actually, this enum is intended to track the debug info emitted for the device. It may be disabled, debug directives only or same debug info as for the host. |
289 ↗ | (On Diff #172003) | Added description. |
706–708 ↗ | (On Diff #172003) | Yes, actually it does. Currently, when we need to emit the code for the device, we use the same debug info level just like for the host. But in some situations, we need to disable it or emit only debug directives for the device, while keeping the original debug info for the host. This function allows us to change the debug info level for the device and force clang to emit required debug info data during codegen for the NVPTX devices. |
LGTM. I'm quite a bit happier with this now. Thanks for going through the back and forth.