This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][OPENMP][NVPTX]Improve logic of the debug info support.
ClosedPublic

Authored by ABataev on Aug 31 2018, 12:48 PM.

Details

Summary

Added support for the -gline-directives-only option + fixed logic of the
debug info for CUDA devices. If optimization level is O0, then options
--[no-]cuda-noopt-device-debug do not affect the debug info level. If
the optimization level is >O0, debug info options are used +
--no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is
used, the optimization level for the device code is kept and the
emission of the debug directives is used.
If the opt level is > O0, debug info is requested +
--cuda-noopt-device-debug option is used, the optimization is disabled
for the device code + required debug info is emitted.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev created this revision.Aug 31 2018, 12:48 PM
tra accepted this revision.Aug 31 2018, 2:07 PM

Nice. So, in effect, for optimized builds we'll generate pre-DWARF line info only, unless --cuda-noopt-device-debug is specified.
Will this deal with the warnings about back-end being unable to handle particular debug options?

On a side note, when DWARF is functional in NVPTX we need to seriously consider per-GPU control for it. Enabling debug info blows up cubin size (ptxas apparently packs compressed PTX inside *cubin*) and we run into ELF reloc overflows in some tensorflow builds if all GPU variants carry it.

This revision is now accepted and ready to land.Aug 31 2018, 2:07 PM

The change in name here from "line tables" to "directives only" feels a bit confusing. "Limited" seems to be a bit more clear, or even remaining line tables only. Can you explain where you were going with this particular set of changes in a bit more detail please?

Thanks!

-eric

The change in name here from "line tables" to "directives only" feels a bit confusing. "Limited" seems to be a bit more clear, or even remaining line tables only. Can you explain where you were going with this particular set of changes in a bit more detail please?

Can't say I have much of an informed opinion about the parts that are only in the CUDA code. The "line directives only" terminology did come from a suggestion I made in one of the other reviews I can't seem to find right now.. ah, here: https://reviews.llvm.org/D51177 - whether or not that matches up with the use in the CUDA ToolChain code, I'm not sure.

The change in name here from "line tables" to "directives only" feels a bit confusing. "Limited" seems to be a bit more clear, or even remaining line tables only. Can you explain where you were going with this particular set of changes in a bit more detail please?

Thanks!

-eric

CUDA/NVPTX supports only 3 types of the debug info: limited/full, debug directives and no debug info at all. It does not support debug tables, so I just convert this into debug directives only.
The main idea is to mimic what nvcc does. It behaves absolutely the same way. If the opt level is O0, we can use full debug info. if opt level is >O0, we can use only lineinfo(debug directives) or no debug info. If we enabling debug info for the device code using --cuda-noopt-device-debug, the opt level for the device code is lowered to O0 and we enable full debug info. The host code will be optimized still.

ABataev updated this revision to Diff 172003.Oct 31 2018, 1:21 PM

Updated processing of the debug options. -g1 (line-tables) is considered as the regular debug directive, which may emit some useful debug info.

The llvm backend patch here has discussion around debug info kinds that we should iron out first.

lib/Driver/ToolChains/Cuda.cpp
292 ↗(On Diff #172003)

Is this an nvcc compatibility flag?

ABataev added inline comments.Nov 9 2018, 8:58 AM
lib/Driver/ToolChains/Cuda.cpp
292 ↗(On Diff #172003)

No, nvcc uses different set of flags. It uses -g for the debug info for the host code and -G for the device code. I'm not the original author of this option. clang uses it to control emission of the debug info for the device.
The bad thing about nvcc that it disables optimizations when -G is used. Using this option we can use LLVM optimizations and disable the optimizations only when we call ptxas tool.

echristo added inline comments.Dec 5 2018, 10:39 PM
lib/Driver/ToolChains/Cuda.cpp
282–285 ↗(On Diff #172003)

This enum doesn't appear to be complete? Either way can you make it match the other and document what each thing means a bit more?

289 ↗(On Diff #172003)

Please document this routine in prose.

292 ↗(On Diff #172003)

OK.

706–708 ↗(On Diff #172003)

Is this really doing anything?

ABataev marked 4 inline comments as done.Dec 6 2018, 9:02 AM
ABataev added inline comments.
lib/Driver/ToolChains/Cuda.cpp
282–285 ↗(On Diff #172003)

No, it is complete, but probably has some wrong names. I reworked it. Actually, this enum is intended to track the debug info emitted for the device. It may be disabled, debug directives only or same debug info as for the host.

289 ↗(On Diff #172003)

Added description.

706–708 ↗(On Diff #172003)

Yes, actually it does. Currently, when we need to emit the code for the device, we use the same debug info level just like for the host. But in some situations, we need to disable it or emit only debug directives for the device, while keeping the original debug info for the host. This function allows us to change the debug info level for the device and force clang to emit required debug info data during codegen for the NVPTX devices.

ABataev updated this revision to Diff 176993.Dec 6 2018, 9:10 AM
ABataev marked an inline comment as done.

Reworked according to the latest comments from Eric.

echristo accepted this revision.Dec 11 2018, 11:19 PM

LGTM. I'm quite a bit happier with this now. Thanks for going through the back and forth.

This revision was automatically updated to reflect the committed changes.