Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 16981 Build 16981: arc lint + arc unit
Event Timeline
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
272 ↗ | (On Diff #128956) | There are 2 substatements - 1 is single line and another one is 2-line. They both must be enclosed in braces. |
Looks OK to me. That said, I have little clue about DWARF, so I'll defer to echristo@ as it's his domain.
In general, it would be good if we could generate debug info in a somewhat more compact way (more than one byte per .b8 directive, for example) -- debug info has tendency to grow and we do carry PTX around in compiled binaries.
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
---|---|---|
2097 | CUNode->getMacros() -> Macros | |
lib/CodeGen/AsmPrinter/DwarfDebug.h | ||
263 ↗ | (On Diff #129123) | Nit: Should it be section (singular) labels (plural) here? E.g. we have license plates and book covers. Perhaps it should be just UseSectionsAsReferences. |
lib/Target/NVPTX/NVPTXAsmPrinter.h | ||
347 | Do I understand it correctly that printout of the } was moved here out of EmitFunctionBodyEnd() so that we can emit additional debug labels after the last basic block? It would be good to add few comments describing why we emit (or not) braces in particular places. |
In general, I think this patch needs a lot more description and possibly breaking up. For now a few questions:
a) nvptx doesn't allow a debug_str section?
b) relocations in nvptx should be against the section rather than the label? (To be fair, they can be this way on elf platforms as well which would be an improvement)
c) There are a bunch of directives for nvptx that you want to use for emission?
d) It looks like you've got this turned off by default by commenting out everything? I'd rather you just conditionalize it and have it all work and be testable before turning on.
I think I'd probably start splitting up the NVPTX asm printer patch into separate and testable patches, perhaps by rewriting the existing line table handling and going from there?
I've also made some inline comments of things that were confusing or otherwise not sure about if you also want to tackle those.
Thanks!
-eric
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp | ||
---|---|---|
37 | This isn't really a condition? :) | |
40 | This needs more elaboration. | |
44 | I don't understand what's going on here? | |
lib/Target/NVPTX/NVPTXAsmPrinter.cpp | ||
878–880 | I'm not sure what this fixme means? |
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
416 ↗ | (On Diff #130473) | I don't believe that this is a necessary or sufficient check here. What's the point of it? |
760 ↗ | (On Diff #130473) | So there's a difference between a begin symbol for a section and the section itself. You're doing the former and not the latter it looks like? And also I think you're breaking the possibility of multiple cus in a single module. |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
28 | A description of how this works would be good. |
Hi Eric, thanks for the review.
Ok, I'll add some description. As to breaking it up, it will be very hard to do. This patch adds commented out debug info, that allows successful compiling of resulting PTX file. If we will remove at least some small change from this patch, it will produce incorrect PTX file. So, I rather doubt that it is possible to break this patch into several small patches.
a) nvptx doesn't allow a debug_str section?
Here is an excerpt from CUDA Toolkit documentation (http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information): The PTX producer is responsible for emitting binary DWARF into the PTX file, using the .section and .b8-.b16-.b32-and-.b64 directives in PTX. This should contain the .debug_info and .debug_abbrev sections, and possibly optional sections .debug_pubnames and .debug_aranges. So, yes, .debug_str is not allowed.
b) relocations in nvptx should be against the section rather than the label? (To be fair, they can be this way on elf platforms as well which would be an improvement)
According to PTX ISA, 11.5.2. Debugging Directives: .section(http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives) labels are not allowed inside the debugging directives, so we cannot use labels.
c) There are a bunch of directives for nvptx that you want to use for emission?
Just .debug_info and .debug_abbrevs sections, all other sections should be empty, if emitted.
d) It looks like you've got this turned off by default by commenting out everything? I'd rather you just conditionalize it and have it all work and be testable before turning on.
It is just an initial patch. It adds some default debug info generation, but this info is still incorrect. This is why I commented it out in the output PTX. After this patch I need to commit several patches to LLVM and couple patches to clang to be able to produce correct debug info, that is compilable and does not break cuda-gdb and ptxas.
We still can test it, using LLVM lit tests, but it definitely won't work with the cuda-gdb and ptxas.
I think I'd probably start splitting up the NVPTX asm printer patch into separate and testable patches, perhaps by rewriting the existing line table handling and going from there?
I think the only thing I can do separately is to remove an existing line table handling completely. If this is ok, I can prepare the patch for it.
I've also made some inline comments of things that were confusing or otherwise not sure about if you also want to tackle those.
Thanks!
-eric
Thanks
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
416 ↗ | (On Diff #130473) | PTX format does not allow use of .debug_ranges section, we should not emit it. |
760 ↗ | (On Diff #130473) | I checked it, multiple CUs are handled correctly. But the reference to CU is generated as .b64 .debug_info+offset, which is supported by PTX. Don't forget, that PTX format is very limited in support of DWARF sections and we will genarte only .debug_abbrevs and .debug_info sections. |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp | ||
37 | I mean remove comments from the emitted DWARF sections once they are generated correctly. | |
44 | The body of the DWARF sections in PTX file must be enclosed into braces, like this: .section .debug_info { ... } This code checks that current section is one of the DWARF sections and encloses the content of these sections into braces. | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
28 | Ok, will add | |
lib/Target/NVPTX/NVPTXAsmPrinter.cpp | ||
878–880 | I mean, that the comment symbols // from the emitted //, debug must be removed once the debug info is properly generated. |
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
760 ↗ | (On Diff #130473) | After some thoughts, I realized that generally speaking, you're correct here. We need an additional offset for the compilation unit. I 'll fix this. |
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
416 ↗ | (On Diff #130473) | The .debug_ranges section was not defined in DWARF v2. Better to avoid emitting it based on the DWARF version. |
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
299 | Unconditionally ignoring a command-line option? Are there tests that fail if you don't do this? |
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp | ||
---|---|---|
416 ↗ | (On Diff #130473) | It won't be emitted (at least for NVPTX). I'm not sure about other targets. Maybe they allow to use .debug_ranges sections even for DWARF2? |
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
299 | I'm not aware of any, did it just in case if there are such cases. |
At this point I'm looking at how to break this up for you. It's going to take more time because I have other things, but I don't want to put in all of this in a single patch because there are things I want to reason about separately.
-eric
Hi Eric, thanks for your help. My ping is not a ping for the review, it is a ping for my question.
It is very hard to split this patch. It will break the generation of the PTX files so that either the compiler will crash or ptxas won't be able to compile the generated file. I asked you, is this ok if as the first part of the patch I remove generation of the debug lines info? It will reduce the size of the patch.
Ahh, and I think I can remove the part for the strings from this patch and commit it later
include/llvm/MC/MCAsmInfo.h | ||
---|---|---|
586 ↗ | (On Diff #139597) | Let's separate this out into a separate patch as well? |
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
299 | Yeah, I don't think this should be here. | |
334–336 | Do you still need to use dwarf2 here? | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
34 | So, this functionality is used for changing files within the line table. I think what's actually going on here is that nvptx, for some reason, only supports a single file directive. Basically you don't support the idea of: int foo (void) { // some code #include "somefile.def" // some more code } having a file switch in the middle of a function. That's arguably a bug, but I think you don't want to collect them and emit them outside the functions because then it would be more actively wrong. | |
lib/Target/NVPTX/NVPTXAsmPrinter.cpp | ||
472 | Seems odd that we're pulling the functionality across multiple functions that don't really work in pairs. |
include/llvm/MC/MCAsmInfo.h | ||
---|---|---|
586 ↗ | (On Diff #139597) | Ok |
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
334–336 | Yes, we need to force dwarf2 for NVPTX, this is what we agreed on with Paul Robinson reviewing clang part of the patch | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
34 | No, it means that you may have the reference to the somefile.def, but the table of files should be emitted outside of the function, in the outer context | |
lib/Target/NVPTX/NVPTXAsmPrinter.cpp | ||
472 | Yes, maybe it is worth it to add some option that allows enclosing of the dwarf sections into braces on the emission? |
lib/Target/NVPTX/NVPTXAsmPrinter.cpp | ||
---|---|---|
472 | Ahh, no, it won't work. This is caused by the fact that we only perform switch to section but not doing exit from the section emission. Because of that we just can't form pair of functions to emit braces |
lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp | ||
---|---|---|
42 | Do we still need these parts commented out? | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp | ||
40 | Similar to my comment below, I think that the "which section am I in" part needs some work. I definitely don't think it should be a "!=" comparison, and perhaps should be checking for some other metadata/feature. | |
40 | This should be commented as to "only dwarf sections need comments according to 'some reason'" | |
44 | But why looking for particular sections this way? | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
34 | I'm not sure what you mean. Can you show some examples and elaborate? I.e. you may have a file table entry, but if you're not emitting them before the function you might have a bad time? What do you do with this testcase? echristo@athyra ~> cat foo.c #include "foo.def" return; } | |
lib/Target/NVPTX/NVPTXAsmPrinter.h | ||
347 | Can you add a rationale here as to why we do it here rather than some other place? More detail in comment descriptions is good. |
lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp | ||
---|---|---|
42 | Yes, cause the debug info is still not correct. I'm afraid that it may cause some problems in the ptxas | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp | ||
40 | Ok | |
40 |
| |
44 | Do you have some better solution in mind? | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
34 | I removed include <stdio.h> from your second file, it causes a lot of troubles. Here is what I got: In file included from /NVPTX_DEBUG_INFO/foo.c:3: ./foo.def:2:1: warning: implicitly declaring library function 'printf' with type 'int (const char *, ...)' [-Wimplicit-function-declaration] printf("hello world\n"); ^ ./foo.def:2:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf' // // Generated by LLVM NVPTX Back-End // .version 3.2 .target sm_20//, debug .address_size 64 // .globl foo .extern .func (.param .b32 func_retval0) vprintf ( .param .b64 vprintf_param_0, .param .b64 vprintf_param_1 ) ; .global .align 1 .b8 _$_str[13] = {104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 10, 0}; .visible .func foo() { .reg .b32 %r<2>; .reg .b64 %rd<4>; Lfunc_begin0: .loc 1 1 0 .loc 2 2 1 mov.u64 %rd1, _$_str; cvta.global.u64 %rd2, %rd1; mov.u64 %rd3, 0; { // callseq 0, 0 .reg .b32 temp_param_reg; .param .b64 param0; st.param.b64 [param0+0], %rd2; .param .b64 param1; st.param.b64 [param1+0], %rd3; .param .b32 retval0; call.uni (retval0), vprintf, ( param0, param1 ); ld.param.b32 %r1, [retval0+0]; } // callseq 0 .loc 1 4 1 ret; Ltmp0: Lfunc_end0: } .file 1 "/NVPTX_DEBUG_INFO/foo.c" .file 2 "/NVPTX_DEBUG_INFO/./foo.def" // .section .debug_abbrev // { // .b8 1 // .b8 17 // .b8 1 // .b8 37 // .b8 8 // .b8 19 // .b8 5 // .b8 3 // .b8 8 // .b8 16 // .b8 6 // .b8 27 // .b8 8 // .b8 17 // .b8 1 // .b8 18 // .b8 1 // .b8 0 // .b8 0 // .b8 2 // .b8 46 // .b8 0 // .b8 17 // .b8 1 // .b8 18 // .b8 1 // .b8 3 // .b8 8 // .b8 58 // .b8 11 // .b8 59 // .b8 11 // .b8 63 // .b8 12 // .b8 0 // .b8 0 // .b8 0 // } // .section .debug_info // { // .b32 166 // .b8 2 // .b8 0 // .b32 .debug_abbrev // .b8 8 // .b8 1 // .b8 99 // .b8 108 // .b8 97 // .b8 110 // .b8 103 // .b8 32 // .b8 118 // .b8 101 // .b8 114 // .b8 115 // .b8 105 // .b8 111 // .b8 110 // .b8 32 // .b8 55 // .b8 46 // .b8 48 // .b8 46 // .b8 48 // .b8 32 // .b8 40 // .b8 116 // .b8 114 // .b8 117 // .b8 110 // .b8 107 // .b8 32 // .b8 51 // .b8 50 // .b8 57 // .b8 56 // .b8 48 // .b8 52 // .b8 41 // .b8 32 // .b8 40 // .b8 108 // .b8 108 // .b8 118 // .b8 109 // .b8 47 // .b8 116 // .b8 114 // .b8 117 // .b8 110 // .b8 107 // .b8 32 // .b8 51 // .b8 50 // .b8 57 // .b8 56 // .b8 48 // .b8 53 // .b8 41 // .b8 0 // .b8 12 // .b8 0 // .b8 102 // .b8 111 // .b8 111 // .b8 46 // .b8 99 // .b8 0 // .b32 .debug_line // .b8 47 // .b8 103 // .b8 115 // .b8 97 // .b8 47 // .b8 121 // .b8 107 // .b8 116 // .b8 103 // .b8 115 // .b8 97 // .b8 47 // .b8 104 // .b8 111 // .b8 109 // .b8 101 // .b8 47 // .b8 97 // .b8 47 // .b8 108 // .b8 47 // .b8 97 // .b8 108 // .b8 101 // .b8 120 // .b8 101 // .b8 121 // .b8 47 // .b8 116 // .b8 101 // .b8 115 // .b8 116 // .b8 47 // .b8 78 // .b8 86 // .b8 80 // .b8 84 // .b8 88 // .b8 95 // .b8 68 // .b8 69 // .b8 66 // .b8 85 // .b8 71 // .b8 95 // .b8 73 // .b8 78 // .b8 70 // .b8 79 // .b8 0 // .b64 Lfunc_begin0 // .b64 Lfunc_end0 // .b8 2 // .b64 Lfunc_begin0 // .b64 Lfunc_end0 // .b8 102 // .b8 111 // .b8 111 // .b8 0 // .b8 1 // .b8 1 // .b8 1 // .b8 0 // } // .section .debug_macinfo // { // .b8 0 // } 1 warning generated. As you can see .loc directives are emitted with references to files 1 and 2 and after the .text section we have 2 .file directives: one for file foo.c and another one for file foo.def | |
lib/Target/NVPTX/NVPTXAsmPrinter.h | ||
347 | Ok, will add |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
---|---|---|
34 | Typically you don't have file directives listed after the function and still have it able to assemble. Doing that in the .s file created with my testcase gives: echristo@athyra ~> clang -c foo.s .loc 1 1 0 # foo.c:1:0 ^ foo.s:18:7: error: unassigned file number in '.loc' directive .loc 2 2 1 prologue_end # ./foo.def:2:1 ^ foo.s:22:7: error: unassigned file number in '.loc' directive .loc 1 3 3 # foo.c:3:3 which seems problematic? |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
---|---|---|
34 | Not for ptxas, ptxas supports this order of the .loc and .file directives. |
Couple of small comment then this is going to be ok.
One question: "What work needs to happen to uncomment the debug info output?" In other words, what isn't done right now?
-eric
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp | ||
---|---|---|
40 | For the record this is pretty fragile and should be fixed up in a followup patch. | |
lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h | ||
34 | Can you document this in comments? |
Good start for now. Let's see what we can do to start testing as we can and documenting any bugs in the cuda tools as we work around them.
Thanks for all of your efforts here.
-eric
FWIW I'm seeing internal correctness issues after this patch has been merged - I think it's related to the section changing part of the patches. I'm investigating.
So this diff:
- for (const std::string &S : DwarfFiles)
- getStreamer().EmitRawText(S.data());
+ for (const std::string &S : DwarfFiles)
+ getStreamer().EmitRawText(DwarfFiles.front().data());
causes the problem to go away.
The problem shows up this way:
There is more than one initializer with name 'file'
during program startup (within google so you won't be able to duplicate this part). which seems a little weird, but I'm guessing that there's something going on in the way that we're outputting the file changes at section switch time with how ptxas is dealing with it.
For now I'm going to go ahead and revert this because it's making it so that cuda compilation isn't working with clang and I'll continue working up a testcase and see if I can't narrow down the problem a little better and get back to you soon.
Eric, could you send the small part of the PTX file, that causes troubles? Not the whole file, just several lines.
Hmm. Not sure, of course, but looks like the corruption of the output buffer. Like the flush is required or something like this. Need the output result for better analysis.
Unconditionally ignoring a command-line option? Are there tests that fail if you don't do this?