Page MenuHomePhabricator

[Debug-Info][XCOFF] support dwarf for XCOFF for assembly output
ClosedPublic

Authored by shchenz on Jan 27 2021, 5:25 AM.

Details

Summary

This patch supports dwarf debugging info for XCOFF format when clang generates an assembly output. Object output will be handled in followup patches.

Since I can not get the assembly file been assembled, I verify the debug info by checking the assembly files.

I have done some testing in debugger on the AIX system, this patch works well for some cases at 32-bit XCOFF.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
llvm/include/llvm/MC/MCSectionXCOFF.h
109

Suggested corresponding change.

llvm/lib/MC/MCAsmStreamer.cpp
1437–1442

The comment only covers half the condition? That makes reading this confusing.

llvm/lib/MC/MCContext.cpp
699–703

Don't use the {} initialization for XCOFFSectionKey now that its constructors are more complicated.

llvm/lib/MC/MCSymbolXCOFF.cpp
16–18

This function (as named) is for csects. The assert should stay or some more extensive changes are needed.

30–32

This function (as named) is for csects. The assert should stay or some more extensive changes are needed.

hubert.reinterpretcast added inline comments.
llvm/test/DebugInfo/XCOFF/empty.ll
76–78

See comment about the size of the traceback table and the resulting alignment consequences.

277–279

Because the size of the traceback table is not a guaranteed to be a multiple of 4 (here it is 22 bytes), I think it may be more conservative to actually emit these labels immediately after the last code instruction (although it seems neither GCC nor XL on AIX do that). Alternatively (and GCC and XL both do so), padding should be emitted after the traceback table to maintain the alignment. @DiggerLin @jasonliu, if the second option (add padding) is chosen, I think it should be a separate patch that is a follow-up to the traceback table emission.

llvm/include/llvm/MC/MCContext.h
620–626

It may make sense to add a getXCOFFDwarfSection function that forwards to this.

llvm/lib/MC/MCAsmStreamer.cpp
2357

I tried this patch with -ffunction-sections, and the assembly I get generates a duplicate definition of L...dwline:

	.byte	3
	.byte	1
	.byte	1
	.csect .main[PR],2
L..sec_end1:

	.dwsect 0x20000
L...dwline:
	.byte	0
	.byte	5
	.byte	2
	.vbyte	4, L..sec_end1
	.byte	0
	.byte	1
	.byte	1
L..debug_line_end0:
shchenz updated this revision to Diff 327031.Feb 28 2021, 11:07 PM
shchenz marked 11 inline comments as done.

address @hubert.reinterpretcast comments

thanks for your review @hubert.reinterpretcast . See my inline reply.

llvm/include/llvm/MC/MCContext.h
620–626

hmm, I would let this function like this for now. There would be many redundant codes between getXCOFFDwarfSection and getXCOFFSection if we create a new getXCOFFDwarfSection. Let me know if you think it is profitable to add a new function. Maybe we need a big NFC patch for this change.

llvm/lib/MC/MCAsmStreamer.cpp
2357

Good catch. Fixed in the new update.

llvm/lib/MC/MCSymbolXCOFF.cpp
30–32

I have added a FIXME in the caller of setRepresentedCsect, I will fix this later in another patch.

llvm/test/DebugInfo/XCOFF/empty.ll
76–78

Can you help to point out which comments? Do you mean we will have a functionality issue if we don't align the text section?

shchenz updated this revision to Diff 327050.Mar 1 2021, 1:26 AM

1: handle neighbouring .loc

shchenz updated this revision to Diff 327067.Mar 1 2021, 3:18 AM

typo fix

jasonliu added inline comments.Mar 1 2021, 9:46 AM
llvm/include/llvm/MC/MCContext.h
620–626

I think what Hubert meant is to make a call to getXCOFFSection inside of getXCOFFDwarfSection so that you don't need to pass in so many default parameters when you just want to get the Dwarf Section. i.e. The getXCOFFDwarfSection signature could be

MCSectionXCOFF *
getXCOFFDwarfSection(StringRef Section, XCOFF::DwarfSectionSubtypeFlags DwarfSubtypeFlags, bool MultiSymbolsAllowed = true, const char *BeginSymName = nullptr);
llvm/lib/MC/MCContext.cpp
700–701

Use () to make it obvious that we are calling a constructor.

llvm/test/DebugInfo/XCOFF/empty.ll
76–78

I just meant that https://reviews.llvm.org/D95518?id=326625#inline-915279 applies here too. I'm not aware of a functionality issue with this.

llvm/include/llvm/MC/MCSectionXCOFF.h
109

Minor nit: We've been using Dwarf for camelCase.

llvm/lib/MC/MCContext.cpp
694

Minor nit: We've been using Dwarf with PascalCase.

695–697

Minor comment: == works for this check.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

@jasonliu, I'd like your analysis of how this interacts with explicit sections.

shchenz updated this revision to Diff 327330.Mar 1 2021, 5:19 PM
shchenz marked 4 inline comments as done.

1: format fixing

llvm/include/llvm/MC/MCContext.h
620–626

Thanks for the review @jasonliu

That would make the API more confusing in my opinion. For that case, we have getXCOFFDwarfSection and getXCOFFSection. DWARF section should also be an XCOFF section, so why can't we call getXCOFFSection directly for DWARF section? getXCOFFSection obviously already has a parameter for DWARF section?

I am a little in favor of the current implementation, if you guys insist, I will do it in another NFC patch.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

explicit sections are generated when -fdata-sections is enabled? If so, I think debugging functionality should not be impacted. We only need to worry about the "text" section for debug line. For explicit sections functionality, I guess it should not be impacted too as we only add new labels, not data, like vbyte, byte

Anyway, your input is appreciated @jasonliu

llvm/test/DebugInfo/XCOFF/empty.ll
76–78

Ah, OK, I was thinking you were saying "comments" in llvm source code

jasonliu added inline comments.Mar 1 2021, 7:49 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

It seems that we want to generate a section end label on every text csect here.
If that's the case, then it doesn't seem like we are generating them correctly when explicit section for text section is enabled. An example would be:

int bar() {return 1;}
int __attribute__((section ("explicit_main_sec"))) main() {
  return bar();
}

I don't see a L..sec_end for .text[PR].

A side note when I was trying things: it seems we have a subtle bug when we enables -function-sections and specify an explicit section for a function. We don't generate a label for the function entry point in this case, and we don't have the correct function entry point stored in the function descriptor section.
The bug is not related to this patch, and we need to fix it in another patch. But it affects how we want to construct a test case in this patch.

llvm/test/DebugInfo/XCOFF/empty.ll
3

We should add test for -ffunction-sections, and possible explicit sections as well.

52

It might not be big deal, but it seems we are generating an extra label for no reason?

shchenz updated this revision to Diff 327366.Mar 1 2021, 11:46 PM
shchenz marked an inline comment as done.

address @jasonliu comments:
1: handle explicit section
2: add more testcases

shchenz added inline comments.Mar 1 2021, 11:46 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

Good catch. We need to generate section end symbol for the explicit section too. Fixed in the new patch.

I found one issue, not sure if it is the same one as you found:
for the case you given, if we compile it with -ffunction-sections, we will get some linking error:

> clang 4.c -ffunction-sections          
ld: 0711-317 ERROR: Undefined symbol: .main
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
clang-13: error: linker command failed with exit code 8 (use -v to see invocation)
llvm/test/DebugInfo/XCOFF/empty.ll
52

Not sure, this should be also exist without -g.

shchenz updated this revision to Diff 327373.Mar 2 2021, 12:31 AM

some simplify to the test case

shchenz added inline comments.Mar 2 2021, 3:03 AM
llvm/test/DebugInfo/XCOFF/empty.ll
52

Not sure, this should be also exist without -g.

What I mean is these two labels are caused by input to MC DWARF generation. The L..tmp2 is generated by a .loc directive. It will be used in dwlinesection. The L..tmp1 is generated for other usage., I have not checked into this yet. But I guess this should not be related for XCOFF DWARF.

jasonliu added inline comments.Mar 2 2021, 11:26 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

Yes, it's the same bug.
I feel the current way of handling end of text csect is not very robust.
A few more ways to break it:

int __attribute__((section ("explict_text_sec"))) foo() {
  return 1;
}
int bar2() {return 1;}
int __attribute__((section ("explict_text_sec"))) bar() {
  return bar2();
}

It seems when C++ exception handling is enabled, and eh instruments are generated, then we will not have the label L..sec_end on the correct location.
A sample program would be:

int foo(){
  try { throw 1;}
  catch(...) { return 1;}
  return 0;
}
shchenz updated this revision to Diff 327686.Mar 2 2021, 11:58 PM

1: address comments

shchenz added inline comments.Mar 3 2021, 12:06 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1780–1781

issue 1 is indeed an issue. Fixed in the new patch. Now, I use a new method NOW. Use end symbol of .text section for all sections including any function sections or explicit sections.

issue 2 is as expected, we have to contain this inaccurate text end symbol, because we may generate other sections (maybe not for text, for example, eh, rodata sections) between functions, and we need an end symbol for all instructions in the current CU, so we have to add it in PPCAIXAsmPrinter::doFinalization() which runs after all functions being handled by AsmPrinter::runOnMachineFunction(). But this should not impact debugability as the inaccurate label only impact the last debug line entry in the current CU and this inaccurate label is out of any function, so before we reach this address in the debugger, we should jump to the caller of the last function in the section.

jasonliu added inline comments.Mar 3 2021, 10:39 AM
llvm/lib/MC/MCAsmStreamer.cpp
2323

I hope this FIXME is not going to stay here for long (and we will quickly have a patch to address this). As this is really a workaround here and it creates different behavior between object file generation and assembly generation which is undesirable.

2344

The assembly generated is hard for human to consume. It will be very difficult to debug if anything goes wrong between those lines. It would be great if we could use addComment to let people know what each of this byte means (It could be in a follow on patch if this is too much for this one).
Some of the existing dwarf code does not really have those comments because they were meant for object file generation only. But now, it might make sense to have those comments available.
Arguably, one could use some other tools to dump the assembled object file and figure out what each entry means. But those are extra steps and it requires time to map the dump output from object file to the original asm output.

2363

Should use dwarf::DW_LNS_extended_op instead of hardcoding 0?

2370–2373

We have the same 4 lines on these 3 blocks. Do we want to common them up?
Also, better yet, is it possible if we just use most of the logic from MCObjectStreamer::emitDwarfAdvanceLineAddr instead of making our own assembly version?
For example, put the common code that would work for both assembly and object file generation in a common function that could access by MCAsmStreamer and MCObjectStreamer. And in the override version of emitDwarfAdvanceLineAddr, we could call the common function and handle the differences between assembly and object file generation. Or, at least we want to see if we could reuse MCDwarfLineAddr::Emit which gets called in MCObjectStreamer::emitDwarfAdvanceLineAddr, instead of duplicating the logic here.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
2308

I wonder if doFinalizationAtSectionEnd really has to be in MCStreamer. At least for this patch, it could just be a helper static function, or private member function of PPCAIXAsmPrinter or even a lambda here, and no other target actually needs it. We could always refactor it to a larger scope later when we find the need for it.

jasonliu added inline comments.Mar 3 2021, 5:56 PM
llvm/lib/MC/MCAsmStreamer.cpp
2146

This is more of a question for my own understanding: when do we need to call MCDwarfLineEntry::make?
It seems in MCObjectStreamer, we would also call MCDwarfLineEntry::make in emitValueImpl and emitBytes. Are those calls not necessary in the MCAsmStreamer counterpart?

shchenz updated this revision to Diff 327980.Mar 3 2021, 6:06 PM
shchenz marked 3 inline comments as done.

1: address @jasonliu comments

llvm/lib/MC/MCAsmStreamer.cpp
2323

I left this as a FIXME because I can not find an easy way to fix this. Why we need this function is because of the difference of the changeSection function of MCAsmStreamer and MCObjectStreamer. MCObjectStreamer will change the insert point to the section end, while MCAsmStreamer will add a new section start in the assembly output. The MCAsmStreamer current behavior is not what we need for debug lines, we can not add another .text section when we generate assembly for .dwline.
So if we want to have the same behavior for MCAsmStreamer and MCObjectStreamer, we must make sure that we know a function is a section end. So we can add a section end symbol in the function emitFunctionBodyEnd. But this seems hard due to explicit sections. When we handle a function in AsmPrinter pass, it is hard to know there are other functions(which will be handled later) that also belong to the same section unless a CU level analysis is performed to fetch this info.

2344

Please check the assembly generated now matching your expectation? Since the prologue of debug line is not newly generated by this patch, so I left it without comment to avoid many current cases fail.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
2308

I know what you mean, but we must implement this based on MCStreamer. We only need this for MCAsmStreamer but not for MCObjectStreamer. Currently, there is no way to differentiate these two streamers. And we should not differentiate them, as it breaks the abstraction of MCSreamer.

shchenz added inline comments.Mar 3 2021, 6:23 PM
llvm/lib/MC/MCAsmStreamer.cpp
2146

I think the calling of MCDwarfLineEntry::make in MCObjectStreamer is for DW_LNS_advance_pc usage.
For MCAsmStreamer, we don't use DW_LNS_advance_pc, so we only need to add a line entry and a temp label where we generate a .loc directive.

jasonliu added inline comments.Mar 4 2021, 6:57 AM
llvm/lib/MC/MCAsmStreamer.cpp
2344

Thanks. It's much better for reading now.

2351

nit: AddComment for here too?

jasonliu added inline comments.Mar 4 2021, 7:12 AM
llvm/lib/MC/MCAsmStreamer.cpp
2323

I guess one way to do this for assembly would be:

  1. collect all the seen text csects somehow into an array.
  2. After all the text csects are emitted (doFinalization or emitEndOfAsmFile?), traverse (and switch to) each text csect in the array and emit the section end label for it.

I think in this way, you could make sure that the section end labels are indeed emitted to the section end.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
2308

You are right. Forgot about the fact that this is for assembly only.

jasonliu added inline comments.Mar 4 2021, 9:23 AM
llvm/test/DebugInfo/XCOFF/empty.ll
10

So I tried a simple program with this patch:

(gdb) list
1       int bar2() {return 1;}
2       int  main() {
3         int c = 10;
4         return bar2()+c;
5       }
(gdb) b 3
Breakpoint 1 at 0x100004d8: file /gsa/tlbgsa/home/j/a/jasonli/defect/XCOFF/main.c, line 3.
(gdb) r
Starting program: /gsa/tlbgsa-h1/06/jasonli/defect/XCOFF/main 

Breakpoint 1, main () at /gsa/tlbgsa/home/j/a/jasonli/defect/XCOFF/main.c:3
3         int c = 10;
(gdb) n
4         return bar2()+c;
(gdb) s
bar2 () at /gsa/tlbgsa/home/j/a/jasonli/defect/XCOFF/main.c:1
1       int bar2() {return 1;}
(gdb) n
0x100002fc in __start ()

It seems when we try the last "next", it skips the main function and goes directly into __start(). But gcc on AIX seems to be able to advance to main before goes to __start().

jasonliu accepted this revision.Mar 4 2021, 10:01 AM

LGTM if the nit comment is addressed
There are two items that we need to follow up after this patch:

  1. The corner case/bug in https://reviews.llvm.org/D95518#inline-919030
  2. The possibility of addressing the FIXME for https://reviews.llvm.org/D95518#inline-918884
This revision is now accepted and ready to land.Mar 4 2021, 10:01 AM
shchenz updated this revision to Diff 328342.Mar 4 2021, 5:16 PM

1: add more comment

@jasonliu Please see my inline comments for the corner case and the left FIXME. Thanks for your review.

llvm/lib/MC/MCAsmStreamer.cpp
2323

hmm, doing this in doFinalization for the explicit sections will be too late, just like the case we do this in dwarf line emission. We can not switch back to the previous section at the end of all text sections.

Collecting this info in doInitialization is a possible way. Before we run AsmPrinter::runOnMachineFunction() for any function, we go through all functions in the module and try to find out which function is the last one of a specific section.

But I am not sure that the function order (when we iterate them in the module ) is the same as the order they exist in the source file. We need more investigation here if we want to fix this later.

llvm/test/DebugInfo/XCOFF/empty.ll
10

Yeah, this is a difference between GCC and clang. We also get the same behavior for clang on Linux for this small case.

I think maybe clang's behavior is better than GCC. In the debugger, when we break a function name, we will always skip the function's prologue and stop after the prologue, so when we return back to a function at the address for a function's epilogue, should we just execute them and not stop there? If so, clang does the right thing, we don't stop at the function's epilogue and return to the function's caller.

This revision was landed with ongoing or failed builds.Mar 4 2021, 6:08 PM
This revision was automatically updated to reflect the committed changes.
jasonliu added inline comments.Mar 4 2021, 6:39 PM
llvm/lib/MC/MCAsmStreamer.cpp
2323

I'm not sure if I understand what you meant by we could not switch back to the previous section at the end of all text sections.
Assuming you have csects like this:

.csect main[PR]
xxx
.csect text[PR]
xxx
.csect main[PR]
xxx
.csect text[PR]
xxx

Then at finalization, we know no more text csect will be emitted. Then we could start emitting the end label for each of the csect above:

.csect main[PR]
.L..sec_end0
.csect text[PR]
.L..sec_end1

Then you could refer back to these labels in the dwarf sections. Those labels should give you the address of the end of that particular csect.
Wouldn't this work?

llvm/test/DebugInfo/XCOFF/empty.ll
10

I see. That's fine if this is just the behavior of clang. It's a corner case anyway.

shchenz added inline comments.Mar 4 2021, 7:03 PM
llvm/lib/MC/MCAsmStreamer.cpp
2323

Please keep in mind that there is no way to change the insert point for the assembly output. See MCAsmStreamer::changeSection() -> MCSectionXCOFF::PrintSwitchToSection(.

If we are in the position of the text end, for your example:

.csect main[PR]
xxx
.csect text[PR]
xxx

Assume .text is the end of all text sections, now we want to emit end symbol for section main.
Then we call switchSection(main), what we have now is generating another csect directive for main and then add the end label, so it becomes:

.csect main[PR]
xxx
.csect text[PR]
xxx
.csect main[PR]
L..sec_end    #for main

I think this is not what we expect, right? We expect:

.csect main[PR]
xxx
L..sec_end    #for main
.csect text[PR]
xxx

So we must do the section end lable adding when we just reach a section end.

FYI, MCObjectStreamer has the ability to change the insert point. See MCObjectStreamer::changeSectionImpl(

jasonliu added inline comments.Mar 4 2021, 7:21 PM
llvm/lib/MC/MCAsmStreamer.cpp
2323

So we must do the section end lable adding when we just reach a section end.

You don't have to.

.csect .text[PR]
    lwz 1,1
    lwz 2,2
.csect something[PR]
    lwz 3,3
.csect .text[PR]
end_label:
.csect .data[RW]
   .byte end_label

In this example, we switch back to .text section and emitted the section end label. And we print the end_label label at .data csect. And it correctly shows 8 in .data[rw] when you do the objdump of the assembled object file.