Page MenuHomePhabricator

[DebugInfo] Fix end_sequence of debug_line in LTO Object
Needs ReviewPublic

Authored by kyulee on Aug 17 2021, 5:59 PM.

Details

Summary

In a LTO build, the end_sequence in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The end_sequence needs to point to the end of each CU's range. This bug often causes invalid debug_line table in the final .dSYM binary for MachO after running dsymutil which tries to compensate an out-of-range address of end_sequence.
The fix is to set the end label of the last function range for each CU's line table. Use it to encode end_sequence address or fall back to use the section end label same as before.

Diff Detail

Event Timeline

kyulee created this revision.Aug 17 2021, 5:59 PM
kyulee requested review of this revision.Aug 17 2021, 5:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2021, 5:59 PM
ellis added a subscriber: ellis.Aug 17 2021, 6:23 PM

Would it be possible and easier to test this issue without any LTO concerns, with code like this:

void f1() { }
void __attribute__((nodebug)) void f2() { }

Looks like the bug manifests here by having the line table cover f2, when it should end at the end of f1 instead?

(admittedly there's still a problem, maybe this isn't sufficiently representative, if there was void f3() { } at the end - we don't end the sequence around f2 and restart it afterwards... )

Well, if you want to keep the existing LTO-related testing, you could link two IR files together with simple empty functions, rather than interesting functions like you have in your examples. Just void f1() { } and void f2() { } in two files, llvm-linkd together would suffice, I think?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2158–2160

What's this refactoring for? Looks like it causes duplicate lookups of the DwarfCompileUnit, that might be nice to avoid/not introduce? (if it is necessary/preferred for some reason, might be worth doing separately from the rest of this patch?)

(both callers already have the DwarfCompileUnit available - perhaps this function should take that instead of the MachineFunction? that way it could avoid the extra lookups)

2205–2209

Perhaps this could be done once outside the loop?

// MBBSectionRanges is always non-empty at this point, right? (could assert just in case, but I think it's a reasonable invariant)
Asm->OutStreamer->getContext()
        .getMCDwarfLineTable(getDwarfCompileUnitID(MF))
        .getMCLineSections()
        .updateEndLabel(Asm->MBBSectionRanges.back().second.EndLabel);
llvm/lib/MC/MCObjectStreamer.cpp
516–517

When is this condition false? (is it tested?)

llvm/test/DebugInfo/lto-debugline-main.ll
1 ↗(On Diff #367079)

This patch only applies to llvm's codegen - so the test probably shouldn't include opt or LTO tools - the post-LTO'd IR should be checked in directly & then fed into llc in the test to exercise the codegen codepath, most likely?

clayborg added inline comments.Aug 17 2021, 8:43 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2209

Are the "Asm->MBBSectionRanges" sorted? If not, do we need to find the highest EndLabel and only add that one?

kyulee updated this revision to Diff 367121.Aug 17 2021, 11:07 PM

Update the change based on the feedback

  • Refactor NFC change to a separate revision, https://reviews.llvm.org/D108271
  • Use a non-LTO unit test
  • Set EndLabel once after the loop
  • Simpliy the logic to directly use EndLabel for end_sequence address
kyulee marked 4 inline comments as done.
kyulee edited the summary of this revision. (Show Details)Aug 17 2021, 11:13 PM
kyulee updated this revision to Diff 367123.Aug 17 2021, 11:37 PM

Remove unnecessary MCObjectWriter.h inclusion.

kyulee updated this revision to Diff 367249.Aug 18 2021, 10:39 AM

Restore symbol diff logic to fix Window failures

kyulee updated this revision to Diff 367372.Aug 18 2021, 5:38 PM

Rebase. getDwarfCompileUnitID -> getDwarfCompileUnitIDForLineTable in https://reviews.llvm.org/D108271

dblaikie added inline comments.Aug 18 2021, 10:35 PM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

Still curious about this question ^

kyulee added inline comments.Aug 19 2021, 1:52 AM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

I've attempted to delete this symbol diff check in https://reviews.llvm.org/D108261?id=367123 but
https://buildkite.com/llvm-project/premerge-checks/builds/53063#d29a2537-c103-4035-86a0-4e01e0188dd4 showed many builds failed like

Assertion failed: Abs && "We created a line delta with an invalid expression"

So, I restored this logic back.

As for InSet = false, I think it confirms the absolute symbol difference, or this check can be true optimistically by https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/lib/MC/MachObjectWriter.cpp#L680-L681.
I tested the following test could fail if I set InSet = true.

DebugInfo/X86/gmlt.test
dblaikie added inline comments.Aug 19 2021, 12:31 PM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

This seems suspicious to me - "isSymbolRefDifferenceFullyResolvedImpl" (also a bit suspicious that this is calling an "Impl" function - usually those are only used as implementation details for some non-impl function, where the latter is intended as the general entry point) should only return false if the two symbols aren't in the same section. I wouldn't've thought that should happen - when/where/how/why does that happen?

kyulee updated this revision to Diff 367613.Aug 19 2021, 2:12 PM

Use isSymbolRefDifferenceFullyResolved instead of isSymbolRefDifferenceFullyResolvedImpl

kyulee added inline comments.Aug 19 2021, 2:15 PM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

I've just replaced isSymbolRefDifferenceFullyResolvedImpl by isSymbolRefDifferenceFullyResolved.
As shown https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/include/llvm/MC/MCObjectWriter.h#L67-L86, the arguments are very similar in these two except that one is with MCSymbolRefExpr and the other is with MCSymbol.

Are you suggesting a new API isSymbolRefDifferenceFullyResolved(constMCAssembler &Asm, constMCSymbolRefExpr *A, constMCSymbolRefExpr *B) that defaults with InSet = false, and use it here?

dblaikie added inline comments.Aug 19 2021, 3:20 PM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

Sorry, no I don't mean to rename or refactor this API.

I'd like to understand why this function call ever produces a result of false - it seems to me that EndLabel and LastLabel should always be from the same section, and if they are, I think this function should never return false - so I think I'm misunderstanding something/don't understand where these labels are coming from such that they could end up being from different sections. I'd like to understand how that situation can arise before approving this patch - to understand better why this approach/test is suitable.

kyulee added inline comments.Aug 19 2021, 9:03 PM
llvm/lib/MC/MCObjectStreamer.cpp
516–517

I'm not familiar with the whole logic, but I've traced a particular case below.

DebugInfo/X86/cu-ranges.ll

This test runs with -function-sections so it appears each function goes to each section (comdat?).
I found LastLabel and EndLabel are in the same section for the last function only because EndLabel is set from the last function.
This means all the prior functions go with the end section label because this symbol difference check fails.
For the last function, EndLabel is actually the same offset to the end section label because each section has its own function only.
So, I think this logic seems to cover this case correctly resulting in a valid line table. I'm not sure if there is a better way to guard this case.

kyulee updated this revision to Diff 367733.Aug 20 2021, 1:21 AM

Set EndLabel per Section. A compilation unit (CU) can have multiple sections (e.g., comdat) so track EndLabel per section.
This seems a way to avoid unnecessary symbol diff check because EndLabel and LastLabel would be in the same section.

kyulee updated this revision to Diff 368021.Aug 22 2021, 7:25 PM

Handle DebugLineEntries and EndLabel per Section

@dblaikie Could you take another look? Thanks!

I'm a bit confused/unclear how this algorithm works/how this fixes things and whether it's the right direction. It'll be a while before I can find the time/brain bandwidth to really dig into this - so perhaps you can help me.

What I'm confused by is why multiple line tables would be unterminated & then end up needing to terminate them all/figure out which one to terminate? Instead I'd expect whenever another line table got an entry, the previously current line table entry would need to end - so there shouldn't be a need for the extra maps & maybe we can just keep track of the current "open" line table (whichever was the last one to emit an entry)?

I'm a bit confused/unclear how this algorithm works/how this fixes things and whether it's the right direction. It'll be a while before I can find the time/brain bandwidth to really dig into this - so perhaps you can help me.

What I'm confused by is why multiple line tables would be unterminated & then end up needing to terminate them all/figure out which one to terminate? Instead I'd expect whenever another line table got an entry, the previously current line table entry would need to end - so there shouldn't be a need for the extra maps & maybe we can just keep track of the current "open" line table (whichever was the last one to emit an entry)?

Here is my understanding. Basically a line table is emitted per compilation unit (CU). For simplicity, I consider two cases:

  1. When multiple sections exist in a CU, we still want to emit them into a single line table (with multiple end_sequences per section).
  2. In a LTO case, multiple CUs appear in a single (merged) object. In that case, we want to emit multiple line tables per CU.

So, this algorithm tries to track the end label of each function range per section in a CU. Likewise each line entries are already aggregated per section in a CU. This ensures emitting the end label per section per CU.
In case for asm parser path (instead of normal compilation path), those line entries/function range may not be emitted, so in that case (EndLabel = null), I fall back to the old logic that just uses the section end label conservatively.

This is two examples that this change tries to fix/improve.

  1. Multiple sections in a CU case -- similar to Comdat (function per section).
void f2()
{
}
// The end of .text in CU line table

__attribute__((section(".s1")))
void f1()
{
}
// The end of .s1 in CU line table
__attribute__((nodebug,section(".s1")))
void f1nodebug()
{
}
// --> Before this change: The line table pointed to here because it's the end of the section .s1.

#if 0
// llvm-objdump -d t1.o
Disassembly of section .text:

0000000000000000 <f2>:
..
       5: c3                            retq

Disassembly of section .s1:

0000000000000000 <f1>:
..
       5: c3                            retq

0000000000000010 <f1nodebug>:
..
      15: c3                            retq


// llvm-dwarfdump --debugline t1.o
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      2      0      1   0             0  is_stmt
0x0000000000000004      3      1      1   0             0  is_stmt prologue_end
0x0000000000000006      3      1      1   0             0  is_stmt end_sequence
0x0000000000000000      8      0      1   0             0  is_stmt
0x0000000000000004      9      1      1   0             0  is_stmt prologue_end
0x0000000000000006      9      1      1   0             0  is_stmt end_sequence ---> Before this change, it pointed to offset 16, the end of the section .s1.
#endif
  1. LTO case: Multiple CUs in a single section
// LTO object effectively has the view of the merged CUs.
// CU1 from t1.c
void t1()
{
}
// End of .text in CU1 line table

// CU2 from t2.c
void t2()
{
}
// End of .text in CU2 line table

#if 0
$ llvm-objdump -d lto.o
0000000000000000 <t1>:
...
       5: c3                            retq

0000000000000010 <t2>:
..
      15: c3                            retq

$ llvm-dwarfdump --debug-line lto.o
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      4      0      1   0             0  is_stmt
0x0000000000000004      5      1      1   0             0  is_stmt prologue_end
0x0000000000000006      5      1      1   0             0  is_stmt end_sequence --> Before this change, it pointed to offset 16 (the end of .text section).

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000010      4      0      1   0             0  is_stmt
0x0000000000000014      5      1      1   0             0  is_stmt prologue_end
0x0000000000000016      5      1      1   0             0  is_stmt end_sequence
#endif

I'm a bit confused/unclear how this algorithm works/how this fixes things and whether it's the right direction. It'll be a while before I can find the time/brain bandwidth to really dig into this - so perhaps you can help me.

What I'm confused by is why multiple line tables would be unterminated & then end up needing to terminate them all/figure out which one to terminate? Instead I'd expect whenever another line table got an entry, the previously current line table entry would need to end - so there shouldn't be a need for the extra maps & maybe we can just keep track of the current "open" line table (whichever was the last one to emit an entry)?

Here is my understanding. Basically a line table is emitted per compilation unit (CU). For simplicity, I consider two cases:

  1. When multiple sections exist in a CU, we still want to emit them into a single line table (with multiple end_sequences per section).
  2. In a LTO case, multiple CUs appear in a single (merged) object. In that case, we want to emit multiple line tables per CU.

^ Presumably multiple line tables per Module, one per CU.

So, this algorithm tries to track the end label of each function range per section in a CU. Likewise each line entries are already aggregated per section in a CU. This ensures emitting the end label per section per CU.
In case for asm parser path (instead of normal compilation path), those line entries/function range may not be emitted, so in that case (EndLabel = null), I fall back to the old logic that just uses the section end label conservatively.

Ah, I see, and you've in the "normal compilation path" the change here in DwarfDebug's endFunctionImpl to add an end label for the section. Wouldn't this have issues if you interleave functions from CUs in the same section - like CU1:func1:.text, then CU2:func2:.text, then CU1:func3:.text. (ah, right, I mentioned this in my first comment, but I'm feeling more like fixing the issue your seeing should involve the more general fix to that interleaved issue too)

Yeah, in that case you get CU1's line table covering the whole range, including func2, which isn't intended/desirable.

So I think this solution you have is incomplete & I'd say it's essentially the same bug - at least I think of it that way. But I guess it depends what sort of "invalid debug_line table in the final dsym" you're dealing with - they're all not great, but maybe there's a more severe invalidity in the end case?

I guess what I'm suggesting would probably still require that extra signal from DwarfDebug in the normal compilation path, but it could be more robust/address the interleaved issue which wouldn't be fixed by this approach.

What if whenever the section changes (in raw MC) or when a new line entry is emitted to another CU (or in DwarfDebug, if a nodebug function starts) - then emit an end_prologue to whatever the current open line table is? (have to keep track of that, I guess - "last emitted line table MCLineDivision")

That would miss some opportunities for shorter line table encodings (in the func1/2/3 scenario above, if func2 was in another section, then the line table for func1/3 wouldn't actually need to have two separate chunks - they could be contiguous) - so more advanced would be keeping "Last MCLineDivision per section". Terminate the last MCLineDivision whenever somethingr is emitted to that section that isn't the same division. DwarfDebug would have one extra bonus: If it starts a nodebug function, it could also pre-emptively terminate any current MCLineDivision.

I think that'd fix all these issues, probably?

I'm a bit confused/unclear how this algorithm works/how this fixes things and whether it's the right direction. It'll be a while before I can find the time/brain bandwidth to really dig into this - so perhaps you can help me.

What I'm confused by is why multiple line tables would be unterminated & then end up needing to terminate them all/figure out which one to terminate? Instead I'd expect whenever another line table got an entry, the previously current line table entry would need to end - so there shouldn't be a need for the extra maps & maybe we can just keep track of the current "open" line table (whichever was the last one to emit an entry)?

Here is my understanding. Basically a line table is emitted per compilation unit (CU). For simplicity, I consider two cases:

  1. When multiple sections exist in a CU, we still want to emit them into a single line table (with multiple end_sequences per section).
  2. In a LTO case, multiple CUs appear in a single (merged) object. In that case, we want to emit multiple line tables per CU.

^ Presumably multiple line tables per Module, one per CU.

So, this algorithm tries to track the end label of each function range per section in a CU. Likewise each line entries are already aggregated per section in a CU. This ensures emitting the end label per section per CU.
In case for asm parser path (instead of normal compilation path), those line entries/function range may not be emitted, so in that case (EndLabel = null), I fall back to the old logic that just uses the section end label conservatively.

Ah, I see, and you've in the "normal compilation path" the change here in DwarfDebug's endFunctionImpl to add an end label for the section. Wouldn't this have issues if you interleave functions from CUs in the same section - like CU1:func1:.text, then CU2:func2:.text, then CU1:func3:.text. (ah, right, I mentioned this in my first comment, but I'm feeling more like fixing the issue your seeing should involve the more general fix to that interleaved issue too)

Yeah, in that case you get CU1's line table covering the whole range, including func2, which isn't intended/desirable.

So I think this solution you have is incomplete & I'd say it's essentially the same bug - at least I think of it that way. But I guess it depends what sort of "invalid debug_line table in the final dsym" you're dealing with - they're all not great, but maybe there's a more severe invalidity in the end case?

I guess what I'm suggesting would probably still require that extra signal from DwarfDebug in the normal compilation path, but it could be more robust/address the interleaved issue which wouldn't be fixed by this approach.

What if whenever the section changes (in raw MC) or when a new line entry is emitted to another CU (or in DwarfDebug, if a nodebug function starts) - then emit an end_prologue to whatever the current open line table is? (have to keep track of that, I guess - "last emitted line table MCLineDivision")

That would miss some opportunities for shorter line table encodings (in the func1/2/3 scenario above, if func2 was in another section, then the line table for func1/3 wouldn't actually need to have two separate chunks - they could be contiguous) - so more advanced would be keeping "Last MCLineDivision per section". Terminate the last MCLineDivision whenever somethingr is emitted to that section that isn't the same division. DwarfDebug would have one extra bonus: If it starts a nodebug function, it could also pre-emptively terminate any current MCLineDivision.

I think that'd fix all these issues, probably?

Thanks for comments! I roughly get your points but may need sometime to digest it.
I may misunderstand an important piece on your example -- CU1:func1:.text, then CU2:func2:.text, then CU1:func3:.text.
My understanding is CU is a compilation unit which seems to be processed sequentially (in LLVM). How could we interleave CUs like this? I thought we only interleave sections within a CU.
Does this interleaving come from a ThinLTO compilation? I would appreciate it if you can provide an example.

Thanks for comments! I roughly get your points but may need sometime to digest it.
I may misunderstand an important piece on your example -- CU1:func1:.text, then CU2:func2:.text, then CU1:func3:.text.
My understanding is CU is a compilation unit which seems to be processed sequentially (in LLVM).

Ah, that's not quite correct - a CU exists both by its existence in the cu list named module metadata, and by its reference from DISubprograms (which exist by reference from llvm::Functions). A CU is not processed atomically - some handling is done up front (at the start of the module, iterating the named module metadata list of CUs) and some is done at the end of the module - but between those, for all the DISubprograms, they're visited in the order of the llvm::Functions that reference them - so they can be arbitrarily interleaved.

How could we interleave CUs like this? I thought we only interleave sections within a CU.
Does this interleaving come from a ThinLTO compilation? I would appreciate it if you can provide an example.

I don't know that I have a concrete example that creates the interleaving I can reproduce by hand - but it's not an IR invariant that all Functions that reference CUs through DISubprograms must be contiguous for that CU.

To create such a situation by hand, create two files, let's say a.cpp and b.cpp and have two functions a1 and a2 in a.cpp and main in b.cpp, have main call a1 and a2. Compile to IR, link the IR, hand-modify the IR to reorder the functions so it goes a1, then main, then a2 - then you should be able to reproduce/see the strange situation where the line table for a.cpp continues/extends over main/b.cpp.