This is an archive of the discontinued LLVM Phabricator instance.

[DebugLine] save one debug line entry for empty prologue
ClosedPublic

Authored by shchenz on Apr 3 2023, 11:46 PM.

Details

Summary

Some debuggers like DBX on AIX assume the address in debug line entries is always incremental. But clang generates two entries (entry for file scope line and entry for prologue end) with same address if prologue is empty

And if the prologue is empty, seems the first debug line entry for the function is unnecessary(i.e. removing the first entry won't impact the behavior in GDB on Linux), so I implement this for all debuggers.

Diff Detail

Event Timeline

shchenz created this revision.Apr 3 2023, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 11:46 PM
shchenz requested review of this revision.Apr 3 2023, 11:46 PM

gentle ping

I think this seems reasonable. Dis you run the LLDB test suite against this change?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2160

Are we allowed to use structured bindings? In any case, could you assign this to named variables, the .first and .second make the code harder to read.

shchenz marked an inline comment as done.Apr 12 2023, 8:35 PM

I think this seems reasonable. Dis you run the LLDB test suite against this change?

Just run a make check-lldb on PPC64LE target, no regression found.

Looks a duplicate of https://reviews.llvm.org/D49426

oops, I really didn't realize that patch. Good to know we also get an agreement there that what this patch proposes is correct. Considering that patch is pending there for more than 4 years (the last comment is ping to the author about 2 year ago), I will continue with this patch.

shchenz updated this revision to Diff 513055.Apr 12 2023, 8:35 PM
shchenz added a reviewer: TWeaver.

address @aprantl comments.

I guess I'll provide the same feedback here as I did on the other one (@aprantl thanks for referencing it/finding the duplicate):

I'm OK with this change from a "slightly reduces debug info size" but I hesitate to suggest that this isn't a thing your consumer should support & this "When an assembly file with multiple .loc directives with no interveaning instructions are assembled, GAS produces a DWARF line table that maps one instruction to multiple entries." sounds like a bit of a mischaracterization of how I'd read the line table entries in question - I think this line table doesn't "map one instruction to multiple entries" but instead has an empty entry that contains no instructions. Everything between one .loc and the next is at that first location - if there's nothing between them, then nothing is at that location. I suspect that's how other consumers are viewing this situation.

If this is an invariant you want (that there are no zero-length ranges in the line table), maybe it makes sense to implement it as an option in the integrated assembler, rather than special casing this particular instance of zero-length ranges in the line table?

They seem useless enough, but important to reproduce accurately if that's what the assembly says and other assemblers portray that information - but for our own uses we can say "these are meaningless, elide any zero-length location range"?

(do other assemblers reproduce this information? Does anyone care if we preserve it or drop it? My take on it is that this information is meaningless - it describes a location for zero instructions... so dropping it seems fine to me)

do other assemblers reproduce this information?

I agree with you that some assembler, like GAS on Linux will generate zero-length ranges in the line table if no instructions between two .loc. I guess this can also be improved in these producers like what we did here in compiler?
And on AIX where this patch is motivated, the system assembler does not have pseudo like .loc, it can only get debug line info in the .dwline section, see case llvm/test/DebugInfo/XCOFF/empty.ll. So on AIX, I think the debug lines should be always generated by compiler in the assembly file.

Does anyone care if we preserve it or drop it?

Hmm, at least DBX cares about the zero-length ranges since it assumes the addresses in the debug line entries are always incremental. And I also tested on GDB, seems no behavior change with/without this patch. Since this patch does some small optimization in the debug line section, so I think we should use this patch for all debuggers?

Thanks for review @dblaikie . Hope I am understanding your concerns right : )

do other assemblers reproduce this information?

I agree with you that some assembler, like GAS on Linux will generate zero-length ranges in the line table if no instructions between two .loc. I guess this can also be improved in these producers like what we did here in compiler?

My concern is that we should possibly have a more general solution in the compiler - this patch is touching specific prologue handling code, but I think we should fix this generally, so that we never produce zero length entries no matter where they come from. (with the exception that we /might/ need to preserve them when we're assembling from user assembly for consistency with GAS, for instance - maybe someone's relying on it - but for our own internal uses (when using LLVM's integrated assembler in... an integrated manner) we should skip zero length entries - or, maybe, even at a higher level - like we should ensure we don't emit zero length .loc directives even if we're emitting assembly - but again, I'd hope that would be a general solution, not finding the spots (like prologue emission) where we happen to be creating zero length entries today - that the solution should be more general/robust/lower level)

And on AIX where this patch is motivated, the system assembler does not have pseudo like .loc, it can only get debug line info in the .dwline section, see case llvm/test/DebugInfo/XCOFF/empty.ll. So on AIX, I think the debug lines should be always generated by compiler in the assembly file.

I don't understand this, sorry. You mean on AIX the compiler produces the line table directly with the .byte directives, etc?Not sure how that relates to this issue..

Does anyone care if we preserve it or drop it?

Hmm, at least DBX cares about the zero-length ranges since it assumes the addresses in the debug line entries are always incremental. And I also tested on GDB, seems no behavior change with/without this patch. Since this patch does some small optimization in the debug line section, so I think we should use this patch for all debuggers?

Yeah, I'm good with it being used for all tools/debuggers - but I think the solution should be more general, especially if your debugger has real trouble with these zero length entries - fixing each case of zero length entries as they're found seems not good, and we should have a general solution so it's not possible to create these zero length entries.

Yeah, I'm good with it being used for all tools/debuggers - but I think the solution should be more general, especially if your debugger has real trouble with these zero length entries - fixing each case of zero length entries as they're found seems not good, and we should have a general solution so it's not possible to create these zero length entries.

Agreed. First time when I got this "DBX assumes the addresses in the debug line entries are always incremental", I was thinking the same thing, there may be other cases where clang generates two debug line entries with same address especially when optimization is enabled. At that time I was thinking MCDwarfLineTable::emitOne() should be able to optimize the debug lines if there are two adjacent lines with same address. Seems this is not the case?

but for our own internal uses (when using LLVM's integrated assembler in... an integrated manner) we should skip zero length entries - or, maybe, even at a higher level - like we should ensure we don't emit zero length .loc directives even if we're emitting assembly - but again, I'd hope that would be a general solution, not finding the spots (like prologue emission) where we happen to be creating zero length entries today - that the solution should be more general/robust/lower level)

I think for the motivated case on AIX, we need this to be in the higher level, i.e. in the place where we generate the .loc directives. AIX for now uses non integrated-assembler mode by default, so fixing this inside integrated assembler will not make our issue gone. I will try to do by this way.

Thanks for your good suggestion. @dblaikie

I think for the motivated case on AIX, we need this to be in the higher level, i.e. in the place where we generate the .loc directives. AIX for now uses non integrated-assembler mode by default, so fixing this inside integrated assembler will not make our issue gone.

In that case, maybe in depth would be to fix the AIX assembler too, if it's producing invalid output for the platform? But, yeah, if that's out of scope and/or to fix this as robustly as possible, would be good to get this fixed in LLVM at the assembly emission (somewhere generic for all MCStreamer (ie: above MCAsmStreamer/MCObjectStreamer, but below any user of MCStreamer - so presumably that puts the fix somewhere /in/ MCStreamer, generally speaking (maybe some side utilities, etc, I haven't looked at the API in detail))

Hi @dblaikie , in the MCStreamer when debug line is handled in DwarfDebug::beginInstruction(), it already ignores the case for zero length entries, see https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2020-L2024 . DwarfDebug::beginInstruction() is called for each instruction and if the instruction is not a "real" instruction on the platform(which means there will be zero length entry in the debug line section), no debug line entry will be added. If the instruction is a "real" instruction, then the address must be changed. So, seems emitInitialLocDirective() is the only place where zero length entry may be generated. What do you think? Thank you!

Added a simple assert in MCObjectStreamer::emitDwarfAdvanceLineAddr() for the absolute address diffs, and run a unittest check in llvm. All the failures are for assembly file (hardcode two .loc without any "real" instructions between). The ll input can not produce 0 address diff.

Added a simple assert in MCObjectStreamer::emitDwarfAdvanceLineAddr() for the absolute address diffs, and run a unittest check in llvm. All the failures are for assembly file (hardcode two .loc without any "real" instructions between). The ll input can not produce 0 address diff.

Hi @dblaikie , in the MCStreamer when debug line is handled in DwarfDebug::beginInstruction(), it already ignores the case for zero length entries, see https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2020-L2024 . DwarfDebug::beginInstruction() is called for each instruction and if the instruction is not a "real" instruction on the platform(which means there will be zero length entry in the debug line section), no debug line entry will be added. If the instruction is a "real" instruction, then the address must be changed. So, seems emitInitialLocDirective() is the only place where zero length entry may be generated. What do you think? Thank you!

I'd still prefer this to be at a lower layer - because I worry we'll end up missing cases (we have prologue and DwarfDebug so far - maybe we'll end up with a case in some other situation like epilogues, or some other "special case")
Like maybe MCObjectStreamer::emitDwarfADvanceLineAddr() could be a non-virtual wrapper up in MCStreamer that delegates to the virtual one (asm or object) and handles the zero-length skipping, probably under a flag (if we need to preserve the behavior/opt-in to the behavior when we're assembling - though I'm not 100% sure we need to bother with that degree of compatibility either - I think we'd be doing the DWARF community a service by removing them earlier and removing the confusion/debate they create later on - but that opinion might be too strong/opinionated)

Hi @dblaikie , for now there are two kinds of debug line generations for non-integrated mode.

In MCAsmStreamer::emitDwarfLocDirective():

  • if MAI->usesDwarfFileAndLocDirectives() is false, for AIX XCOFF(AIX assembly does not have .loc and .file pseudos), the debug line entries will be collected to a container, same way with integrated-as mode does, and later in finishImpl(), the lower level function like MCDwarfLineTable::emitOne() will handle all collected debug line entries. I think there should be opportunity to handle zero-length skipping in MCDwarfLineTable::emitOne().
  • if MAI->usesDwarfFileAndLocDirectives() is true, for most other target, the .loc will be directly emitted to the output assembly file. So if we want to handle zero-length skipping for this case, we must do it before the .loc is emitted.

Due to this, handling the zero-length skipping in the higher level, when recordSourceLine() is called for each instruction and prologue, seems be consistent for all cases. So DwarfDebug::beginInstruction() seems like a right place. What do you think? Thanks!

Due to this, handling the zero-length skipping in the higher level, when recordSourceLine() is called for each instruction and prologue, seems be consistent for all cases. So DwarfDebug::beginInstruction() seems like a right place.

Could we do it down in recordSourceLine? Somewhere it can be done once and handle both the beginInstruction and emitInitialLocDirective cases in a single place, so there's a better chance that these things don't get out of sync, or a 3rd place doesn't cause the same problem later on.

(Ideally, maybe, even lower in the MCStreamer - but above either the MCAsm or MCObjectStreamer (so there'd be some non-virtual MCStreamer function that keeps track of this/skips the duplicate entry, then delegates to MCAsmStreamer/MCOBjectStreamer) - though not sure there's a convenient way to track whether the entry is zero-length or not there)

Like maybe recordSourceLine raises a flag and beginInstruction checks the flag, and if the flag is raised, it calls Asm.OutStreamer->emitDwarfLocDirective and lowers the flag. Then any redundant line directives aren't emitted, they are just overwritten.

shchenz added a comment.EditedApr 26 2023, 7:53 AM
Like maybe recordSourceLine raises a flag and beginInstruction checks the flag, and if the flag is raised, it calls Asm.OutStreamer->emitDwarfLocDirective and lowers the flag. Then any redundant line directives aren't emitted, they are just overwritten.

Hi @dblaikie Sorry, I don't quite understand the method you suggested. Could you please share more details? And I am also sorry for the long comment...

I also tried a way in recordSourceLine() (See the code snippet at last), it can handle empty prologue case, but it can not handle non-empty prologue case now and I can not find a better way to fix this.

The truncated input for non-empty prologue case:

; Function Attrs: noinline nounwind optnone uwtable
define dso_local signext i32 @main() #0 !dbg !10 {
entry:
  %retval = alloca i32, align 4  ;; prologue
  store i32 0, ptr %retval, align 4 ;;prologue
  ret i32 0, !dbg !15
}
main:                                   # @main
.Lfunc_begin0:
	.file	0 "empty_prologue" "1.c" md5 0xd1035b543709f23158df094b6d49742b
	.cfi_startproc
# %bb.0:                                # %entry
	li 3, 0
	stw 3, -12(1)
.Ltmp0:
	.loc	0 2 0                           # 1.c:2:0     ######now the first line entry of the file is put after prologue instructions.
	.loc	0 3 3 prologue_end              # 1.c:3:3
	li 3, 0
	blr

The issue is for assembly code generation, we must emit the .loc once the recordSourceLine() is called, we can not delay it otherwise the .loc will be emitted in the wrong place like above case shows. AsmPrinter::emitFunctionBody() emits one instruction in each iteration while it iterates the instructions in a BB. The debug handler DwarfDebug::beginInstruction() must keep pace with AsmPrinter::emitFunctionBody(). Otherwise the .loc can not be emitted in the right place. I thought we can not change how we write the assembly file, it has to be written line by line? That is, in above example, after AsmPrinter::emitFunctionBody() emits prologue instructions li and stw, when recordSourceLine() tries to emit the first .loc, it can not reset the file position to the front of prologue, right?

diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 98f24599970d..cd4dd504e1e8 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -2048,7 +2048,7 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
     if ((LastAsmLine == 0 && DL.getLine() != 0) || Flags) {
       // Reinstate the source location but not marked as a statement.
       const MDNode *Scope = DL.getScope();
-      recordSourceLine(DL.getLine(), DL.getCol(), Scope, Flags);
+      recordSourceLine(DL.getLine(), DL.getCol(), Scope, Flags, MI);
     }
     return;
   }
@@ -2079,7 +2079,7 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
         Scope = PrevInstLoc.getScope();
         Column = PrevInstLoc.getCol();
       }
-      recordSourceLine(/*Line=*/0, Column, Scope, /*Flags=*/0);
+      recordSourceLine(/*Line=*/0, Column, Scope, /*Flags=*/0, MI);
     }
     return;
   }
@@ -2091,7 +2091,7 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
     return;
   if (DL == PrologEndLoc) {
     Flags |= DWARF2_FLAG_PROLOGUE_END | DWARF2_FLAG_IS_STMT;
-    PrologEndLoc = DebugLoc();
+    //PrologEndLoc = DebugLoc();
   }
   // If the line changed, we call that a new statement; unless we went to
   // line 0 and came back, in which case it is not a new statement.
@@ -2100,7 +2100,7 @@ void DwarfDebug::beginInstruction(const MachineInstr *MI) {
     Flags |= DWARF2_FLAG_IS_STMT;

   const MDNode *Scope = DL.getScope();
-  recordSourceLine(DL.getLine(), DL.getCol(), Scope, Flags);
+  recordSourceLine(DL.getLine(), DL.getCol(), Scope, Flags, MI);

   // If we're not at line 0, remember this location.
   if (DL.getLine())
@@ -2128,28 +2128,6 @@ static DebugLoc findPrologueEndLoc(const MachineFunction *MF) {
   return LineZeroLoc;
 }

-/// Register a source line with debug info. Returns the  unique label that was
-/// emitted and which provides correspondence to the source line list.
-static void recordSourceLine(AsmPrinter &Asm, unsigned Line, unsigned Col,
-                             const MDNode *S, unsigned Flags, unsigned CUID,
-                             uint16_t DwarfVersion,
-                             ArrayRef<std::unique_ptr<DwarfCompileUnit>> DCUs) {
-  StringRef Fn;
-  unsigned FileNo = 1;
-  unsigned Discriminator = 0;
-  if (auto *Scope = cast_or_null<DIScope>(S)) {
-    Fn = Scope->getFilename();
-    if (Line != 0 && DwarfVersion >= 4)
-      if (auto *LBF = dyn_cast<DILexicalBlockFile>(Scope))
-        Discriminator = LBF->getDiscriminator();
-
-    FileNo = static_cast<DwarfCompileUnit &>(*DCUs[CUID])
-                 .getOrCreateSourceID(Scope->getFile());
-  }
-  Asm.OutStreamer->emitDwarfLocDirective(FileNo, Line, Col, Flags, 0,
-                                         Discriminator, Fn);
-}
-
 DebugLoc DwarfDebug::emitInitialLocDirective(const MachineFunction &MF,
                                              unsigned CUID) {
   // Get beginning of function.
@@ -2161,8 +2139,7 @@ DebugLoc DwarfDebug::emitInitialLocDirective(const MachineFunction &MF,
     // We'd like to list the prologue as "not statements" but GDB behaves
     // poorly if we do that. Revisit this with caution/GDB (7.5+) testing.
     const DISubprogram *SP = PrologEndLoc->getInlinedAtScope()->getSubprogram();
-    ::recordSourceLine(*Asm, SP->getScopeLine(), 0, SP, DWARF2_FLAG_IS_STMT,
-                       CUID, getDwarfVersion(), getUnits());
+    recordSourceLine(SP->getScopeLine(), 0, SP, DWARF2_FLAG_IS_STMT);
     return PrologEndLoc;
   }
   return DebugLoc();
@@ -2308,13 +2285,64 @@ void DwarfDebug::endFunctionImpl(const MachineFunction *MF) {
   CurFn = nullptr;
 }

-// Register a source line with debug info. Returns the  unique label that was
+// Register a source line with debug info. Returns the unique label that was
 // emitted and which provides correspondence to the source line list.
 void DwarfDebug::recordSourceLine(unsigned Line, unsigned Col, const MDNode *S,
-                                  unsigned Flags) {
-  ::recordSourceLine(*Asm, Line, Col, S, Flags,
-                     Asm->OutStreamer->getContext().getDwarfCompileUnitID(),
-                     getDwarfVersion(), getUnits());
+                                  unsigned Flags, const MachineInstr *MI) {
+
+  auto EmitLoc = [&](unsigned Line, unsigned Col, const MDNode *S,
+                     unsigned Flags) {
+    StringRef Fn;
+    unsigned FileNo = 1;
+    unsigned Discriminator = 0;
+    ArrayRef<std::unique_ptr<DwarfCompileUnit>> DCUs = getUnits();
+    unsigned CUID = Asm->OutStreamer->getContext().getDwarfCompileUnitID();
+
+    if (auto *Scope = cast_or_null<DIScope>(S)) {
+      Fn = Scope->getFilename();
+      if (Line != 0 && getDwarfVersion() >= 4)
+        if (auto *LBF = dyn_cast<DILexicalBlockFile>(Scope))
+          Discriminator = LBF->getDiscriminator();
+
+      FileNo = static_cast<DwarfCompileUnit &>(*DCUs[CUID])
+                   .getOrCreateSourceID(Scope->getFile());
+    }
+    Asm->OutStreamer->emitDwarfLocDirective(FileNo, Line, Col, Flags, 0,
+                                            Discriminator, Fn);
+  };
+
+  // Before handling instructions, multiple debug lines may be recorded and
+  // these debug lines may have same addresses, for example the prologue is
+  // empty. Ignore these lines for now until we know what the prologue looks
+  // like.
+  if (!PrevInstLoc && !MI)
+    return;
+
+  // If the function's first instruction does not have debug loc info, prologue
+  // is not empty. Generate the line for the function start first before
+  // generating prologue end line.
+  if (!PrevInstLoc && MI && PrologEndLoc) {
+    const MachineInstr *FirstInstr =
+        [](const MachineInstr *MI) -> const MachineInstr * {
+      for (auto &BB : *MI->getParent()->getParent()) {
+        for (auto &I : BB) {
+          if (!I.isMetaInstruction())
+            return &I;
+        }
+      }
+      return nullptr;
+    }(MI);
+
+    if (FirstInstr && FirstInstr != MI) {
+      // FirstInstr must be in prologue.
+      const DISubprogram *SP =
+          PrologEndLoc->getInlinedAtScope()->getSubprogram();
+      EmitLoc(SP->getScopeLine(), 0, SP, DWARF2_FLAG_IS_STMT);
+    }
+    PrologEndLoc = DebugLoc();
+  }
+
+  EmitLoc(Line, Col, S, Flags);
 }

 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
index 0e71c216e958..fdac1134abe7 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
@@ -606,7 +606,7 @@ private:
   /// label that was emitted and which provides correspondence to the
   /// source line list.
   void recordSourceLine(unsigned Line, unsigned Col, const MDNode *Scope,
-                        unsigned Flags);
+                        unsigned Flags, const MachineInstr *MI = nullptr);

   /// Populate LexicalScope entries with variables' info.
   void collectEntityInfo(DwarfCompileUnit &TheCU, const DISubprogram *SP,
Like maybe recordSourceLine raises a flag and beginInstruction checks the flag, and if the flag is raised, it calls Asm.OutStreamer->emitDwarfLocDirective and lowers the flag. Then any redundant line directives aren't emitted, they are just overwritten.

Hi @dblaikie Sorry, I don't quite understand the method you suggested. Could you please share more details? And I am also sorry for the long comment...

I also tried a way in recordSourceLine() (See the code snippet at last), it can handle empty prologue case, but it can not handle non-empty prologue case now and I can not find a better way to fix this.

The truncated input for non-empty prologue case:

; Function Attrs: noinline nounwind optnone uwtable
define dso_local signext i32 @main() #0 !dbg !10 {
entry:
  %retval = alloca i32, align 4  ;; prologue
  store i32 0, ptr %retval, align 4 ;;prologue
  ret i32 0, !dbg !15
}
main:                                   # @main
.Lfunc_begin0:
	.file	0 "empty_prologue" "1.c" md5 0xd1035b543709f23158df094b6d49742b
	.cfi_startproc
# %bb.0:                                # %entry
	li 3, 0
	stw 3, -12(1)
.Ltmp0:
	.loc	0 2 0                           # 1.c:2:0     ######now the first line entry of the file is put after prologue instructions.
	.loc	0 3 3 prologue_end              # 1.c:3:3
	li 3, 0
	blr

The issue is for assembly code generation, we must emit the .loc once the recordSourceLine() is called, we can not delay it otherwise the .loc will be emitted in the wrong place like above case shows. AsmPrinter::emitFunctionBody() emits one instruction in each iteration while it iterates the instructions in a BB. The debug handler DwarfDebug::beginInstruction() must keep pace with AsmPrinter::emitFunctionBody(). Otherwise the .loc can not be emitted in the right place. I thought we can not change how we write the assembly file, it has to be written line by line? That is, in above example, after AsmPrinter::emitFunctionBody() emits prologue instructions li and stw, when recordSourceLine() tries to emit the first .loc, it can not reset the file position to the front of prologue, right?

Right, I think the solution would have to coordinate with changes to DwarfDebug::beginInstruction per my comment here:

Like maybe recordSourceLine raises a flag and beginInstruction checks the flag, and if the flag is raised, it calls Asm.OutStreamer->emitDwarfLocDirective and lowers the flag. Then any redundant line directives aren't emitted, they are just overwritten.

So recordSourceLine wouldn't call emitDwarfLocDirective - it would only set some members to represent the line info.
Then DwarfDebug::beginInstruction would check if the line info was present - if it's present, it would call emitDwarfLocDirective and then zero/null/nullopt the line info, so it wouldn't get re-emitted on the next instruction.

That way if two recordSourceLine calls happen with no instruction between them, no emitDwarfLocDirective is called - but on the next insntruction that's emitted, the source line info would be emitted at that point.

recordSourceLine() {
  this->lineInfo = {line, file};
}
beginInstruction() {
  if (this->lineInfo) {
    emitDwarfLocDirective(this->lineInfo)
    this->lineInfo = std::nullopt;
  }
  ...
}

Basically that. (very rough on the code - I know it's more than just file/line, etc... but a rough idea at least)

That way if two recordSourceLine calls happen with no instruction between them, no emitDwarfLocDirective is called - but on the next insntruction that's emitted, the source line info would be emitted at that point.

recordSourceLine() {
  this->lineInfo = {line, file};
}
beginInstruction() {
  if (this->lineInfo) {
    emitDwarfLocDirective(this->lineInfo)
    this->lineInfo = std::nullopt;
  }
  ...
}

Thanks! I know your point now. This should be able to handle duplicated lines for the empty prologue case. But seems it also has same issue for the case when generating loc for instructions. (Currently, when starting to handle instructions, recordSourceLine is always called by a new instruction emitting, so there are always instructions between two recordSourceLine() calls in beginInstruction()). For example:

Inst1;  // line1
Inst2;  // line2

When handle Inst1, in the DwarfDebug::beginInstruction(), the line info for Inst1 is cached to this->lineInfo() in recordSourceLine() and then Inst1 is emitted to assembly file. This delay will cause issue, loc for Inst1 is supposed to be in front of Inst1. Even at the very beginning of beginInstruction() for Instr2, we call emitDwarfLocDirective() for Inst1, it is still put the .loc after Inst1.

dblaikie accepted this revision.May 1 2023, 5:53 PM

After talking it through (thanks for your patience with this) I guess this is OK - the solution I ended up at below still involves fairly specific prologue handling anyway... though it doesn't explicitly have to detect if the prologue is empty and maybe reduces the explicit prologue handling in general too, but is maybe more bother than it's worth - if you want to have a go at that refactoring, it might be nice, but I won't hold up this patch on that work. Maybe I'll have a go at it myself if you don't get to it, after this patch is committed.

That way if two recordSourceLine calls happen with no instruction between them, no emitDwarfLocDirective is called - but on the next insntruction that's emitted, the source line info would be emitted at that point.

recordSourceLine() {
  this->lineInfo = {line, file};
}
beginInstruction() {
  if (this->lineInfo) {
    emitDwarfLocDirective(this->lineInfo)
    this->lineInfo = std::nullopt;
  }
  ...
}

Thanks! I know your point now. This should be able to handle duplicated lines for the empty prologue case. But seems it also has same issue for the case when generating loc for instructions. (Currently, when starting to handle instructions, recordSourceLine is always called by a new instruction emitting, so there are always instructions between two recordSourceLine() calls in beginInstruction()). For example:

Inst1;  // line1
Inst2;  // line2

When handle Inst1, in the DwarfDebug::beginInstruction(), the line info for Inst1 is cached to this->lineInfo() in recordSourceLine() and then Inst1 is emitted to assembly file. This delay will cause issue, loc for Inst1 is supposed to be in front of Inst1. Even at the very beginning of beginInstruction() for Instr2, we call emitDwarfLocDirective() for Inst1, it is still put the .loc after Inst1.

Ah, right, sorry. I'll try to clarify further.

Maybe a more accurate pseudocode would be this:

beginInstruction() {
  if (this->InPrologue) {
    this->InPrologue = IsInPrologue(~= is frame setup and doesn't have a debug location);
    if (this->InPrologue) {
      if (!MetaInstruction) {
        this->PrologueLocationEmitted = true;
        apply the location we use for the prologue
      }
    }
  }
  ...
}

So I guess it comes out to still being a special case for the prologue, unfortunately... and this ^ solution could rewrite the prologue handling code such that it doesn't need a specific walk ahead of time, the end of the prologue would be detected during the instruction walk. But requires an extra member variable/more mutable state. Not sure if it's substantially better.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
2117–2130

Perhaps this'd be written as:

if (MetaInstruction) {
  if (!FrameSetup &&DebugLoc) {
    ...
  }
  IsEmptyPrologue = false;
}
This revision is now accepted and ready to land.May 1 2023, 5:53 PM
This revision was landed with ongoing or failed builds.May 3 2023, 9:37 PM
This revision was automatically updated to reflect the committed changes.
shchenz marked an inline comment as done.

if you want to have a go at that refactoring, it might be nice, but I won't hold up this patch on that work. Maybe I'll have a go at it myself if you don't get to it, after this patch is committed.

Thanks for your good comments @dblaikie . I can sure do this refactor in following patch. Just that I was assigned with some high priority issues recently, so I might do this in later time.

dyung added a subscriber: dyung.May 3 2023, 11:36 PM

@shchenz your change seems to be causing 8 test failures on a build bot, can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/247/builds/4125

@shchenz your change seems to be causing 8 test failures on a build bot, can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/247/builds/4125

I think I know what's wrong here. Will post a fix soon.

@shchenz your change seems to be causing 8 test failures on a build bot, can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/247/builds/4125

I think I know what's wrong here. Will post a fix soon.

The failure should be caused by the line number changes in the compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp, specifically in line:

// CHECK-NEXT: function.cpp:[[@LINE-11]]: note: f() defined here

But the failure seems does not happen on PPC linux. I can not find a machine to reproduce this issue, could someone who can reproduce this failure help me to fix the case? @dyung ? Thank you!

Hello and good morning from the UK,

My thanks to @shchenz for acknowledging the concerns of @dyung . It's unfortunate that we're unable to reproduce and get a fix it in sooner but I'm afraid I'm going to have to revert this change for now until the author can figure out a fix for the following buildbot failure:

https://lab.llvm.org/buildbot/#/builders/247/builds/4125

The bot has been failing for five hours now and we need to get it green again.

My apologies,
Tom W

https://reviews.llvm.org/rG6a808270e8a9becc07857240ebb

has also been reverted as the added test relies on the behaviour laid out in this change.

After digging deeper, this is not just a test case issue. Seems the Diag object in ubsan_diag.h can not get the line info for function f() in the test compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp now. Not sure if it is an issue related to llvm-symbolizer after the line table is changed.

on a sidenote, it's good to see this old dragon is finally being slain! @shchenz , I agree that the issue of eliding zero length line entries has much broader scope than just prologues in line with blaikies comments, but I'm keen to see this particular issue go away in the meantime with some additional work planned down the line to fix this in the more general sense.

dyung added a comment.May 4 2023, 9:45 AM

@shchenz your change seems to be causing 8 test failures on a build bot, can you take a look and revert if you need time to investigate?

https://lab.llvm.org/buildbot/#/builders/247/builds/4125

I think I know what's wrong here. Will post a fix soon.

The failure should be caused by the line number changes in the compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp, specifically in line:

// CHECK-NEXT: function.cpp:[[@LINE-11]]: note: f() defined here

But the failure seems does not happen on PPC linux. I can not find a machine to reproduce this issue, could someone who can reproduce this failure help me to fix the case? @dyung ? Thank you!

Sorry, was away from my computer. @shchenz do you still need help reproducing the failures?

This comment was removed by shchenz.
uabelho added a subscriber: uabelho.May 4 2023, 9:43 PM
shchenz updated this revision to Diff 519806.May 5 2023, 4:39 AM

fix the buildbot failure

The buildbot failure is caused by instructions are inserted into prologue after empty prologue is checked. So if instructions are inserted to a empty prologue, because we save the first loc for empty prologue, then the inserted instructions will not appear in the debug line section because they are all after the first loc. So with such dwarf debug line, llvm-symbolizer can not find the location for these prologue instructions.

shchenz reopened this revision.May 5 2023, 4:45 AM
This revision is now accepted and ready to land.May 5 2023, 4:45 AM

The buildbot failure is caused by instructions are inserted into prologue after empty prologue is checked. So if instructions are inserted to a empty prologue, because we save the first loc for empty prologue, then the inserted instructions will not appear in the debug line section because they are all after the first loc. So with such dwarf debug line, llvm-symbolizer can not find the location for these prologue instructions.

ah, good to know. Guess that favors the "lazy" approach I was suggesting, then? (since that'll work more dynamically/based on the instructions that are emitted, rather than trying to compute it ahead of time)

The buildbot failure is caused by instructions are inserted into prologue after empty prologue is checked. So if instructions are inserted to a empty prologue, because we save the first loc for empty prologue, then the inserted instructions will not appear in the debug line section because they are all after the first loc. So with such dwarf debug line, llvm-symbolizer can not find the location for these prologue instructions.

ah, good to know. Guess that favors the "lazy" approach I was suggesting, then? (since that'll work more dynamically/based on the instructions that are emitted, rather than trying to compute it ahead of time)

Hi @dblaikie Thanks for comments. I actually had same thought about check the prologue at a later time, for example, moving line 1012 ~ 1022 after line 1024 ~ 1045 . So that when checking the prologue, the instructions should already be inserted. However this does not work, because the codes inserted at line 1024 ~ 1045 are directly emitted to the final assembly/object, they are not linked to the MachineFunction. Iterating all instructions inside a function by using:

for (const auto &MBB : *MF)
  for (const auto &MI : MBB)

can not iterate through these instructions. So even in a lazy time removal(also apply to your proposal about handling in beginInstruction() + recordSourceLine() which is also driven by above iterating), we still have to explicit check in which cases instructions may be inserted to the prologue. And seems the instruction insertion at line 1024 ~ 1045 looks good, I don't see other places in AsmPrinter calls BuildMI or something like that to add instructions to MachineFunction.

The buildbot failure is caused by instructions are inserted into prologue after empty prologue is checked. So if instructions are inserted to a empty prologue, because we save the first loc for empty prologue, then the inserted instructions will not appear in the debug line section because they are all after the first loc. So with such dwarf debug line, llvm-symbolizer can not find the location for these prologue instructions.

ah, good to know. Guess that favors the "lazy" approach I was suggesting, then? (since that'll work more dynamically/based on the instructions that are emitted, rather than trying to compute it ahead of time)

Hi @dblaikie Thanks for comments. I actually had same thought about check the prologue at a later time, for example, moving line 1012 ~ 1022 after line 1024 ~ 1045 . So that when checking the prologue, the instructions should already be inserted. However this does not work, because the codes inserted at line 1024 ~ 1045 are directly emitted to the final assembly/object, they are not linked to the MachineFunction. Iterating all instructions inside a function by using:

for (const auto &MBB : *MF)
  for (const auto &MI : MBB)

can not iterate through these instructions. So even in a lazy time removal(also apply to your proposal about handling in beginInstruction() + recordSourceLine() which is also driven by above iterating), we still have to explicit check in which cases instructions may be inserted to the prologue. And seems the instruction insertion at line 1024 ~ 1045 looks good, I don't see other places in AsmPrinter calls BuildMI or something like that to add instructions to MachineFunction.

Ah :/ this again, feels to me like wanting something lower-level down in MC/MI to handle this... (what's the codepath that leads from this AsmPrinter.cpp 1024-1045 to BuildMI calls?)

Ah :/ this again, feels to me like wanting something lower-level down in MC/MI to handle this... (what's the codepath that leads from this AsmPrinter.cpp 1024-1045 to BuildMI calls?)

Oh, sorry for the confusion. AsmPrinter.cpp 1024-1045 does not call BuildMI, so the instructions added there will not be linked to the MachineFunction. My thought is if we can change lines(1024-1045) to call BuildMI() like other passes do, the new instructions will be part of MachineFunction, so lazy methods (which iterates all instructions in a MachineFunction with MI-MBB-MF) will know these prologue instructions. But seems calling BuildMI in AsmPrinter is not a right way. So we have to explicitly check the cases where instructions are inserted to prologue...

Do you feel OK I commit the current patch? Or you would post a new fix in near future? Thanks. @dblaikie

dblaikie accepted this revision.May 9 2023, 1:53 PM

Ah :/ this again, feels to me like wanting something lower-level down in MC/MI to handle this... (what's the codepath that leads from this AsmPrinter.cpp 1024-1045 to BuildMI calls?)

Oh, sorry for the confusion. AsmPrinter.cpp 1024-1045 does not call BuildMI, so the instructions added there will not be linked to the MachineFunction. My thought is if we can change lines(1024-1045) to call BuildMI() like other passes do, the new instructions will be part of MachineFunction, so lazy methods (which iterates all instructions in a MachineFunction with MI-MBB-MF) will know these prologue instructions. But seems calling BuildMI in AsmPrinter is not a right way. So we have to explicitly check the cases where instructions are inserted to prologue...

Do you feel OK I commit the current patch? Or you would post a new fix in near future? Thanks. @dblaikie

Ah, thanks for the context.

I guess... it all seems a bit unfortunate, but I guess we can go with this.

This revision was landed with ongoing or failed builds.May 9 2023, 6:21 PM
This revision was automatically updated to reflect the committed changes.

This appears to break Godbolt's strategy for filtering code to only include your own functions. As a simple example, https://godbolt.org/z/6cbGeqc11 only displays the ret, whereas unchecking the filter for library functions (https://godbolt.org/z/qzGv7nGKq) displays the entire function. See https://github.com/compiler-explorer/compiler-explorer/issues/5020 and https://github.com/compiler-explorer/compiler-explorer/issues/5050 for more examples.

For the code in my Godbolt link, we'd previously emit debug info like:

_Z1fOSt10shared_ptrIiE:                 # @_Z1fOSt10shared_ptrIiE
.Lfunc_begin0:
        .file   15 "llvm-bisect/inputs" "godbolt-filtering.cpp" md5 0xa1c9e6751ece14286108589bebeb1200
        .loc    15 3 0                          # llvm-bisect/inputs/godbolt-filtering.cpp:3:0
        .cfi_startproc
# %bb.0:
        #DEBUG_VALUE: f:sp <- $rdi
        #DEBUG_VALUE: operator bool:this <- $rdi
        .loc    2 1300 23 prologue_end          # /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr_base.h:1300:23
        cmpq    $0, (%rdi)
        setne   %al
.Ltmp0:
        .loc    15 4 5                          # llvm-bisect/inputs/godbolt-filtering.cpp:4:5
        retq

So it was obvious that the function was associated with user code and not library code. We now emit the following debug info:

_Z1fOSt10shared_ptrIiE:                 # @_Z1fOSt10shared_ptrIiE
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:                                # %entry
        #DEBUG_VALUE: f:sp <- $rdi
        #DEBUG_VALUE: operator bool:this <- $rdi
        .loc    2 1300 23 prologue_end          # /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr_base.h:1300:23
        cmpq    $0, (%rdi)
        setne   %al
.Ltmp0:
        .file   15 "llvm-bisect/inputs" "godbolt-filtering.cpp" md5 0xa1c9e6751ece14286108589bebeb1200
        .loc    15 4 5                          # llvm-bisect/inputs/godbolt-filtering.cpp:4:5
        retq

So the retq is the first place we see the association of the assembly with the user's input file, hence the truncated Godbolt output. How should Godbolt be handling the association of functions with user code (vs. library code) instead?

This appears to break Godbolt's strategy for filtering code to only include your own functions. As a simple example, https://godbolt.org/z/6cbGeqc11 only displays the ret, whereas unchecking the filter for library functions (https://godbolt.org/z/qzGv7nGKq) displays the entire function. See https://github.com/compiler-explorer/compiler-explorer/issues/5020 and https://github.com/compiler-explorer/compiler-explorer/issues/5050 for more examples.

For the code in my Godbolt link, we'd previously emit debug info like:

_Z1fOSt10shared_ptrIiE:                 # @_Z1fOSt10shared_ptrIiE
.Lfunc_begin0:
        .file   15 "llvm-bisect/inputs" "godbolt-filtering.cpp" md5 0xa1c9e6751ece14286108589bebeb1200
        .loc    15 3 0                          # llvm-bisect/inputs/godbolt-filtering.cpp:3:0
        .cfi_startproc
# %bb.0:
        #DEBUG_VALUE: f:sp <- $rdi
        #DEBUG_VALUE: operator bool:this <- $rdi
        .loc    2 1300 23 prologue_end          # /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr_base.h:1300:23
        cmpq    $0, (%rdi)
        setne   %al
.Ltmp0:
        .loc    15 4 5                          # llvm-bisect/inputs/godbolt-filtering.cpp:4:5
        retq

So it was obvious that the function was associated with user code and not library code. We now emit the following debug info:

_Z1fOSt10shared_ptrIiE:                 # @_Z1fOSt10shared_ptrIiE
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:                                # %entry
        #DEBUG_VALUE: f:sp <- $rdi
        #DEBUG_VALUE: operator bool:this <- $rdi
        .loc    2 1300 23 prologue_end          # /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/shared_ptr_base.h:1300:23
        cmpq    $0, (%rdi)
        setne   %al
.Ltmp0:
        .file   15 "llvm-bisect/inputs" "godbolt-filtering.cpp" md5 0xa1c9e6751ece14286108589bebeb1200
        .loc    15 4 5                          # llvm-bisect/inputs/godbolt-filtering.cpp:4:5
        retq

So the retq is the first place we see the association of the assembly with the user's input file, hence the truncated Godbolt output. How should Godbolt be handling the association of functions with user code (vs. library code) instead?

Better to discuss this here, or on one of those bugs?

In any case, what's godbolt doing - scanning the .debug_line table only? & checking the first entry related to the function symbol/label? If it could scan through the whole function's line table and check if any of the code came from the user's code, that might be an option? (though it could still end up detecting library functions - like templates that are instantiated with user defined types that then get inlined)

If you wanted to reliably detect functions defined in the user's source code, you'd have to use more than just the .debug_line section, you'd have to use the .debug_info section - Symbolizing just the starting address, then iterating up any stack frames you get (in case you get more than one, which would happen if the first instruction were inlined, for instance) and checking the function name/the file it's declared in is the file you're interested in.

Thanks for taking a look! I'm just a messenger here and don't know the details of Godbolt's logic. If you could comment on https://github.com/compiler-explorer/compiler-explorer/issues/5020 that'd be really appreciated.

@dblaikie Thanks for the explanation.