This is an archive of the discontinued LLVM Phabricator instance.

Fix prologue end handling when code compiled by gcc
ClosedPublic

Authored by tberghammer on Sep 10 2015, 7:01 AM.

Details

Summary

Fix prologue end handling when code compiled by gcc

GCC don't use the is_prologue_end flag to mark the first instruction
after the prologue. Instead of it it is issuing a line table entry for
the first instruction of the prologue and one for the first instruction
after the prologue. If the size of the prologue is 0 instruction then
the 2 line entry will have the same file address.

We remove these duplicates entries as they are violating the dwarf spec
and can cause confusion in the debugger. To prevent the lost of
information about the end of prologue we should set the prologue end
flag for the line entries what are representing more then 1 entry.

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Fix prologue end handling when code compiled by gcc.
tberghammer updated this object.
tberghammer added a reviewer: clayborg.
tberghammer added a subscriber: lldb-commits.
clayborg requested changes to this revision.Sep 10 2015, 10:46 AM
clayborg edited edge metadata.

see inlined comments.

source/Symbol/LineTable.cpp
107–117 ↗(On Diff #34435)

I am not sure I like this solution. Now if we ever have two line entries with the same address we will automatically mark the item as the prologue end? This seems like a hack and it will mark all sorts of line entries as being "is_prologue_end = true" all throughout the line table.

This revision now requires changes to proceed.Sep 10 2015, 10:46 AM
tberghammer added inline comments.Sep 10 2015, 1:18 PM
source/Symbol/LineTable.cpp
107–117 ↗(On Diff #34435)

I agree that this is a hack to work around a bug in gcc (and I don't really like it either), but there is one thing to consider based on the comment in line 100-105.

  • It is invalid dwarf if we have 2 line entries with the same file address, so if the compiler is correct, then this code path will never be activated. Next to the issue with generating duplicate entries for functions without prologue I never seen any case where we have 2 line entry for the same address (which line would we show to the user in that case?)

If we don't want to use this (hackish) solution then the other option is to not remove duplicate line entries from the code as we use the 2nd line entry as the beginning of the prologue when there is no is_prologue_end flag in any of the entries. It will result in a cleaner solution for this specific issue (what is working around a gcc bug), but gave up the 1 to 1 mapping what we want to keep based on the comment in line 100-105 *it was added by rL211212)

Hi Greg, what do you think about my inline suggestion? Are you fine with removing the original hack about removing duplicate entries from the line table end then solve the problem around duplicate line entries with always returning the last entry if we have multiple line entries for the same address?

Maybe we can try still removing duplicates, but remembering the first index where we had a duplicate line entry. If we don't get a prologue end, then we got back to the index we remembered for the first duplicate and if it is valid, modify that entry to say "prologue_end = true"?

One other questions for clarification: Is GCC emitting prologue_end, but only emitting it on the first line entry? And then we overrwrite it with the second and remove the prologue_end, or does GCC just plain not emit prologue_end? If so, what happens when we have a line table that doesn't have two entries for the prologue with the same address? Do we just not have a prologue_end in a sequence in that case?

Maybe we can try still removing duplicates, but remembering the first index where we had a duplicate line entry. If we don't get a prologue end, then we got back to the index we remembered for the first duplicate and if it is valid, modify that entry to say "prologue_end = true"?

Remembering to the first duplicate entry isn't really possible because a line table covers several functions and we need the prologe_end marker for each functions. If we want to go in this direction then we have to couple the line table with the function ranges (including the function ranges for inline functions) what I am pretty sure we want to avoid. It would cause significant performance hit because it would require a full dwarf parsing.

One other questions for clarification: Is GCC emitting prologue_end, but only emitting it on the first line entry? And then we overrwrite it with the second and remove the prologue_end, or does GCC just plain not emit prologue_end? If so, what happens when we have a line table that doesn't have two entries for the prologue with the same address? Do we just not have a prologue_end in a sequence in that case?

I never seen GCC emitting prologue_end marker in any architecture I tested and based on some online threads I am pretty sure it is the case for all architecture. It emits 1 line entry for the first address of the function and then an other line entry for the first non prologue instruction of the function.

Blech... Ok, one more try: does GCC always emit the same line and file with the same address? If so we could do:

{
    // GCC don't use the is_prologue_end flag to mark the first instruction after the prologue.
    // Instead of it it is issueing a line table entry for the first instruction of the prologue
    // and one for the first instruction after the prologue. If the size of the prologue is 0
    // instruction then the 2 line entry will have the same file address. Removing it will remove
    // our ability to properly detect the location of the end of prologe so we set the prologue_end
    // flag to preserve this information (setting the prologue_end flag for an entry what is after
    // the prologue end don't have any effect)
    entry.is_prologue_end = entry.file == entries.back().file && entry.line == entries.back().line;
    entries.back() = entry;
}

Otherwise we could settle on just making sure the file is the same:

{
    // GCC don't use the is_prologue_end flag to mark the first instruction after the prologue.
    // Instead of it it is issueing a line table entry for the first instruction of the prologue
    // and one for the first instruction after the prologue. If the size of the prologue is 0
    // instruction then the 2 line entry will have the same file address. Removing it will remove
    // our ability to properly detect the location of the end of prologe so we set the prologue_end
    // flag to preserve this information (setting the prologue_end flag for an entry what is after
    // the prologue end don't have any effect)
    entry.is_prologue_end = entry.file == entries.back().file;
    entries.back() = entry;
}

The line number is not always the same for the 2 entry (the first one points to the open '{' and the second one to the first instruction of the function). I added the check for the file name, but I would be quite surprised if we find a scenario when the file names doesn't match (it would imply some LTO what will kill most of the debugging experience anyway)

This revision was automatically updated to reflect the committed changes.

The point of the file name check is to catch the case where you had nested inlines that share the same address, and the compiler (errantly, but...) decided to emit duplicate entries at the same address for the two levels of inlining. The inlining could of course be from the current file, but at least at -O0 it is much more common that they will come from somewhere else (like std::whatever)

I see. Thank you for the clarification