This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] getMergedLocation: Maintain the line number if they match
ClosedPublic

Authored by jmmartinez on Oct 4 2022, 8:52 AM.

Details

Summary

getMergedLocation returns a 'line 0' DILocaiton if the two locations
being merged don't perfecly match, even if they are in the same line but
a different column.

This commit adds support to keep the line number if it matches (but only
the column differs). The merged column number is the leftmost between the
two.

For comparison,

Notice that gcc seems to use the location of "a" in both cases, while Clang return a line 0 location even if both locations are on the same line.

Should Clang give the correct line number but with column 0? or the left-most? or the line 0 location is in fact more accurate? And should it use the inlined-at scope+line+column in the case it matches?

I have yet to read the source-code of gcc to understand exactly how it works.

Any advice on the subject is welcomed.

Diff Detail

Event Timeline

jmmartinez created this revision.Oct 4 2022, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 8:52 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
jmmartinez edited the summary of this revision. (Show Details)Oct 4 2022, 8:58 AM
jmmartinez retitled this revision from [DebugInfo] getMergedLocation: Maintain the line number if they match to [Draft][DebugInfo] getMergedLocation: Maintain the line number if they match.
jmmartinez added reviewers: probinson, dblaikie.
jmmartinez added a project: debug-info.
jmmartinez retitled this revision from [Draft][DebugInfo] getMergedLocation: Maintain the line number if they match to [DebugInfo] getMergedLocation: Maintain the line number if they match.Oct 4 2022, 9:01 AM
jmmartinez edited the summary of this revision. (Show Details)Oct 4 2022, 9:06 AM
jmmartinez edited the summary of this revision. (Show Details)
jmmartinez edited the summary of this revision. (Show Details)
jmmartinez edited the summary of this revision. (Show Details)Oct 4 2022, 9:10 AM
jmmartinez edited the summary of this revision. (Show Details)Oct 4 2022, 9:13 AM
jmmartinez updated this revision to Diff 465043.Oct 4 2022, 9:17 AM

Remove from drafts

jmmartinez published this revision for review.Oct 4 2022, 9:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 9:23 AM

Could we use zero column instead of the leftmost - less chance of misleading someone, perhaps?

I'm cautious about adding a one-off iterator abstraction as is done here - meeting iterator requirements is non-trivial (eg: returning values from op* is questionable - since then to the iterator requirements you also need op-> which can't return by value... ). So might be best sticking with the existing code style of an explicit loop.

jmmartinez updated this revision to Diff 465706.Oct 6 2022, 5:53 AM
  • Remove iterators
  • Use column 0 instead of leftmost-column
jmmartinez added inline comments.Oct 6 2022, 5:57 AM
llvm/lib/IR/DebugInfoMetadata.cpp
108

Style: Remove the brackets

dblaikie added inline comments.Oct 6 2022, 3:47 PM
llvm/lib/IR/DebugInfoMetadata.cpp
122

Could you walk me through why the line/col get set to zero at this point? I'd maybe have thought they should get set to the line/col of the InlinedAt location?

jmmartinez updated this revision to Diff 465999.Oct 7 2022, 1:23 AM
  • Keep the line and column number from the inlined at instead of setting it to 0
jmmartinez added inline comments.Oct 7 2022, 1:27 AM
llvm/lib/IR/DebugInfoMetadata.cpp
122

It was my mistake. Your comment made me have second thoughts about that change.

I added some extra unit-tests for the cases where the InlinedAt line/column is used.

Looks pretty reasonable to me.

Got a rough sense of how this changes debug info size (especially .debug_line growth) - maybe on a clang self-host?

Hi, sorry, I have been meaning to comment here but it's taken me a while to get to.

Use column 0 instead of leftmost-column

+1 I think this is the right move (please update the patch description / comment before you commit).

"line 0" has a special meaning - that the instruction can't be attributed a single line number (e.g. because the instruction represents some merged instruction). I don't think that is actually mentioned anywhere in the DWARF spec but is more of a convention between DWARF emitters and consumers. Does column 0 convey the same meaning I wonder? I guess if your debugger uses column info and points at the start of the line (column 0) then that is an intuitive indication that column number isn't known, even if it's not explicit.

I think this kind of change might be worth documenting somewhere (release notes? I'm not sure how they work), given that it could reasonably affect DWARF consuming tools, @dblaikie wdyt?

"line 0" has a special meaning - that the instruction can't be attributed a single line number (e.g. because the instruction represents some merged instruction). I don't think that is actually mentioned anywhere in the DWARF spec but is more of a convention between DWARF emitters and consumers. Does column 0 convey the same meaning I wonder? I guess if your debugger uses column info and points at the start of the line (column 0) then that is an intuitive indication that column number isn't known, even if it's not explicit.

As far as I understand it's in the DWARF standard (in section 6.2.2 https://dwarfstd.org/doc/DWARF5.pdf ):

  • line: An unsigned integer indicating a source line number. Lines are numbered beginning at 1. The compiler may emit the value 0 in cases where an instruction cannot be attributed to any source line.
  • column: An unsigned integer indicating a column number within a source line. Columns are numbered beginning at 1. The value 0 is reserved to indicate that a statement begins at the “left edge” of the line.

The behaviour for the column is slightly different for declarations (in section 2.14): The value of the DW_AT_decl_column attribute represents the source column number at which the first character of the identifier of the declared object appears. The value 0 indicates that no column has been specified.

"line 0" has a special meaning - that the instruction can't be attributed a single line number (e.g. because the instruction represents some merged instruction). I don't think that is actually mentioned anywhere in the DWARF spec but is more of a convention between DWARF emitters and consumers. Does column 0 convey the same meaning I wonder? I guess if your debugger uses column info and points at the start of the line (column 0) then that is an intuitive indication that column number isn't known, even if it's not explicit.

As far as I understand it's in the DWARF standard (in section 6.2.2 https://dwarfstd.org/doc/DWARF5.pdf ):

  • line: An unsigned integer indicating a source line number. Lines are numbered beginning at 1. The compiler may emit the value 0 in cases where an instruction cannot be attributed to any source line.
  • column: An unsigned integer indicating a column number within a source line. Columns are numbered beginning at 1. The value 0 is reserved to indicate that a statement begins at the “left edge” of the line.

The behaviour for the column is slightly different for declarations (in section 2.14): The value of the DW_AT_decl_column attribute represents the source column number at which the first character of the identifier of the declared object appears. The value 0 indicates that no column has been specified.

Oh, my mistake, thanks for finding that. In which case this change does seem to slightly bend the spec. IIUC, with your patch the compiler is trying to convey "the line is known but the column is unspecifiable", and DWARF doesn't give us a way to express that. Using column 0 does sound like a reasonable work-around to me (and looks like it should improve developer experience) but I think it's worth getting an opinion from someone who works on a DWARF-consumer. I've asked our (Sony's) debugger people what they think.

Perhaps we should seek input from an LLDB developer too (cc @JDevlieghere).

Got a rough sense of how this changes debug info size (especially .debug_line growth) - maybe on a clang self-host?

I just finished mesuring this. I compiled in Release+-gline-tables-only and RelWithDebInfo. In both cases the size difference in clang's binary was pretty small (less than 100 bytes).

dblaikie accepted this revision.Oct 10 2022, 11:35 AM

I'm generally OK with this.

Might be worth getting some feedback from folks - but I think it's probably fairly harmless (lots of code is built without column info anyway, so tools are likely fine with it) & things can be addressed in post-commit review. But also doesn't seem like it'd need to be rushed in either - so waiting for some Sony/Apple feedback for a bit seems fine - either way.

This revision is now accepted and ready to land.Oct 10 2022, 11:35 AM

@JDevlieghere Sorry to ping again, but would it be possible to get some input on this patch? I'm not sure about what could be the implications that this could have on LLDB's side.

Hi, sorry for the delay following up on this.

I just wanted to mention that it turns out that SCE debugger tuning (-gsce) supresses column info by default and that our debugger ignores it. So no objections from me on that front.

I've left some inline nits, and I think it would be a good idea to update the function's comment in llvm/IR/DebugInfoMetadata.h too. Otherwise, I'm happy, thanks!

llvm/lib/IR/DebugInfoMetadata.cpp
119

If I'm reading this right, AdvanceToParentLoc walks the entire scope chain from the source location up to file scope. Then walks the entire scope chain of the inlined-at location of the original source location up to file scope. Then, the entire scope chain of the inline-at location of that one, and so on.

Is that right? I think it could be useful to add a comment explaining the traversal.

134

IMO it would be useful to add a comment here. Something along the lines of "Traverse the scope tree of LocB looking for the innermost match with LocB" maybe?

143

nit

  • Added comments
jmmartinez marked 3 inline comments as done.Oct 25 2022, 1:08 AM
jmmartinez added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
119

Exactly. I added a comment to explain the traversal. Thanks!

Orlando accepted this revision.Oct 25 2022, 5:35 AM

There are a few outstanding review comments to be addressed:

  1. I think it would be a good idea to update the function's comment in llvm/IR/DebugInfoMetadata.h.
  2. The comments should end with a full stops (more info).
  3. Your comment Style: Remove the brackets.

Feel free to address those before pushing (i.e. IMO no need for another round of review).

LGTM with those changes.

llvm/lib/IR/DebugInfoMetadata.cpp
121

nit: full stop.

135

nit: full stop.

145

nit: full stop.

156

nit: full stop.

llvm/unittests/IR/MetadataTest.cpp
1031

nit: full stop.

1061

nit: full stop.

This revision was landed with ongoing or failed builds.Oct 25 2022, 7:13 AM
This revision was automatically updated to reflect the committed changes.
jmmartinez marked 7 inline comments as done.