Page MenuHomePhabricator

[ELF] - Teach LLD to report line numbers for data symbols.

Authored by grimar on Oct 10 2017, 5:07 AM.



Currently LLD is unable to report line number when reporting duplicate
declaration of some variable.

That happens because for extracting line information we always use .debug_line section
content which describes mapping from machine instructions to source file locations,
what does not help for variables as does not describe them.

In this patch I am taking the approproate information about data symbol locations
from the .debug_info section.

It is PR34826.

Diff Detail


Event Timeline

grimar created this revision.Oct 10 2017, 5:07 AM
dblaikie added inline comments.Oct 10 2017, 7:26 AM
91 ↗(On Diff #118352)

Why are the children being visited here? the 'dies' collection in the outer loop should visit all the DIEs, right?

grimar updated this revision to Diff 118395.Oct 10 2017, 8:05 AM
  • Fixed issue found by David.
91 ↗(On Diff #118352)

Ah, you right, my mistake. Thanks !

grimar updated this revision to Diff 118399.Oct 10 2017, 8:26 AM
  • Removed dead method, simplified testcase.
ruiu edited edge metadata.Oct 10 2017, 12:31 PM

Some parts of lld's source code is quite hard to understand now as an accumulated result of numerous patches. I'm trying to fix it now, so please spend a bit more time too to improve code quality so that it is understandable for future readers.

99 ↗(On Diff #118399)

Skip something ... skip other thing... and now what? This block needs comment too.

109 ↗(On Diff #118399)

What does this do?

110 ↗(On Diff #118399)

Remove llvm::

281 ↗(On Diff #118399)

get -> create

(Unless it is an accessor, "get" is not a good prefix.)

281 ↗(On Diff #118399)

It needs comment.

309 ↗(On Diff #118399)

Remove llvm::.

1 ↗(On Diff #118399)

I don't think we want to have this machine-generated assembly in the test directory. It is hard to verify/edit. Can you do this with llvm-as?

45 ↗(On Diff #118399)

Please create a new test file.

grimar updated this revision to Diff 118580.Oct 11 2017, 3:27 AM
grimar marked 8 inline comments as done.
  • Addressed review comments.
1 ↗(On Diff #118399)

Problem to do that is next. There is a difference in debug information produced by gcc and clang.

clang encodes variables names using DW_FORM_strp (represents an offset in .debug_str):

0x0000001e:   DW_TAG_variable [2]  
                DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000048] = "foo")
                DW_AT_type [DW_FORM_ref4]	(cu + 0x0033 => {0x00000033})
                DW_AT_external [DW_FORM_flag_present]	(true)
                DW_AT_decl_file [DW_FORM_data1]	("/home/umb/tests/566_duplicatesym/1.c")
                DW_AT_decl_line [DW_FORM_data1]	(1)
                DW_AT_location [DW_FORM_exprloc]	(DW_OP_addr 0x0)

gcc places string in place using DW_FORM_string:

0x0000001d:   DW_TAG_variable [2]  
                 DW_AT_name [DW_FORM_string]	("foo")
                 DW_AT_decl_file [DW_FORM_data1]	("/home/umb/tests/566_duplicatesym/1.c")
                 DW_AT_decl_line [DW_FORM_data1]	(1)
                 DW_AT_type [DW_FORM_ref4]	(cu + 0x0032 => {0x00000032})
                DW_AT_external [DW_FORM_flag_present]	(true)
                 DW_AT_location [DW_FORM_exprloc]	(DW_OP_addr 0x0)

We want to support both ways, and this patch supports only first one, because our LLDDwarfObj
does not provide .debug_str section for DWARF parsers yet.

I tried to use llvm-as + llc and found no way to make output to use gcc way (DW_FORM_string for name attribute).
(used clang -S -emit-llvm -g to produce IR). I am not sure is it possible ? Seems - not.
I think clang has no flags to drop use of .debug_str.

I am going to implement support for DW_FORM_strp (clang way) after this patch be landed, but we need testcases
for both ways I believe. For following patch I can use llvm-as.

I updated this testcase so it should be more readable. It also should be no need to edit it in future I think.

ruiu added inline comments.Oct 11 2017, 11:21 AM
1 ↗(On Diff #118399)

I'm confused. You are saying that this patch supports only the first one (which is DW_FORM_strp and that's what clang generates), but you are also saying that you will implement a DW_FORM_strp support after this patch. Which is correct?

Sorry, I wanted to say "We want to support both ways, and this patch supports only second one, because ...", so it supports DW_FORM_string.

grimar updated this revision to Diff 119656.Oct 20 2017, 6:58 AM


  • Rebased.
  • Added detailed comments for DWARF data in testcase.
ruiu added inline comments.Oct 25 2017, 4:35 PM
90 ↗(On Diff #119656)

Die(CU, &Entry)

120 ↗(On Diff #119656)

Data doesn't feel quite right -- for example, an unnamed region of a section contains data, but that wasn't something you are thinking about when you wrote this function. Variable is better.

123 ↗(On Diff #119656)

I don't get the meaning of this comment.

261–265 ↗(On Diff #119656)

This comment is not quite right, because it returns a string similar to the example only when you give a pathname and a line number.

Your comment gives a false impression that it always return a string consisting with a pathname and a line number. That is not true, because it is not really a property of this function but a property of a caller of this function. What this function does is just blindly concatenating strings.

274–275 ↗(On Diff #119656)

What is "this", and what do you mean by "normally"?

I'd keep the original comment

// This function is intended to be used for constructing an error message. 
// The returned message looks like this:
//   foo.c:42 (/home/alice/possibly/very/long/path/foo.c:42)
//  Returns an empty string if there's no way to get line info.
grimar updated this revision to Diff 120399.Oct 26 2017, 6:01 AM
grimar marked 5 inline comments as done.
  • Addressed review comments.
90 ↗(On Diff #119656)


123 ↗(On Diff #119656)

It is the same as:

We always have one compilation unit here, so offset is always 0.
I supposed it is fine because we have equal comment here:

I extended this comment.

261–265 ↗(On Diff #119656)

I see. Rewrote the comment.

274–275 ↗(On Diff #119656)

Restored original comment.
I used word "normally" because when no debug information is available, it tries
to take information about location from STT_FILE symbol.
So if I would omit "normally", it would be not entirely correct.

ruiu added inline comments.Oct 26 2017, 1:52 PM
123 ↗(On Diff #119656)

The have -> There is always only one

204 ↗(On Diff #120399)

Omit "Data" as DWARF is always data.

220 ↗(On Diff #120399)


264–265 ↗(On Diff #120399)

Function -> "A function" or "this function"

But maybe it is better to omit the subject.

// Concatenates arguments to construct a string representing an error location.

289 ↗(On Diff #120399)

association between location in sourcef iles and machine instructions


table from offsets to source locations

290–291 ↗(On Diff #120399)

For variables, we have to use .debug_info instead.

292 ↗(On Diff #120399)

Are you sure that functions would never have type STT_OBJECT?

grimar updated this revision to Diff 120585.Oct 27 2017, 6:31 AM
grimar marked 7 inline comments as done.
  • Addressed review comments.
292 ↗(On Diff #120399)

It's interesting. Specification says functions and other executable code should be of type STT_FUNC:
So at least it does not seem correct.

At the same time it looks possible to do that:

.section .text.fn,"ax",@progbits,unique,0
.globl fn
.type fn,@object

.section .text.fn2,"ax",@progbits,unique,0
.globl fn2
.type fn2,@function
 callq fn@PLT

And you'll get

4: 0000000000400079     0 FUNC    GLOBAL DEFAULT    1 fn2
7: 0000000000400078     0 OBJECT  GLOBAL DEFAULT    1 fn

We probably can fallback to always use .debug_line:

if (Sym.Type == STT_OBJECT)
  if (Optional<std::pair<std::string, unsigned>> FileLine =
    return createFileLineMsg(FileLine->first, FileLine->second);

if (Optional<DILineInfo> Info = File->getDILineInfo(this, Offset))
  return createFileLineMsg(Info->FileName, Info->Line);

At least such behavior does not seem harmful for me.
Do you think we should ? (I am not sure because with assembly it is possible
to do many things, and particulary this one seems violates ELF spec, not sure we
should even think about supporting that until it is reported useful).

ruiu added inline comments.Oct 27 2017, 10:17 AM
292 ↗(On Diff #120399)

Well, I think what I was trying to say is that we don't need to think about it too deeply. Why don't you just attempt both? First, try to get line info as a function, and if it fails, try as a variable.

grimar updated this revision to Diff 120807.Oct 30 2017, 5:49 AM
  • Addressed review comments.
292 ↗(On Diff #120399)


ruiu accepted this revision.Oct 31 2017, 8:39 PM


87 ↗(On Diff #120807)

Loop over variable records and insert them to VariableLoc.

95 ↗(On Diff #120807)

Skip if a local variable because we don't need them for generating error messages. In general, only non-local symbols can fail to be linked.

99–100 ↗(On Diff #120807)

Get the source filename index for the variable.

105 ↗(On Diff #120807)

Get the line number on which the variable is declared.

108–109 ↗(On Diff #120807)

Get the name of the variable and add the collected information to VariableLoc. Usually Name is non-empty, but it can be empty if the input object file lacks some debug info.

287–288 ↗(On Diff #120807)

// In DWARF, functions and variables are stored to different places. First, lookup a function for a given offset.

292–293 ↗(On Diff #120807)

// If it failed, lookup again as a variable.

This revision is now accepted and ready to land.Oct 31 2017, 8:39 PM
This revision was automatically updated to reflect the committed changes.
grimar marked 7 inline comments as done.