This is an archive of the discontinued LLVM Phabricator instance.

[WIP/RFC] lld LTO drops variables in namespaces from .debug_names
Changes PlannedPublic

Authored by jankratochvil on Apr 3 2021, 5:59 PM.

Details

Summary

Adding -flto will drop variable in namespace from .debug_names:

echo 'namespace N { int varname; int func() { return varname; } } int main(){ N::func(); }'|clang++ -Wall -Werror -gdwarf-5 -gpubnames -fuse-ld=lld -flto -x c++ -;llvm-dwarfdump -debug-names

In .debug_names N::func is always present but N::varname is present only when -flto is not used.
Unfortunately I do not know much more what to do with it so posting it FYI.
The testcase should use *.ll format but then I could not make it use LLD LTO.

Diff Detail

Event Timeline

jankratochvil created this revision.Apr 3 2021, 5:59 PM
jankratochvil requested review of this revision.Apr 3 2021, 5:59 PM

Yeah, that looks like it breaks the unattached-global.ll because this causes a DW_AT_location to be added (probably with some kind of broken/empty expression) for a variable without a location/that's been optimized away.

How's the index behavior compare to GCC's debug_names behavior under similar circumstances? Or llvm's behavior with a static (file-local) variable that's similarly optimized away?

jankratochvil planned changes to this revision.EditedApr 4 2021, 3:28 PM

Yeah, that looks like it breaks the unattached-global.ll because this causes a DW_AT_location to be added (probably with some kind of broken/empty expression) for a variable without a location/that's been optimized away.

I did not expect that but IMO unattached-global.ll is wrong (since its inception in D20147):

!4 = !DIExpression(DW_OP_plus_uconst, 4)
                DW_AT_location	(DW_OP_plus_uconst 0x4)

That is not a valid DWARF expression for DW_TAG_variable. There should be an empty DWARF expression (or completely missing DW_AT_location), right?

How's the index behavior compare to GCC's debug_names behavior under similar circumstances? Or llvm's behavior with a static (file-local) variable that's similarly optimized away?

GCC cannot produce .debug_names. In GNU Toolchain .debug_names is produced by GDB. I wrote the GDB .debug_names producer+consumer but it took a year before it got accepted and then I could+did already switch to LLDB. GDB .debug_names file format itself is correct but semantically it is wrong (it is now just a dump of GDB's internal ManualDWARFIndex). I planned to semantically fix it after the format producer+consumer gets accepted. For example one needs to fix .debug_names linkage name which is now in demangled form there, it should be mangled according to DWARF-5. Currently .debug_names is not in use, GDB still uses .gdb_index - that one is also produced post-compilation by gdb-add-index and not by GCC. My future plan is to make .debug_names format of LLVM+GDB compatible so that Linux OSes do not need to carry both .gdb_index and .debug_names.

LLVM does not produce N::i for an optimized-out variable:

(set -ex;echo 'namespace N { const int i=42; } int main(void) { return N::i; }'|clang -g -gdwarf-5 -gpubnames -O2 -fuse-ld=lld -flto -x c++ -;llvm-dwarfdump -debug-info -debug-names)
      String: 0x0000002c "i"
        Tag: DW_TAG_variable
      String: 0x00000030 "N"
        Tag: DW_TAG_namespace

Which is correct because such variable has no linkage name. Unaware how DW_IDX_parent could be used for effective lookups, DWARF-5 spec does not describe it. This means N::i is missing in the index.

GDB does produce N::i but that is not mangled and therefore invalid according to the DWARF-5 spec:

(set -ex;echo 'namespace N { const int i=42; } int main(void) { return N::i; }'|g++ -g -O2 -x c++ -;gdb-add-index -dwarf-5 ./a.out;llvm-dwarfdump -debug-info -debug-names)
      String: 0x00000067 "N::i" // invalid as it should have been mangled but then this variable does not have linkage name
        Tag: DW_TAG_variable
      String: 0x00000070 "N"
        Tag: DW_TAG_typedef // that should be DW_TAG_namespace but GDB IR is insufficient for that
// missing entry for "i"

Yeah, that looks like it breaks the unattached-global.ll because this causes a DW_AT_location to be added (probably with some kind of broken/empty expression) for a variable without a location/that's been optimized away.

I did not expect that but IMO unattached-global.ll is wrong (since its inception in D20147):

!4 = !DIExpression(DW_OP_plus_uconst, 4)
                DW_AT_location	(DW_OP_plus_uconst 0x4)

That is not a valid DWARF expression for DW_TAG_variable.

Fair enough - agreed!

There should be an empty DWARF expression (or completely missing DW_AT_location), right?

Yep.

Probably good to cleanup that test in a separate preparatory commit. & maybe you or @aprantl might be interested in suring up the LLVM IR debug info verifier to catch this?

How's the index behavior compare to GCC's debug_names behavior under similar circumstances? Or llvm's behavior with a static (file-local) variable that's similarly optimized away?

GCC cannot produce .debug_names. In GNU Toolchain .debug_names is produced by GDB. I wrote the GDB .debug_names producer+consumer but it took a year before it got accepted and then I could+did already switch to LLDB. GDB .debug_names file format itself is correct but semantically it is wrong (it is now just a dump of GDB's internal ManualDWARFIndex). I planned to semantically fix it after the format producer+consumer gets accepted. For example one needs to fix .debug_names linkage name which is now in demangled form there, it should be mangled according to DWARF-5. Currently .debug_names is not in use, GDB still uses .gdb_index - that one is also produced post-compilation by gdb-add-index and not by GCC.

It's also producible from debug_gnu_pubnames/debug_gnu_pubtypes which are produced by GCC and Clang - which contains the same information, ostensibly at least.

My future plan is to make .debug_names format of LLVM+GDB compatible so that Linux OSes do not need to carry both .gdb_index and .debug_names.

Thanks for all the context/current state & certainly appreciate/look forward to that work.

LLVM does not produce N::i for an optimized-out variable:

(set -ex;echo 'namespace N { const int i=42; } int main(void) { return N::i; }'|clang -g -gdwarf-5 -gpubnames -O2 -fuse-ld=lld -flto -x c++ -;llvm-dwarfdump -debug-info -debug-names)
      String: 0x0000002c "i"
        Tag: DW_TAG_variable
      String: 0x00000030 "N"
        Tag: DW_TAG_namespace

Which is correct because such variable has no linkage name.

Not sure I follow - why would the absence/presence of a linkage name change whether the name should appear in the index? Oh, you mean 'i' does appear in the index, but its mangled name does not, OK.

Unaware how DW_IDX_parent could be used for effective lookups, DWARF-5 spec does not describe it. This means N::i is missing in the index.

I think the spec does touch on it, perhaps not especially well:

The standard attributes are:
...
Parent debugging information entry, a reference to the index entry for the parent. This is represented as the offset of the entry relative to the start of the entry pool.
...
It is possible that an indexed debugging information entry has a parent that is not indexed (for example, if its parent does not have a name attribute). In such a case, a parent attribute may point to a nameless index entry (that is, one that cannot be reached from any entry in the name table), or it may point to the nearest ancestor that does have an index entry.

So in theory the DW_IDX_parent would tell you that i is inside N. I guess if your consumer was trying to do lookup on N::i it would lookup N and i separately and for all the i you check their parent to see which i has the N as its parent.

Guessing that's not implemented in LLVM as yet.

GDB does produce N::i but that is not mangled and therefore invalid according to the DWARF-5 spec:

(set -ex;echo 'namespace N { const int i=42; } int main(void) { return N::i; }'|g++ -g -O2 -x c++ -;gdb-add-index -dwarf-5 ./a.out;llvm-dwarfdump -debug-info -debug-names)
      String: 0x00000067 "N::i" // invalid as it should have been mangled but then this variable does not have linkage name
        Tag: DW_TAG_variable
      String: 0x00000070 "N"
        Tag: DW_TAG_typedef // that should be DW_TAG_namespace but GDB IR is insufficient for that
// missing entry for "i"

Coming back to the spec I don't think the spec agrees with the claim that a DW_TAG_variable without a DW_AT_location should have an entry. The spec says:

DW_TAG_variable debugging information entries with a DW_AT_location attribute that includes a DW_OP_addr or DW_OP_form_tls_address operator are included; otherwise, they are excluded.

So it seems like the example shouldn't have an entry, according to the spec. Perhaps that's a mistake, and it should apply? I guess these names are present in the gdb_index, even when the variable doesn't have a location?

& the bit about linkage names seems to only be applied to subprograms, not variables - perhaps that's a mistake and it should apply to both.