This is an archive of the discontinued LLVM Phabricator instance.

[lldb/DWARF] Fix implementation of DW_OP_convert
Needs ReviewPublic

Authored by labath on Nov 2 2020, 5:27 AM.

Details

Summary

According to the standard the DW_OP_convert operand is an offset
relative to the current compilation unit, not to the start of the
debug_info section.

To test this, extend YAMLModuleTester with the ability to hold multiple
DWARF units.

Diff Detail

Unit TestsFailed

Event Timeline

labath created this revision.Nov 2 2020, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2020, 5:27 AM
labath requested review of this revision.Nov 2 2020, 5:27 AM
labath updated this revision to Diff 302269.Nov 2 2020, 6:42 AM

Add (test) changes I accidentally left out.

That's my understanding. It means I'm not sure (though I'm not the expert here - certainly leave final decisions up to @aprantl and @JDevlieghere ) how this will be fixed, since lldb will presumably still need to be able to parse/understand old dsyms using the incorrect absolute offsets. (we had a similar problem once with DW_OP_bit_piece, where it was implemented incorrectly in both LLVM and lldb and I'm not sure how the backwards compatibility story was addressed there).

That's my understanding. It means I'm not sure (though I'm not the expert here - certainly leave final decisions up to @aprantl and @JDevlieghere ) how this will be fixed, since lldb will presumably still need to be able to parse/understand old dsyms using the incorrect absolute offsets. (we had a similar problem once with DW_OP_bit_piece, where it was implemented incorrectly in both LLVM and lldb and I'm not sure how the backwards compatibility story was addressed there).

I think @aprantl has proposed this in the past, but we should make dsymutil convert DW_OP_convert to DW_OP_LLVM_convert in its output. I also don't have a good plan for backward compatibility. If there was a way we could "know" that the DWARF came from a dSYM we could continue to support the old behavior knowing that it has always been wrong, but I'm not aware of anything like that in LLDB and I'm also not sure that's something I'd like to have in the first place...

labath added a comment.Nov 3 2020, 1:14 AM

That's my understanding. It means I'm not sure (though I'm not the expert here - certainly leave final decisions up to @aprantl and @JDevlieghere ) how this will be fixed, since lldb will presumably still need to be able to parse/understand old dsyms using the incorrect absolute offsets. (we had a similar problem once with DW_OP_bit_piece, where it was implemented incorrectly in both LLVM and lldb and I'm not sure how the backwards compatibility story was addressed there).

I think @aprantl has proposed this in the past, but we should make dsymutil convert DW_OP_convert to DW_OP_LLVM_convert in its output.

How much space are we talking about saving here? A single DW_TAG_base_type DIE is like 4--8 bytes long (depending of whether it contains the name field, and how its encoded, etc.). If every compile unit needed a dozen of these entries, that's still like under 100 bytes per CU. Is that too much? Could we just put these entries into every CU that needs them? I'd expect this to be negligible compared to the rest of DWARF...

If the space savings are important enough, then staking out a new dwarf opcode seems like the best solution. However, since DWARF Linker is also coming to other platforms and those platforms may have consumers which may not understand llvm extensions, I think we may have to implement both solutions anyway. Also, to support non-macho use cases, I think the operand to this extension should not be uleb-encoded (so that it can be relocated by a linker), which can make it larger than usual.

Or.. here's a crazy idea. Do all compile units produced by dsymutil share the same abbreviation table? (If not, why not?) We could encode the DW_TAG_base_types "into" the abbreviation table via DW_FORM_implicit_const. Then the DW_TAG_base_types would be just sizeof(uleb) ≈ 2 (plus maybe the size of a string form, if they have a name).

I also don't have a good plan for backward compatibility. If there was a way we could "know" that the DWARF came from a dSYM we could continue to support the old behavior knowing that it has always been wrong, but I'm not aware of anything like that in LLDB and I'm also not sure that's something I'd like to have in the first place...

Detecting dSYM files should not be that hard. triple.isMachO() && !symbol_file_dwarf->GetDebugMapSymfile() should be a pretty good approximation. It won't handle case the case of opening .o files directly, but that's only useful for tests I believe, and we could throw in an additional condition for that if needed. If we want to have backward compatibility, I think we're going to have to have something like that at least for a transitional period. The question is how to ensure that this code can be removed eventually. One possibility would be for dsymutil to signal that it has this fixed via some cu-level attribute (DW_AT_LLVM_convert_is_fixed) -- after a certain time, we can remove the fallback, and assume the correct behavior everywhere...