This is an archive of the discontinued LLVM Phabricator instance.

Emit a .debug_str_offsets section with dsymutil to support DW_FORM_strx in dsymutil.
ClosedPublic

Authored by rastogishubham on Aug 3 2023, 2:15 PM.

Details

Summary

With this change, dsymutil can emit a .debug_str_offsets section, therefore it can also emit DW_FORM_strx in the .debug_info section.

I elected to emit only one .debug_str_offset contribution for all compile units, because any shared strings between compile units wouldn't be assigned different offsets for different compile units. Therefore, dsymutil rewrites every DW_AT_str_offsets_base to 8 which is the size of the .debug_str_offsets section header and the start of the offsets for all compile units.

Before this change, the size of the .debug_info section + .debug_str section was for clang.dSYM ~1.17 GB
After this change, the size of the .debug_info section + .debug_str section + .debug_str_offsets section is ~1.12 GB

This results in a 4.35% size decrease

Diff Detail

Event Timeline

rastogishubham created this revision.Aug 3 2023, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:15 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
rastogishubham requested review of this revision.Aug 3 2023, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:15 PM
avl added inline comments.Aug 4 2023, 3:48 AM
llvm/lib/DWARFLinker/DWARFLinker.cpp
1063

I would suggest to use DwarfVersion as a sign of which resulting form would be used:

if (DwarfVersion >= 5) {
      // Switch everything to DW_FORM_strx strings.
      auto StringOffsetIndex =
          StringOffsetPool.getValueIndex(StringEntry.getOffset());
      return Die
          .addValue(DIEAlloc, AttrSpec.Attr,
                    dwarf::DW_FORM_strx, DIEInteger(StringOffsetIndex))
          ->sizeOf(U.getFormParams());
}

    // Switch everything to DW_FORM_strp strings.
    return Die
          .addValue(DIEAlloc, AttrSpec.Attr,
                    dwarf::DW_FORM_strp, DIEInteger(StringEntry.getOffset()))
          ->sizeOf(U.getFormParams());

This allows to save more space if original CU is of 5 version but uses DW_FORM_strp forms. Also it would allow to simplify patch by removing StrxFoundInCU and AttrStrOffsetBaseSeen flags.

Though we would need to pay attention to the case when original CU of 5 dwarf version does not have DW_AT_str_offsets_base because it uses DW_FORM_strp forms. We would need to add DW_AT_str_offsets_base in that case.

This results in a 4.35% size decrease

Sweet.

llvm/lib/DWARFLinker/DWARFLinker.cpp
1063

That seems reasonable — if makes the patch simpler.

1068

Very nitpicky nit: it's DIE because it's an abbreviation for Debug Info Entry ;-)

2025

Just double-checking that this is really unreachable, even with malformed input?

2592

If I'm reading the DWARF spec correctly, isn't the header size 4+2+2 in DWARF32 and 4+8+2+2 in DWARF64?

rastogishubham marked 3 inline comments as done.

All dwarf 5 cus get their strings converted to strxs.

rastogishubham added inline comments.Aug 9 2023, 4:36 PM
llvm/lib/DWARFLinker/DWARFLinker.cpp
2025

With the updates to the patch, this should truly be unreachable

2592

Yes, but we only emit DWARF32 on our platforms

avl added a comment.Aug 10 2023, 4:26 AM

Thank you for the update. Would you mind creating the test case for DW_AT_str_offsets_base and debug_str_offsets, please?(similar to dwarf5-addr_base.test, i.e. several compile units, update/no-update). And, probably, add checks for strings forms into the dwarf5-dwarf4-combination-macho.test(that dwarf5 has all required new forms, and dwarf4 still uses old forms), please.

llvm/include/llvm/DWARFLinker/DWARFLinker.h
665

there is AttributesInfo structure which keeps exactly that kind of info. it would be good to make AttrStrOffsetBaseSeen to be member of AttributesInfo.

880

the DWARFLinkerOptions::TargetDWARFVersion version could be used instead of this additional flag. It would be set into the version 5 in case any DWARFv5 CU seen.

llvm/include/llvm/DWARFLinker/DWARFStreamer.h
85
llvm/lib/DWARFLinker/DWARFLinker.cpp
1845

It should be added to only DW_TAG_compile_unit DIEs.

2015

usage of patchStrOffsetsBase looks a bit reversed. That patch* routines are used when the value is not known at the moment the input DIE is clonned. In this concrete case the offset value is known. It would be more straightforward if proper value of DW_AT_str_offsets_base would be set in cloneScalarAttribute routine(and patchStrOffsetsBase would be removed):

unsigned DWARFLinker::DIECloner::cloneScalarAttribute()
...
if (AttrSpec.Form == dwarf::DW_AT_str_offsets_base) {
    // DWARFLinker generates common .debug_str_offsets table used for all compile units.
    // The offset to the common .debug_str_offsets table is 8 on DWARF32.
    AttrSpec.AttrStrOffsetBaseSeen = true;
    return Die.addValue(DIEAlloc, dwarf::DW_AT_str_offsets_base,
                  dwarf::DW_FORM_sec_offset, DIEInteger(8))->sizeOf(U.getFormParams())
}
...
2944

It would probably look cleaner if checking for above conditions would be hidden inside emitStringOffsets like in emitDebugAddrSection.

llvm/lib/DWARFLinker/DWARFStreamer.cpp
262
rastogishubham marked 9 inline comments as done.

Thanks for the review Alexey.

Added a test to specifically check for strx and str_offsets_base, similar to dwarf5-addr_base.test. One thing to note, right now, when using --update, the strx1's get converted to strxs, I don't know if we want to change that.

Moved AttrStrOffsetBaseSeen to AttributesInfo

Removed Dwarf5CUSeen

Added check for DW_TAG_compile_unit tag when adding a DW_AT_str_offsets_base in a CU

Removed patchStrOffsetsBase, added the code to cloneScalarAttribute

Moved Dwarf5 check to emitStringOffsets

avl added a comment.Aug 10 2023, 3:15 PM

Added a test to specifically check for strx and str_offsets_base, similar to dwarf5-addr_base.test. One thing to note, right now, when using --update, the strx1's get converted to strxs, I don't know if we want to change that.

I think that is OK. We already have the similar situation for update DWARFv4. DW_FORM_string is replaced with DW_FORM_strp.

llvm/include/llvm/DWARFLinker/DWARFLinker.h
117
llvm/lib/DWARFLinker/DWARFLinker.cpp
2024

It looks patchStrOffsetsBase is not removed still.

llvm/lib/DWARFLinker/DWARFStreamer.cpp
268

should not we check here for empty StringOffsets and return?

if (StringOffsets.empty())
  return;
llvm/test/tools/dsymutil/ARM/dwarf5-addr-base.test
48–49

Oh, excuse me, I did not notice it earlier : please, do not use underscore in file names. It is really inconvenient if underscores mixed up with hyphens. Other file names use hyphens. Please rename all of this:

Inputs/DWARF5-addr_base-str_off_base -> Inputs/DWARF5-addr-base-str-off-base 
Inputs/DWARF5-addr_base/ -> Inputs/DWARF5-addr-base/
dwarf5-addr_base.dSYM -> dwarf5-addr-base.dSYM
dwarf5-str_offsets_base-strx.test -> dwarf5-str-offsets-base-strx.test
dwarf5-addr_base.test -> dwarf5-addr-base.test
llvm/test/tools/dsymutil/ARM/dwarf5-dwarf4-combination-macho.test
66

probably:

CHECK-NOT: DW_AT_str_offsets_base

llvm/test/tools/dsymutil/ARM/dwarf5-str_offsets_base-strx.test
1 ↗(On Diff #549131)
rastogishubham marked 6 inline comments as done.
rastogishubham added inline comments.
llvm/lib/DWARFLinker/DWARFLinker.cpp
2024

Apologies, I forgot to remove the code, but I did add the proper code to cloneScalarAttribute

llvm/lib/DWARFLinker/DWARFStreamer.cpp
268

My thought was that if there is a DWARF5 cu, then there has to be a debug_str_offset section, but I am wrong, what if the CU doesn't have any strings?

avl accepted this revision.Aug 11 2023, 5:01 AM

Thanks! LGTM.

llvm/lib/DWARFLinker/DWARFStreamer.cpp
268

Actually, that case(when CU does not have any string) looks very specific. Usually at least DW_TAG_compile_unit has a name. Anyway, it would still be good to not create empty table in this case.

This revision is now accepted and ready to land.Aug 11 2023, 5:01 AM
This revision was automatically updated to reflect the committed changes.