[DWARF] v5 implementation of string offsets tables - producer side

Authored by wolfgangp on Fri, Jan 12, 5:01 PM.



Generate DWARF v5 string offsets tables and use the DW_FORM_strx? forms to refer to indexed strings. All compile and type units in the compilation share one contribution. There are 3 test cases:

str-offsets-table.ll: Simple test for one unit in non-split and split scenarios.
str-offsets-multiple-cus.ll: Multiple compile units and a type unit
str-offsets-form.ll: Test if we generate DW_FORM_strx2 (in addition to DW_FORM_strx1).

One caveat is that there is no testing for the DW_FORM_strx{3,4} forms. I'm not quite sure how to effect this without a humongous source that uses > 64K strings (for strx3) or > 24M strings (for strx4), other than perhaps a generating script. However, the code that triggers these forms is very simple, so a test for strx2 may be sufficient.

Diff Detail

wolfgangp created this revision.Fri, Jan 12, 5:01 PM
aprantl accepted this revision.Fri, Jan 12, 5:57 PM

The code looks fairly straightforward. LGTM, thanks!

This revision is now accepted and ready to land.Fri, Jan 12, 5:57 PM
dblaikie added inline comments.Mon, Jan 15, 11:41 AM

Remove the return after unreachable.


This looks like it's inconsistent with the code.

The code looks like it sets the symbol for both the skeleton and non-skeleton holders in the case of split DWARF, doesn't it?

Perhaps it should be:

if (useSplitDwarf)


(or that could be rewritten as:

(useSplitDwarf ? SkeletonHolder : InfoHolder).setStringOffsetsStartSym(...);

This is duplicated in a couple of places - could it be reasonably moved into some common place, like the DwarfUnit ctor? (not sure if that has some common setup code for adding attributes common to all units - maybe there is no such common place? not sure)


What's "segmented" mean/imply/add/clarify in this context?


maybe just say "split units"


Would these comparisons be better written as "Index >= 0xff", etc?


To get a really large string index that doesn't need lots of IR to be cleaned up, maybe just use one giant enum with lots of enumerators? Tie the enum to the IR with a single global, same as the struct here.


Could these be simplified to all use the same construct for the strings. Otherwise it makes the test a bit more confusing/cryptic by implying that functions, global variahbles, and structs, might be meaningfully different for string emission - which shouldn't be true.

(eg: all use global variables, all use enums (I'd say enums are probably a good call), or whatever)

Ah, I see, type units are relevant - you could probably make this work by having named/external linkage enums (which will go in type units) and anonymous/internal linkage enums (which won't go in type units - but can still be emitted by using an enumerator to initialize a global/external linkage variable, I think? That should cause the enumeration to go into the enum list on the CU)


Maybe rather than naming it CU1_STROFF, name it STROFFBASE or something, since it's meant to be the same for all the units?


Is each type unit getting its own string offset? I'd expect type units to use the string offset table of the CU they were attached to, so this should be the same as CU1_STROFF?

I mean it's a tradeoff - using separate string offset sections per TU means they can be dropped if this copy of the TU is dropped. But it also means more duplication - the same string offset having to be duplicated in the CU and every TU if the same string appears in all/several of them.


Would it be worth printing out the index number in the section dumping to make it easier to look them up? Then the matching wouldn't need 5 trivial matches, you could match the index from the unit dumping to the string here. (& similarly in the non-TU case above)


Some comments about which parts of this code are important/what situation is being created/tested for (& ensuring the code is as simple as possible for that goal) might be handy.