llvm-dwp needs to remap string offsets from input .dwo or .dwp files and therefore needs to be made aware of DWARF v5 string offsets tables now that llvm generates them. Unlike the pre-v5 GNU-style string offsets tables, DWARF v5 string offsets tables have a header describing length, format and version of the table. This patch preserves the header and performs some simple validation when it rewrites the string offsets table contributions.
It also makes llvm-dwp handle v5 compile unit headers.
One thing to note is that the current implementation recreates the string offsets table in one single loop over the entire string offsets section. Since we want to support a mixture of v5 and pre-v5 CUs, we can't parse the section without knowing where the pre-v5 units' contributions are. This is a similar problem as in llvm-dwarfdump, so we go unit by unit in re-writing the table.
Oh... I hadn't realized/understood/thought about this.
That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks.
Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks.
Then we could always walk the str_offsets alone, either expecting header'd sections (which I hope are self descriptive - once you know you're reading a v5 str_offsets, you don't need to consult the CU for anything to do that?) or non-header'd sections (where you just treat every word as a string offset without consideration for how those are divided into contributions)
(Ah, I see you mentioned/highlighted that in the patch description too)