The current consumer side implementation of DWARF v5 string offsets tables is
flawed. According to the DWARF standard, contributions to the string offsets table
have their own format - either DWARF32 or DWARF64 - independent of the contributing
unit's (CU or TU) format. The current implementation incorrectly derives the string
offsets table's format from that of its contributing unit.
This patch corrects this problem in the following way: In full compilation units, full type units and
skeleton units we obtain the start of the unit's contribution from the DW_AT_str_offsets_base
attribute as before. Since the standard mandates that a contribution header has to
immediately precede this location, we attempt to detect a well-formed contribution header at
either 16 (DWARF64) or 8 (DWARF32) bytes before it. This establishes the format and length
of the unit's string offsets table contribution and enables us to validate the length
as well as any accesses through the table. Each unit's contribution is captured in a
descriptor that holds all the relevant information about the contribution. If parsing of
the header failed, the descriptor is left in an error state, which prevents extraction
of strings by any consumers. llvm-dwarfdump reports errors based on the state of the
descriptor.
llvm-dwarfdump can now also report gaps in the string offsets table as well as overlapping
contributions because dumping the table is driven by the existing units and their str_offsets_base
attributes and is no longer a simple scan as before.
For split CUs and TUs, the standard says that the DW_AT_str_offsets_table attribute is
not used. The current implementation mistakenly honors it. The patch changes the
implementation to ignore the attribute instead. Additionally, since the standard
mandates that there is only one CU (but possibly multiple TUs) in a .dwo file,
the implementation now assumes that there is a single contribution to the string offsets
table in the .debug_str_offsets.dwo section with a DWARF v5 header at offset 0. It attempts
to derive the format from the header. It also assumes that the CU and all TUs in the .dwo file
share this single contribution. The DWARF5 standard is not completely clear on this
last point but it is the only consistent interpretation I can think of.
In a .dwp file, individual contributions to the string offsets table are identified
by the index tables. There is only one CU per contributing .dwo file, so we can identify
pre-DWARF5 units and correctly parse their contributions to the string offsets table,
which is just a simple array of string offsets without a header.
The tests have been modified to add tests for
- Units whose string offsets tables have a different format.
- Recognition of gaps in the table
- Mixing of DWARF v5 and existing GNU split dwarf units in a dwp file
- Detection of overlapping contributions to the string offsets table (error)
/// and on line above, so it gets picked up by Doxygen?