D12353 added inline strings to the DWARF info produced by clang. This
turns out to break some debugging software that assumes that a
DW_TAG_variable *must* come with a DW_AT_name. Add a release note to
broadcast this change.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I am reminded of the perennial problem of "optional" protobuf fields that, when omitted, will cause production crashes.
Do you think it would be less disruptive to synthesize a name? I believe the string lives in a string pool, so naming all string literals <stringliteral> seems like it could be acceptable.
I'm going to try to summarize an offline discussion w/ @mcgrathr about this here:
There are some other considerations to think about w.r.t. emitting names for non-source language constructs, as would be the case here. In fact, DWARF already handles this case: the "this" pointer in C++. Despite the fact that it may not be directly used in the source program, it is tagged as artificial, and gets a name, so that programmers can reference it within the debugger. Debuggers don't know anything special about C++, the compiler just emitted a DWARF variable that has the DW_AT_name of "this". They just reference the name, and allow users to access it as if it were another variable.
The point is that the DW_AT_name has a special meaning to debuggers, which may be problematic if we try to emit a name when this isn't really a source level construct and isn't something that should be accessed like another variable. Generating a name for these items makes them addressable, and is probably more problematic for downstream consumers of DWARF than omitting the name. Generally, DWARF consumers probably shouldn't be relying on things like the name always being there. The standard is pretty clear that many fields are optional, i.e., they may be there or they may not be there.
It's probably fine to gate the new DWARF items behind a compiler flag if there is concern about incompatible consumers. It could even be on by default, but at least it would be possible to opt out of the new behavior. You may even be able to gate it based on the DWARF version. That isn't strictly correct, but is offered as a possible pragmatic convenience on the rationale that concern about incompatible consumers probably applies to "old" consumers and consumers new enough to handle v5 could be assumed new/well-maintained enough to either already handle, or quickly be fixed to handle, nameless DW_TAG_variable.
Unnamed variables are an oddity, sure; we've had to patch a downstream test or two that wasn't being careful enough. But not providing a name is entirely defensible, and consumers should be willing to cope with DWARF that doesn't fully meet their expectations.
While the string would be deduplicated, the offset in the DIEs (or index in DWARFv5, plus offset in .debug_str_offsets) for each of these strings would not. But perhaps that would still be low-cost enough (if we were getting really fancy, we could bake that string offset/index into the abbreviation - though we probably don't have the infrastructure in LLVM's DWARF emission to do that super easily)
DWARF doesn't have a way to encode a section offset into .debug_abbrev. Best you could do (for v5 and up) is pin the string-literal variable name to .debug_str_offsets index 0, which keeps the .debug_info overhead to one byte for DW_FORM_strx1.
It sounds like the consensus is what's represented in the release notes, "if this new feature breaks you, then you need to fix yourself".
Does anyone have any concerns with the specific text, or does it LGTY?
LGTM.
clang/docs/ReleaseNotes.rst | ||
---|---|---|
505 | nit: its likely not an assertion failure, but just invalid code. It's also fine w/ me to word this differently, or ignore the suggestion. |
I see some unrelated whitespace changes, we generally don't like mixing those with "real" changes. But the description seems fine to me.
Looks like someone has cleaned up the whitespace, so the diff is cleaner.
clang/docs/ReleaseNotes.rst | ||
---|---|---|
505 | changed the wording - it's just a common case of "invalid code" that we've seen - asserting that DW_TAG_variables must have a DW_AT_name. |
nit: its likely not an assertion failure, but just invalid code.
It's also fine w/ me to word this differently, or ignore the suggestion.