This is an archive of the discontinued LLVM Phabricator instance.

Add DWARF string debug to clang release notes.
ClosedPublic

Authored by hctim on May 23 2022, 10:12 AM.

Details

Summary

D12353 added inline strings to the DWARF info produced by clang. This
turns out to break some debugging software that assumes that a
DW_TAG_variable *must* come with a DW_AT_name. Add a release note to
broadcast this change.

Diff Detail

Event Timeline

hctim created this revision.May 23 2022, 10:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 10:12 AM
hctim requested review of this revision.May 23 2022, 10:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 10:12 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
rnk added a comment.May 23 2022, 10:30 AM

I am reminded of the perennial problem of "optional" protobuf fields that, when omitted, will cause production crashes.

Do you think it would be less disruptive to synthesize a name? I believe the string lives in a string pool, so naming all string literals <stringliteral> seems like it could be acceptable.

@rnk the standard even has an example of this exact behavior, so I think it's hard to say its wrong for LLVM to do this, but that may be more gentle. I'm going to defer to more expert opinions here, however, and loop in @mcgrathr.

I'm going to try to summarize an offline discussion w/ @mcgrathr about this here:

There are some other considerations to think about w.r.t. emitting names for non-source language constructs, as would be the case here. In fact, DWARF already handles this case: the "this" pointer in C++. Despite the fact that it may not be directly used in the source program, it is tagged as artificial, and gets a name, so that programmers can reference it within the debugger. Debuggers don't know anything special about C++, the compiler just emitted a DWARF variable that has the DW_AT_name of "this". They just reference the name, and allow users to access it as if it were another variable.

The point is that the DW_AT_name has a special meaning to debuggers, which may be problematic if we try to emit a name when this isn't really a source level construct and isn't something that should be accessed like another variable. Generating a name for these items makes them addressable, and is probably more problematic for downstream consumers of DWARF than omitting the name. Generally, DWARF consumers probably shouldn't be relying on things like the name always being there. The standard is pretty clear that many fields are optional, i.e., they may be there or they may not be there.

It's probably fine to gate the new DWARF items behind a compiler flag if there is concern about incompatible consumers. It could even be on by default, but at least it would be possible to opt out of the new behavior. You may even be able to gate it based on the DWARF version. That isn't strictly correct, but is offered as a possible pragmatic convenience on the rationale that concern about incompatible consumers probably applies to "old" consumers and consumers new enough to handle v5 could be assumed new/well-maintained enough to either already handle, or quickly be fixed to handle, nameless DW_TAG_variable.

Unnamed variables are an oddity, sure; we've had to patch a downstream test or two that wasn't being careful enough. But not providing a name is entirely defensible, and consumers should be willing to cope with DWARF that doesn't fully meet their expectations.

I am reminded of the perennial problem of "optional" protobuf fields that, when omitted, will cause production crashes.

Do you think it would be less disruptive to synthesize a name? I believe the string lives in a string pool, so naming all string literals <stringliteral> seems like it could be acceptable.

While the string would be deduplicated, the offset in the DIEs (or index in DWARFv5, plus offset in .debug_str_offsets) for each of these strings would not. But perhaps that would still be low-cost enough (if we were getting really fancy, we could bake that string offset/index into the abbreviation - though we probably don't have the infrastructure in LLVM's DWARF emission to do that super easily)

While the string would be deduplicated, the offset in the DIEs (or index in DWARFv5, plus offset in .debug_str_offsets) for each of these strings would not. But perhaps that would still be low-cost enough (if we were getting really fancy, we could bake that string offset/index into the abbreviation - though we probably don't have the infrastructure in LLVM's DWARF emission to do that super easily)

DWARF doesn't have a way to encode a section offset into .debug_abbrev. Best you could do (for v5 and up) is pin the string-literal variable name to .debug_str_offsets index 0, which keeps the .debug_info overhead to one byte for DW_FORM_strx1.

hctim added a comment.May 24 2022, 9:45 AM

It sounds like the consensus is what's represented in the release notes, "if this new feature breaks you, then you need to fix yourself".

Does anyone have any concerns with the specific text, or does it LGTY?

paulkirth accepted this revision.May 24 2022, 9:57 AM

LGTM.

clang/docs/ReleaseNotes.rst
505

nit: its likely not an assertion failure, but just invalid code.

It's also fine w/ me to word this differently, or ignore the suggestion.

This revision is now accepted and ready to land.May 24 2022, 9:57 AM

I see some unrelated whitespace changes, we generally don't like mixing those with "real" changes. But the description seems fine to me.

so do we want to land this? or are there some outstanding issues I'm unaware of?

Yeah, I'll look at landing it today.

(thanks for the bump - this one fell off the radar)

hctim marked an inline comment as done.Jun 16 2022, 2:53 PM

I see some unrelated whitespace changes, we generally don't like mixing those with "real" changes. But the description seems fine to me.

Looks like someone has cleaned up the whitespace, so the diff is cleaner.

clang/docs/ReleaseNotes.rst
505

changed the wording - it's just a common case of "invalid code" that we've seen - asserting that DW_TAG_variables must have a DW_AT_name.

hctim updated this revision to Diff 437717.Jun 16 2022, 2:53 PM
hctim marked an inline comment as done.

Change a small amount of wording, rebase.

This revision was landed with ongoing or failed builds.Jun 16 2022, 2:54 PM
This revision was automatically updated to reflect the committed changes.