This is an archive of the discontinued LLVM Phabricator instance.

DebugInfo: Add/support new DW_LANG codes for recent C and C++ versions
ClosedPublic

Authored by dblaikie on Nov 23 2022, 11:16 AM.

Details

Summary

This may be a breaking change for consumers if they're trying to detect
if code is C or C++, since it'll start using new codes that they may not
be ready to recognize, in which case they may fall back to non-C
handling.

Diff Detail

Event Timeline

dblaikie created this revision.Nov 23 2022, 11:16 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
dblaikie requested review of this revision.Nov 23 2022, 11:16 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 23 2022, 11:16 AM

Hmmm I might be inclined to emit 17 and 20 only under not-strict-DWARF for v5, although it makes the logic more complicated. The codes have been allocated but AFAICT the website doesn't have the new codes listed (I looked at http://wiki.dwarfstd.org/index.php/DWARF_Language_Support which doesn't even have all the v5 codes yet). @aprantl do you have an opinion on this? I tend to lean to the pedantic side on this kind of thing, but I'm persuadable.

Hmmm I might be inclined to emit 17 and 20 only under not-strict-DWARF for v5, although it makes the logic more complicated. The codes have been allocated but AFAICT the website doesn't have the new codes listed (I looked at http://wiki.dwarfstd.org/index.php/DWARF_Language_Support which doesn't even have all the v5 codes yet).

I think they're up here: https://dwarfstd.org/Languages.php (linked from the dwarfstd.org front page "DWARF V5 Language Codes and Requests")

@aprantl do you have an opinion on this? I tend to lean to the pedantic side on this kind of thing, but I'm persuadable.

Yeah, I've certainly got mixed feelings - maybe we don't pick up these after-release language codes, and instead produce the new language encoding (which separates language from version) as an extension, alongside the old/in-the-dwarfv5-spec-document codes? So that a DWARFv5 consumer that contains only the DWARFv5-spec-document functionality and not the new codes, and a newer consumer can read the new parts.

The codes have been allocated but AFAICT the website doesn't have the new codes listed (I looked at http://wiki.dwarfstd.org/index.php/DWARF_Language_Support which doesn't even have all the v5 codes yet).

I think they're up here: https://dwarfstd.org/Languages.php (linked from the dwarfstd.org front page "DWARF V5 Language Codes and Requests")

Oh, thanks. I had it in my head they were on the wiki, but being on the main site is better.

@aprantl do you have an opinion on this? I tend to lean to the pedantic side on this kind of thing, but I'm persuadable.

Yeah, I've certainly got mixed feelings - maybe we don't pick up these after-release language codes, and instead produce the new language encoding (which separates language from version) as an extension, alongside the old/in-the-dwarfv5-spec-document codes? So that a DWARFv5 consumer that contains only the DWARFv5-spec-document functionality and not the new codes, and a newer consumer can read the new parts.

Except the new LNAME codes aren't 1-1 with the old codes and aren't published anywhere. Well, except in the issue resolution, I guess, but that's not definitive (subject to change until DWARF 6 is published). I'm not sure which way to go here.

@aprantl do you have an opinion on this? I tend to lean to the pedantic side on this kind of thing, but I'm persuadable.

Yeah, I've certainly got mixed feelings - maybe we don't pick up these after-release language codes, and instead produce the new language encoding (which separates language from version) as an extension, alongside the old/in-the-dwarfv5-spec-document codes? So that a DWARFv5 consumer that contains only the DWARFv5-spec-document functionality and not the new codes, and a newer consumer can read the new parts.

Except the new LNAME codes aren't 1-1 with the old codes

I'm not sure that'd be a problem? They're in different attributes, etc, such that they wouldn't be read by an old consumer without explicit support for them.

and aren't published anywhere.

They're published on the website too: https://dwarfstd.org/LanguagesV6.php

They're published on the website too: https://dwarfstd.org/LanguagesV6.php

(sigh) sorry, distracted by other things today.

Yeah, using the v6 codes with the v6 attributes for the new ones WFM.

@aprantl do you have an opinion on this? I tend to lean to the pedantic side on this kind of thing, but I'm persuadable.

As long as LLDB can deal with it I'm fine either way.
Emitting the separated DWARF 6 attribute as an extension sounds fine to me.

llvm/lib/IR/DIBuilder.cpp
159 ↗(On Diff #477570)

Should we define a DW_LANG_HI_DWARF in the .def file?

dblaikie updated this revision to Diff 479499.Dec 1 2022, 6:37 PM

Add llvm testing (but I'm going to rework this a lot, so this is just a snapshot for history)

Don't know that I've got it in me right now to do the more invasive change of updating the LLVM DebugInfo IR metadata to carry the DWARFv6 style language+version stuff (perhaps especially if that's going to go alongside the DWARFv5 and earlier pure language encoding)

I could do the minimum and fix the C version case to use the DWARFv5-shipped C11 encoding. And perhaps add the post-v5 backported language encodings to LLVM so at least llvm-dwarfdump can detect/render/handle them correctly for name printing, etc.

Or I could maybe do something in between - where the post-v5 backported language codes could be used in the IR and then LLVM could detect those and remap them to v5-shipped DW_AT_language + v6-not-ready-yet (using an extension form? Using the form that's not shipped in v6 yet?) attributes?

Seems like it's probably not worth emitting the v6 stuff until it's baked. And not worth emitting the post-v5 backported attributes due to risk of breakage? So maybe just do the minimum?

One not-yet-asked question, does gcc produce the C++17/20 codes? If so, does fstrict-dwarf affect that? The major consumers of this would be lldb (which we control) and gdb, so following gcc's lead here would seem appropriate. I can accept that listing the codes on the website "counts" as in-spec for fstrict-dwarf purposes.

Deferring the v6 language attributes seems totally fair.

dblaikie updated this revision to Diff 480292.Dec 5 2022, 5:33 PM

Rebase on top of committed LLVM changes

dblaikie updated this revision to Diff 480296.Dec 5 2022, 5:46 PM

Don't use the since-v5-published codes

One not-yet-asked question, does gcc produce the C++17/20 codes? If so, does fstrict-dwarf affect that?

Looks like GCC uses C++17 for both 17 and 20: https://gcc.gnu.org/git/?p=gcc.git;a=commit;f=gcc/testsuite/g%2B%2B.dg/debug/dwarf2/lang-cpp17.C;h=a90606ab979762ce4630777c2fce5c921d0a2b96 (apparently it used to use just C_plus_plus for 20: https://gcc.gnu.org/git/?p=gcc.git;a=commit;f=gcc/testsuite/g%2B%2B.dg/debug/dwarf2/lang-cpp17.C;h=a90606ab979762ce4630777c2fce5c921d0a2b96 )

The major consumers of this would be lldb (which we control) and gdb, so following gcc's lead here would seem appropriate. I can accept that listing the codes on the website "counts" as in-spec for fstrict-dwarf purposes.

Yeah, I'm leaning towards just not implementing the v5 bonus codes - they're good for new languages, but for existing languages the risk of regression seems not worth the benefit.

I left the test cases in - easy to tweak them when we decide how to render this - and left some minor refactoring in that I could pull out/commit separately.

I still haven't heard from the internal user that seemed pretty adamant that expressing these newer versions was important... I can't really picture (relative to all the other debug info incompletenessess) why it would be especially important.

llvm/lib/IR/DIBuilder.cpp
159 ↗(On Diff #477570)

Maybe? I'm not sure - if it has to explicitly write the constant or another enum value in there, I think it's as liable to get missed when updating as this code, perhaps? Open to ideas.

aprantl accepted this revision.Dec 6 2022, 12:26 PM

This minimal patch LGTM.

llvm/lib/IR/DIBuilder.cpp
159 ↗(On Diff #477570)

I guess this isn't important enough to spend too much time designing a solution for it. Let's leave it as is.

This revision is now accepted and ready to land.Dec 6 2022, 12:26 PM
This revision was landed with ongoing or failed builds.Dec 6 2022, 1:11 PM
This revision was automatically updated to reflect the committed changes.

LGTM. I agree with the commentary in the test.

Oh, right, PS4 defaults to C99. It's okay with me if you make those two unsupported for PS4.

dblaikie reopened this revision.Dec 8 2022, 9:27 AM

(just a side note I discovered while looking into this: seems someone else had this idea before & came to similar conclusions: https://reviews.llvm.org/D104118#2840490 )

This revision is now accepted and ready to land.Dec 8 2022, 9:27 AM