The tags DW_LANG_C_plus_plus_14 and DW_LANG_C_plus_plus_11, introduced in Dwarf-5, are unexpected in previous versions.
Details
- Reviewers
shchenz dblaikie aprantl - Group Reviewers
Restricted Project - Commits
- rG428b17ce7052: [LLDB] Fix buildbots breakage due to TestGuessLanguage.py
rG62fa9b9388aa: [DebugInfo] Fix the mismatching between C++ language tags and Dwarf versions.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/CGDebugInfo.cpp | ||
---|---|---|
572 | Seems we miss to handle DW_LANG_C_plus_plus_03 which is also a DWARF 5 language name value? We always generate DW_LANG_C_plus_plus for -std=c++03 even at -gdwarf-5? If so, maybe we also need to fix this in another patch. | |
clang/test/CodeGenCXX/debug-info-programming-language.cpp | ||
4 | the alignment is a little strange, 80 characters one line? | |
10 | is limited debug info kind enough for this test? |
Can you explain your motivation? There is usually no good reason not to emit more specific attributes from future DWARF versions if they aren't in direct conflict with the older version of the spec that clang is emitting. Most consumers (including LLDB and GDB) will accept DW_LANG_C_plus_plus_14 in DWARF v4 compile units without any problem.
Thx! @aprantl The motivation of the patch came from the crash of tag name mismatching when using DBX under AIX. And modifying the debugger doesn't seem to make sense?
A consumer should not crash when it sees an unexpected value. This is clearly a DBX problem. It should have some fallback, for example it could do the same thing it would do if the language attribute was missing entirely.
Yeah, I totally agree with this! A consumer should be friendly enough to handle any unexpected input. The crash is anyhow not a good response.
Back to the DWARF info issue, will it be an issue that a DWARF 5 language value be emitted when the compiling option explicitly specifies the DWARF 3 version? This time it is DW_LANG_C_plus_plus_14.
We plan to check the DWARF info is generated under the correct DWARF version, what if we meet some other issues? An improper example (NOT exist, just an example) is what if we generate constant class DW_AT_high_pc attribute under -gdwarf-3? I think we should fix this right? For DWARF 3, DW_AT_high_pc attribute should always be a relocatable address.
What you guys think?
I think there are degrees of compatibility with regard to DWARF, which is inherently supposed to be extensible if the extension can be safely ignored by a consumer. This is the "permissive" rule in the standard.
- A new FORM must *never* be used, because that makes it impossible to interpret the rest of the CU.
- A new attribute or tag may be used if it can be safely ignored.
- A new constant value for an existing attribute also may be used if it can be safely ignored.
For other cases, such as using existing forms/attributes in novel ways, it really depends on how we think (or know) consumers will react to it. DW_AT_high_pc with a constant value is an excellent example; a debugger would not have any obvious reasonable interpretation for that. On the other hand, something like a new language code happens all the time, and there's even a range for user-defined language codes. A debugger must be prepared for unknown values, and an unknown value inside the standard-defined range doesn't seem like a reason to crash, or even bail out.
If we implemented a "strict DWARF" mode, then in that mode we should be more careful about this. If you are implementing a DWARF verifier, it would be reasonable to report using language codes outside the range defined by a given DWARF version IMO, but a bug report about that would likely be rejected.
My 2c:
I don't have super strong feelings either way about this particular issue.
Yes, I agree a consumer shouldn't crash and probably should handle an unknown language code fairly gracefully - but equally it's a loss of quality for a conforming DWARFv3 consumer to see a language code it doesn't recognize and drop all C++ support it might have/been able to use if it had been given a code that existed in v3. (this'd be nicely addressed by separating language code from version - then it could warn the user "unknown C++ version, I'll just use the latest version I support - so you might not be able to write newer C++ features in the expression evaluator, like hyperlambdas or whatever weird things exist in future C++")
So it's a tradeoff between consumers that know about newer C++ language codes and those that don't. I suspect if you're compiling with DWARFv3 it's because you've got consumers who only understand v3 & could benefit from the old code (but maybe you have some sample profiler, or crash symbolizer that only understands that - which doesn't actually care about the language version - and the interactive debugger you use is totally capable of parsing DWARFv5 and recognizing all its variety of language codes even when parsing old DWARF).
So... dunno.
In this specific case, I guess then the question is for the user: @Esme: What would your DWARF consumer do with an unknown code if it didn't crash? Would the user experience be significantly worse/different because it didn't recognize the code?
I also agree that consumer *SHOULD* handle the unexpected value better. eg: newer DBX will handle such failures better.
However, there are situations that the end user might not get up to date debugger, then the user experiences will be BAD.
If updating these tags unconditionally not fitting well with the design philosophy,
will it be acceptable if we introduce *-debugger-tuning=dbx* and only tune these tags conditionally for dbx?
Thanks!
Does anyone else have a DWARFv3 consumer they care about? (@aprantl and @probinson)
Does anyone have a DWARF consumer that changes significantly based on the language version (I guess lldb (@aprantl can you confirm?)? I doubt gdb does (yeah, it just treats all C++ versions the same)? how about Sony?)
so if no one's got a particular reason that one C++ version is better than another, except lldb, perhaps we just emit the newer language codes in old DWARF when tuning for lldb but not otherwise?
But if folks supporting previous builds of DBX are likely going to need a bunch of other compatibility work - then maybe adding the debugger tuning may be suitable/necessary/worthwhile at this point.
Yes, there are a few of the feature flags setting for compatibility coming.. Thanks David!
Do you have a list of issues you're hoping to address? Might be helpful to look at them wholistically (perhaps on an llvm-dev thread) to see what the best overall direction is.
The list is growing, but sure, we will post a thread in llvm-dev about what we met so far.
Two big one would be that DBX not supporting string section(DW_FORM_strp) and column-info in line no table.
Our practice has been for a "tuning" option to, effectively, expand into other options to control various bits of behavior. Basically, there should never be behavior that can be controlled _only_ via the tuning option. This is a design decision that was debated at length when we introduced tuning.
There is already an option to turn off column-info in the line table, so a new tuning option could easily imply that. I'm not sure we support suppressing .debug_str, but if we do (or add it), that would be another option, and the new tuning would imply that. And so on.
If we want to emit different sets of language codes based on something other than the DWARF version, we'd need to invent a new option to control that, and then tuning could control whether that's on or off.
I believe Sony is only using v4/v5 at this point. Regarding dialect controlling debugger behavior, I think it's unlikely, as distinguishing C++ dialects is new in v5, its adoption is pretty recent (at least for us), and we'd want behavior to be consistent across v4/v5 as much as possible. I'll verify that with our debugger guys.
If LLDB actually does _not_ have behavior change based on dialect, then we might as well control language codes based on DWARF version, and the patch should proceed.
Our practice has been for a "tuning" option to, effectively, expand into other options to control various bits of behavior. Basically, there should never be behavior that can be controlled _only_ via the tuning option. This is a design decision that was debated at length when we introduced tuning.
Agree, this is a good practise, we will certainly follow.
I believe Sony is only using v4/v5 at this point. Regarding dialect controlling debugger behavior, I think it's unlikely, as distinguishing C++ dialects is new in v5, its adoption is pretty recent (at least for us), and we'd want behavior to be consistent across v4/v5 as much as possible. I'll verify that with our debugger guys.
Thank you Paul!
If LLDB actually does _not_ have behavior change based on dialect, then we might as well control language codes based on DWARF version, and the patch should proceed.
We will wait for LLDB's confirmation then.
Thanks for your comments @dblaikie @probinson .
Sent out a mail to llvm-dev for continuous discussion. https://lists.llvm.org/pipermail/llvm-dev/2021-March/149446.html
Welcome your comments.
I've confirmed that our intent is not to have dialect-specific behavior.
If LLDB actually does _not_ have behavior change based on dialect, then we might as well control language codes based on DWARF version, and the patch should proceed.
I grepped the lldb tree and found nothing... @aprantl can you confirm?
If LLDB actually does _not_ have behavior change based on dialect, then we might as well control language codes based on DWARF version, and the patch should proceed.
@shafik Can you think of situations where LLDB is sensitive to the C++ version number as encoded in DWARF?
Addressed Zheng's comments.
Thank you all for your comments.
Since no DWARF consumers has dialect-specific behavior, the patch will proceed.
clang/lib/CodeGen/CGDebugInfo.cpp | ||
---|---|---|
572 | How about we just do that in this patch? |
clang/lib/CodeGen/CGDebugInfo.cpp | ||
---|---|---|
572 | I don't think anyone will stop you. :) | |
clang/test/CodeGenCXX/debug-info-programming-language.cpp | ||
10 | Yes, limited is enough (and is the default). "Limited" in this context mostly means unused type info will not be emitted; the language is a CU attribute, and the CU is always emitted. |
clang/lib/CodeGen/CGDebugInfo.cpp | ||
---|---|---|
572 | It seems we need to add a new language mode for C++03, so I will post another patch to do it. |
If there are no more comments, I'll commit the patch soon.
What do you think? @aprantl
Sorry for the delay! Looks like this doesn't have any drawbacks for any other debuggers, but helps dbx.
This change has caused a test failure in lldb. On our Arm bots: http://lab.llvm.org:8011/#/builders/96/builds/6582
Also on Linux x86_64: http://lab.llvm.org:8011/#/builders/68/builds/10277
Not windows though because test/API/commands/frame/language/TestGuessLanguage.py is an expected failure there already.
Is there a mapping in lldb that needs to be updated as well?
I've reproduced the failure, but I couldn't solve it immediately because I'm not familiar with lldb. I'll revert the patch first. Thanks!
Seems we miss to handle DW_LANG_C_plus_plus_03 which is also a DWARF 5 language name value? We always generate DW_LANG_C_plus_plus for -std=c++03 even at -gdwarf-5? If so, maybe we also need to fix this in another patch.