This is an archive of the discontinued LLVM Phabricator instance.

[clang] Change the multi-character character constants from extension to implementation-defined.
ClosedPublic

Authored by nomanous on Sep 19 2020, 4:12 AM.

Details

Summary

The multi-character character constants should be implementation-defined according to the C standard but actually were made to be extension, which would cause errors when using clang with option "-pedantic-errors". This patch fixes it and it is for bug #46797 on the Bugzilla system.

Diff Detail

Event Timeline

nomanous created this revision.Sep 19 2020, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2020, 4:12 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
nomanous requested review of this revision.Sep 19 2020, 4:12 AM

Multichar literals are implementation-defined in C and conditionally supported with implementation-defined semantics in C++. I agree that it may make sense to warn about their use for portability reasons, but I'm not certain whether it makes sense to promote their use to be always-on diagnostics. I'm curious to know if this change causes any issues with system headers (which may or may not still define four char codes) or popular libraries.

I was curious as to why this was an extension in the first place and found the original commits (https://github.com/llvm/llvm-project/commit/74c95e20af4838152a63010292d1063835176711 and https://github.com/llvm/llvm-project/commit/8577f62622d50183c7413d7507ec783d3c1486fc) but there's no justification as to why this was picked as an extension.

clang/include/clang/Basic/DiagnosticLexKinds.td
108–109

One potential reason why we don't want to warn on this by default is that four char codes were quite popular back in the Mac Classic days.

Multichar literals are implementation-defined in C and conditionally supported with implementation-defined semantics in C++. I agree that it may make sense to warn about their use for portability reasons, but I'm not certain whether it makes sense to promote their use to be always-on diagnostics. I'm curious to know if this change causes any issues with system headers (which may or may not still define four char codes) or popular libraries.

I was curious as to why this was an extension in the first place and found the original commits (https://github.com/llvm/llvm-project/commit/74c95e20af4838152a63010292d1063835176711 and https://github.com/llvm/llvm-project/commit/8577f62622d50183c7413d7507ec783d3c1486fc) but there's no justification as to why this was picked as an extension.

Or should we change the four character case to "Remark" so it wouldn't be warned even with the "-pandemic" option? There seems no other choice.

nomanous updated this revision to Diff 295559.Oct 1 2020, 6:54 AM

Or should we change the four character case to "Remark" so it wouldn't be warned even with the "-pandemic" option? There seems no other choice.

There's a DefaultIgnore modifier for warnings that turns then off by default. We use this sparingly, since it's hard to discover off-by-default warnings, but seems appropriate here.

aaron.ballman requested changes to this revision.Oct 1 2020, 7:30 AM

Multichar literals are implementation-defined in C and conditionally supported with implementation-defined semantics in C++. I agree that it may make sense to warn about their use for portability reasons, but I'm not certain whether it makes sense to promote their use to be always-on diagnostics. I'm curious to know if this change causes any issues with system headers (which may or may not still define four char codes) or popular libraries.

I was curious as to why this was an extension in the first place and found the original commits (https://github.com/llvm/llvm-project/commit/74c95e20af4838152a63010292d1063835176711 and https://github.com/llvm/llvm-project/commit/8577f62622d50183c7413d7507ec783d3c1486fc) but there's no justification as to why this was picked as an extension.

Or should we change the four character case to "Remark" so it wouldn't be warned even with the "-pandemic" option? There seems no other choice.

That doesn't sound like the right approach to me -- Remarks are usually for reporting backend decision-making to the frontend for things like optimization passes. In this case, it may make more sense to make it a Warning<> but make it DefaultIgnore so that you have to enable it explicitly.

Btw, it looks like when you generated this patch, it was generated against the previous diff and not trunk, so the diff view is misleading. When you update the patch, can you be sure to diff against the trunk?

This revision now requires changes to proceed.Oct 1 2020, 7:30 AM
lattner resigned from this revision.Oct 1 2020, 9:10 AM

That doesn't sound like the right approach to me -- Remarks are usually for reporting backend decision-making to the frontend for things like optimization passes.

To be clear: that's how they happen to most visibly be used, but the more general idea is that Remarks are for purely informational messages. One test for whether a diagnostic could reasonably be a remark is: can we imagine anyone ever reasonably wanting to promote it to an error as part of the flags for some project? If so, then use of a remark is inappropriate. That's certainly the case here: it's entirely reasonable to want to be able to reject use of non-portable functionality such as this. So I agree, this should not be a remark.

Btw, it looks like when you generated this patch, it was generated against the previous diff and not trunk, so the diff view is misleading. When you update the patch, can you be sure to diff against the trunk?

Please also ensure you upload a diff with full context.

nomanous updated this revision to Diff 295766.Oct 2 2020, 2:25 AM

I change the four-char constants back to "Warning" in this revision, but set it to be ignored by default.

nomanous updated this revision to Diff 295771.Oct 2 2020, 2:35 AM

I restore some needless modification in the previous patch.

aaron.ballman accepted this revision.Oct 2 2020, 6:37 AM

That doesn't sound like the right approach to me -- Remarks are usually for reporting backend decision-making to the frontend for things like optimization passes.

To be clear: that's how they happen to most visibly be used, but the more general idea is that Remarks are for purely informational messages. One test for whether a diagnostic could reasonably be a remark is: can we imagine anyone ever reasonably wanting to promote it to an error as part of the flags for some project? If so, then use of a remark is inappropriate. That's certainly the case here: it's entirely reasonable to want to be able to reject use of non-portable functionality such as this. So I agree, this should not be a remark.

Thank you for the explanation!

The patch LGTM with a minor nit about the test case.

clang/test/Lexer/multi-char-constants.c
7

You can remove the empty main().

This revision is now accepted and ready to land.Oct 2 2020, 6:37 AM
rsmith added inline comments.Oct 3 2020, 9:50 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
108–109

Please also rename these diagnostics from ext_ to warn_.

nomanous updated this revision to Diff 296118.Oct 5 2020, 2:00 AM

In this revision I delete the useless main function in the new test case and rename ext_multichar_character_literal & ext_four_char_character_literal to warn_multichar_character_literal & warn_four_char_character_literal.

aaron.ballman accepted this revision.Oct 5 2020, 12:56 PM

LGTM! Do you need someone to commit on your behalf?

LGTM! Do you need someone to commit on your behalf?

Yes I need!

aaron.ballman closed this revision.Oct 6 2020, 5:48 AM

LGTM! Do you need someone to commit on your behalf?

Yes I need!

I've committed on your behalf in 8fa45e1fd527269140c4e2a1652fef5500da16fd, thank you for the fix!