This is an archive of the discontinued LLVM Phabricator instance.

Improve handling of static assert messages.
ClosedPublic

Authored by cor3ntin on Aug 20 2021, 9:07 AM.

Details

Summary

Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string when it is a
8 byte string.

Wide, UTF-8/UTF-16/32 strings are still completely
escaped, until we clarify how these entities should
behave (cf https://wg21.link/p2361).

FormatDiagnostic is modified to escape
non printable characters and invalid UTF-8.

This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)

Of note, PaddingChecker did print out new lines that were
later removed by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.

Unicode tables updated to both use the Unicode definitions
and the Unicode 14.0 data.

U+00AD SOFT HYPHEN is still considered a print character
to match existing practices in terminals, in addition of
being considered a formatting character as per Unicode.

Diff Detail

Event Timeline

cor3ntin created this revision.Aug 20 2021, 9:07 AM
cor3ntin requested review of this revision.Aug 20 2021, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2021, 9:07 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin updated this revision to Diff 367802.Aug 20 2021, 9:10 AM
cor3ntin retitled this revision from Improve handling of static assert messages. Instead of dumping the string literal (which quotes it and escape every non-ascii symbol), we can use the content of the string (which we know is valid UTF-8 by virtue if being a unevaluated string). to Improve handling of static assert messages..
cor3ntin edited the summary of this revision. (Show Details)
This comment was removed by cor3ntin.
jfb added a subscriber: hwright.Aug 20 2021, 9:45 AM

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

Can you add tests for other special characters, to make sure they're all handled properly? Just copy/paste my twitter shitpost might be sufficient? I think I covered all the corner cases in it, gotta be thorough!

clang/lib/Basic/Diagnostic.cpp
834

We don't have a better hex formatter? 😟
Not a big deal, but I'd hoped that ADT had something!

clang/lib/Sema/SemaDeclCXX.cpp
16596

cast should already handle this.

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

Can you add tests for other special characters, to make sure they're all handled properly? Just copy/paste my twitter shitpost might be sufficient? I think I covered all the corner cases in it, gotta be thorough!

It's funny that you mention that because my code filters out LTR/RTL... I should probably change that!

cor3ntin added inline comments.Aug 20 2021, 9:53 AM
clang/lib/Basic/Diagnostic.cpp
834

There are a few hex formatter, none that pads to a minimum size!

cor3ntin updated this revision to Diff 367840.Aug 20 2021, 11:26 AM
  • Add more tests
  • Add a table for format codepoints - which we want to output

as is. These include among other ZWJ (for emojis) and LTR/RTL
marks.

  • At the same time, update the unicode tables to Unicode 13.

(We may want to move all the unicode tables
(there are some in clang and some in llvm) in the same place for
better maintainance in the future!)

cor3ntin updated this revision to Diff 367844.Aug 20 2021, 11:35 AM

Remove superfluous assert

martong removed a subscriber: martong.Aug 30 2021, 5:33 AM
cor3ntin planned changes to this revision.Sep 15 2021, 11:27 AM

I'll fix the tests

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

I think it's fine to change; we've never promised diagnostic formatting compatibility between versions. I'm sure *someone* is relying on this somewhere, but I'm not worried we're going to break a ton of people -- hopefully enough folks are tracking trunk that we can find any major issues before releasing.

Btw, it looks like the CI is currently failing the LLVM unit tests in interesting ways. That should be resolved.

There are changes in clang/test/Lexer/null-character-in-literal.c but Phab is unhelpful about showing what those changes are because it thinks the file is a binary file. Can you explain what's been changed there?

clang/lib/Basic/Diagnostic.cpp
807

Per coding conventions.

809

Please fix the clang-tidy and clang-format warnings.

Also, I think it is more commonly considered the name for an iterator, which this sort of isn't. I'd recommend going with Begin and End for names (that also fixes the coding style nit with the names).

819–822

You should fix these names up for the coding conventions (same suggestion applies elsewhere). Also, the local style is const unsigned char *.

835

I'm pretty sure you need to call flush() before using number.

clang/lib/Sema/SemaDeclCXX.cpp
16596–16598
cor3ntin updated this revision to Diff 373404.Sep 18 2021, 3:17 AM
cor3ntin marked 5 inline comments as done.
  • Fix tests, cland tidy
cor3ntin updated this revision to Diff 373405.Sep 18 2021, 3:18 AM

Formatting

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

I think it's fine to change; we've never promised diagnostic formatting compatibility between versions. I'm sure *someone* is relying on this somewhere, but I'm not worried we're going to break a ton of people -- hopefully enough folks are tracking trunk that we can find any major issues before releasing.

Btw, it looks like the CI is currently failing the LLVM unit tests in interesting ways. That should be resolved.

There are changes in clang/test/Lexer/null-character-in-literal.c but Phab is unhelpful about showing what those changes are because it thinks the file is a binary file. Can you explain what's been changed there?

A verbatim NULL is replaced by <U+0000> in the expected-error of a test that checks the handling of null in header-name (!)

-                          // expected-error@-1 {{'null\^@character' file not found}}
+                          // expected-error@-1 {{'null\<U+0000>character' file not found}}
cor3ntin updated this revision to Diff 373408.Sep 18 2021, 3:45 AM

Properly comment isFormatting/isPrintable.

cor3ntin updated this revision to Diff 375244.Sep 27 2021, 6:53 AM
This comment was removed by cor3ntin.
cor3ntin updated this revision to Diff 375251.Sep 27 2021, 7:27 AM
This comment was removed by cor3ntin.
cor3ntin planned changes to this revision.Sep 27 2021, 8:11 AM
cor3ntin updated this revision to Diff 375273.Sep 27 2021, 8:18 AM

Correctly rebase on top of D105759

jfb added a comment.Sep 27 2021, 9:01 AM

Can you test all the values in this? https://godbolt.org/z/h7n54fa5x

clang/lib/Basic/Diagnostic.cpp
808

Can this addition overflow?

cor3ntin added inline comments.Sep 30 2021, 5:59 AM
clang/lib/Basic/Diagnostic.cpp
808

Technically yes, if you are trying to output a single string over 4GB I guess. Do we care?

Btw, this CI failure looks relevant: https://reviews.llvm.org/harbormaster/unit/view/1055822/ but... it looks more relevant to the parent patch than this one (and the parent seems to have a clean CI). May be worth looking into whether this is an issue or not.

clang/lib/Basic/Diagnostic.cpp
836
838
845–846
llvm/include/llvm/Support/Unicode.h
41
llvm/lib/Support/Unicode.cpp
285–286
aaron.ballman added inline comments.Sep 30 2021, 7:38 AM
clang/lib/Basic/Diagnostic.cpp
808

Personally, I'm not overly worried about that scenario (I suspect we will run into overflow like that far earlier). But if @jfb has a situation in mind where this might be a concern, checking for overflow and diagnosing it wouldn't be the worst outcome either.

aaron.ballman added inline comments.Sep 30 2021, 7:39 AM
clang/lib/Basic/Diagnostic.cpp
808

Though I should note: testing that scenario without causing undue impacts on running the tests may be interesting.

Btw, this CI failure looks relevant: https://reviews.llvm.org/harbormaster/unit/view/1055822/ but... it looks more relevant to the parent patch than this one (and the parent seems to have a clean CI). May be worth looking into whether this is an issue or not.

Yep, I'm waiting for the parent patch to get merge to cleanup this one.

cor3ntin abandoned this revision.Oct 1 2021, 12:01 PM
cor3ntin reclaimed this revision.Dec 2 2021, 10:32 AM
cor3ntin updated this revision to Diff 440063.Jun 26 2022, 6:52 AM
cor3ntin marked 4 inline comments as done.
  • Rebase
  • Address Aaron's comments.
  • Fix tests
  • Keep dumping wide/utf16/utf32 strings escaped until we can ensure these things cannot appear in static_assert and other compile time constructs
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2022, 6:52 AM
Herald added a subscriber: steakhal. · View Herald Transcript
cor3ntin updated this revision to Diff 440064.Jun 26 2022, 6:54 AM

Formatting

cor3ntin marked an inline comment as done.Jun 26 2022, 7:05 AM
aaron.ballman added reviewers: tahonermann, Restricted Project.Jun 27 2022, 8:15 AM

Thanks for picking this back up! I have mostly nits, a few questions, and am adding some extra reviewers who may have an opinion. Also, I think the summary needs to be updated (this is no longer waiting on the unevaluated strings stuff), and the changes need a release note.

clang/lib/Basic/Diagnostic.cpp
809
821

We don't really need this variable, we can use &CodepointValue in the two places it's needed.

825
828

It'd be worth adding a message here as well. Something like && "the sequence is legal UTF-8 but we couldn't convert it to UTF-32"

clang/lib/Sema/SemaDeclCXX.cpp
16596
16597

Do we want to use something like isASCII() here instead of just checking for a single-byte width? e.g., pascal strings are single byte width, but probably not something we want printed this way.

Another question is, do we want to do something slightly slower (since we know we're already on the failure path, the performance here shouldn't matter overly much) by checking !containsNonAsciiOrNull()?

llvm/lib/Support/Unicode.cpp
301

Unrelated formatting change?

377

Unrelated formatting change?

erichkeane added inline comments.
llvm/lib/Support/Unicode.cpp
268

As a nit, this condition might be worth reversing to get the 'fast thing' on the LHS of the short-circuit.

cor3ntin updated this revision to Diff 440261.Jun 27 2022, 9:22 AM
cor3ntin marked 5 inline comments as done.
  • Address Aaron's and Erich's comments
cor3ntin added inline comments.Jun 27 2022, 9:25 AM
clang/lib/Basic/Diagnostic.cpp
821

We do need a llvm::UTF32** though - that's why there is this otherwise not useful variable

clang/lib/Sema/SemaDeclCXX.cpp
16597

Do we want to use something like isASCII() here instead of just checking for a single-byte width? e.g., pascal strings are single byte width, but probably not something we want printed this way.

Maybe yes. I though about supporting u8, but we should definitively not make effort for string with a prefix

Another question is, do we want to do something slightly slower (since we know we're already on the failure path, the performance here shouldn't matter overly much) by checking !containsNonAsciiOrNull()

No, we want to render unicode characters properly.

erichkeane added inline comments.Jun 27 2022, 9:38 AM
clang/lib/Basic/Diagnostic.cpp
807

could we have some comment as to what this is for/what it is supposed to do? It isn't clear to me what the arguments mean here. The name 'Str' isn't particularly helpful.

809

I might prefer BeginItr, but I don't see us needing Begin for other things later, so I don't care much.

843

I find myself wondering if OutStr here should be just a stream as passed in (or immediately become one above?).

clang/lib/Sema/SemaDeclCXX.cpp
16597

why isAscii, which returns bool, compared to 1?

cor3ntin added inline comments.Jun 27 2022, 9:39 AM
clang/lib/Sema/SemaDeclCXX.cpp
16597

Done.
with this change

static_assert(false, "Ω");      // outputs Ω 
static_assert(false, u8"Ω");  // ouputs u8"\316\251"

I think this makes sense. Or rather using u8 in static)assert makes no sense so it's not an issue, even a good thing not to have any special handling. disuade people from using prefixes in that context.

cor3ntin updated this revision to Diff 440301.Jun 27 2022, 10:29 AM
cor3ntin marked an inline comment as done.
  • Add a comment to pushEscapedString
  • Use a stream interface which turns out to be simpler and cleaner,

thanks Erich!

  • Fix typos and small nits
cor3ntin edited the summary of this revision. (Show Details)Jun 27 2022, 12:07 PM
cor3ntin edited the summary of this revision. (Show Details)

I'm generally OK with this. BUT Aaron/JF/Tom should review it.

clang/lib/Basic/Diagnostic.cpp
817

OutStream << *Begin;

837

Something here that uses the stream?

cor3ntin updated this revision to Diff 440354.Jun 27 2022, 12:32 PM
  • Add release note
  • Address Erich's comment.
cor3ntin added inline comments.Jun 27 2022, 12:34 PM
clang/lib/Basic/Diagnostic.cpp
837

The stream is not buffered, it just directly write to OutStr.
So it seems counter productive to construct a StringRef here just to use the stream.

erichkeane added inline comments.Jun 27 2022, 12:36 PM
clang/lib/Basic/Diagnostic.cpp
837

What about...?

cor3ntin added inline comments.Jun 27 2022, 12:41 PM
clang/lib/Basic/Diagnostic.cpp
837

I doesn't seem much better but sure, I can do that

cor3ntin added inline comments.Jun 27 2022, 12:44 PM
clang/lib/Basic/Diagnostic.cpp
837

actually, you would need something like
OutStream.write(reinterpret_cast<const char*>(CodepointBegin), std::distance(CodepointBegin, CodepointEnd));

Unless you insist, I'd rather not change it.

erichkeane added inline comments.Jun 27 2022, 12:44 PM
clang/lib/Basic/Diagnostic.cpp
837

I just see value in having ALL accesses of the string be done 'the same' vs counting on it being unbuffered, which makes it harder to read. And I think the 'stream' versions of the other uses below are a vast improvement that makes this touch of ugliness perhaps worth it.

837

Hrmph.... yeah, thats not great, I won't insist.

Patch application is still failing the precommit CI; it'd be nice to get that corrected for some early test feedback.

clang/docs/ReleaseNotes.rst
276
clang/lib/Sema/SemaDeclCXX.cpp
16597

Another question is, do we want to do something slightly slower (since we know we're already on the failure path, the performance here shouldn't matter overly much) by checking !containsNonAsciiOrNull()

No, we want to render unicode characters properly.

Ah derp, good point.

Thanks for the switch to isASCII() -- I agree that we shouldn't go to great lengths to support prefixes there until WG21 decides what to do with compile-time strings.

cor3ntin updated this revision to Diff 440640.Jun 28 2022, 8:47 AM

Try to fix the patch application issue

I've removed the parent revision as this review no longer requires it to proceed. That should get precommit CI working again.

cor3ntin updated this revision to Diff 440642.Jun 28 2022, 8:52 AM

Trigger a CI rerun

cor3ntin updated this revision to Diff 440645.Jun 28 2022, 8:56 AM
cor3ntin marked an inline comment as done.

Fix rst syntax

aaron.ballman accepted this revision.Jun 28 2022, 9:57 AM

LGTM! The precommit CI failures appear to be unrelated as best I can tell.

This revision is now accepted and ready to land.Jun 28 2022, 9:57 AM
This revision was landed with ongoing or failed builds.Jun 28 2022, 1:26 PM
This revision was automatically updated to reflect the committed changes.

I think this might be the cause of the many libc++ static_assert failures we're seeing on our builders: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8810079977358640657/overview

I think this might be the cause of the many libc++ static_assert failures we're seeing on our builders: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8810079977358640657/overview

Thanks for the heads up, i reverted the patch until i can look into it. Sorry for the inconvenience

cor3ntin reopened this revision.Jun 28 2022, 4:16 PM
This revision is now accepted and ready to land.Jun 28 2022, 4:16 PM
cor3ntin updated this revision to Diff 440804.Jun 28 2022, 4:16 PM
  • Undo the change of diagnostic message formatting as they break

libc++ tests
so we will keep printing <static_assert failed "message"> instead
<of static_assert failed: message>

Changing the format should be done -if at all- in a separate
change.

This revision was landed with ongoing or failed builds.Jun 29 2022, 5:57 AM
This revision was automatically updated to reflect the committed changes.

I missed my opportunity to review while being on vacation last week. I reviewed to keep myself informed; spotted a typo.

llvm/lib/Support/Unicode.cpp
272
cor3ntin added inline comments.Jul 6 2022, 7:06 AM
llvm/lib/Support/Unicode.cpp
272

Thanks! I made a commit to fix it :)