Page MenuHomePhabricator

Improve handling of static assert messages.
AbandonedPublic

Authored by cor3ntin on Aug 20 2021, 9:07 AM.

Details

Reviewers
aaron.ballman
jfb
Summary

Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string
(which we know is valid UTF-8 by virtue if being a
unevaluated string).

...g).

FormatDiagnostic is modified to escape
non printable characters and invalid UTF-8.

This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)

Of note, PaddingChecker did print out new lines that were
later remove by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.

Diff Detail

Unit TestsFailed

TimeTest
70 msx64 debian > Clang.Misc::wrong-encoding.c
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -fsyntax-only -Wno-unused-value /var/lib/buildkite-agent/builds/llvm-project/clang/test/Misc/wrong-encoding.c 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck -strict-whitespace /var/lib/buildkite-agent/builds/llvm-project/clang/test/Misc/wrong-encoding.c
160 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-cxa-atexit.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-cxa-atexit.S
150 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
Script: -- : 'RUN: at line 7'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-static-initializer.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-static-initializer.S
170 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-tls.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-tls.S
1,380 msx64 debian > libFuzzer.libFuzzer::fuzzer-finalstats.test
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/SimpleTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest

Event Timeline

cor3ntin created this revision.Aug 20 2021, 9:07 AM
cor3ntin requested review of this revision.Aug 20 2021, 9:07 AM
Herald added a project: Restricted Project. Β· View Herald TranscriptAug 20 2021, 9:07 AM
Herald added a subscriber: cfe-commits. Β· View Herald Transcript
cor3ntin updated this revision to Diff 367802.Aug 20 2021, 9:10 AM
cor3ntin retitled this revision from Improve handling of static assert messages. Instead of dumping the string literal (which quotes it and escape every non-ascii symbol), we can use the content of the string (which we know is valid UTF-8 by virtue if being a unevaluated string). to Improve handling of static assert messages..
cor3ntin edited the summary of this revision. (Show Details)
This comment was removed by cor3ntin.
jfb added a subscriber: hwright.Aug 20 2021, 9:45 AM

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

Can you add tests for other special characters, to make sure they're all handled properly? Just copy/paste my twitter shitpost might be sufficient? I think I covered all the corner cases in it, gotta be thorough!

clang/lib/Basic/Diagnostic.cpp
818

We don't have a better hex formatter? 😟
Not a big deal, but I'd hoped that ADT had something!

clang/lib/Sema/SemaDeclCXX.cpp
16391

cast should already handle this.

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

Can you add tests for other special characters, to make sure they're all handled properly? Just copy/paste my twitter shitpost might be sufficient? I think I covered all the corner cases in it, gotta be thorough!

It's funny that you mention that because my code filters out LTR/RTL... I should probably change that!

cor3ntin added inline comments.Aug 20 2021, 9:53 AM
clang/lib/Basic/Diagnostic.cpp
818

There are a few hex formatter, none that pads to a minimum size!

cor3ntin updated this revision to Diff 367840.Aug 20 2021, 11:26 AM
  • Add more tests
  • Add a table for format codepoints - which we want to output

as is. These include among other ZWJ (for emojis) and LTR/RTL
marks.

  • At the same time, update the unicode tables to Unicode 13.

(We may want to move all the unicode tables
(there are some in clang and some in llvm) in the same place for
better maintainance in the future!)

Herald added a project: Restricted Project. Β· View Herald TranscriptAug 20 2021, 11:26 AM
cor3ntin updated this revision to Diff 367844.Aug 20 2021, 11:35 AM

Remove superfluous assert

martong removed a subscriber: martong.Aug 30 2021, 5:33 AM
cor3ntin planned changes to this revision.Sep 15 2021, 11:27 AM

I'll fix the tests

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

I think it's fine to change; we've never promised diagnostic formatting compatibility between versions. I'm sure *someone* is relying on this somewhere, but I'm not worried we're going to break a ton of people -- hopefully enough folks are tracking trunk that we can find any major issues before releasing.

Btw, it looks like the CI is currently failing the LLVM unit tests in interesting ways. That should be resolved.

There are changes in clang/test/Lexer/null-character-in-literal.c but Phab is unhelpful about showing what those changes are because it thinks the file is a binary file. Can you explain what's been changed there?

clang/lib/Basic/Diagnostic.cpp
791

Per coding conventions.

793

Please fix the clang-tidy and clang-format warnings.

Also, I think it is more commonly considered the name for an iterator, which this sort of isn't. I'd recommend going with Begin and End for names (that also fixes the coding style nit with the names).

803–806

You should fix these names up for the coding conventions (same suggestion applies elsewhere). Also, the local style is const unsigned char *.

819

I'm pretty sure you need to call flush() before using number.

clang/lib/Sema/SemaDeclCXX.cpp
16391–16393
cor3ntin updated this revision to Diff 373404.Sep 18 2021, 3:17 AM
cor3ntin marked 5 inline comments as done.
  • Fix tests, cland tidy
cor3ntin updated this revision to Diff 373405.Sep 18 2021, 3:18 AM

Formatting

I worry that changing the general static_assert printing (adding a colon, and dropping the quotes) will get @hwright's law to drop on us. We can try and see if e.g. users of clang have automated checks for static_assert in their CI pipelines or something. I think your new format looks better, but Hyrum is finicky that way... What do others think?

I think it's fine to change; we've never promised diagnostic formatting compatibility between versions. I'm sure *someone* is relying on this somewhere, but I'm not worried we're going to break a ton of people -- hopefully enough folks are tracking trunk that we can find any major issues before releasing.

Btw, it looks like the CI is currently failing the LLVM unit tests in interesting ways. That should be resolved.

There are changes in clang/test/Lexer/null-character-in-literal.c but Phab is unhelpful about showing what those changes are because it thinks the file is a binary file. Can you explain what's been changed there?

A verbatim NULL is replaced by <U+0000> in the expected-error of a test that checks the handling of null in header-name (!)

-                          // expected-error@-1 {{'null\^@character' file not found}}
+                          // expected-error@-1 {{'null\<U+0000>character' file not found}}
cor3ntin updated this revision to Diff 373408.Sep 18 2021, 3:45 AM

Properly comment isFormatting/isPrintable.

cor3ntin updated this revision to Diff 375244.Mon, Sep 27, 6:53 AM
This comment was removed by cor3ntin.
cor3ntin updated this revision to Diff 375251.Mon, Sep 27, 7:27 AM
This comment was removed by cor3ntin.
cor3ntin planned changes to this revision.Mon, Sep 27, 8:11 AM
cor3ntin updated this revision to Diff 375273.Mon, Sep 27, 8:18 AM

Correctly rebase on top of D105759

jfb added a comment.Mon, Sep 27, 9:01 AM

Can you test all the values in this? https://godbolt.org/z/h7n54fa5x

clang/lib/Basic/Diagnostic.cpp
792

Can this addition overflow?

cor3ntin added inline comments.Thu, Sep 30, 5:59 AM
clang/lib/Basic/Diagnostic.cpp
792

Technically yes, if you are trying to output a single string over 4GB I guess. Do we care?

Btw, this CI failure looks relevant: https://reviews.llvm.org/harbormaster/unit/view/1055822/ but... it looks more relevant to the parent patch than this one (and the parent seems to have a clean CI). May be worth looking into whether this is an issue or not.

clang/lib/Basic/Diagnostic.cpp
820
822
829–830
llvm/include/llvm/Support/Unicode.h
37
llvm/lib/Support/Unicode.cpp
271
aaron.ballman added inline comments.Thu, Sep 30, 7:38 AM
clang/lib/Basic/Diagnostic.cpp
792

Personally, I'm not overly worried about that scenario (I suspect we will run into overflow like that far earlier). But if @jfb has a situation in mind where this might be a concern, checking for overflow and diagnosing it wouldn't be the worst outcome either.

aaron.ballman added inline comments.Thu, Sep 30, 7:39 AM
clang/lib/Basic/Diagnostic.cpp
792

Though I should note: testing that scenario without causing undue impacts on running the tests may be interesting.

Btw, this CI failure looks relevant: https://reviews.llvm.org/harbormaster/unit/view/1055822/ but... it looks more relevant to the parent patch than this one (and the parent seems to have a clean CI). May be worth looking into whether this is an issue or not.

Yep, I'm waiting for the parent patch to get merge to cleanup this one.

cor3ntin abandoned this revision.Fri, Oct 1, 12:01 PM