This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Warn about invisible Hangul whitespace
AbandonedPublic

Authored by modocache on Mar 25 2019, 6:38 AM.

Details

Reviewers
chandlerc
rsmith
Summary

On Twitter @LunarLambda pointed out that Clang allows Hangul whitespace Unicode
characters in identifiers, which allows users to write very confusing
programs: https://twitter.com/LunarLambda/status/1110097030423240705

Clang warns about similar whitespace Unicode characters. Add the Hangul
half-width and full-width whitespace characters to the set that Clang
warns about.

N.B.: Clang warns about Japanese space character <U+3000>, but in a
different way, because that character is not a valid identifier
character according to the C++11 standard. So Clang emits a warning that
it will treat the Japanese <U+3000> as whitespace. This is different
from the Korean Hangul whitespace character, which is a valid identifier
character according to the C++11 standard. For this reason, Clang warns
the character will be treated as an identifier character, not as a
whitespace character -- so in sum, Clang's behavior is slightly
different for the Japanese whitespace character compared to the Korean
Hangul one.

Diff Detail

Event Timeline

modocache created this revision.Mar 25 2019, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2019, 6:38 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
modocache updated this revision to Diff 192092.Mar 25 2019, 6:41 AM

Remove unneeded change to test identifier 'xx'.

modocache edited the summary of this revision. (Show Details)Mar 25 2019, 6:41 AM
ruiu added a subscriber: ruiu.Mar 25 2019, 7:53 PM

I wonder if we should handle Unicode codepoints that are in the whitespace category as a whole, instead of handling each codepoint individually.

modocache abandoned this revision.Nov 21 2019, 7:40 PM

I'm not super interested in this patch anymore, someone else feel free to work on this! :)