This is an archive of the discontinued LLVM Phabricator instance.

[Lex] Don't assert when decoding invalid UCNs.
ClosedPublic

Authored by sammccall on May 5 2022, 4:44 PM.

Details

Summary

Currently if a lexically-valid UCN encodes an invalid codepoint, then we
diagnose that, and then hit an assertion while trying to decode it.

Since there isn't anything preventing us reaching this state, remove the
assertion. expandUCNs("X\UAAAAAAAAY") will produce "XY".

Diff Detail

Event Timeline

sammccall created this revision.May 5 2022, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 4:44 PM
sammccall requested review of this revision.May 5 2022, 4:44 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 5 2022, 4:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

With this and D125049, I no longer get any low-hanging fuzzer crashes.

(Though we're probably mostly fuzzing the lexer. Providing a list of tokens as a dictionaly may help)

hokein accepted this revision.May 5 2022, 10:19 PM
hokein added inline comments.
clang-tools-extra/pseudo/test/crash/bad-ucn.c
4

nit: this test seems duplicated with the one in unicode.c, I'd just remove it.

This revision is now accepted and ready to land.May 5 2022, 10:19 PM
This revision was landed with ongoing or failed builds.May 5 2022, 11:52 PM
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.