This is an archive of the discontinued LLVM Phabricator instance.

Fix handling of medial hyphens in Unicode Names.
ClosedPublic

Authored by cor3ntin on Jul 28 2023, 2:28 AM.

Details

Summary

In a Unicode name was stored in a way that caused
a medial hyphen to be at the end of a a chunk, it would not
be properly ignored by the loose matching algorithm.

For example if LEFT-TO-RIGHT OVERRIDE was stored as
LEFT- [...], the - would not be ignored.

The generators now ensures nodes are not cut accross
medial hyphen boundaries.

Fixes #64161

Diff Detail

Event Timeline

cor3ntin created this revision.Jul 28 2023, 2:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 2:28 AM
cor3ntin requested review of this revision.Jul 28 2023, 2:28 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 28 2023, 2:28 AM
tbaeder added inline comments.
llvm/lib/Support/UnicodeNameToCodepoint.cpp
350
cor3ntin updated this revision to Diff 545078.Jul 28 2023, 2:38 AM

Fix comment

aaron.ballman accepted this revision.Jul 28 2023, 5:04 AM

LGTM

llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp
100

Parens here might be a kindness to folks who don't keep precedence rules in their head very well. ;-)

This revision is now accepted and ready to land.Jul 28 2023, 5:04 AM
This revision was landed with ongoing or failed builds.Jul 28 2023, 6:09 AM
This revision was automatically updated to reflect the committed changes.
cor3ntin added inline comments.Jul 28 2023, 6:13 AM
llvm/utils/UnicodeData/UnicodeNameMappingGenerator.cpp
100

you are not wrong, I added parens!