Implements P2071 Named Universal Character Escapes - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).
We add
- A code generator that transforms UnicodeData.txt and NameAliases.txt to a space efficient data structure that can be queried in O(NameLength)
- A set of functions in Unicode.h to query that data, including
- A function to find an exact match of a given Unicode character name
- A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
- A function returning the best matching codepoint for a given string per edit distance
- Support of \N{} escape sequences in String and character Literals, with loose and typos diagnostics/fixits
- Support of \N{} as UCN with loose matching diagnostics/fixits.
Loose matching is considered an error to match closely the semantics of P2071.
The generated data contributes to 280kB of data to the binaries.
UnicodeData.txt and NameAliases.txt are not committed to the repository in this patch, and regenerating the data is a manual process.
I don't see much value in combining these diagnostics since these are distinct features. The ext_delimited_escape_sequence name seems odd for named escape sequences too (even if both features use { and } as delimiters).