This is an archive of the discontinued LLVM Phabricator instance.

[llvm-rc] Support UTF inputs.
Needs ReviewPublic

Authored by mnbvmar on Sep 29 2017, 3:10 PM.



This enables llvm-rc to consume UTF-8 and UTF-16 little endian inputs. The latter is a widely used Microsoft standard, and is given as a large part of the inputs. However, original rc tool also accepts UTF-8. Therefore, we need to find a way to determine the file encoding, and convert the input script to UTF-8.

We settle on the following algorithm: if the file starts with UTF-16 Byte Order Mark, or its second byte is equal to 0 (i.e., first character in the file is in range [0x00, 0xFF]), we guess the file is UTF-16LE. In the opposite case, we guess it's UTF-8.
This method should be enough for all feasible .rc inputs.

Diff Detail