Size of paths in utf8 is possibly bigger than in utf16.
Details
Diff Detail
Event Timeline
The usual practice is to include the entire context in the diff. However, the change itself LGTM.
Thanks for doing that, makes it easier for reviewers to see around the code without having to load up the file in question.
Perhaps this particular use case needs to be added to the unit testing ("llvm/unittests/Support/Path.cpp")?
It's just so obvious.
The unit-test TEST_F(FileSystemTest, widenPath) in llvm/unittests/Support/Path.cpp:2080-2146
has already demonstrated this.
#ifdef _WIN32 TEST_F(FileSystemTest, widenPath) { const std::wstring LongPathPrefix(L"\\\\?\\"); // Test that the length limit is checked against the UTF-16 length and not the // UTF-8 length. std::string Input("C:\\foldername\\"); const std::string Pi("\xcf\x80"); // UTF-8 lower case pi. // Add Pi up to the MAX_PATH limit. const size_t NumChars = MAX_PATH - Input.size() - 1; for (size_t i = 0; i < NumChars; ++i) Input += Pi; // Check that UTF-8 length already exceeds MAX_PATH. EXPECT_TRUE(Input.size() > MAX_PATH); SmallVector<wchar_t, MAX_PATH + 16> Result; ASSERT_NO_ERROR(windows::widenPath(Input, Result)); // Result should not start with the long path prefix. EXPECT_TRUE(std::wmemcmp(Result.data(), LongPathPrefix.c_str(), LongPathPrefix.size()) != 0); EXPECT_EQ(Result.size(), (size_t)MAX_PATH - 1); // Add another Pi to exceed the MAX_PATH limit.
Input.size() > MAX_PATH is expected to be true and Result.size() is expected to be MAX_PATH - 1, meaning
that PathUTF16[Path.size() - 1] is an out-of-bound array access.
That test is for widenPath and it does cover a similar situation (I know because I wrote it). The code you have changed relates to directory_iterator_construct which does not have coverage of the situation where the UTF16 length is less than the UTF8 length, i.e. the issue that you are fixing.
If another reviewer is happy to approve this change without any test coverage, then that's fine with me.