This fixes https://github.com/llvm/llvm-project/issues/55771.
Directly using StringLiteral::getString for wide string is not currently supported; therefore in ASTDiff, getStmtValue will fail when asserting that the StringLiteral has a width of 1. This patch also covers cases for UTF16 and UTF32 encoding, along with corresponding test cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I wonder if clang-diff is useful in its current state, I remember there are many edge cases left over.
Anyway, I've left some minor comments. I'm not sure how to download a patch that Git can apply so I didn't try it yet.
| clang/lib/Tooling/ASTDiff/ASTDiff.cpp | ||
|---|---|---|
| 468 | I think there are some other cases that need fixing too.
Can you fix them too? Should be quite similar. | |
| 471 | How about using "auto" here to avoid repeating the type? | |
| 474 | Instead of calling substr() it seems better to move this to the constructor std::wstring wstr(temp, wsize); (I think the reason why we need wsize is because strings can have embedded null characters) | |
Refactored some code and add support of U16 & U32 characters, as well as tests for them.
I've suggested some more refactorings but otherwise this should be good to go
| clang/lib/Tooling/ASTDiff/ASTDiff.cpp | ||
|---|---|---|
| 497 | Sorry for the delay. Looks pretty good already. think there's a bit of redundancy now, I tried to simplify it, WDYT? if (String->isWide() || String->isUTF16() || String->isUTF32()) {
std::string UTF8Str;
unsigned int NumChars = String->getLength();
const char *Bytes = String->getBytes().data();
if (String->isWide()) {
const auto *Chars = reinterpret_cast<const wchar_t *>(Bytes);
if (!convertWideToUTF8({Chars, NumChars}, UTF8Str))
return "";
} else if (String->isUTF16()) {
const auto *Chars = reinterpret_cast<const unsigned short *>(Bytes);
if (!convertUTF16ToUTF8String({Chars, NumChars}, UTF8Str))
return "";
} else {
assert(String->isUTF32() && "Unsupported string encoding.");
const auto *Chars = reinterpret_cast<const unsigned int *>(Bytes);
if (!convertUTF32ToUTF8String({Chars, NumChars}, UTF8Str))
return "";
}
return UTF8Str;
} | |
| clang/test/Tooling/clang-diff-ast.cpp | ||
| 80 | I think two lines per encoding should be enough, so how about doing this right next to the existing string literal? const char *foo(int i) {
if (i == 0)
// CHECK: StringLiteral: foo(
return "foo";
// CHECK: StringLiteral: wide(
(void)L"wide";
// CHECK: StringLiteral: utf-16(
(void)u"utf-16";
// CHECK: StringLiteral: utf-32(
(void)U"utf-32";
// CHECK-NOT: ImplicitCastExpr
return 0;
} | |
Sorry! I'm a novice at LLVM and I just didn't realize that those types can be implicitly cast to ArrayRef ... I have changed those and it should be fine now!
Hi, the test you modified in your change seems to be failing on the PS4 linux build bot at https://lab.llvm.org/buildbot/#/builders/139/builds/22964.
******************** TEST 'Clang :: Tooling/clang-diff-ast.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1'; /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang-diff -ast-dump /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp -- -std=c++11 | /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp
--
Exit Code: 1
Command Output (stderr):
--
/home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp:50:12: error: CHECK: expected string not found in input
// CHECK: StringLiteral: wide(
^
<stdin>:35:21: note: scanning from here
StringLiteral: foo(34)
^
<stdin>:39:2: note: possible intended match here
StringLiteral: utf-16(38)
^
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp
-dump-input=help explains the following input dump.
Input was:
<<<<<<
.
.
.
30: IfStmt(29)
31: BinaryOperator: ==(30)
32: DeclRefExpr: i(31)
33: IntegerLiteral: 0(32)
34: ReturnStmt(33)
35: StringLiteral: foo(34)
check:50'0 X~~~ error: no match found
36: CStyleCastExpr(35)
check:50'0 ~~~~~~~~~~~~~~~~~~~~
37: StringLiteral(36)
check:50'0 ~~~~~~~~~~~~~~~~~~~
38: CStyleCastExpr(37)
check:50'0 ~~~~~~~~~~~~~~~~~~~~
39: StringLiteral: utf-16(38)
check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:50'1 ? possible intended match
40: CStyleCastExpr(39)
check:50'0 ~~~~~~~~~~~~~~~~~~~~
41: StringLiteral: utf-32(40)
check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
42: ReturnStmt(41)
check:50'0 ~~~~~~~~~~~~~~~~
43: IntegerLiteral: 0(42)
check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~
44: AccessSpecDecl: public(43)
check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.
.
.
>>>>>>
--Can you take a look and revert if you need time to investigate?
No idea why the conversion fails for the wide string.
First step is to reproduce the problem. I guess we should try in a Ubuntu VM.
Is there an easy way to see if other builders succeeded?
Unfortunately I think you would have to search through the history for it.
I can tell you that our PS4 Windows buildbot did succeed though. https://lab.llvm.org/buildbot/#/builders/216/builds/5492
I think there are some other cases that need fixing too.
Can you fix them too? Should be quite similar.