This fixes https://github.com/llvm/llvm-project/issues/55771.
Directly using StringLiteral::getString for wide string is not currently supported; therefore in ASTDiff, getStmtValue will fail when asserting that the StringLiteral has a width of 1. This patch also covers cases for UTF16 and UTF32 encoding, along with corresponding test cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I wonder if clang-diff is useful in its current state, I remember there are many edge cases left over.
Anyway, I've left some minor comments. I'm not sure how to download a patch that Git can apply so I didn't try it yet.
clang/lib/Tooling/ASTDiff/ASTDiff.cpp | ||
---|---|---|
468 | I think there are some other cases that need fixing too.
Can you fix them too? Should be quite similar. | |
471 | How about using "auto" here to avoid repeating the type? | |
474 | Instead of calling substr() it seems better to move this to the constructor std::wstring wstr(temp, wsize); (I think the reason why we need wsize is because strings can have embedded null characters) |
Refactored some code and add support of U16 & U32 characters, as well as tests for them.
I've suggested some more refactorings but otherwise this should be good to go
clang/lib/Tooling/ASTDiff/ASTDiff.cpp | ||
---|---|---|
497 | Sorry for the delay. Looks pretty good already. think there's a bit of redundancy now, I tried to simplify it, WDYT? if (String->isWide() || String->isUTF16() || String->isUTF32()) { std::string UTF8Str; unsigned int NumChars = String->getLength(); const char *Bytes = String->getBytes().data(); if (String->isWide()) { const auto *Chars = reinterpret_cast<const wchar_t *>(Bytes); if (!convertWideToUTF8({Chars, NumChars}, UTF8Str)) return ""; } else if (String->isUTF16()) { const auto *Chars = reinterpret_cast<const unsigned short *>(Bytes); if (!convertUTF16ToUTF8String({Chars, NumChars}, UTF8Str)) return ""; } else { assert(String->isUTF32() && "Unsupported string encoding."); const auto *Chars = reinterpret_cast<const unsigned int *>(Bytes); if (!convertUTF32ToUTF8String({Chars, NumChars}, UTF8Str)) return ""; } return UTF8Str; } | |
clang/test/Tooling/clang-diff-ast.cpp | ||
80 | I think two lines per encoding should be enough, so how about doing this right next to the existing string literal? const char *foo(int i) { if (i == 0) // CHECK: StringLiteral: foo( return "foo"; // CHECK: StringLiteral: wide( (void)L"wide"; // CHECK: StringLiteral: utf-16( (void)u"utf-16"; // CHECK: StringLiteral: utf-32( (void)U"utf-32"; // CHECK-NOT: ImplicitCastExpr return 0; } |
Sorry! I'm a novice at LLVM and I just didn't realize that those types can be implicitly cast to ArrayRef ... I have changed those and it should be fine now!
Hi, the test you modified in your change seems to be failing on the PS4 linux build bot at https://lab.llvm.org/buildbot/#/builders/139/builds/22964.
******************** TEST 'Clang :: Tooling/clang-diff-ast.cpp' FAILED ******************** Script: -- : 'RUN: at line 1'; /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang-diff -ast-dump /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp -- -std=c++11 | /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp -- Exit Code: 1 Command Output (stderr): -- /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp:50:12: error: CHECK: expected string not found in input // CHECK: StringLiteral: wide( ^ <stdin>:35:21: note: scanning from here StringLiteral: foo(34) ^ <stdin>:39:2: note: possible intended match here StringLiteral: utf-16(38) ^ Input file: <stdin> Check file: /home/buildbot/buildbot-root/llvm-project/clang/test/Tooling/clang-diff-ast.cpp -dump-input=help explains the following input dump. Input was: <<<<<< . . . 30: IfStmt(29) 31: BinaryOperator: ==(30) 32: DeclRefExpr: i(31) 33: IntegerLiteral: 0(32) 34: ReturnStmt(33) 35: StringLiteral: foo(34) check:50'0 X~~~ error: no match found 36: CStyleCastExpr(35) check:50'0 ~~~~~~~~~~~~~~~~~~~~ 37: StringLiteral(36) check:50'0 ~~~~~~~~~~~~~~~~~~~ 38: CStyleCastExpr(37) check:50'0 ~~~~~~~~~~~~~~~~~~~~ 39: StringLiteral: utf-16(38) check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ check:50'1 ? possible intended match 40: CStyleCastExpr(39) check:50'0 ~~~~~~~~~~~~~~~~~~~~ 41: StringLiteral: utf-32(40) check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42: ReturnStmt(41) check:50'0 ~~~~~~~~~~~~~~~~ 43: IntegerLiteral: 0(42) check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~ 44: AccessSpecDecl: public(43) check:50'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . . >>>>>> --
Can you take a look and revert if you need time to investigate?
No idea why the conversion fails for the wide string.
First step is to reproduce the problem. I guess we should try in a Ubuntu VM.
Is there an easy way to see if other builders succeeded?
Unfortunately I think you would have to search through the history for it.
I can tell you that our PS4 Windows buildbot did succeed though. https://lab.llvm.org/buildbot/#/builders/216/builds/5492
I think there are some other cases that need fixing too.
Can you fix them too? Should be quite similar.