Details
Diff Detail
Event Timeline
clang/lib/AST/Interp/ByteCodeExprGen.cpp | ||
---|---|---|
447 | The 0 here is unused in emitConst anyway. I have a follow-up NFC patch locally to remove that parameter from emitConst altogether. |
clang/lib/AST/Interp/ByteCodeExprGen.cpp | ||
---|---|---|
446 | Should this be concerned about the sign of the promoted literal? | |
clang/test/AST/Interp/arrays.cpp | ||
135 | I think you need a more coverage for character literals. Some test cases that are interesting: multichar literals ('abcd'), character literals with UCNs ('\uFFFF'), character literals with numeric escapes ('\xFF'). I'm especially interested in seeing whether we handle integer promotions properly, especially when promoting to unsigned int. |
clang/lib/AST/Interp/ByteCodeExprGen.cpp | ||
---|---|---|
446 | Switched to emitConst() that takes into account signedness. | |
clang/test/AST/Interp/arrays.cpp | ||
135 | I added two more test cases but I'm generally not that familiar with character literal edge cases and integer promotion, so if you have concrete test cases in mind, that would be great :) |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
128 | Also 2["foobar"] | |
135 | We can find GNU documentation of multi-char literals here: https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior and I believe we follow the same scheme. There are some weird cases like '\8' which compilers seem to treat consistently but generate a diagnostic for. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
135 | CC @tahonermann and @cor3ntin as text encoding code owners -- they likely know all the worst test cases to be thinking about in this space. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
135 | Most weirdness are taken care of during lexing.
Note that wide characters literals were removed from C++ so it's no longer a concern |
clang/lib/AST/Interp/ByteCodeExprGen.cpp | ||
---|---|---|
444–445 | T isn't used and classify() already returns an Optional<PrimType>, so no conversion coercion appear to be intended here. I think this can be simplified to: if (classify(E)) this->emitConst(E, E->getValue()); | |
clang/test/AST/Interp/arrays.cpp | ||
135 | I believe Corentin meant that wide multicharacter literals were removed from C++. That is true for C++23 (they were removed by the adoption of P2362R3 during the July 2022 WG21 plenary) but that change was not adopted as a DR, so they remain part of the language for prior language modes. However, they were a conditionally supported feature that, as of Clang 14, are diagnosed as an error in all language modes. All that to say that I agree; no test for them needed. | |
138 | This check depends on the source file being UTF-8 encoded. Perhaps compare against 0x0001F60EL instead. | |
143 | As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like \N{LATIN_CAPITAL_LETTER_E} would be beneficial. Some tests of character escapes like \t wouldn't hurt either. Clang does not yet support use of -fexec-charset to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 | Yes, wide multicharacter literals, that's was important information missing, thanks for spotting that. I have mixed feeling about adding tests for escape sequences. Their replacement doesn't happen during constant evaluation. but we should compare string literal with byte values. Testing a string literal against another one doesn't ensure the code units are correct if both are equally miss evaluated. Also we could add explicit tests for null termination here as they are added as part of evaluation in theory - but then again that's also something clang does earlier. If we want we could consider enabling the byte code interpreter on the existing lexing test files, i actually think that's the better way to deal with the escape sequences tests. |
clang/lib/AST/Interp/ByteCodeExprGen.cpp | ||
---|---|---|
444–445 | Good point. Since the value of the expression is always an APInt, might was well skip the classify() altogether. emitConst() does the classify anyway. | |
clang/test/AST/Interp/arrays.cpp | ||
143 | I changed the first test that inspects all characters of a string to comparing with integers instead. Do you have a suggestion for what lexing tests to enable the constant interpreter in? |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 | I think good candidates are Lexer/char-escapes.c |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 | Of those, only Lexer/char-escapes.c does much validation of literal values. I prefer the approach Timm has already taken relative to those tests. It looks like we don't have an existing set of Sema tests for character and string literals. How about we move this test under clang/test/Sema? That would be the appropriate place to exercise values relative to -fexec-charset support for non-UTF-8 encodings in the future. If that sounds amenable, then I'd like the test split to exercise character and string literals separately. The character literal tests don't really belong in a test named arrays.cpp as is. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 |
We can do both, I was not arguing against the test we have here, I'm happy with those :) I don't have opinions how the new interpreter tests are organized. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 | I've added a new run line to test/Lexer/char-escapes.c, which works fine. To summarize, the plan is to add a new test to clang/test/Sema? Or split the char tests out from arrays.cpp and add some for multicharacter char sequences? |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 | I'm a bit uncertain regarding the purpose of tests in the Interp directory as opposed to tests under the Sema* directories. Basically, I'm unsure why we wouldn't want just one set of tests that exercise the semantics of the language, at least with regard to constant evaluation. If the interpreter is also intended to (eventually) interpret full programs (e.g., evaluating main()), I think such tests would be appropriate for an Interp directory. But I say that having not closely followed the development and previous reviews of the new interpreter. At a minimum, I would like to see the character and string literal testing pulled out of arrays.cpp. For the rest, I'll defer to others that have been more involved in these reviews. |
clang/test/AST/Interp/arrays.cpp | ||
---|---|---|
143 |
The line is a bit blurry; there's certainly semantic tests happening for the constant expression interpreter. I think where I draw the line is on what the goal of the test is. If the goal is to test new interpreter behavior, it goes in Interp. If the goal is to test semantic behavior (and happens to test the new interpreter as well), then Sema seems more reasonable. Ultimately, the really important bit is getting the tests *at all*, so thank you both for your advice on this!
I agree, it'd be nice to put them into a file more focused on literals. It's fine to have some string literal and character literal testing in arrays (a string is an array of characters, after all), but it shouldn't be the primary place where that coverage lives. |
Give some time for Aaron/Tom to have the chance to raise further comment but I'm happy with it otherwise.
Thanks for doing that, and sorry I missed you earlier ping.
LGTM aside from a testing request.
clang/test/AST/Interp/literals.cpp | ||
---|---|---|
342 ↗ | (On Diff #467399) | I'd like to see the same test for u, U, and u8 prefixes (which should all fail the same was as L). |
One of your commits doesn't build on windows with clang-cl as host compiler: http://45.33.8.238/win/68965/step_4.txt
Please take a look and revert for now if it takes a while to fix.
Give some time for Aaron/Tom to have the chance to raise further comment but I'm happy with it otherwise.
I'm sorry for the delay getting back to this and I see this has already been pushed (and a fix then pushed). The changes look good to me too.
T isn't used and classify() already returns an Optional<PrimType>, so no conversion coercion appear to be intended here. I think this can be simplified to: