Download Raw Diff

Details

Reviewers

aaron.ballman
erichkeane
tahonermann
shafik
cor3ntin

Commits

rG52ba50c24c55: [clang][Interp] Implement String- and CharacterLiterals
rG6fa1795d1fd4: [clang][Interp] Implement String- and CharacterLiterals

Diff Detail

Event Timeline

tbaeder created this revision.Oct 6 2022, 8:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 8:01 AM

tbaeder requested review of this revision.Oct 6 2022, 8:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 8:01 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B190735: Diff 465741.Oct 6 2022, 8:02 AM

tbaeder added inline comments.Oct 6 2022, 11:13 PM

clang/lib/AST/Interp/ByteCodeExprGen.cpp
448	The `0` here is unused in `emitConst` anyway. I have a follow-up NFC patch locally to remove that parameter from `emitConst` altogether.

aaron.ballman added inline comments.Oct 7 2022, 10:14 AM

clang/lib/AST/Interp/ByteCodeExprGen.cpp
447	Should this be concerned about the sign of the promoted literal?
clang/test/AST/Interp/arrays.cpp
135 ↗	(On Diff #465741)	I think you need a more coverage for character literals. Some test cases that are interesting: multichar literals (`'abcd'`), character literals with UCNs (`'\uFFFF'`), character literals with numeric escapes (`'\xFF'`). I'm especially interested in seeing whether we handle integer promotions properly, especially when promoting to `unsigned int`.

tbaeder updated this revision to Diff 466262.Oct 8 2022, 12:20 AM

Harbormaster completed remote builds in B191078: Diff 466262.Oct 8 2022, 12:21 AM

tbaeder added inline comments.Oct 8 2022, 12:22 AM

clang/lib/AST/Interp/ByteCodeExprGen.cpp
447	Switched to `emitConst()` that takes into account signedness.
clang/test/AST/Interp/arrays.cpp
135 ↗	(On Diff #465741)	I added two more test cases but I'm generally not that familiar with character literal edge cases and integer promotion, so if you have concrete test cases in mind, that would be great :)

shafik added inline comments.Oct 9 2022, 4:06 PM

clang/test/AST/Interp/arrays.cpp
128 ↗	(On Diff #466262)	Also `2["foobar"]`
135 ↗	(On Diff #465741)	We can find GNU documentation of multi-char literals here: https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior and I believe we follow the same scheme. There are some weird cases like `'\8'` which compilers seem to treat consistently but generate a diagnostic for.

tbaeder updated this revision to Diff 466461.Oct 10 2022, 4:01 AM

tbaeder marked an inline comment as done.

Harbormaster completed remote builds in B191243: Diff 466461.Oct 10 2022, 4:02 AM

aaron.ballman added a subscriber: cor3ntin.Oct 10 2022, 6:59 AM

aaron.ballman added inline comments.

clang/test/AST/Interp/arrays.cpp
135 ↗	(On Diff #465741)	CC @tahonermann and @cor3ntin as text encoding code owners -- they likely know all the worst test cases to be thinking about in this space.

cor3ntin added inline comments.Oct 10 2022, 7:50 AM

clang/test/AST/Interp/arrays.cpp
135 ↗	(On Diff #465741)	Most weirdness are taken care of during lexing. What is interesting to test during evaluation multi character literals ie `'abcd'` `u8` (utf8) and `u` literals (utf16) Note that wide characters literals were removed from C++ so it's no longer a concern

tahonermann added inline comments.Oct 10 2022, 2:06 PM

clang/lib/AST/Interp/ByteCodeExprGen.cpp
445–446	`T` isn't used and `classify()` already returns an `Optional<PrimType>`, so no conversion coercion appear to be intended here. I think this can be simplified to: if (classify(E)) this->emitConst(E, E->getValue());
clang/test/AST/Interp/arrays.cpp
138 ↗	(On Diff #466461)	This check depends on the source file being UTF-8 encoded. Perhaps compare against `0x0001F60EL` instead.
143 ↗	(On Diff #466461)	As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of character escapes like `\t` wouldn't hurt either. Clang does not yet support use of `-fexec-charset` to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year.
135 ↗	(On Diff #465741)	I believe Corentin meant that wide multicharacter literals were removed from C++. That is true for C++23 (they were removed by the adoption of P2362R3 during the July 2022 WG21 plenary) but that change was not adopted as a DR, so they remain part of the language for prior language modes. However, they were a conditionally supported feature that, as of Clang 14, are diagnosed as an error in all language modes. All that to say that I agree; no test for them needed.

cor3ntin added inline comments.Oct 10 2022, 2:29 PM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	Yes, wide multicharacter literals, that's was important information missing, thanks for spotting that. I have mixed feeling about adding tests for escape sequences. Their replacement doesn't happen during constant evaluation. We shouldn't replicate the lexing tests here. but we should compare string literal with byte values. Testing a string literal against another one doesn't ensure the code units are correct if both are equally miss evaluated. Also we could add explicit tests for null termination here as they are added as part of evaluation in theory - but then again that's also something clang does earlier. If we want we could consider enabling the byte code interpreter on the existing lexing test files, i actually think that's the better way to deal with the escape sequences tests.

tbaeder marked 3 inline comments as done.Oct 10 2022, 10:29 PM

tbaeder added inline comments.

clang/lib/AST/Interp/ByteCodeExprGen.cpp
445–446	Good point. Since the value of the expression is always an `APInt`, might was well skip the `classify()` altogether. `emitConst()` does the classify anyway.
clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	I changed the first test that inspects all characters of a string to comparing with integers instead. Do you have a suggestion for what lexing tests to enable the constant interpreter in?

tbaeder updated this revision to Diff 466700.Oct 10 2022, 10:29 PM

tbaeder marked an inline comment as done.

Harbormaster completed remote builds in B191425: Diff 466700.Oct 10 2022, 10:30 PM

cor3ntin added inline comments.Oct 11 2022, 8:16 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	I think good candidates are Lexer/char-escapes.c Lexer/char-escapes-delimited.c Lexer/char-literal.cpp

tahonermann added inline comments.Oct 11 2022, 8:49 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	Of those, only `Lexer/char-escapes.c` does much validation of literal values. I prefer the approach Timm has already taken relative to those tests. It looks like we don't have an existing set of Sema tests for character and string literals. How about we move this test under `clang/test/Sema`? That would be the appropriate place to exercise values relative to `-fexec-charset` support for non-UTF-8 encodings in the future. If that sounds amenable, then I'd like the test split to exercise character and string literals separately. The character literal tests don't really belong in a test named `arrays.cpp` as is.

cor3ntin added inline comments.Oct 11 2022, 9:00 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	Of those, only Lexer/char-escapes.c does much validation of literal values. I prefer the approach Timm has already taken relative to those tests. We can do both, I was not arguing against the test we have here, I'm happy with those :) I'm opposed to duplicate tests for escape sequences here. using the new interpreter on tests that already exist (in addition of the existing run commands) is pretty easy and cheap to do. I don't have opinions how the new interpreter tests are organized. Ideally we would have a complete set of test that is equally suitable for both constexpr engines, but maybe that's something that does not need to be done as part of this PR

tbaeder added inline comments.Oct 12 2022, 12:10 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	I've added a new run line to `test/Lexer/char-escapes.c`, which works fine. To summarize, the plan is to add a new test to `clang/test/Sema`? Or split the char tests out from `arrays.cpp` and add some for multicharacter char sequences?

tahonermann added inline comments.Oct 12 2022, 11:26 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	I'm a bit uncertain regarding the purpose of tests in the `Interp` directory as opposed to tests under the `Sema*` directories. Basically, I'm unsure why we wouldn't want just one set of tests that exercise the semantics of the language, at least with regard to constant evaluation. If the interpreter is also intended to (eventually) interpret full programs (e.g., evaluating `main()`), I think such tests would be appropriate for an `Interp` directory. But I say that having not closely followed the development and previous reviews of the new interpreter. At a minimum, I would like to see the character and string literal testing pulled out of `arrays.cpp`. For the rest, I'll defer to others that have been more involved in these reviews.

aaron.ballman added inline comments.Oct 12 2022, 11:48 AM

clang/test/AST/Interp/arrays.cpp
143 ↗	(On Diff #466461)	I'm a bit uncertain regarding the purpose of tests in the Interp directory as opposed to tests under the Sema* directories. Basically, I'm unsure why we wouldn't want just one set of tests that exercise the semantics of the language, at least with regard to constant evaluation. The line is a bit blurry; there's certainly semantic tests happening for the constant expression interpreter. I think where I draw the line is on what the goal of the test is. If the goal is to test new interpreter behavior, it goes in Interp. If the goal is to test semantic behavior (and happens to test the new interpreter as well), then Sema seems more reasonable. Ultimately, the really important bit is getting the tests at all, so thank you both for your advice on this! At a minimum, I would like to see the character and string literal testing pulled out of arrays.cpp. I agree, it'd be nice to put them into a file more focused on literals. It's fine to have some string literal and character literal testing in arrays (a string is an array of characters, after all), but it shouldn't be the primary place where that coverage lives.

tbaeder updated this revision to Diff 467399.Oct 13 2022, 1:43 AM

Harbormaster completed remote builds in B191916: Diff 467399.Oct 13 2022, 1:43 AM

I've moved the tests to literals.cpp, is there anything else left?

Ping

Give some time for Aaron/Tom to have the chance to raise further comment but I'm happy with it otherwise.
Thanks for doing that, and sorry I missed you earlier ping.

This revision is now accepted and ready to land.Oct 27 2022, 1:53 AM

LGTM aside from a testing request.

clang/test/AST/Interp/literals.cpp
342	I'd like to see the same test for `u`, `U`, and `u8` prefixes (which should all fail the same was as `L`).

tbaeder updated this revision to Diff 471144.Oct 27 2022, 5:54 AM

Harbormaster completed remote builds in B194641: Diff 471144.Oct 27 2022, 7:18 AM

Closed by commit rG6fa1795d1fd4: [clang][Interp] Implement String- and CharacterLiterals (authored by tbaeder). · Explain WhyOct 28 2022, 3:50 AM

This revision was automatically updated to reflect the committed changes.

tbaeder added a commit: rG6fa1795d1fd4: [clang][Interp] Implement String- and CharacterLiterals.

One of your commits doesn't build on windows with clang-cl as host compiler: http://45.33.8.238/win/68965/step_4.txt

Please take a look and revert for now if it takes a while to fix.

tbaeder added a reverting change: rG63e3fe108824: Revert "[clang][Interp] Implement String- and CharacterLiterals".Oct 28 2022, 4:34 AM

tbaeder added a commit: rG52ba50c24c55: [clang][Interp] Implement String- and CharacterLiterals.Oct 28 2022, 6:05 AM

Give some time for Aaron/Tom to have the chance to raise further comment but I'm happy with it otherwise.

I'm sorry for the delay getting back to this and I see this has already been pushed (and a fix then pushed). The changes look good to me too.

Diff 467399

clang/lib/AST/Interp/ByteCodeExprGen.h

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	public:
bool VisitArraySubscriptExpr(const ArraySubscriptExpr *E);		bool VisitArraySubscriptExpr(const ArraySubscriptExpr *E);
bool VisitInitListExpr(const InitListExpr *E);		bool VisitInitListExpr(const InitListExpr *E);
bool VisitConstantExpr(const ConstantExpr *E);		bool VisitConstantExpr(const ConstantExpr *E);
bool VisitUnaryExprOrTypeTraitExpr(const UnaryExprOrTypeTraitExpr *E);		bool VisitUnaryExprOrTypeTraitExpr(const UnaryExprOrTypeTraitExpr *E);
bool VisitMemberExpr(const MemberExpr *E);		bool VisitMemberExpr(const MemberExpr *E);
bool VisitArrayInitIndexExpr(const ArrayInitIndexExpr *E);		bool VisitArrayInitIndexExpr(const ArrayInitIndexExpr *E);
bool VisitOpaqueValueExpr(const OpaqueValueExpr *E);		bool VisitOpaqueValueExpr(const OpaqueValueExpr *E);
bool VisitAbstractConditionalOperator(const AbstractConditionalOperator *E);		bool VisitAbstractConditionalOperator(const AbstractConditionalOperator *E);
		bool VisitStringLiteral(const StringLiteral *E);
		bool VisitCharacterLiteral(const CharacterLiteral *E);

protected:		protected:
bool visitExpr(const Expr *E) override;		bool visitExpr(const Expr *E) override;
bool visitDecl(const VarDecl *VD) override;		bool visitDecl(const VarDecl *VD) override;

protected:		protected:
/// Emits scope cleanup instructions.		/// Emits scope cleanup instructions.
void emitCleanup();		void emitCleanup();
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

clang/lib/AST/Interp/ByteCodeExprGen.cpp

Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines	if (!this->visit(FalseExpr))
return false;		return false;

this->fallthrough(LabelEnd);		this->fallthrough(LabelEnd);
this->emitLabel(LabelEnd);		this->emitLabel(LabelEnd);

return true;		return true;
}		}

		template <class Emitter>
		bool ByteCodeExprGen<Emitter>::VisitStringLiteral(const StringLiteral *E) {
		unsigned StringIndex = P.createGlobalString(E);
		return this->emitGetPtrGlobal(StringIndex, E);
		}

		template <class Emitter>
		bool ByteCodeExprGen<Emitter>::VisitCharacterLiteral(
		const CharacterLiteral *E) {
		return this->emitConst(E, E->getValue());
		}
		tahonermannUnsubmitted Done Reply Inline Actions `T` isn't used and `classify()` already returns an `Optional<PrimType>`, so no conversion coercion appear to be intended here. I think this can be simplified to: if (classify(E)) this->emitConst(E, E->getValue()); tahonermann: `T` isn't used and `classify()` already returns an `Optional<PrimType>`, so no conversion…
		tbaederAuthorUnsubmitted Done Reply Inline Actions Good point. Since the value of the expression is always an `APInt`, might was well skip the `classify()` altogether. `emitConst()` does the classify anyway. tbaeder: Good point. Since the value of the expression is always an `APInt`, might was well skip the…

		aaron.ballmanUnsubmitted Done Reply Inline Actions Should this be concerned about the sign of the promoted literal? aaron.ballman: Should this be concerned about the sign of the promoted literal?
		tbaederAuthorUnsubmitted Done Reply Inline Actions Switched to `emitConst()` that takes into account signedness. tbaeder: Switched to `emitConst()` that takes into account signedness.
template <class Emitter> bool ByteCodeExprGen<Emitter>::discard(const Expr *E) {		template <class Emitter> bool ByteCodeExprGen<Emitter>::discard(const Expr *E) {
		tbaederAuthorUnsubmitted Done Reply Inline Actions The `0` here is unused in `emitConst` anyway. I have a follow-up NFC patch locally to remove that parameter from `emitConst` altogether. tbaeder: The `0` here is unused in `emitConst` anyway. I have a follow-up NFC patch locally to remove…
OptionScope<Emitter> Scope(this, /NewDiscardResult=/true);		OptionScope<Emitter> Scope(this, /NewDiscardResult=/true);
return this->Visit(E);		return this->Visit(E);
}		}

template <class Emitter>		template <class Emitter>
bool ByteCodeExprGen<Emitter>::visit(const Expr *E) {		bool ByteCodeExprGen<Emitter>::visit(const Expr *E) {
OptionScope<Emitter> Scope(this, /NewDiscardResult=/false);		OptionScope<Emitter> Scope(this, /NewDiscardResult=/false);
return this->Visit(E);		return this->Visit(E);
▲ Show 20 Lines • Show All 762 Lines • Show Last 20 Lines

clang/test/AST/Interp/literals.cpp

Show First 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	namespace floats {

/// Initialized by a double.		/// Initialized by a double.
constexpr float df = 0.0;		constexpr float df = 0.0;
/// The other way around.		/// The other way around.
constexpr double fd = 0.0f;		constexpr double fd = 0.0f;

static_assert(0.0f == -0.0f, "");		static_assert(0.0f == -0.0f, "");
};		};

		namespace strings {
		constexpr const char *S = "abc";
		static_assert(S[0] == 97, "");
		static_assert(S[1] == 98, "");
		static_assert(S[2] == 99, "");
		static_assert(S[3] == 0, "");

		static_assert("foobar"[2] == 'o', "");
		static_assert(2["foobar"] == 'o', "");

		constexpr const wchar_t *wide = L"bar";
		static_assert(wide[0] == L'b', "");

		constexpr const char32_t *u32 = U"abc";
		static_assert(u32[1] == U'b', "");

		constexpr char32_t c = U'\U0001F60E';
		static_assert(c == 0x0001F60EL, "");

		constexpr char k = -1;
		static_assert(k == -1, "");

		static_assert('\N{LATIN CAPITAL LETTER E}' == 'E', "");
		static_assert('\t' == 9, "");

		#pragma clang diagnostic push
		#pragma clang diagnostic ignored "-Wmultichar"
		constexpr int mc = 'abc';
		static_assert(mc == 'abc', "");
		__WCHAR_TYPE__ wm = L'abc'; // ref-error{{wide character literals may not contain multiple characters}} \
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions I'd like to see the same test for `u`, `U`, and `u8` prefixes (which should all fail the same was as `L`). aaron.ballman: I'd like to see the same test for `u`, `U`, and `u8` prefixes (which should all fail the same…
		// expected-error{{wide character literals may not contain multiple characters}}
		#pragma clang diagnostic pop
		};

clang/test/Lexer/char-escapes.c

	// RUN: %clang_cc1 -fsyntax-only -pedantic -verify %s			// RUN: %clang_cc1 -fsyntax-only -pedantic -verify %s
				// RUN: %clang_cc1 -fsyntax-only -pedantic -fexperimental-new-constant-interpreter -verify %s

	int test['\\' == 92 ? 1 : -1];			int test['\\' == 92 ? 1 : -1];
	int test['\"' == 34 ? 1 : -1];			int test['\"' == 34 ? 1 : -1];
	int test['\'' == 39 ? 1 : -1];			int test['\'' == 39 ? 1 : -1];
	int test['\?' == 63 ? 1 : -1];			int test['\?' == 63 ? 1 : -1];
	int test['\a' == 7 ? 1 : -1];			int test['\a' == 7 ? 1 : -1];
	int test['\b' == 8 ? 1 : -1];			int test['\b' == 8 ? 1 : -1];
	int test['\e' == 27 ? 1 : -1]; // expected-warning {{non-standard escape}}			int test['\e' == 27 ? 1 : -1]; // expected-warning {{non-standard escape}}
	Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang][Interp] Implement String- and CharacterLiterals
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 467399

clang/lib/AST/Interp/ByteCodeExprGen.h

clang/lib/AST/Interp/ByteCodeExprGen.cpp

clang/test/AST/Interp/literals.cpp

clang/test/Lexer/char-escapes.c

This is an archive of the discontinued LLVM Phabricator instance.

[clang][Interp] Implement String- and CharacterLiteralsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 467399

clang/lib/AST/Interp/ByteCodeExprGen.h

clang/lib/AST/Interp/ByteCodeExprGen.cpp

clang/test/AST/Interp/literals.cpp

clang/test/Lexer/char-escapes.c

[clang][Interp] Implement String- and CharacterLiterals
ClosedPublic