This is an archive of the discontinued LLVM Phabricator instance.

[clang][Interp] Array initialization via string literal
ClosedPublic

Authored by tbaeder on Nov 5 2022, 6:06 AM.

Diff Detail

Event Timeline

tbaeder created this revision.Nov 5 2022, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2022, 6:06 AM
tbaeder requested review of this revision.Nov 5 2022, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2022, 6:06 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tschuett added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1081

Program

tbaeder marked an inline comment as done.Nov 5 2022, 10:38 AM
tschuett added inline comments.Nov 5 2022, 11:29 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1081

Probably I misunderstood something. It says:

Porgram::createGlobalString

I would expected it to start with Program and not Porgram.

tbaeder added inline comments.Nov 5 2022, 2:09 PM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1081

Yep, I fixed it locally.

aaron.ballman added inline comments.Nov 8 2022, 9:49 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1076–1077
1082
1086

Should we be looking at the sign of char to decide whether to use a uint8 or an sint8?

1100
clang/test/AST/Interp/literals.cpp
354–359

I'd like to see some tests for the other encodings, as well as a test with embedded null characters in the literal.

Testing a string literal that's longer than the array is something we should think about. That code is ill-formed in C++, so I don't think we can add a test for it yet, but it's only a warning in C.

tbaeder updated this revision to Diff 474177.Nov 9 2022, 12:35 AM
tbaeder marked 4 inline comments as done.
tbaeder added inline comments.
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1076–1077

I feel like every time I write the code to get the ConstantArrayType from some array expression, I use a different version. I've never used getAsConstantArrayType() before :)

1086

I was wondering about that too, but this code is copy/paste from Program.cpp where we create global storage for string literals. That code works, so I assume it will work here, too. Getting rid of the duplication might be nice though.

tahonermann added inline comments.Nov 15 2022, 12:36 PM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1098–1099

Aren't N and NumElems guaranteed to have the same value here? Both are derived from SL. The code seems to be written with the expectation that NumElems corresponds to the number of elements to be iniitialized in the target array.

1100

CodePoint should be CodeUnit here. char and friends all hold code units (in some cases, those code units may also constitute a code point).

clang/test/AST/Interp/literals.cpp
354–359

I agree with Aaron's requests. Please also extend the test to include a char element that would be negative for a signed 8-bit char. Something like:

constexpr char foo[12] = "abc\xff";
...
#if defined(__CHAR_UNSIGNED__) || __CHAR_BIT__ > 8
static_assert(foo[3] == 255, "");
#else
static_assert(foo[3] == -1, "");
#endif

A couple of more tests to add:

  • One where the string literal has the same length (including the implicit terminator) as the array; to ensure that the implicit terminator is properly accounted for.
  • One where the target array size is deduced from the string literal; to ensure there are no dependencies on an explicit array size.
tbaeder updated this revision to Diff 475697.Nov 15 2022, 11:39 PM
tbaeder marked 2 inline comments as done.
tahonermann added inline comments.Nov 18 2022, 3:34 PM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1098–1099

I see the change to now use the minimum of SL->getLength() and CAT->getSize().getZExtValue(). Based on https://godbolt.org/z/5sTWExTac this looks to be unnecessary. When a string literal is used as an array initializer, it appears that the type of the string literal is adjusted to match the size of the array being initialized. I suggest using only CAT->getSize().getZExtValue() and adding a comment that this code depends on that adjustment.

clang/test/AST/Interp/literals.cpp
354–359

These cases all look to have been added now. Thank you!

tbaeder updated this revision to Diff 476660.Nov 18 2022, 9:57 PM
tbaeder marked 3 inline comments as done.
tbaeder added inline comments.Nov 18 2022, 10:03 PM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1098–1099

That is good to know and makes sense, thanks!

tbaeder added inline comments.Nov 23 2022, 12:55 AM
clang/lib/AST/Interp/ByteCodeExprGen.cpp
1098–1099

That actually doesn't work. They type might be adjusted, but getCodeUnit() still asserts that the index is < getLength(). :(

tbaeder updated this revision to Diff 477409.Nov 23 2022, 1:03 AM

@tahonermann Anything still missing here?

tahonermann accepted this revision.Dec 7 2022, 2:37 PM

This looks good to me now. I'm sorry for taking so long to return to review.

clang/lib/AST/Interp/ByteCodeExprGen.cpp
1098–1099

Ah, ok, that makes sense, thanks. I agree this is the right approach for enumerating just the code units in the string literal that are used to initialize the target now.

This revision is now accepted and ready to land.Dec 7 2022, 2:37 PM
This revision was landed with ongoing or failed builds.Jan 25 2023, 5:24 AM
This revision was automatically updated to reflect the committed changes.