This is an archive of the discontinued LLVM Phabricator instance.

Emit boxed expression as a compile-time constant if the expression inside the parentheses is a string literal
ClosedPublic

Authored by ahatanak on Feb 27 2019, 11:09 AM.

Details

Summary

clang currently emits an expression like @("abc") as a message send to stringWithUTF8String. This patch makes changes so that a compile-time constant is emitted instead when the expression inside the parentheses is a string literal. I think it's possible to do the same optimization for constant expressions that evaluate to zero-terminated strings (see the example below), but I'm leaving that for future work.

constexpr const char *foo() { return "abc"; }

void test() {
  NSString *s = @(foo());
}

The original motivation for the patch is to silence the -Wnullable-to-nonnull-conversion warning that clang started emitting after r317727:
The https://oleb.net/2018/@keypath/.

rdar://problem/42684601

Diff Detail

Repository
rL LLVM

Event Timeline

ahatanak created this revision.Feb 27 2019, 11:09 AM
rjmccall added inline comments.Feb 27 2019, 12:50 PM
lib/CodeGen/CGExprConstant.cpp
1793 ↗(On Diff #188578)

IgnoreParens.

lib/Sema/SemaExprObjC.cpp
534 ↗(On Diff #188578)

You're implicitly dropping sugar from the source expression here; I really think you should preserve that, and if it means you end up having to look through array-decay expressions, that's not the end of the world.

The need for a special case here is just to allow us to compile these even if there isn't a boxing method available?

Do you need to restrict the type of the string literal so that it doesn't apply to e.g. wide strings?

ahatanak marked an inline comment as done.Feb 27 2019, 6:35 PM
ahatanak added inline comments.
lib/Sema/SemaExprObjC.cpp
534 ↗(On Diff #188578)

Line 516 checks that the pointee type has the same unqualified type as 'char'. If the string literal inside the parentheses is of a wider type (e.g., @(u"abc")), this piece of code won't be executed, and a diagnostic is emitted later ("illegal type 'const char16_t *' used in a boxed expression").

The original motivation for special-casing string literals inside boxed expressions was to silence the -Wnullable-to-nonnull-conversion warning mentioned here: https://oleb.net/2018/@keypath/. The warning is issued because the return type of stringWithUTF8String is nullable.

...
+ (nullable instancetype)stringWithUTF8String:(const char *)nullTerminatedCString;
...

NSString * _Nonnull ptr = @("abc");  // expected-error {{implicit conversion from nullable pointer 'NSString * _Nullable' to non-nullable pointer type 'NSString * _Nonnull'}}
rjmccall added inline comments.Feb 27 2019, 10:05 PM
lib/Sema/SemaExprObjC.cpp
534 ↗(On Diff #188578)

I see. So in addition to guaranteeing to skip the boxing call, you'd like to adjust the type to report the result as non-null, which we can do because we're unconditionally making a constant string. Makes sense.

However, I think you need to check that the string is valid UTF-8 or else this is semantics-changing, since such strings are rejected by +stringWithUTF8String: (which is why it returns a nullable pointer at all). It should be alright to warn about strings that are statically known to be invalid UTF-8, but you'll then need to emit them using the normal boxing method.

ahatanak updated this revision to Diff 188794.Feb 28 2019, 2:46 PM
ahatanak marked 4 inline comments as done.

Address review comments.

  • Preserve sugar when creating an ObjCBoxedExpr in Sema::BuildObjCBoxedExpr.
  • Add a test case to test/SemaObjC/boxing-illegal.m that shows clang rejects a string literal in a boxed expression if its character encoding is incompatible with UTF-8.
lib/Sema/SemaExprObjC.cpp
534 ↗(On Diff #188578)

clang already rejects boxed expressions that are string literals with a character encoding incompatible with UTF-8. Also, we reach this part of the code only if the string is UTF-8 or ASCII (see the assertion I added). I added a test case to test/SemaObjC/boxing-illegal.m that shows clang rejects strings incompatible with UTF-8.

rjmccall added inline comments.Mar 4 2019, 10:11 PM
test/SemaObjC/boxing-illegal.m
70 ↗(On Diff #188794)

I don't know what \p is supposed to be or why it apparently changes the type of the literal to unsigned char *, but none of these are ordinary string literals that are invalid as UTF-8. I mean something like "\xFF", which still has type char * but will fail to parse as UTF-8, which will cause normal boxing to fail and return nil.

ahatanak updated this revision to Diff 189436.Mar 5 2019, 6:30 PM
ahatanak marked 2 inline comments as done.

If the string literal used for the boxed expression isn't a valid UTF-8 string, don't emit a compile-time constant.

ahatanak added inline comments.Mar 5 2019, 6:34 PM
test/SemaObjC/boxing-illegal.m
70 ↗(On Diff #188794)

Fixed and added test cases. Boxed expressions now have to be valid UTF-8 string literals in order to be emitted as compile-time constants.

If the string literal in a boxed expression is an invalid UTF-8 string, should we reject it the same way we reject other kinds of string literals (e.g., UTF-16)?

ahatanak marked an inline comment as done.Mar 5 2019, 6:37 PM
ahatanak added inline comments.
test/SemaObjC/boxing-illegal.m
70 ↗(On Diff #188794)

Or at least warn about it.

rjmccall added inline comments.Mar 5 2019, 7:01 PM
test/SemaObjC/boxing-illegal.m
70 ↗(On Diff #188794)

Thanks, fix looks good. I think a warning would be extremely reasonable.

ahatanak updated this revision to Diff 189630.Mar 6 2019, 5:47 PM

Diagnose invalid UTF-8 strings in boxed expressions.

rjmccall added inline comments.Mar 6 2019, 7:12 PM
include/clang/Basic/DiagnosticGroups.td
322 ↗(On Diff #189630)

I think this warning group should mention that this is specific to ObjC boxing somehow.

include/clang/Basic/DiagnosticSemaKinds.td
5807 ↗(On Diff #189630)

Maybe "string is ill-formed as UTF-8 and will become a null NSString* when boxed", where the type is whatever the type of the box-expression would be?

ahatanak updated this revision to Diff 189657.Mar 6 2019, 11:45 PM
ahatanak marked 2 inline comments as done.

Improve diagnostic message.

rjmccall added inline comments.Mar 7 2019, 5:47 PM
include/clang/Basic/DiagnosticGroups.td
407 ↗(On Diff #189657)

Sure. If we decide to add more things to this warning group, we can pull this one out into a sub-group with a more targeted name.

include/clang/Basic/DiagnosticSemaKinds.td
922 ↗(On Diff #189657)

Might as well use the NSStringPointer type we stash on Sema so that this will be the right type even in more obscure environments.

ahatanak updated this revision to Diff 189809.Mar 7 2019, 6:25 PM
ahatanak marked an inline comment as done.

Use the type of NSStringPointer in the diagnostic string.

ahatanak added inline comments.Mar 7 2019, 6:27 PM
include/clang/Basic/DiagnosticSemaKinds.td
922 ↗(On Diff #189657)

I wasn't exactly sure when NSStringPointer can be anything other than NSString *.

rjmccall accepted this revision.Mar 7 2019, 7:06 PM
rjmccall added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
922 ↗(On Diff #189657)

Hmm. It's possible that it can't be for boxing, but I'm pretty sure it can be overridden for string literals, and it's not ridiculous that we'd allow that to be changed for arbitrary boxing as well, so this is good idea regardless.

This revision is now accepted and ready to land.Mar 7 2019, 7:06 PM
ahatanak marked an inline comment as done.Mar 7 2019, 8:29 PM
ahatanak added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
922 ↗(On Diff #189657)

I agree, it's probably a good idea.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2019, 8:44 PM