This is an archive of the discontinued LLVM Phabricator instance.

[Sema] List conversion validate character array
ClosedPublic

Authored by Mordante on Sep 12 2020, 9:44 AM.

Details

Summary

The function TryListConversion didn't properly validate the following part of the standard:

Otherwise, if the parameter type is a character array [... ]
and the initializer list has a single element that is an
appropriately-typed string literal (8.5.2 [dcl.init.string]), the
implicit conversion sequence is the identity conversion.

This caused the following call to f() to be ambiguous.

void f(int(&&)[1]);
void f(unsigned(&&)[1]);

void g(unsigned i) {
  f({i});
}

This issue only occurrs when the initializer list had one element.

Diff Detail

Event Timeline

Mordante requested review of this revision.Sep 12 2020, 9:44 AM
Mordante created this revision.
rsmith added inline comments.Sep 13 2020, 1:11 PM
clang/lib/Sema/SemaOverload.cpp
4988

isCharType is too narrow a check here. We also need to check for all the other types that can be initialized from a string literal (wchar_t, char16_t, ... -- including unsigned short in some cases). These details are handled by [IsStringInit](https://github.com/llvm/llvm-project/blob/master/clang/lib/Sema/SemaInit.cpp#L136). Maybe it's worth exposing that as a bool Sema::isStringInit function or similar to use from here?

4989

This is too narrow a check in two ways: we should allow parenthesized string literals here, and we should allow ObjCEncodeExpr.

4991–4992

Phabricator thinks you converted spaces to tabs here. Please can you check and convert back if necessary?

Mordante marked 3 inline comments as done.Sep 14 2020, 10:50 AM

Thanks for the feedback!

clang/lib/Sema/SemaOverload.cpp
4988

I think exposing this function sounds good.

4991–4992

That's odd in my editor I see spaces. I ran git clang-format on an earlier version where I made some changes here. I'll revert the hunk.

Mordante marked 2 inline comments as done.Sep 15 2020, 11:42 AM
Mordante added inline comments.
clang/lib/Sema/SemaOverload.cpp
4989

Actually it seems the code isn't behaving properly at all. It seems the conversion is done by
https://github.com/llvm/llvm-project/blob/master/clang/lib/Sema/SemaOverload.cpp#L5174
resulting in an array-to-pointer conversion instead of an identity conversion.

It can solve it by manually removing the decay like:

if (const auto *DT = dyn_cast<DecayedType>(ToType))
  if (const auto *AT = S.Context.getAsArrayType(DT->getOriginalType()))
    if (S.IsStringInit(From->getInit(0), AT) {
       ...

This code works and results in an identity conversion. But it feels a bit odd to manually peel away the array-to-pointer decay. Is this the best solution or do you have a better suggestions?

rsmith added inline comments.Sep 16 2020, 5:33 PM
clang/lib/Sema/SemaOverload.cpp
4989

I think this is a bug in your testcases. I'll comment below.

clang/test/CXX/drs/dr14xx.cpp
411–414

These should presumably be references to arrays, rather than arrays, or the parameter type is as if you wrote (for example) void f(const char *), which shouldn't get the special treatment here.

[over.ics.list]p4 mentions this in its footnote:

"Otherwise, if the parameter type is a character array [Footnote: Since there are no parameters of array type, this will only occur as the referenced type of a reference parameter.] and the initializer list has a single element that is an appropriately-typed string-literal (9.4.3), the implicit conversion sequence is the identity conversion."

Mordante added inline comments.Sep 19 2020, 7:48 AM
clang/test/CXX/drs/dr14xx.cpp
411–414

Ah I missed that footnote. I reread the standard and can you confirm some cases?

namespace A { 
  void f(const char(&)[4]);
  void g() { f({"abc"}); }
}
namespace B { 
  void f(const char(&&)[4]);
  void g() { f({"abc"}); }
}

both should work and with P0388 the array without bounds will also work.

I ask since this is my interpretation of the standard, but it seems there's a difference between implementations and void f(const char(&&)[4]); fails for "abc" but works for "ab".
It seems ICC and consider "abc" an lvalue in this case and not when using "ab".

Here's a gotbolt link with the examples https://godbolt.org/z/r1TKfx

rsmith added inline comments.Sep 20 2020, 11:28 AM
clang/test/CXX/drs/dr14xx.cpp
411–414

That's a really interesting example :)

The initializer is a list containing an lvalue of type const char[4]. Per [dcl.init.list]/3.9 and /3.10, the behavior depends on whether the referenced type is reference-related to const char[4] -- if so, then the reference can only bind directly and a && reference will be invalid, because it's binding an rvalue reference to an lvalue, and if not, then we create an array temporary and the && binding is fine.

Per [dcl.init.ref]/4, const char[???] is reference-related to const char[4] if they are similar types, and per [conv.qual]/2, the types are similar if ??? is omitted or 4, and not similar otherwise.

So:

  • const char (&&r)[4] = {"abc"} is ill-formed (types are the same, binds rvalue reference to lvalue)
  • const char (&&)[] = {"abc"} is ill-formed (types are similar, binds rvalue reference to lvalue)
  • const char (&&r)[5] = {"abc"} is OK (types are not similar, creates temporary)

That seems like a very surprising outcome to me. Perhaps we should probably ignore array bounds entirely when determining whether two types are reference-related. I'll take this to CWG for discussion.

rsmith added inline comments.Sep 20 2020, 11:51 AM
clang/test/CXX/drs/dr14xx.cpp
411–414

I think that's only an answer to half of your question. The other half is that [over.ics.list]p4 does not apply (directly) to either of your testcases, because the "parameter type" is a reference type, not an array type. For:

namespace A { 
  void f(const char(&)[4]);
  void g() { f({"abc"}); }
}

... we reach [over.ics.list]p9, which says to use the rules in [over.ics.ref]. Those rules say that if the reference binds directly to an argument expression (ignore the "expression" here; this is very old wording that predates braced initializers), then we form an identity conversion. So that's what happens in this case.

For

namespace B { 
  void f(const char(&&)[4]);
  void g() { f({"abc"}); }
}

the same thing happens, but now [over.ics.ref]p3 says "an implicit conversion sequence cannot be formed if it requires binding an lvalue reference other than a reference to a non-volatile const type to an rvalue or binding an rvalue reference to an lvalue other than a function lvalue", so the candidate is not viable.

If the array bound were omitted or were not 4, then the reference would not bind directly, and we would instead consider initializing a temporary. [over.ics.ref]p2 says "When a parameter of reference type is not bound directly to an argument expression, the conversion sequence is the one required to convert the argument expression to the referenced type according to 12.4.4.2.", which takes us back around to [over.ics.list] with the "parameter type" now being the array type. That's how we can reach [over.ics.list]p4 and consider string literal -> array initialization.

So I think that, according to the current rules, for

void f(const char (&&)[4]); // #1
void f(const char (&&)[5]); // #2

... a call to f({"abc"}) remarkably calls #2, because #1 is not viable (would bind rvalue reference to lvalue string literal), but #2 is, just like const char (&&r)[4] = {"abc"}; is ill-formed but const char (&&r)[5] = {"abc"}; is valid.

Mordante marked 2 inline comments as done.Sep 27 2020, 9:47 AM
Mordante added inline comments.
clang/test/CXX/drs/dr14xx.cpp
411–414

That's a really interesting example :)

Thanks :-)
Also thanks for the detailed explanation of the rules applying to these cases.

So:

  • const char (&&r)[4] = {"abc"} is ill-formed (types are the same, binds rvalue reference to lvalue)
  • const char (&&)[] = {"abc"} is ill-formed (types are similar, binds rvalue reference to lvalue)

Interesting, wasn't P0388 trying to solve the issues with array of unknown bound?
Am I correct to assume it's intended that this code will become valid in the future?

  • const char (&&r)[5] = {"abc"} is OK (types are not similar, creates temporary)

That seems like a very surprising outcome to me. Perhaps we should probably ignore array bounds entirely when determining whether two types are reference-related. I'll take this to CWG for discussion.

I agree this seems unexpected behaviour, thanks for discussing it with CWG.

So I think that, according to the current rules, for

void f(const char (&&)[4]); #1
void f(const char (&&)[5]);
#2

... a call to f({"abc"}) remarkably calls #2, because #1 is not viable (would bind rvalue reference to lvalue string literal), but #2 is, just like const char (&&r)[4] = {"abc"}; is ill-formed but const char (&&r)[5] = {"abc"}; is valid.

MSVC agrees with your observation, GCC picks #1 and fails to create the call, and ICC aborts with an ICE
https://godbolt.org/z/s7nTPP

Mordante updated this revision to Diff 294556.Sep 27 2020, 10:26 AM

Addresses the review comments.
Adds an extra test case to test whether the proper overload is called. The proper overload is a bit of a surprise so when the expected behaviour changes the overload test can be adjusted.

rsmith accepted this revision.Sep 27 2020, 5:19 PM

Thanks, only nits here. Please feel free to submit after addressing them, or request another round of review if you prefer.

clang/lib/Sema/SemaInit.cpp
3115–3118 ↗(On Diff #294556)

Consider moving this up to around line 144 (just after the definition of ::IsStringInit).

clang/lib/Sema/SemaOverload.cpp
4999

Nit: the if body is long, please add braces.

clang/test/CXX/drs/dr14xx.cpp
338–340

It would be useful to also test that the right overload is used here, and for g2.

411–414

Am I correct to assume it's intended that this code will become valid in the future?

I think it's more likely that the corresponding cases with other array bounds will become ill-formed. String literals are lvalues, so binding an rvalue reference to a string literal should not be valid based on our normal principles; allowing the wrong-array-bound case is then pretty surprising emergent behavior of the form that we usually disallow (on the basis that it "looks sufficiently like" the intention was for the reference to bind directly, so it's an error if the reference can't bind directly).

I think the likely outcomes are either that we ignore array bounds when determining reference-relatedness, or that we always create a temporary when binding an rvalue-reference-to-char-array to a string literal, regardless of whether the bound matches. Or maybe that we set the array bound of the string literal to that of the reference-to-array, allowing the const & cases but not the && cases with an incorrect bound. But I wouldn't want to guess which.

424–431

Please test that the correct overload is called in each of these cases (especially considering the unintuitive outcome of these calls).

clang/test/SemaCXX/overload-array-size.cpp
3–5 ↗(On Diff #294556)
6–13 ↗(On Diff #294556)

It's more common to test such things via diagnostics rather than by AST dumps. (AST dump tests tend to be more fragile in practice.) You could mark the [5] case as =delete and check that you get the "call to a deleted function" diagnostic, or you could change the [5] case to return int & and check that you can initialize an int & from a call to it, or something like that.

This revision is now accepted and ready to land.Sep 27 2020, 5:19 PM
Mordante marked 6 inline comments as done.Oct 3 2020, 5:31 AM
Mordante added inline comments.
clang/test/CXX/drs/dr14xx.cpp
411–414

Thanks for your insight.

clang/test/SemaCXX/overload-array-size.cpp
6–13 ↗(On Diff #294556)

I'm also not too fond of the AST checks, but forgot I could use = delete to detect whether the wanted overload is tested. Since I adjusted the other tests as suggested this test has no additional value and I removed it.

This revision was automatically updated to reflect the committed changes.
Mordante marked 2 inline comments as done.