This is an archive of the discontinued LLVM Phabricator instance.

[clang] pre-0388 array parm list initialization
ClosedPublic

Authored by urnathan on May 25 2021, 7:49 AM.

Details

Summary

in continuing on p0388 (unbounded array conversions), I discovered we don't deal with list initialization of bounded array parameters. Indeed there's a smoking gun:

// FIXME: We're missing a lot of these checks.

This patch implements those checks. I discovered that the existing implementation, when marking an implicit conversion sequence as being to std::initializer_list, is broken in that it'll think initializing 'std::initializer_list<T> parm[N]' is doing that. But it's not, it's initializing an array of std::initializer. There's a test for that, (clang/tests/CXX/drs/dr15xx.cpp, around line 490) but it's working due to the other missing checks! (It continues to work after this patch too.)

Anyway, rather than have a single bit flag 'StdInitializerListElement' we need to record the type being initialized by a list, so we can compare array bounds. Sadly this does grow ImplicitConversionSequence.

TryListConversion now checks the target array type has sufficient bounds, and if it has additional bounds, there is a default conversion from {} available. Note the FIXME there -- is there a better way to determine that?

CompareImplicitConversionSequence now also compares array size, for two conversions to array type.

The structure of the code changes in SemaOverload is ready for changes that p0388 introduces involving IncompleteArrayType -- the llvm_unreachables in CompareImplicitConversionSequence in particular.

... let's try adding some other reviewers who've had their fingers in SemaOverload ...

Diff Detail

Event Timeline

urnathan created this revision.May 25 2021, 7:49 AM
urnathan requested review of this revision.May 25 2021, 7:49 AM
urnathan edited the summary of this revision. (Show Details)Jun 7 2021, 5:15 AM
urnathan added reviewers: curdeius, jlebar, Anastasia.
jlebar removed a reviewer: jlebar.Jun 15 2021, 2:06 PM
jlebar added a subscriber: jlebar.

... let's try adding some other reviewers who've had their fingers in SemaOverload ...

Way beyond my ken, but fingers crossed you get some help with this one!

This is changing Larisse's code of:
19d08672844ee (Larisse Voufo 2015-01-27 18:47:05 +0000 3801) // list-initialization sequence L2 if:
so adding as a reviewer

rsmith added inline comments.Jun 21 2021, 4:33 PM
clang/include/clang/Sema/Overload.h
542

Can we store this directly as a QualType instead? (I think the only reason we store types as void*s in *ConversionSequence is a historical artifact from a time when union members couldn't have non-trivial default constructors.)

It's also not clear to me whether we need to store an additional type here, in addition to the types we're already tracking. I think I would expect the type we need to inspect is the overall destination type of the implicit conversion sequence (the "to type" for the final standard conversion sequence). Is this redundant, or are there cases where the type we want isn't already tracked? Is the problem case a conversion to T (&&)[], where we invent an array bound?

678–681

I'm not sure how to parse this: is this "is this a conversion from an initializer list to a type", or is this "do we have a destination type to convert an initializer list from"? If the latter, hasInitializerListToType might be clearer.

clang/lib/Sema/SemaOverload.cpp
3821

Use Context.hasSameType or Context.hasSameUnqualifiedType here to ignore type sugar in this comparison.

3829

You're also effectively asserting that the array is not a VLA and doesn't have a value-dependent bound here. I think that's fine (we shouldn't have a list initialization conversion for those cases); can you use a broader assertion failure message?

It looks like we would also reach this if the array bounds are equal. I would imagine that it's probably possible for us to end up comparing two initializer list conversions to the same type.

5152–5154

It would be nice to have custom FailureKind for this so that we can diagnose that there were too few / too many inits in the initializer list instead of saying there was no conversion.

5162

Hm, conversion sequences aren't totally ordered; what happens if there are two incomparable conversion sequences (eg, using different conversion functions)?

I think that's a pre-existing issue that this patch is not making any worse, and the problem is the language rules just don't make sense here. I've posted an example on the core reflector.

5167–5169

"bad" is not the worst conversion sequence we support; we treat "disallowed by language rules" conversion sequences (string literal -> non-const char*) as worse than a bad sequence. With this change, we won't bail if we hit a bad conversion after a "disallowed by language rules" conversion any more. Eg:

#include <initializer_list>
struct any { template<typename T> any(T); };
void f(std::initializer_list<char*>); // #1
void f(std::initializer_list<any>); // #2
void g() {
  // Used to pick #2; I think this will now pick #1 and reject during initialization because the worst
  // conversion sequence is string literal -> char* and we never set `Result` to the bad conversion sequence.
  f({"hello", 3});
}

I think we should go back to checking ICS.isBad() before comparing implicit conversion sequences to avoid that.

urnathan marked 4 inline comments as done.Jun 22 2021, 8:02 AM
urnathan added inline comments.
clang/include/clang/Sema/Overload.h
542

It is quite possible a QualType is fine -- I matched the void *'s already there (thanks for the explanation)

We need to store 4 pieces of information (eventually, when all of p0388 is landed)
0) whether we're initializing from an initializer-list

  1. whether we're initializing to a std::initializer_list.
  2. the array bound for the unbound case you mention
  3. whether we determined a bound (an inferred bound loses to a fixed bound of the same value).

IIUC, array bounds are presumed to be within the host's 'unsigned' (inferred from the iterator in TryListConversion), so we could use 3 bool flags[*] and an unsigned, rather than a whole type -- would that be preferrable? It looks like there's already 32 bits of padding in the structure on an I32LP64 machine.

The existing bug I ran into is the lack of clearing the flag for #1, when initializing an array of std::initializer_list (we still think it's an initializer_list, not an array). But of course there are more ways to fix that than the one I chose.

  • we could use a zero value of the inferred array bound to indicate 'not inferred', as we can never infer a zero-sized array, right? But then we'd need to re-extract the bound from fixed-sized arrays when comparing sequences. Seems more efficient to extract once and store in an unsigned.
678–681

ok.

clang/lib/Sema/SemaOverload.cpp
3829

Good catch, let's not try and assert here, given a later patch adds the c++20 behaviour. I'll add an equal-size testcase though.

5152–5154

I have added too_{few,many}_initializers. Is it ok to go directly poking into Result.Bad.Kind like that? There seem to be other cases of such peeking.

5162

Thanks, I'll amend the comment to point this out.

5167–5169

I'm confused. The ImplicitConversionSequence::Kind has BadConversion as the only non-conversion, AFAICT. isBad would cause us to bail on the attempted string-literal->char *, wouldn't it? Your example selects #2 with this patch.

urnathan updated this revision to Diff 353654.Jun 22 2021, 8:04 AM
urnathan marked 3 inline comments as done.

Updated diff addressing most of Richard's comments, but still open:
a) how to record the list conversion bound and kind (rather than as a type)
b) A FIXME about figuring out if there's a list conversion from empty {}

urnathan added inline comments.Jun 22 2021, 10:35 AM
clang/lib/Sema/SemaOverload.cpp
5157–5158

/Is/ this the simplest way of finding {}?

rsmith added inline comments.Jun 23 2021, 2:04 PM
clang/include/clang/Sema/Overload.h
542

The use of host unsigned for braced initializers is a longstanding bug; things like int arr[] = {[0x100000001] = 1}; tend to crash or miscompile because of this. (But even if those things worked, they'd use a huge amount of memory and compile extremely slowly as a result, so fixing this hasn't been a high priority.) If possible I'd prefer to not replicate it in more places, because I think we'll eventually want to fix it :)

I think storing a type here to represent the type of the actual entity being initialized, but it'd be useful to retain the part of the old comment that says the rest of the sequence (including its "to" type) represents only the worst conversion for any element of the list. (It wasn't obvious to me when reading through this how the InitializerListToType differs from the final ToType of the conversion sequence; I think this would have been clearer if there were a reminder that the conversion sequence describes only the worst element conversion rather than the overall conversion.)

clang/lib/Sema/SemaOverload.cpp
3817–3819

Is it possible to simplify this by using getAsConstantArrayType here and eliding the ConstantArrayType checks below?

5152–5154

I think it's OK to poke at Result.Bad.Kind directly here, given that this code basically owns that type anyway, but it might be more obvious to use Result.setBad(BadConversionSequence::too_many_initializers, From, ToType) so that we don't need a reader of the code to remember that we already had a bad conversion sequence (mostly) properly set up at the start of the function.

5157–5158

I think so. The checks below, especially for aggregate classes and user-defined conversions, are pretty involved; reusing them directly like this seems best.

5167–5169

The problem is that a string literal to char* conversion is not a bad conversion, but ranks worse than bad conversions, so in particular taking the worst conversion in a sequence containing both a string literal to char* conversion and a bad conversion will pick the string literal to char* conversion as the worst conversion in the sequence, and in so doing will lose the information that the conversion was not possible at all.

Concretely, what I'm worried about is that we treat "hello" -> char* as a valid conversion sequence X, and 3 -> char* as a bad conversion sequence Y, and we consider X to be worse than Y. We used to bail out if we ever saw a bad ICS, which meant that we'd treat a sequence containing X and Y as bad. But with the new approach, we don't check whether each ICS in isolation is bad, and only check if the worst sequence so far is bad, so I was expecting that we would:

  • Start with Result being good
  • Compare Result with X, find X is worse, and remember that as Result
  • Not bail out because Result is not bad
  • Compare Result with Y, find Result is worse, and keep Result as-is

... and then end up with a non-bad conversion sequence for f. So my worry is that we'd incorrectly treat #1 as being viable.

I think my example is not properly demonstrating this, because we still end up with X as the worst conversion sequence for #1, and X compares worse than the user-defined conversion sequence to any. A correct example might be more like this:

#include <initializer_list>
struct ugly { ugly(char*); ugly(int); };
void f(std::initializer_list<char*>); // #1
void f(std::initializer_list<ugly>); // #2
void g() {
  f({"hello", 3});
}

... for which I think this patch will prefer #1, because for #1 we'll pick conversion X and for #2 we'll pick a user-defined conversion sequence starting with conversion X. Whereas we should pick #2, because #1 is not viable.

I think we could address this either by checking for ICS.isBad() on each conversion, and bailing out if we find a bad conversion (even if it's not the worst one), or by changing CompareImplicitConversionSequences to consider bad conversions to be worse than the string literal -> char* conversion. The latter is probably the better approach; maybe there's a reason we don't already do that, but it seems worth trying. It doesn't make sense to me to treat a formally-bad-but-supported conversion as being *worse* than a formally-bad-and-unsupported conversion.

5175–5176

I wonder if it would be cleaner to move this call earlier and convert some of the breaks and if (!Result.isBad()) checks into early returns?

urnathan marked 7 inline comments as done.Jun 29 2021, 7:23 AM
urnathan added inline comments.
clang/lib/Sema/SemaOverload.cpp
5167–5169

Thanks, your non-template example does exhibit the problem. The template one didn't.

5175–5176

I also realized that the order we chose to compare any implicit{} initialization to the explicit element initializations is significant, hence my email reply CWG. I've changed this patch to compare {} after the element conversions for the reasons described there.

urnathan updated this revision to Diff 355226.Jun 29 2021, 7:25 AM
urnathan marked 2 inline comments as done.

Thanks for your review. I think I've addressed all your comments, but let me know if I've missed something. I renamed 'hasInitializerListToType' and friends to 'hasInitializerListContainerType', because there are other conversion sequences from initializer lists to non-array non-std::initializer_list types. Container seemed the best shortest word that didn't just say 'Array'.

Added comment about maybe keeping StdInitializerListElement flag

clang/include/clang/Sema/Overload.h
542

thinking about the path forward, perhaps it would be better to
(a) keep StdInitializerListElement bitflag
(b) store the (maybe deduced) array dimension, (and not the type)

For the last patch in the series, I needed to add a 'is unbound array' bit flag. Keeping the StdInitializerListElement flag means reducing duplicate calls to isStdInitializerList. Storing just the bound means we can use ToType to check the element types are the same, and we don't need to fish the array bound out of the type when comparing.

That would most likely be more efficient, and remove the weird name for 'InitializerListContainerType' -- 'ArrayBound' is probably more mnemonic?

This revision was not accepted when it landed; it landed in state Needs Review.Sep 9 2021, 8:30 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2021, 8:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Commited without approval?