This is an archive of the discontinued LLVM Phabricator instance.

[clang] p0388 conversion to incomplete array
ClosedPublic

Authored by urnathan on May 17 2021, 12:24 PM.

Details

Summary

This implements the new implicit conversion sequence to an incomplete (unbounded) array type. It is mostly Richard Smith's work (from D102650), updated to trunk, testcases added and a few bugs fixed found in such testing.

It is not a complete implementation of p0388.

Diff Detail

Event Timeline

urnathan created this revision.May 17 2021, 12:24 PM
urnathan requested review of this revision.May 17 2021, 12:24 PM
rsmith added inline comments.May 17 2021, 1:40 PM
clang/include/clang/Sema/Initialization.h
1288 ↗(On Diff #345963)
clang/lib/AST/ExprConstant.cpp
7799 ↗(On Diff #345963)

Are there any cases here where we're adding an array bound and we need to check that it's correct? Presumably casts that add an array bound can result in wrongly-typed pointers during constant evaluation, but they'll only be wrong by virtue of having the wrong array bounds, and we should be using the stored value rather than the pointer type for our bounds checks, so maybe that's OK?

13203 ↗(On Diff #345963)

I'd be inclined to add this to the above "invalid cast kind for integral value" list instead, since we have such a list here.

13935 ↗(On Diff #345963)

Similarly, this is an invalid cast kind for complex so we could move this to the previous set of cases.

clang/lib/CodeGen/CGExpr.cpp
4692–4693

Instead of manually preserving the address space, you could compute it from the destination reference type, like we do for CK_LValueBitCast below.

4760

I think you will need to do this too, in case the destination type is a variably-modified type. Eg, in:

extern int arr[];
int g();
void f() {
  return ((int(&)[g()])arr)[0];
}

... we need to emit the call to g().

clang/lib/CodeGen/CGExprAgg.cpp
861 ↗(On Diff #345963)

Hm. Aggregate IR emission can be used for lvalues of aggregate types (IR generation will step through temporary materialization and lvalue-to-rvalue conversions looking for the source of the value, to optimize away some obviously-unnecessary copies). But... I think this is still impossible, because array lvalues are never used to initialize arrays.

clang/lib/CodeGen/CGExprConstant.cpp
1129 ↗(On Diff #345963)

(due to your changes to ExprConstant.cpp, yes)

clang/lib/Sema/SemaExprCXX.cpp
4592–4593

Do we need to handle the case where there's both an address-space change and an array-bound change? I'd imagine we want two ImpCastExprs in that case, with a suitably-determined intermediate type (either the source type with the destination address space or the destination type with the source address space; I can't think of a good reason to prefer one over the other).

clang/lib/Sema/SemaInit.cpp
8386

We may need to preserve the value kind, like we do for QualificationConversion. We could use the value kind of CurInit.get(), but I don't think that will do the right thing for initializing a const lvalue reference from an array xvalue:

using T = int[3];
const int (&arr)[] = T{1, 2, 3}; // materialized temporary is an xvalue, but converted initializer should be an lvalue
clang/lib/Sema/SemaOverload.cpp
292

Yes, this seems right to me. We're only trying to strip off casts that either might be, or might follow, a narrowing conversion (to get to the pre-narrowing type), and an array bound conversion can't be either.

3278

It'd be consistent with ObjCLifetimeConversion to use a reference here. I mean, in an ideal world I'd prefer we didn't have three bool parameters in a row, especially not with two of them being out-parameters. Maybe we could return a bitfield enum, eg:

enum QualificationConversionKind {
  QCK_NotQualificationConversion = 0x0,
  QCK_QualificationConversion = 0x1,
  QCK_IncludesObjCLifetimeConversion = 0x2,
  QCK_IncludesArrayBoundConversion = 0x4
};
3292–3304

Style nits:

  • we generally use while (true) not for (;;)
  • please use braces for ifs containing (transitively) more than one statement
4249

Yes, I think we do need a comparison here.

I made a start on P0388 a couple of years ago. I never got very far, but it looks like the parts I covered are mostly non-overlapping with the parts you've covered, and do include this. I've uploaded my work here: https://reviews.llvm.org/D102650; please feel free to take all / some / none of that :)

4578

Yeah, it should be correct to isa / cast directly.

Richard, thanks for the comments -- I'd not realized I'd left quite so many FIXMEs lying around! The high bit I'm taking is that you're fine with the approach I'm taking.

clang/lib/AST/ExprConstant.cpp
7799 ↗(On Diff #345963)

This patch only allows the bound->unbound (even though the code emission is more general), but yes, when unbound->bound is permitted we could end up with such invalid constant evaluations. Making sure that DTRT is on the ToDo list.

clang/lib/CodeGen/CGExpr.cpp
4692–4693

Sounds good.

clang/lib/Sema/SemaExprCXX.cpp
4592–4593

Ok, The code emission for ArrayBound will handle changing the address space, but a 2 step AST makes sense. I think I have a slight preference for placing the address-space change outermost.

Question: Is the address space change only concerning the outermost pointer? What about such changes that appear in the middle of a cv-decomposition?

clang/lib/Sema/SemaOverload.cpp
3278

was wondering about that too. Generally I dislike such out parameters and a bitflag return would be nicer. Presumably with a rename from IsFOO to FOOKind or similar?

3292–3304

thanks for guiding.

Question: What about collating blocks of cascading case labels? They seem to be unordered, which does make navigating them unclear.

urnathan updated this revision to Diff 350627.Jun 8 2021, 9:31 AM
urnathan marked an inline comment as done.
urnathan retitled this revision from [RFC] p0388 implicit cast array bounds (non-overload cases) to [clang] p0388 conversion to incomplete array.
urnathan edited the summary of this revision. (Show Details)
urnathan updated this revision to Diff 351166.Jun 10 2021, 7:29 AM

Cleanup diff, I forgot to review the actual diff :(

If you want to skip the VLA handling for now, that's fine by me, but I think we do eventually want to allow this kind of conversion for VLAs too, and it looks like it might be pretty straightforward to do as part of this change. I think we should at least handle the composite pointer type case for two constant arrays here, though. Eg:

int (*a)[3];
int (*b)[4];
auto *p = true ? a : b; // should be int (*)[] in C++20
clang/lib/AST/ASTContext.cpp
5804

(A little more VLA work here.)

clang/lib/Sema/SemaExprCXX.cpp
4592–4593

I think a non-top-level address space change would necessarily be a reinterpret cast: a T AS(1) * and a T AS(2) * might have different representations, so a T AS(1) ** cannot in general be meaningfully cast to T AS(2) **. (I don't think we currently special-case the situation where the pointer representations are the same between address spaces for such casts, but I could be wrong about that.)

6708

We seem to not handle the case where both types are constant array types, but with different bounds (where, based on the comment below, we're supposed to form an array of unknown bound). We should probably also unwrap here if one of the types is an incomplete array type and the other is a VLA. And maybe also if we have two distinct VLA types, or a VLA and a constant-bound array. So I think perhaps the logic should be:

  • Prior to C++20: if both types to be incomplete array types, unwrap and add an incomplete array step.
  • In C++20 onwards: if we get this far (if we have two array types that don't have the same constant bound) and the types aren't equal, unwrap and add an incomplete array step.
clang/lib/Sema/SemaOverload.cpp
3256

I think we should also allow VLA -> array of unknown bound conversions as qualification conversions.

urnathan updated this revision to Diff 355943.Jul 1 2021, 10:59 AM
urnathan marked an inline comment as done.

This adds support for the cond ? arya : aryb cases you pointed out. There is a problem with decaying VLAs to arrays of unknown bound -- Implicit Conversion Sequence was unprepared to deal with that. It's starting to look a little rat-holey, so if you don't mind I've not handled that.

urnathan marked 3 inline comments as done.Jul 1 2021, 11:02 AM
urnathan added inline comments.
clang/lib/Sema/SemaExprCXX.cpp
6708

Handling VLA -> [] fell into problems with Implicit Conversion Sequence, so I punted that. Constant array -> [] is handled though.

urnathan marked 14 inline comments as done.Jul 1 2021, 11:08 AM

I've marked the comments from the v1 patch as addressed, but most of them were mooted by the v2 patch based on your D102650 patch. I find Phabricator a little hard to use :(

ping? as commented, the v2 patch's absorption of Richard's D102650 diff mooted many of the original comments.

martong removed a subscriber: martong.Sep 28 2021, 1:56 AM

Can you also update the cxx_status page to show that we're getting a partial implementation of P0388?

The changes LGTM, though I spotted an unrelated change that can go in as an NFC commit while we finalize this review. I'm adding Erich as a second set of eyes, in case Richard is busy at the moment.

clang/lib/Sema/SemaInit.cpp
8255

Unrelated typo fix (feel free to land as an NFC change though).

the cxx_status change is in https://reviews.llvm.org/D103908, which is ready to land once this one is done :) Then you get the whole of p0388.

This revision is now accepted and ready to land.Oct 12 2021, 6:21 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 7:35 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript