Page MenuHomePhabricator

Rewrite ConstStructBuilder with a mechanism that can cope with splitting and updating constants.
ClosedPublic

Authored by rsmith on Jun 14 2019, 5:36 PM.

Details

Summary

This adds a ConstantBuilder class that deals with incrementally building
an aggregate constant, including support for overwriting
previously-emitted parts of the aggregate with new values.

This fixes a bunch of cases where we used to be unable to reduce a
DesignatedInitUpdateExpr down to an IR constant, and also lays some
groundwork for emission of class constants with [[no_unique_address]]
members.

Diff Detail

Repository
rL LLVM

Event Timeline

rsmith created this revision.Jun 14 2019, 5:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 5:36 PM

Isn't [[no_unique_address]] only significant for empty members? I'm not sure why they need significant support from constant-building, since they expand to no meaningful initializer.

We have some code we'll hopefully be upstreaming soon that relies on being able to do things with address-of-position placeholder values; I'm a little worried that the new structure here doesn't really support them.

Isn't [[no_unique_address]] only significant for empty members? I'm not sure why they need significant support from constant-building, since they expand to no meaningful initializer.

It also permits reuse of tail padding for non-static data members, which is the complexity that this patch is dealing with (in addition to improving and generalizing the support for non-trivial designated initializers).

We have some code we'll hopefully be upstreaming soon that relies on being able to do things with address-of-position placeholder values; I'm a little worried that the new structure here doesn't really support them.

Can you say a bit more about that? (Do you want to be able to emit a constant that denotes a pointer to somewhere else within the same constant being emitted, or something like that?) I think this approach should be strictly more general than what we had before, but perhaps that means it can't be extended in the direction you need?

Isn't [[no_unique_address]] only significant for empty members? I'm not sure why they need significant support from constant-building, since they expand to no meaningful initializer.

It also permits reuse of tail padding for non-static data members, which is the complexity that this patch is dealing with (in addition to improving and generalizing the support for non-trivial designated initializers).

I see.

We have some code we'll hopefully be upstreaming soon that relies on being able to do things with address-of-position placeholder values; I'm a little worried that the new structure here doesn't really support them.

Can you say a bit more about that? (Do you want to be able to emit a constant that denotes a pointer to somewhere else within the same constant being emitted, or something like that?) I think this approach should be strictly more general than what we had before, but perhaps that means it can't be extended in the direction you need?

We need to be able to construct constant initializers for certain fields in terms of (among other things) a pointer to the current field. My concern was whether this might mess up the indexing to the current field; but it looks like we actually discover the right GEP indices retroactively for these aggregate constants (unlike e.g. ConstantInitBuilder), so the fact that the GEP indices might change as we build the constant is not a problem. As long as the ConstantEmitter is being passed around and used to build the individual fields appropriately, it should be fine.

rjmccall added inline comments.Jun 17 2019, 12:11 PM
lib/CodeGen/CGExprConstant.cpp
73 ↗(On Diff #204887)

This seems like a very generic name for this type.

75 ↗(On Diff #204887)

Are there invariants about these? I assume they're parallel arrays; are they kept sorted?

76 ↗(On Diff #204887)

This is one past the last byte that's been covered by an actual Constant* value, or does it include unoccupied padding, or does it exclude even occupied padding?

98 ↗(On Diff #204887)

Might be worth clarifying what N is here.

190 ↗(On Diff #204887)

AllowOversized (which you used in the interface) seems like a better name.

196 ↗(On Diff #204887)

OffsetWithinChar?

201 ↗(On Diff #204887)

OffsetInChars?

237 ↗(On Diff #204887)

Especially in the context of the comment above, I think it would be good to clarify that both of these are hard "we can't emit this constant" bail-outs.

258 ↗(On Diff #204887)

Oh, because we're splitting before and after a single-CharUnits range? That seems worthy of a somewhat clearer explanation in the code.

I guess we could have a non-ConstantInt single-byte value. Unlikely but not impossible. :)

375 ↗(On Diff #204887)

Does this not come up all the time with bit-fields? I guess we emit them in single-char chunks, so it wouldn't. Probably worth a comment.

rsmith updated this revision to Diff 205161.Jun 17 2019, 1:13 PM
rsmith marked 13 inline comments as done.
  • Address review comments from rjmccall.
lib/CodeGen/CGExprConstant.cpp
73 ↗(On Diff #204887)

It is intended to be a very generic type. (I was trying to arrange it so that it could possibly be moved to LLVM eventually. I heard that globalopt would benefit from being able to do this kind of constant splitting / reforming.) Is ConstantAggregateBuilder sufficiently more precise?

75 ↗(On Diff #204887)

Added comments to explain.

76 ↗(On Diff #204887)

Added comment to explain.

98 ↗(On Diff #204887)

Looks like this ended up being unused; removed.

190 ↗(On Diff #204887)

AllowOversized is used to mean "the size of the constant may be larger than the size of the type", and is a parameter to build / buildFrom.
AllowOverwrite is used to mean "adding this constant may overwrite something you've already been given", and is a parameter to add / addBits.

I can make these names more different from each other if that would help?

258 ↗(On Diff #204887)

Yes. It's not too hard to craft a testcase where we'd get an explicit undef here; added handling for that and for all-zeroes constants, which are both correctly handled by overwriting the whole byte.

375 ↗(On Diff #204887)

Done.

We could hit this case for cases such as:

union U { int a; int b : 3; };
struct S { U u; };
S s = {(union U){1234}, .u.b = 5};

(which CodeGen currently rejects with "cannot compile this static initializer yet" in C), and splitting the ConstantInt would allow us to emit that initializer as a constant, but I'm not sure it's worthwhile unless it lets us simplify or improve bitfield emission in general. (The above isn't a case that C requires us to treat as a constant initializer, so rejecting it is not a conformance issue.)

Maybe instead of splitting bitfields into 1-byte chunks like we currently do, we should try to combine them into a single iN, like CGRecordLayoutBuilder does. But splitting to i8 maintains the status quo, which is what I was aiming for in this patch.

rsmith updated this revision to Diff 205162.Jun 17 2019, 1:17 PM
  • Fix somewhat-incorrect comment on Size member.
efriedma added inline comments.
lib/CodeGen/CGExprConstant.cpp
967 ↗(On Diff #204887)

Like the comment here says, we're awfully close to being able to just completely kill off ConstExprEmitter etc. Do we really want to continue to invest effort into this code?

rsmith marked an inline comment as done.Jun 17 2019, 1:20 PM
rsmith added inline comments.
lib/CodeGen/CGExprConstant.cpp
967 ↗(On Diff #204887)

The new code is also the mechanism by which we emit APValue constants, which is the replacement for ConstExprEmitter.

Also I don't think we have a fix for the performance issues we saw from performing frontend evaluation of large InitListExprs yet, which is another blocker for removing ConstExprEmitter in favor of APValue emission (though I don't expect that to be a problem forever; APValue is long overdue an overhaul).

Minor requests, then LGTM.

lib/CodeGen/CGExprConstant.cpp
73 ↗(On Diff #204887)

Yeah, that makes sense, since it aligns with the LLVM type name. I guess the big blocker there is the lack of a CharUnits equivalent in LLVM.

190 ↗(On Diff #204887)

Oh, oops. No, sorry, no need to do anything here.

258 ↗(On Diff #204887)

Ah, yeah, good catch; I was thinking it was okay to be suboptimal if we saw a <1 x i8> or a 1-byte pointer or whatever, but undef and zero are definitely worth covering here.

375 ↗(On Diff #204887)

I'm fine with single-byte emission for constant bit-fields; ugliness here shouldn't really have significant consequences. The comment is good enough.

162 ↗(On Diff #205162)

Can these go to STLExtras or somewhere similar?

221 ↗(On Diff #205162)

Should this be BitsThisChar for consistency?

rsmith updated this revision to Diff 205182.Jun 17 2019, 2:02 PM
rsmith marked 2 inline comments as done.
  • Address additional review comments from rjmccall.
rsmith marked an inline comment as done.Jun 17 2019, 2:03 PM
rsmith added inline comments.
lib/CodeGen/CGExprConstant.cpp
162 ↗(On Diff #205162)

Done. The use of offsets here is a bit special-case, so I've moved an iterator version to STLExtras and just left an offsets -> iterators adaptor here.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 17 2019, 2:05 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 2:05 PM