This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Refactor "Designators" into a unified implementation [NFC]

Authored by void on Dec 22 2022, 1:08 PM.



The interfaces for designators (i.e. C99 designated initializers) was
done in two slightly different ways. This was rather wasteful as the
differences could be combined into one.

Diff Detail

Event Timeline

void created this revision.Dec 22 2022, 1:08 PM
Herald added a project: Restricted Project. · View Herald Transcript
void requested review of this revision.Dec 22 2022, 1:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 1:08 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
void added a comment.Jan 3 2023, 3:05 PM

Friendly ping.

rsmith added inline comments.Jan 3 2023, 5:22 PM

This looks very odd -- these two index fields represent completely different kinds of information, one of which is specific to an implementation detail of InitListExpr / DesignatedInitExpr. Perhaps this class could be templated over its representation of an expression, instead of having both cases be possibilities here, and having the user of this class keep track of whether they're using the kind of designator that stores an expression pointer or the kind that stores an expression index? (We'd end up storing two different indexes for array range designators this way, but that seems like it should not be problematic.)


Does this really need to be mutable? The AST version shouldn't be exposing a way to mutate this field! It seems like we should have a non-const overload of Designation::getDesignator for Sema to use if it really needs to modify this field.


AST classes shouldn't reference Sema. But these functions do nothing; can they be removed?


AST classes shouldn't reference Sema. But these functions do nothing; can they be removed?

void updated this revision to Diff 486436.Jan 4 2023, 5:06 PM
void marked 4 inline comments as done.

Combine the array designator internals and templatize them. It should make this a bit simpler.

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 4 2023, 5:06 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
void added inline comments.Jan 4 2023, 5:06 PM

Looks like they're dead. Removed.

Drive by comment (I'll give a more thorough review when I have a moment): precommit CI seems to have found an issue to address.

Assertion failed: false && "Invalid accessor", file C:\ws\w5\llvm-project\premerge-checks\clang\include\clang/AST/Designator.h, line 276

void updated this revision to Diff 486710.Jan 5 2023, 4:29 PM

Put assert in correct place.

void added a comment.Jan 5 2023, 4:29 PM

Drive by comment (I'll give a more thorough review when I have a moment): precommit CI seems to have found an issue to address.

Assertion failed: false && "Invalid accessor", file C:\ws\w5\llvm-project\premerge-checks\clang\include\clang/AST/Designator.h, line 276

doh! I did a think-o. Easy fix.

void added a comment.Jan 6 2023, 1:05 PM

BTW, the naming of the enums sucks. Feel free to offer alternatives. :-)

rsmith added inline comments.Jan 6 2023, 3:46 PM

Can we move the templating out from here to the whole Designator and Designation classes? It shouldn't be possible to mix the two kinds in the same Designation.

void planned changes to this revision.Jan 9 2023, 11:52 AM
void added inline comments.Jan 9 2023, 11:55 AM

Grr...My previous comment was eaten.

I'll give it a shot.

However, I'm a bit surprised at how designators are handled by Clang. I expected that a Designation would be an Expr with the Designators being L-values (e.g. MemberExprs / ArraySubscriptExprs), but instead the Designation exists just long enough to be turned into an explicit initialization list. Is there a reason to do it that way instead of using expressions?

void added inline comments.Jan 9 2023, 1:01 PM

So it looks like moving the template outside of the class won't work. The ability to switch between Expr and unsigned while retaining the same overall type is hardwired into things like the ASTImporter.

This is kind of a massive mess. Maybe we shouldn't even allow them to use both Expr and unsigned but instead require them to use one or the other? Maybe we could require unsigned with the understanding that the Expr can be converted into a constant?

void updated this revision to Diff 488005.Jan 10 2023, 3:03 PM

Use EllipsisLoc instead of one of the Brackets.

void added a comment.Jan 11 2023, 1:36 PM

Sonar ping. :-)

void added a comment.Jan 20 2023, 2:04 AM

Friendly ping!

BTW, I'm planning on future changes to designated initializers. This is the first step. I hope to get rid of the template stuff in the followup.

void added a comment.Feb 1 2023, 1:52 PM

@rsmith Ping. This is blocking some patches I'd like to submit afterwards. :-)

rsmith added inline comments.Feb 1 2023, 2:06 PM

I'm not understanding something. Currently the ASTImporter only deals with DesignatedInitExpr::Designators , which only ever store integer indexes.

Basically, today, we have two different classes:

  • A class that's specific to DesignatedInitExpr, and tracks array index expressions by storing the index of the expression within the DesignatedInitExpr's list of children; this is also what ASTImporter can import, because it's the one that's used in the AST's representation.
  • A class that's specific to Sema's processing that tracks array index expressions as Expr* instead.

You want to refactor them to share code, which makes sense, because they are basically the same other than how they refer to expressions. (Not quite: DesignatedInitExpr can apparently refer to a field either as an IdentifierInfo* or as a FieldDecl*, whereas the Sema version always uses the IdentifierInfo* representation.)

Each current user of one of these two classes uses only one of the two, which means they're either exclusively using integers to refer to expressions or exclusively using Expr*. So it seems to me that you should be able to update each user to use either Designator<unsigned> or Designator<Expr*>, depending on which class they used before.

What am I missing?

void added inline comments.Feb 1 2023, 2:37 PM

I'm still allowing them to use a Designator<unsigned> / Designator<Expr*> as they see fit, only it's hidden from them via the Create methods. I personally find the use of two different versions (one using unsigned and one using Expr*) completely baffling. Why can't they all use Expr*? Also the ASTImporter only outputs the start of an array init range, which is at the very least counter-intuitive. That's one of the issues I'd like to tackle with follow-up patches, hopefully getting rid of the need for this template all together. This does mean that in the interim a non-array range designator will have extra End & EllisisLoc fields that aren't used, but that shouldn't be too horrible, given that they'd be there anyway because of the union.

rsmith added inline comments.Feb 1 2023, 5:52 PM

The reason that's jumping out at me for having separate integer / Expr* implementations here is space-efficiency -- we get to make array range designators (and hence designators overall) be only 16 bytes rather than the 32 bytes they occupy in this patch (assuming 64-bit pointers) by storing indexes instead of pointers.

If your eventual plan is to remove the children list from DesignatedInitExpr, and store the pointers only in the designators, that seems to cost 8 bytes per designator in the two common cases:

  • For a field designator: 32 bytes (with 16 bytes of padding) versus 16 bytes + 8 bytes for the child pointer today
  • For an array designator: 32 bytes (with 16 bytes of padding) versus 16 bytes + 8 bytes for the child pointer today
  • For an array range designator: 32 bytes (4 bytes of padding) versus 16 bytes + 16 bytes for two child pointers today

... plus it'll presumably be painful to make the Stmt child iterator be able to handle this.

If you don't remove the separate children list from DesignatedInitExpr, then it seems like this approach will cost 16 bytes per designator in all cases, and we'll need to be careful in AST serialization / deserialization that we don't accidentally duplicate the Exprs that now have two pointers pointing to them instead of one, and likewise anywhere else that assumes each Expr is only reachable by one path through the AST (eg, TreeTransform, the recursive AST visitor).

I think some more visibility into the eventual plan would help.

void added inline comments.Feb 2 2023, 12:06 PM

The plan isn't detailed, but I basically want to address several of the points you mentioned here. In particular, I think the structure of DesignatedInitExpr is backwards from how every other Expr is handled in Clang. For instance, the Expr for something like s.t.u is a MemberExpr with a MemberExpr as its sub-expression and so on. DesignatedInitExpr on the other hand basically has a list of maybe expressions, maybe integers that refer to parts of the structure / array. It seems cleaner to me to use the MemberExpr / ArraySubscriptExpr way of referring to the member being initialized rather than using a specialized list that has to be handled differently from other Expr's.

The first step in my evil plot is to do this simple refactoring, so that there's no initial functionality change, before I do the more invasive changes that may break things.

I'm doing this because I'm working on a feature that uses the DIE syntax, and it would be much simpler to have it be a MemberExpr.

Am I completely off base here?

rsmith accepted this revision.Feb 3 2023, 12:51 PM

I think this is fine as a short-term stepping stone to a different representation.


Thanks, that's really helpful.

OK, so the end design would be something like (just building out some details here so I can think about this better; I'm not expecting you would necessarily do exactly this!):

  • DesignatedInitExpr becomes a base class with derived classes for member designators, array designators, and array range designators (And maybe also there's a different representation for unresolved versus resolved member designators.)
  • These work much like MemberExpr / ArraySubscriptExpr, except that they don't have a "base" object (and instead store an initializer for the nominated field / array element(s)), and are "inside-out": for .t.u = x we want the top-level expression to be a DesignatedInitMemberExpr that names t and whose initializer is a DesignatedInitMemberExpr that names u and whose initializer is x, whereas for s.t.u the top-level expression is a MemberExpr that names u and its base subexpression is a MemberExpr that names t. That is, while we model s.t.u = x as ((s.t).u) = x, we model .t.u = x more like .t = (.u = x)
  • We use the same representation both in DesignatedInitExpr and in Sema (and everywhere else that looks at the syntactic form of an initializer list).
  • Designation and Designator are removed entirely.

If something like that is the plan, then yes, I think that's completely reasonable, and seems like a nice improvement -- and I think it's fine if the intermediate state increases the AST size for designated initializers, as this patch does, especially if you're aiming for this to be completed within a single release cycle.

(Just quickly checking the size of the representation: a single-field DesignatedInitExpr is currently 40 bytes + 16 bytes for the Designator array, and I think could easily be 40 bytes *total* with the new representation; with two fields, it's currently 40 + 32 and could be 40 + 40 with the new representation. I think that's going to be a win essentially always.)

void added inline comments.Feb 3 2023, 1:21 PM

Thanks! I'm going to dig into the DIE stuff more and will have a much more detailed plan that I can give you to see what you think before I go too far down the wrong path. :-)

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2023, 12:59 PM
This revision was automatically updated to reflect the committed changes.
ychen added a subscriber: ychen.Mar 6 2023, 3:36 PM