This is an archive of the discontinued LLVM Phabricator instance.

[AST] Update/correct the static_asserts for the bit-fields in Type
ClosedPublic

Authored by riccibruno on Aug 13 2018, 3:48 AM.

Details

Summary

The current static_assert only checks that ObjCObjectTypeBitfields
fits into an unsigned. However it turns out that FunctionTypeBitfields
do not currently fits into an unsigned. Therefore the anonymous
union containing the bit-fields always use 8 bytes instead of 4.

This patch remove the lone misguided static_assert and systematically
check the size of each bit-field.

Diff Detail

Repository
rC Clang

Event Timeline

riccibruno created this revision.Aug 13 2018, 3:48 AM
erichkeane added inline comments.Aug 13 2018, 6:01 AM
include/clang/AST/Type.h
1621

I don't really see value in ensuring that TypeBitfields is <= 4 bytes, it seems to me that all we care about is the union size, so just <=8 is fine.

riccibruno marked an inline comment as done.Aug 13 2018, 6:18 AM
riccibruno added inline comments.
include/clang/AST/Type.h
1621

I originally did this in the hope to provide more information
to someone who trigger one of the various static_assert.

But you are completely right that we only care about <= 8 bytes

Will update the diff with this change.

riccibruno marked an inline comment as done.
erichkeane accepted this revision.Aug 13 2018, 9:10 AM
This revision is now accepted and ready to land.Aug 13 2018, 9:10 AM
This revision was automatically updated to reflect the committed changes.

Oh, I missed that there was a separate review for this. A lot of the important subclasses that need extra storage have been designed with the expectation that these bit-fields fit within 32 bits. For example, FunctionType starts with a bunch of bit-fields because the expectation is that they'll fit into the tail-padding of Type. So this assertion should really be testing <= 4, and if we need to land a few patches first to make that true, we should do so.

riccibruno added a comment.EditedAug 13 2018, 12:51 PM

@rjmccall

I would argue that we should make these bit-fields take 8 bytes for the following reasons:

  1. On 64 bits archs, this is free since we have already a little less than 8 bytes of padding here, assuming Type keep its 18 bits.
  2. On 32 bits archs, this a priori increase the size of each Type by 4 bytes. However, types are aligned to 16 bytes because of the fast CVR qualifiers. Doing an fsyntax-only on all of Boost with -print-stats I get that by far the most commonly used Types have size 40 or 48 (on a 64 bits arch). That is 5 or 6 pointers. This would be 20 or 24 bytes on 32 bits archs if we assume that every member of the most commonly used types are pointers and that we shrink the bitfields to 32 bits. But since Types are aligned to 16 bytes we do not gain anything.
This comment was removed by riccibruno.

@rjmccall

I would argue that we should make these bit-fields take 8 bytes for the following reasons:

  1. On 64 bits archs, this is free since we have already a little less than 8 bytes of padding here, assuming Type keep its 18 bits.

First, Type is not 16-byte aligned as far as the compiler is concerned; it is merely
dynamically aligned to a 16-byte boundary when allocated. You actually verified this
when you did your experiment of printing out the sizes of various Type subclasses,
because the value of sizeof is always rounded up to the alignment, meaning that a
size of 40 would not be legal if Type were actually 16-byte-aligned.

Because Type is only 4/8-byte aligned, its only tail padding is what's needed to
fill up a pointer after these bit-fields, which is supposed to be 4 bytes but is
instead apparently 0 bytes because we overflowed one of the bit-fields.

Not officially aligning Type to 16 bytes is fairly important for memory efficiency
even on 64-bit hosts because many of our types use tail-allocated storage, which
naturally starts at sizeof(T) bytes from the address point; if we formally aligned
Type subclasses to 16 bytes, we'd often waste a pointer of that storage.

Second, tail padding of base classes is not necessarily wasted under the Itanium C++ ABI.
Itanium will start placing sub-objects in the tail-padding of the last non-empty base class
as long as that base is POD under the rules of C++03; Type is not POD because it has a
user-provided constructor.

Looking at Type.h, we don't seem to be taking advantage of this as much as I thought,
at least in the Type hierarchy (maybe we do in the Stmt or Decl hierarchies). We
have a lot of subclasses that have 32 bits of miscellaneous storage but either (1) don't
order their fields correctly to allow subclasses to fit in or (2) also subclass FoldingSetNode,
which breaks up the optimization.

Since we do care primarily about 64-bit hosts, I'm leaning towards agreeing that we should
just accept that we have 64 bits of storage here, 46 of which are available to subclasses.
If you're interested in optimizing this, it would be nice if you could go through all the
subclasses looking for 32-bit chunks that could be profitably moved up into Type.

  1. On 32 bits archs, this a priori increase the size of each Type by 4 bytes. However, types are aligned to 16 bytes because of the fast CVR qualifiers. Doing an fsyntax-only on all of Boost with -print-stats I get that by far the most commonly used Types have size 40 or 48 (on a 64 bits arch). That is 5 or 6 pointers. This would be 20 or 24 bytes on 32 bits archs if we assume that every member of the most commonly used types are pointers and that we shrink the bitfields to 32 bits. But since Types are aligned to 16 bytes we do not gain anything.

Types aren't the only thing allocated in the ASTContext, so allocating a 16-byte-aligned
chunk with a non-multiple-of-16 size doesn't mean that any difference between the size
and the next 16-byte boundary is necessarily wasted. But you're probably right that the
difference between Type being 12 and 16 bytes is probably not a big deal.

John.

@rjmccall

I would argue that we should make these bit-fields take 8 bytes for the following reasons:

  1. On 64 bits archs, this is free since we have already a little less than 8 bytes of padding here, assuming Type keep its 18 bits.

First, Type is not 16-byte aligned as far as the compiler is concerned; it is merely
dynamically aligned to a 16-byte boundary when allocated. You actually verified this
when you did your experiment of printing out the sizes of various Type subclasses,
because the value of sizeof is always rounded up to the alignment, meaning that a
size of 40 would not be legal if Type were actually 16-byte-aligned.

Because Type is only 4/8-byte aligned, its only tail padding is what's needed to
fill up a pointer after these bit-fields, which is supposed to be 4 bytes but is
instead apparently 0 bytes because we overflowed one of the bit-fields.

Not officially aligning Type to 16 bytes is fairly important for memory efficiency
even on 64-bit hosts because many of our types use tail-allocated storage, which
naturally starts at sizeof(T) bytes from the address point; if we formally aligned
Type subclasses to 16 bytes, we'd often waste a pointer of that storage.

Right,

Second, tail padding of base classes is not necessarily wasted under the Itanium C++ ABI.
Itanium will start placing sub-objects in the tail-padding of the last non-empty base class
as long as that base is POD under the rules of C++03; Type is not POD because it has a
user-provided constructor.

I just realized that something like this was happening when playing with various classes on godbolt.
I should go read the Itanium spec ! However as you say the condition for this happening are
not necessarily satisfied and moreover other ABIs might not do this packing (but you almost
certainly are more familiar with this than me)

Looking at Type.h, we don't seem to be taking advantage of this as much as I thought,
at least in the Type hierarchy (maybe we do in the Stmt or Decl hierarchies). We
have a lot of subclasses that have 32 bits of miscellaneous storage but either (1) don't
order their fields correctly to allow subclasses to fit in or (2) also subclass FoldingSetNode,
which breaks up the optimization.

Since we do care primarily about 64-bit hosts, I'm leaning towards agreeing that we should
just accept that we have 64 bits of storage here, 46 of which are available to subclasses.
If you're interested in optimizing this, it would be nice if you could go through all the
subclasses looking for 32-bit chunks that could be profitably moved up into Type.

  1. On 32 bits archs, this a priori increase the size of each Type by 4 bytes. However, types are aligned to 16 bytes because of the fast CVR qualifiers. Doing an fsyntax-only on all of Boost with -print-stats I get that by far the most commonly used Types have size 40 or 48 (on a 64 bits arch). That is 5 or 6 pointers. This would be 20 or 24 bytes on 32 bits archs if we assume that every member of the most commonly used types are pointers and that we shrink the bitfields to 32 bits. But since Types are aligned to 16 bytes we do not gain anything.

Types aren't the only thing allocated in the ASTContext, so allocating a 16-byte-aligned
chunk with a non-multiple-of-16 size doesn't mean that any difference between the size
and the next 16-byte boundary is necessarily wasted. But you're probably right that the
difference between Type being 12 and 16 bytes is probably not a big deal.

John.

I actually did exactly this. My approach was to extend what is already done,
that is add nested classes SomethingBitfields in Type and add an instance of
each to the anonymous union. The classes which I have found so far benefiting
from this are FunctionProtoType, TemplateSpecializationType, PackExpansionType,
DependentTemplateSpecializationType and SubstTemplateTypeParmPackType.

For example the patch dealing with TemplateSpecializationType is
here https://reviews.llvm.org/D50643.

rsmith added a subscriber: rsmith.Aug 13 2018, 4:00 PM

(Just writing up my archaeology and research so no-one else needs to do it...)

We used to intentionally keep the Type bitfields 32 bits long. However, these commits (accidentally, as far as I can tell) took us past 32 bits for the type bitfields: rL270834 rL285849 rL301535 rL327768

For a target with 32-bit-aligned pointers, these changes will have made all Type objects 4 bytes larger.

For a target with 64-bit-aligned pointers, most Type subclasses would not be made any smaller by reducing the bitfields from 64 bits to 32 bits. The following types could be made smaller with some additional effort:

  • DependentAddressSpaceType, DependentSizedExtVectorType, and DependentVectorType could each be made 8 bytes smaller by moving their SourceLocation member to the start and reordering the bases so FoldingSetNode is first. (But we should also remove the ASTContext& members if we care about how big these nodes are.)
  • UnaryTransformType could be made 8 bytes smaller by moving its UTTKind member to the start. But it could be made 8 bytes smaller today by just removing that member, because its value is always zero. Or moving that member to the Type bitfields.
  • SubstTemplateTypeParmPackType, TemplateSpecializationType, DependentTemplateSpecializationType, and PackExpansionType could be made 8 bytes smaller by reordering the bases and moving an unsigned to the start. Or by moving the member into the Type bitfields.
  • ObjCTypeParamType could be made 8 bytes smaller by reordering bases, but we could get the same benefit by just moving its bitfield member into the Type bitfields.
  • PipeType could similarly be made 8 bytes smaller, but again it'd be easiest to just move its bool member into the type bitfields if we care.

(End archaeology.)

Does anyone still care about the memory uasge of 32-bit clang? (How much effort should we go to to pack our data efficiently for such targets?) Given that there are no Type nodes that could be made smaller on a 64-bit system by packing into the tail-padding of Type that could not more easily be made smaller in other ways, if we're past the point where 32-bit memory usage is a concern, I think we should just accept that the Type bit-fields are 64 bits long now.

I actually did exactly this. My approach was to extend what is already done,
that is add nested classes SomethingBitfields in Type and add an instance of
each to the anonymous union. The classes which I have found so far benefiting
from this are FunctionProtoType, TemplateSpecializationType, PackExpansionType,
DependentTemplateSpecializationType and SubstTemplateTypeParmPackType.

For example the patch dealing with TemplateSpecializationType is
here https://reviews.llvm.org/D50643.

I see. This is one of those rare cases where your changes probably would've been
clearer not broken up into multiple patches. :) I only got CC'ed on two of them.

Anyway, I think we're all in agreement that this is the right thing to do now.

John.

I actually have patches for TemplateSpecializationType, PackExpansionType,
DependentTemplateSpecializationType and SubstTemplateTypeParmPackType which move
the unsigned to the bit-fields in Type. The same could be done for the others I suppose.
This assume that the bit-fields of Type have 64 bits available though.

An other one is in ElaboratedType: the member "TagDecl *OwnedTagDecl" is very
commonly null (IIRC only non-null for about 600 of the 66k ElaboratedTypes when
parsing all of Boost). Therefore we can simply put it in a trailing object
and stash a bit in the bit-fields of Type indicating when this pointer is null.

Bruno

Sure, that seems like a reasonable optimization.