Fix __has_unique_object_representations based on rsmith's input

Authored by erichkeane on Oct 26 2017, 3:14 PM.

Diff Detail

erichkeane created this revision.Oct 26 2017, 3:14 PM

Added all of @rsmith's comments here, I believe I have them all and have properly responded to them.


Moved because @rsmith said:

I think this would make more sense as a member of ASTContext. The Type object generally doesn't know or care about its representation.


@rsmith said: Hmm, are there any cases in which we might want to guarantee the vptr is identical across all instances?

I believe the answer to that is 'no', but if you have a case in mind, I can change this.


@rsmith said: This seems really fragile to me. Empty bases may or may not occupy storage. But that's beside the point -- empty bases are just a special case of bases with tail padding, which are not properly handled here.

I really think you should be walking the record layout rather than trying to determine this from sizes alone.

I think I did that properly in this patch here. I re-did a bunch of the code around this, so hopefully it is to your liking.


@rsmith said: Stray semicolon?

Yep, don't know how that got in there...


@rsmith said: I don't think this is correct. As a weird GNU behaviour, we can have, say, a type with size 3 and alignment 4 (via an alignment attribute on a typedef). An array of 1 such element has size 4, and has padding even if its element type does not.

I think I added a test that covers that here. Is there a way to mark non-record types with an alignment? If not, this cause problems, because the Layout.getSize() checks in the struct handling will take care of this, since 'getSize' includes alignment.


@rsmith said: The second half of this comment doesn't seem right to me. They may be represented as sequences of bits, but that doesn't make them integral.

I think you're right, I removed this part.


@rsmith said: Why?

Thats a good question that I lack the answer to... I decided to remove this and let the 'struct' handling take care of it, which seems to make sense, and even better, seems to work.


@rsmith said: Making these members of QualType seems counterproductive. You already have the Record here; it'd be better to make these file-static and pass that in.


Also, I should probably validate bitfields. The following SHOULD?! be unique?

struct S {
unsigned a : 1;
unsigned b : 2;
unsigned c : 3;
unsigned d : 2;

static_assert(__has_unique_object_representations(S), "error");

But without 'd', it should not be (tail padding).

Corrected version of the previous example:
struct S {
char a : 1;
char b : 2;
char c : 3;
char d : 2;
static_assert(sizeof(S) == 1, "size");
static_assert(__has_unique_object_representations(S), "error");

I haven't tested this patch agaainst this repro, but it should likely be in the tests.

Fixed for bitfields. Review by anyone greatly appreciated :)

It'd be good to have tests which have alignment directives on bitfields.

Added a test for aligned bitfield, as @majnemer requested. Is this sufficient?

Ping? This fixes a number of bugs with the previous implementation, so I'd like to see if we can get this in.

rsmith added inline comments.Nov 27 2017, 2:23 PM

Remove this and change the getAs below to castAs.


Remove the .getTypePtr() here; QualType has an overloaded operator-> that does this for you.


No braces needed here, and use auto * when initializing from dyn_cast.


I think you mean !RD->isUnion(), right? You shouldn't assert if given an interface.


The LLVM coding style does not permit this form of initialization. Use an initializer that expresses your intent instead, such as CharUnits::Zero().


The getNumVBases check here is redundant; isDynamicClass checks that.


Maybe just omit empty classes from the Bases list entirely? That'd remove the need to recompute isStructEmpty below.


What should happen for fields of reference type?


If you want to do something with this case, the thing to do is to assert; it's a programming error to pass a null type here.


As noted on the prior version of the patch, I don't think that's quite true. Here's an example:

typedef __attribute__((aligned(4))) char bad[3]; // type with size 3, align 4 (!!!)

Here, ch has unique object representations. But bad[2] does not, because forming the array type inserts one byte of padding after each element to keep the ch subobjects 4-byte aligned.

Now, this is clearly a very problematic "extension", but we're being intentionally compatible with older versions of GCC here. At some point we should start rejecting this (as recent versions of GCC do), but until we do so, we should treat it properly, which means you can't assume that array types don't insert padding between elements. Sorry this is so horrible ;-(


This is redundant; functions are not trivially copyable.


This is not correct: member pointer representations can have padding bits. In the MS ABI, a pointer to member function is a pointer followed by 0-3 ints, and if there's an odd number of ints, I expect there'll be 4 bytes of padding at the end of the representation on 64-bit targets.

I think you'll need to either add a clang::CXXABI entry point for determining whether a MemberPointerType has padding, or perhaps extend the existing getMemberPointerWidthAndAlign to also provide this information.

erichkeane marked 10 inline comments as done.Nov 27 2017, 4:19 PM

Patch incoming, I think I got everything. I'll do the error on array of aligned items in a separate patch, hopefully tomorrow AM.


References are not trivially copyable, so they will prevent the struct from having a unique object representation.


As suggested on IRC, I'll be shortly putting together another patch to simply remove this case (as alluded to here).


Based on looking through the two, I think I can just do this as Width%Align == 0, right? I've got an incoming patch that does that, so hopefully that is sufficient.

erichkeane marked an inline comment as done.
rsmith added inline comments.Nov 27 2017, 6:46 PM

I don't think that will work; the MS implementation *sometimes* rounds the size up to the alignment. (... which sounds pretty dodgy to me; shouldn't the returned size always be a multiple of the returned alignment?)

erichkeane added inline comments.Nov 28 2017, 9:24 AM

Ah, right... I missed this part of the MicrosoftCXXABI:

if (Target.getTriple().isArch64Bit())
  Width = llvm::alignTo(Width, Align);

So likely I can just do the same math as I implemented here, except before the width is reset in the alignTo here.

Thanks for getting back so quickly! I'll get back on this once I get a patch for the array align error.

Add has padding to CXXABI getMemberPointerWidthAndAlign to correct padding behavior here.

Please add tests for the member pointer cases across various different targets.


This isn't quite right; you want min(bitfield length, bit size of type) here. Eg, a bool b : 8 or int n : 1000 bitfield has padding bits.


That sounds like the wrong behavior to me. If two structs have references that bind to the same object, then they have the same object representation, so the struct does have unique object representations.

253 ↗(On Diff #124609)

This seems to be backwards?

Also, I'm not sure whether this is correct. In the strange case where Width is not a multiple of Align (because we don't round up the width), there is no padding. We should only set HasPadding to true in the alignTo case below.

erichkeane added inline comments.Nov 28 2017, 3:11 PM

I guess I'm missing something with this comment... why would a bitfield with bool b: 8 have padding?

Additionally, isn't int n : 1000 illegal, since 1000 > 32 (# bits in an int)?
Both clang and GCC reject the 2nd case. Curiously, GCC rejects bool b: 9, but Clang seems to allow 'bool' to have ANY size. Is that an error itself?

Finally: Should we consider ANY bool to be not a unique object representation? It has 1 bit of data, but 8 bits of storage. GCC's implementation of this trait accepts them, but I'm second guessing that at the moment...


I didn't think of it that way... I WILL note that GCC rejects references in their implementation, but that could be a bug on their part.

253 ↗(On Diff #124609)

I think you're right about it being backwards.

However, doesn't the struct with a single Ptr and either 1 or 3 Ints have tail padding as well? I do believe you're right about the alignTo situation below, but only if Width changes, right?

rsmith added inline comments.Nov 28 2017, 3:26 PM

The wording you quote below is defective in this regard (it doesn't discuss what "same value" means for reference members) -- that's at least partly my wording, sorry about that! But at least my intent when we were working on the rules was that "same value" would have the obvious structurally-recursive meaning, which for references means we're considering whether they have the same referent.

So I think references, like pointers, should always be considered to have unique object representations when considered as members of objects of class type. (But __has_unique_object_representations(T&) should still return false because T& is not a trivially-copyable type, even though a class containing a T& might be.)

253 ↗(On Diff #124609)

I think the idea is that a struct with one pointer and an odd number of ints, on 32-bit Windows, will have no padding per se, but its size will simply not be a multiple of its alignment. (So struct layout can put things in the four bytes after it, but will insert padding before it to place it at an 8 byte boundary.)

rsmith added inline comments.Nov 28 2017, 3:38 PM
253 ↗(On Diff #124609)
erichkeane added inline comments.Nov 28 2017, 4:04 PM

That makes sense to me, I can change that.

253 ↗(On Diff #124609)

Well... oh dear.

That would mean that:

auto p = &A::N;
static_assert(!has_unique_object_representations_v<decltype(p)>); // not unique, since it has padding, right?
struct S {
   decltype(p) s;
static_assert(!has_unique_object_representations_v<S>); // also not unique, since 's' has padding.

struct S2 {
   decltype(p) s;
   int a;
static_assert(has_unique_object_representations_v<S2>); // Actually unique again, since the padding is filled.

Do I have this right?

rsmith added inline comments.Nov 28 2017, 4:15 PM
253 ↗(On Diff #124609)

The first static_assert looks wrong. Should be:

static_assert(has_unique_object_representations_v<decltype(p)>); // unique, has no padding

Note that sizeof(decltype(p)) is the size without any padding. The type does not have padding itself, but might induce lead padding (due to alignment) and tail padding (due to size not being divisible by alignment) in objects it's placed within. But I think your current approach will get this all right, so long as MPI.HasPadding is only set to true in the cases where there actually is padding in the MPI.Width bytes of the representation (which only happens when Width is rounded up for alignment).

erichkeane added inline comments.Nov 28 2017, 6:03 PM

I did a bit more looking into this... In the 'TYPE b: X' case, 'X' is always the size of the field, so :

struct Bitfield {
  bool b: 16;
static_assert(sizeof(Bitfield) == 2);

The rest of my padding detection works correctly.

thanks for the comments rsmith, think this is all caught up to your comments. Also added the test for member ptr differences.

Did ore looking into the bitfield shennangins. Will update this patch tomorrow. @rsmith if you get a chance, do you perhaps have an opinion on 'bool' fields? It seems to me that a bool could either be considered an 8 bit value, or a 1 bit value with 7 bits of padding. So:

bool b;
struct S { bool b; };
 // Patch currently lists these as unique.  Should they be: 

Umm... so holy crap. We actually reserve the object spece for the whole thing, but ONLY the first sizeof(field) fields are actually stored to! Everything else is truncated!

I think the correct solution is to do:

// too large of a bitfield size causes truncation, and thus 'padding' bits.
if (FieldSizeInBits > Context.getTypeSizeInBits(Field->getType()) return false;
rsmith added inline comments.Nov 28 2017, 7:36 PM

Sounds like you've found more GCC bugs. In C++ (unlike in C), there is no restriction on the maximum length of a bit-field. Specifically, [class.bit]p1 says: "The value of the integral constant expression may be larger than the number of bits in the object
representation (6.7) of the bit-field’s type; in such cases the extra bits are padding bits (6.7)."

For non-bool bit-fields, or bool bit-fields with more than 8 bits, the extra bits are padding, and the type does not have unique object representations. You'll need to check for this.

For bool bit-fields of length 1, we obviously have unique object representations for that one bit.

The interesting case is bool bit-fields with length between 2 and 8 inclusive. It would seem reasonable to assume they have unique representations, as we do for plain bool values, but this is really an ABI question (as, actually, is the plain bool case). The x86-64 psABI is not very clear on the bit-field question, but it does say "Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true).", so for at least x86-64, bools of up to 8 bits should be considered to have unique object representations. The PPC64 psABI appears to say the same thing, but is also somewhat unclear about the bit-field case.


More cases to handle here:

  • vectors (careful about, eg, vector of 3 foo)
  • _Complex int and friends
  • _Atomic T
  • Obj-C block pointers
  • Obj-C object pointers
  • and perhaps OpenCL's various builtin types (pipe, sampler_t, event_t, clk_event_t, queue_t, reserve_id_t)

It's fine to leave these for another change, but we shouldn't forget about them. (There're also Obj-C class types and the Obj-C selector type, but I think it makes sense for those to return false here.)

erichkeane added inline comments.Nov 29 2017, 9:29 AM

I suspect that the 'bool' 2-8 condition is a case where we can let the sleeping dogs lie here :) Incoming patch has the other condition, where the bitfield size is greater than the type size.


I'm not sure what the correct answer is in any of those cases, since i'm not particularly familiar with any of them. I'll put a 'fixme' with the contents of your comment in the code, and revisit it (or hope someone more knowledgable can).

Woops, missed an 'svn add' and lost the member ptr test. Back now!

rsmith added inline comments.Nov 29 2017, 3:19 PM

On reflection, I don't think this is right, and likewise I don't think the check for unique object representations of the base class above is quite right.

A class can be standard-layout but not POD for the purposes of layout (eg, by declaring a special member function). If so, the derived class can reuse the base class's tail padding, and if it does, the derived class can have unique object representations even when the base class does not. Example:

struct A { ~A(); int n; char a; }; struct B : A { char b, c, d; };

Here, under the Itanium C++ ABI, sizeof(B) == 8, and B has unique object representations even though its base class A does not.

This should be fairly easy to fix. One approach would be to change the recursive call to hasUniqueObjectRepresentations above to return the actual size occupied by the struct and its fields (rather than checking for tail padding), and add that size here instead of using the base size. Or you could query the DataSize in the record layout rather than getting the (complete object) type size (but you'd then still need to check that there's no bits of tail padding before the end of the dsize, since we still pad out trailing bit-fields in the dsize computation).


What >= comparison? I think the comment has got out of date relative to the code.

erichkeane marked an inline comment as done.Nov 29 2017, 3:49 PM
erichkeane added inline comments.

According to the standard, the above case isn't, because it is non-trivial, right? 9.1 requires that "T" (B in your case) be trivially copyable, which it isn't, right?

The predicate condition for a template specialization
has_unique_object_representations<T> shall be
satisfied if and only if:
(9.1) - T is trivially copyable, and
(9.2) - any two objects of type T with the same value have the same
object representation

rsmith added inline comments.Nov 29 2017, 3:56 PM

Fixed example:

struct A { int &r; char a; }; struct B : A { char b[7]; };
erichkeane added inline comments.Nov 29 2017, 4:01 PM

AH! I got caught up by the destructor as the reason it wasn't unique, but the actual thing you meant is that "A" has tail padding, so it is NOT unique. However, on Itanium, the tail padding gets used when inheriting from it.

Do I have that correct? I just have to fix the behavior of inheriting-removing-tail-padding?

Fixed the 'tail padding' mechanism to work correctly as suggested by @rsmith

rsmith added inline comments.Nov 29 2017, 5:14 PM

You have fallen into a trap :) There can be (up to 7 bits of) padding between the end of the members and the end of the data size, specifically if the struct ends in a bit-field. You could check for that before you return CurOffsetInBits at the end of this function, but I think it'd be better to store the size in Bases and just use that down here rather than grabbing and using the data size.

erichkeane added inline comments.Nov 29 2017, 5:31 PM

Gah! You're right! Fix incoming.

Fix for trailing padding BITS, which obviously won't be merged.

rsmith accepted this revision.Nov 29 2017, 5:37 PM

Thanks for your patience, this turned out to be surprisingly complicated.

This revision is now accepted and ready to land.Nov 29 2017, 5:37 PM

Thanks for your patience, this turned out to be surprisingly complicated.

Of course! Thank you as well for your input, I was surprised at how complicated this ended up being as well!

This revision was automatically updated to reflect the committed changes.