This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/AST/
-
lib/
-
AST/
6
RecordLayoutBuilder.cpp

Differential D85191

[AST] Get field size in chars rather than bits in RecordLayoutBuilder.
ClosedPublic

Authored by ebevhan on Aug 4 2020, 3:59 AM.

Download Raw Diff

Details

Reviewers

jasonliu
efriedma

Commits

rG1e7ec4842c1a: [AST] Get field size in chars rather than bits in RecordLayoutBuilder.

Summary

In D79719, LayoutField was refactored to fetch the size of field
types in bits and then convert to chars, rather than fetching
them in chars directly. This is not ideal, since it makes the
calculations char size dependent, and breaks for sizes that
are not a multiple of the char size.

This patch changes it to use getTypeInfoInChars instead of
getTypeInfo.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebevhan created this revision.Aug 4 2020, 3:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 3:59 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

ebevhan requested review of this revision.Aug 4 2020, 3:59 AM

ebevhan mentioned this in D79719: [AIX] Implement AIX special alignment rule about double/long double.Aug 4 2020, 4:01 AM

bjope added a subscriber: bjope.Aug 4 2020, 4:25 AM

Harbormaster completed remote builds in B66912: Diff 282859.Aug 4 2020, 5:02 AM

Xiangling_L added a subscriber: Xiangling_L.Aug 4 2020, 7:16 AM

Xiangling_L added inline comments.

clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842	In most cases, `getTypeInfoInChars` invokes `getTypeInfo` underneath. So to make people be careful about this, I would suggest to leave a comment explaining/claiming we have to call `getTypeInfoInChars` here. And also maybe adding a testcase to guard the scenario you were talking about would be helpful to prevent someone to use `getTypeInfo` here in the future.

ebevhan added inline comments.Aug 4 2020, 8:43 AM

clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842	I can do that. I honestly don't think it would be a bad idea to add an assertion to toCharUnitsFromBits that checks for non-bytesize-multiple amounts. I wonder how much would fail if I did that, though.

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

In D85191#2193645, @rsmith wrote:

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

We did it this way because it was possible. If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

ebevhan added inline comments.Aug 5 2020, 4:22 AM

clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842	Oh, I guess I only really replied to the first part about adding a comment here... I'm not sure I can make a test case for this, since I don't think there's anything that triggers this upstream.

In D85191#2195923, @ebevhan wrote:

In D85191#2193645, @rsmith wrote:

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

In D85191#2196863, @rsmith wrote:

In D85191#2195923, @ebevhan wrote:

In D85191#2193645, @rsmith wrote:

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

I don't see that it breaks the language rules. The sizeof result for the 24 bit type should be 2 in the target described by @ebevhan (two 16-bit bytes). But I imagine that without this patch it is reported as 24/16=1, right?

So isn't the problem that toCharUnitsFromBits is rounding down when given a bitsize that isn't a multiple of CHAR_BIT? Would it perhaps make sense to let it round up instead?

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

In D85191#2197550, @bjope wrote:

In D85191#2196863, @rsmith wrote:

In D85191#2195923, @ebevhan wrote:

In D85191#2193645, @rsmith wrote:

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

I don't see that it breaks the language rules. The sizeof result for the 24 bit type should be 2 in the target described by @ebevhan (two 16-bit bytes). But I imagine that without this patch it is reported as 24/16=1, right?

Yes, this is what's happening. The sizeof should be reported as 2 (32 bits), but isn't, because toCharUnitsFromBits always rounds down.

So isn't the problem that toCharUnitsFromBits is rounding down when given a bitsize that isn't a multiple of CHAR_BIT? Would it perhaps make sense to let it round up instead?

The issue with toCharUnitsFromBits is that it's an inherently dangerous API. There could be cases where you want to round down, and cases where you want to round up. The function cannot know.

It could be better if toCharUnitsFromBits took an extra parameter that explicitly specifies the rounding, and if that parameter is set to a default (for unspecified rounding) and the amount passed is not a multiple of the char size, it asserts. This would make a lot of tests fail until all of the uses are corrected, though.

In D85191#2196863, @rsmith wrote:

In D85191#2195923, @ebevhan wrote:

In D85191#2193645, @rsmith wrote:

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

But this is the crux of the matter; if you aren't allowed to return non-char-sizes from getTypeInfo, then there's no way to specify via TargetInfo/getTypeInfo that the number of value bits of a type is less than the size in chars. That is, that a type is padded. And as your examples show, C does not disallow that.

In D85191#2197663, @efriedma wrote:

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

This does sound like a good thing to do, but it would be problematic downstream since it would completely prohibit the design that we're trying to use.

I don't know whether the name of your downstream target is a secret. Wouldn't it help you to add a fake 16bit per char target to clang and add units tests to prevent regressions?

In D85191#2199574, @ebevhan wrote:

In D85191#2197663, @efriedma wrote:

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

This does sound like a good thing to do, but it would be problematic downstream since it would completely prohibit the design that we're trying to use.

That would be a good thing, as the design you're trying to use is not one that we intend to support. The size returned by getTypeInfo is intended to include padding.

It doesn't feel like this patch got a very positive reception, but I'd still like to try a bit more to get it in.

Even though it's difficult to test this particular change upstream, would it still be acceptable to take the patch since it reverts the behavior to what it was previously? If there are worries that things may break in the future due to other changes, we do catch these things in our downstream testing and are fairly diligent about reporting back about them.

I'm still concerned your approach to the computation of getTypeSize() is a ticking time bomb, but I'll take the cleanup even if the underlying motivation doesn't really make sense.

clang/lib/AST/RecordLayoutBuilder.cpp
1847	Can we fix getTypeInfoInChars so it returns all the necessary info, so we don't look up the typeinfo twice?

ebevhan added inline comments.Aug 19 2020, 3:49 AM

clang/lib/AST/RecordLayoutBuilder.cpp
1847	That feels like a hefty change since it would require changing every caller of getTypeInfoInChars. Do you want me to do that in this patch or in a separate one?

LGTM

clang/lib/AST/RecordLayoutBuilder.cpp
1847	Separate patch makes sense. If you want to do it after this one, that's fine.

This revision is now accepted and ready to land.Aug 19 2020, 1:52 PM

This revision was landed with ongoing or failed builds.Aug 20 2020, 1:31 AM

Closed by commit rG1e7ec4842c1a: [AST] Get field size in chars rather than bits in RecordLayoutBuilder. (authored by ebevhan). · Explain Why

This revision was automatically updated to reflect the committed changes.

ebevhan added a commit: rG1e7ec4842c1a: [AST] Get field size in chars rather than bits in RecordLayoutBuilder..

ebevhan mentioned this in D86447: [AST] Change return type of getTypeInfoInChars to a proper struct instead of std::pair..Aug 24 2020, 5:12 AM

ebevhan mentioned this in rG101309fe048e: [AST] Change return type of getTypeInfoInChars to a proper struct instead of….Oct 13 2020, 4:44 AM

Revision Contents

Path

Size

clang/

lib/

AST/

RecordLayoutBuilder.cpp

9 lines

Diff 286735

clang/lib/AST/RecordLayoutBuilder.cpp

Show First 20 Lines • Show All 1,832 Lines • ▼ Show 20 Lines	void ItaniumRecordLayoutBuilder::LayoutField(const FieldDecl *D,
CharUnits FieldSize;		CharUnits FieldSize;
CharUnits FieldAlign;		CharUnits FieldAlign;
// The amount of this class's dsize occupied by the field.		// The amount of this class's dsize occupied by the field.
// This is equal to FieldSize unless we're permitted to pack		// This is equal to FieldSize unless we're permitted to pack
// into the field's tail padding.		// into the field's tail padding.
CharUnits EffectiveFieldSize;		CharUnits EffectiveFieldSize;

auto setDeclInfo = [&](bool IsIncompleteArrayType) {		auto setDeclInfo = [&](bool IsIncompleteArrayType) {
TypeInfo TI = Context.getTypeInfo(D->getType());		auto TI = Context.getTypeInfoInChars(D->getType());
FieldAlign = Context.toCharUnitsFromBits(TI.Align);		FieldAlign = TI.second;
		Xiangling_LUnsubmitted Not Done Reply Inline Actions In most cases, `getTypeInfoInChars` invokes `getTypeInfo` underneath. So to make people be careful about this, I would suggest to leave a comment explaining/claiming we have to call `getTypeInfoInChars` here. And also maybe adding a testcase to guard the scenario you were talking about would be helpful to prevent someone to use `getTypeInfo` here in the future. Xiangling_L: In most cases, `getTypeInfoInChars` invokes `getTypeInfo` underneath. So to make people be…
		ebevhanAuthorUnsubmitted Not Done Reply Inline Actions I can do that. I honestly don't think it would be a bad idea to add an assertion to toCharUnitsFromBits that checks for non-bytesize-multiple amounts. I wonder how much would fail if I did that, though. ebevhan: I can do that. I honestly don't think it would be a bad idea to add an assertion to…
		ebevhanAuthorUnsubmitted Not Done Reply Inline Actions Oh, I guess I only really replied to the first part about adding a comment here... I'm not sure I can make a test case for this, since I don't think there's anything that triggers this upstream. ebevhan: Oh, I guess I only really replied to the first part about adding a comment here... I'm not sure…
// Flexible array members don't have any size, but they have to be		// Flexible array members don't have any size, but they have to be
// aligned appropriately for their element type.		// aligned appropriately for their element type.
EffectiveFieldSize = FieldSize =		EffectiveFieldSize = FieldSize =
IsIncompleteArrayType ? CharUnits::Zero()		IsIncompleteArrayType ? CharUnits::Zero() : TI.first;
: Context.toCharUnitsFromBits(TI.Width);		AlignIsRequired = Context.getTypeInfo(D->getType()).AlignIsRequired;
		efriedmaUnsubmitted Not Done Reply Inline Actions Can we fix getTypeInfoInChars so it returns all the necessary info, so we don't look up the typeinfo twice? efriedma: Can we fix getTypeInfoInChars so it returns all the necessary info, so we don't look up the…
		ebevhanAuthorUnsubmitted Not Done Reply Inline Actions That feels like a hefty change since it would require changing every caller of getTypeInfoInChars. Do you want me to do that in this patch or in a separate one? ebevhan: That feels like a hefty change since it would require changing every caller of…
		efriedmaUnsubmitted Not Done Reply Inline Actions Separate patch makes sense. If you want to do it after this one, that's fine. efriedma: Separate patch makes sense. If you want to do it after this one, that's fine.
AlignIsRequired = TI.AlignIsRequired;
};		};

if (D->getType()->isIncompleteArrayType()) {		if (D->getType()->isIncompleteArrayType()) {
setDeclInfo(true /* IsIncompleteArrayType */);		setDeclInfo(true /* IsIncompleteArrayType */);
} else if (const ReferenceType *RT = D->getType()->getAs<ReferenceType>()) {		} else if (const ReferenceType *RT = D->getType()->getAs<ReferenceType>()) {
unsigned AS = Context.getTargetAddressSpace(RT->getPointeeType());		unsigned AS = Context.getTargetAddressSpace(RT->getPointeeType());
EffectiveFieldSize = FieldSize = Context.toCharUnitsFromBits(		EffectiveFieldSize = FieldSize = Context.toCharUnitsFromBits(
Context.getTargetInfo().getPointerWidth(AS));		Context.getTargetInfo().getPointerWidth(AS));
▲ Show 20 Lines • Show All 1,760 Lines • Show Last 20 Lines