This is an archive of the discontinued LLVM Phabricator instance.

[AST] Get field size in chars rather than bits in RecordLayoutBuilder.
ClosedPublic

Authored by ebevhan on Aug 4 2020, 3:59 AM.

Details

Summary

In D79719, LayoutField was refactored to fetch the size of field
types in bits and then convert to chars, rather than fetching
them in chars directly. This is not ideal, since it makes the
calculations char size dependent, and breaks for sizes that
are not a multiple of the char size.

This patch changes it to use getTypeInfoInChars instead of
getTypeInfo.

Diff Detail

Event Timeline

ebevhan created this revision.Aug 4 2020, 3:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 3:59 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ebevhan requested review of this revision.Aug 4 2020, 3:59 AM
bjope added a subscriber: bjope.Aug 4 2020, 4:25 AM
Xiangling_L added inline comments.
clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842

In most cases, getTypeInfoInChars invokes getTypeInfo underneath. So to make people be careful about this, I would suggest to leave a comment explaining/claiming we have to call getTypeInfoInChars here. And also maybe adding a testcase to guard the scenario you were talking about would be helpful to prevent someone to use getTypeInfo here in the future.

ebevhan added inline comments.Aug 4 2020, 8:43 AM
clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842

I can do that.

I honestly don't think it would be a bad idea to add an assertion to toCharUnitsFromBits that checks for non-bytesize-multiple amounts. I wonder how much would fail if I did that, though.

rsmith added a subscriber: rsmith.EditedAug 4 2020, 9:13 AM

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

We did it this way because it was possible. If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

ebevhan added inline comments.Aug 5 2020, 4:22 AM
clang/lib/AST/RecordLayoutBuilder.cpp
1841–1842

Oh, I guess I only really replied to the first part about adding a comment here... I'm not sure I can make a test case for this, since I don't think there's anything that triggers this upstream.

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

bjope added a comment.Aug 5 2020, 1:09 PM

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

I don't see that it breaks the language rules. The sizeof result for the 24 bit type should be 2 in the target described by @ebevhan (two 16-bit bytes). But I imagine that without this patch it is reported as 24/16=1, right?

So isn't the problem that toCharUnitsFromBits is rounding down when given a bitsize that isn't a multiple of CHAR_BIT? Would it perhaps make sense to let it round up instead?

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

I don't see that it breaks the language rules. The sizeof result for the 24 bit type should be 2 in the target described by @ebevhan (two 16-bit bytes). But I imagine that without this patch it is reported as 24/16=1, right?

Yes, this is what's happening. The sizeof should be reported as 2 (32 bits), but isn't, because toCharUnitsFromBits always rounds down.

So isn't the problem that toCharUnitsFromBits is rounding down when given a bitsize that isn't a multiple of CHAR_BIT? Would it perhaps make sense to let it round up instead?

The issue with toCharUnitsFromBits is that it's an inherently dangerous API. There could be cases where you want to round down, and cases where you want to round up. The function cannot know.

It could be better if toCharUnitsFromBits took an extra parameter that explicitly specifies the rounding, and if that parameter is set to a default (for unspecified rounding) and the amount passed is not a multiple of the char size, it asserts. This would make a lot of tests fail until all of the uses are corrected, though.

This is not ideal, since it makes the calculations char size dependent, and breaks for sizes that are not a multiple of the char size.

How can we have a non-bitfield member whose size is not a multiple of the size of a char?

Downstream, we have fixed-point types that are 24 bits large, but where the char size is 16. The type then takes up 2 chars, where 8 of the bits are padding. The only way in Clang to express that the width of the bit representation of a type should be smaller than the number of chars it takes up in memory -- and consequently, produce an i24 in IR -- is to return a non-charsize multiple from getTypeInfo.

This violates the C and C++ language rules, which require the size of every type to be a multiple of the size of char. A type with 24 value bits and 8 padding bits should report a type size of 32 bits, just like bool reports a size of CHAR_BIT bits despite having only 1 value bit, and x86_64 long double reports a type size of 128 bits despite having only 80 value bits.

But this is the crux of the matter; if you aren't allowed to return non-char-sizes from getTypeInfo, then there's no way to specify via TargetInfo/getTypeInfo that the number of value bits of a type is less than the size in chars. That is, that a type is padded. And as your examples show, C does not disallow that.

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

This does sound like a good thing to do, but it would be problematic downstream since it would completely prohibit the design that we're trying to use.

tschuett added a subscriber: tschuett.EditedAug 6 2020, 6:10 AM

I don't know whether the name of your downstream target is a secret. Wouldn't it help you to add a fake 16bit per char target to clang and add units tests to prevent regressions?

rsmith added a comment.Aug 6 2020, 4:01 PM

If the intent is for getTypeInfo to always return sizes that are multiples of the char size, then the design should be inverted and getTypeInfo should simply be calling getTypeInfoInChars and multiply that result by the char size. But that isn't how it works.

The reason it doesn't work this way is just that someone made the wrong choice a decade ago, and nobody has spent the time to rewrite it since. Patch welcome.

This does sound like a good thing to do, but it would be problematic downstream since it would completely prohibit the design that we're trying to use.

That would be a good thing, as the design you're trying to use is not one that we intend to support. The size returned by getTypeInfo is intended to include padding.

It doesn't feel like this patch got a very positive reception, but I'd still like to try a bit more to get it in.

Even though it's difficult to test this particular change upstream, would it still be acceptable to take the patch since it reverts the behavior to what it was previously? If there are worries that things may break in the future due to other changes, we do catch these things in our downstream testing and are fairly diligent about reporting back about them.

I'm still concerned your approach to the computation of getTypeSize() is a ticking time bomb, but I'll take the cleanup even if the underlying motivation doesn't really make sense.

clang/lib/AST/RecordLayoutBuilder.cpp
1847

Can we fix getTypeInfoInChars so it returns all the necessary info, so we don't look up the typeinfo twice?

ebevhan added inline comments.Aug 19 2020, 3:49 AM
clang/lib/AST/RecordLayoutBuilder.cpp
1847

That feels like a hefty change since it would require changing every caller of getTypeInfoInChars. Do you want me to do that in this patch or in a separate one?

efriedma accepted this revision.Aug 19 2020, 1:52 PM

LGTM

clang/lib/AST/RecordLayoutBuilder.cpp
1847

Separate patch makes sense.

If you want to do it after this one, that's fine.

This revision is now accepted and ready to land.Aug 19 2020, 1:52 PM
This revision was landed with ongoing or failed builds.Aug 20 2020, 1:31 AM
This revision was automatically updated to reflect the committed changes.