This is an archive of the discontinued LLVM Phabricator instance.

[format] follow up: Use unsigned char as the base of all enums in FormatStyle
AbandonedPublic

Authored by kwk on Feb 23 2022, 2:22 AM.

Details

Summary

Analogous to https://reviews.llvm.org/D93758, this patch uses
unsigned char for all enums in FormatStyle.

Diff Detail

Event Timeline

kwk requested review of this revision.Feb 23 2022, 2:22 AM
kwk created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 2:22 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
kwk updated this revision to Diff 411105.Feb 24 2022, 6:10 AM

Rebased on upstream/main

Is there a clang-tidy check for this? I mean if we are doing this because it helps reduce the size of FormatStyle and its recommended practice for small enums, then we should have a clang-tidy check that catches us on this. (no?)

MyDeveloperDay accepted this revision.Feb 27 2022, 4:43 AM
This revision is now accepted and ready to land.Feb 27 2022, 4:43 AM
MyDeveloperDay added a project: Restricted Project.

Looks innocent but I have a few questions.
What's the size difference that you observe?
Any runtime penalty? (Usually compilers choose a faster underlying type for enums, not smaller)
Also, is this change very relevant to clang-format? We should not copy FormatStyle a lot, no?

The decision is made in the header (so I think it will always be int until this doesn't fit). To decide what is faster the compiler would need to know all the usages. Which it can't.

We have multiple times bit fields in clang-format to reduce the object size. I think it's only consistent to decrease the FormatStyle size.

The only reason I didn't do it for the enums introduced by me is consistency.

Is there a clang-tidy check for this? I mean if we are doing this because it helps reduce the size of FormatStyle and its recommended practice for small enums, then we should have a clang-tidy check that catches us on this. (no?)

None that I'm aware of.

Any runtime penalty? (Usually compilers choose a faster underlying type for enums, not smaller)

The decision is made in the header (so I think it will always be int until this doesn't fit). To decide what is faster the compiler would need to know all the usages. Which it can't.

Yes, Clang chooses int by default. In fact, C++11 and later mandate that a new-style scoped enum enum class Foo { means the same thing as enum class Foo : int { — see https://timsong-cpp.github.io/cppwp/n3337/dcl.enum#5 . For unscoped old-style enums, Clang/GCC/MSVC all choose int by default, unless -fshort-enums is passed on the command line and/or the enum is annotated with __attribute__((packed)), in which case it chooses the smallest possible type. -fshort-enums and __attribute__((packed)) have no effect on new-style scoped enums nor on enums with explicit underlying types.

Before this lands can we have a discussion about what clarity this gives us?, because I think it makes the code more unreadable, but surely we are saving just 7x(3 bytes) (the difference between and int and a unsigned char for 7 enums)

Is saving 21 bytes valuable?

Before this lands can we have a discussion about what clarity this gives us?, because I think it makes the code more unreadable, but surely we are saving just 7x(3 bytes) (the difference between and int and a unsigned char for 7 enums)

Is saving 21 bytes valuable?

Peanut gallery says: I think giving every enum a concrete underlying type is a good practice, so I don't think this makes anything less readable. (That is, I'd even favor adding : int to each of these enums, over the status quo where the type is merely implied — we've seen that people can be unsure of the exact behavior.)
I can imagine that someone might object that hard-coding one byte for each enum might one day cause problems if we want more than 256 possible settings for any given enum, but that's not likely, is it?
Re space savings, OP says that they're saving 21 bytes per object and that they have "lots" of objects in their particular use-case, even if clang-format itself doesn't.

In my own area I gave FormatStyle a copy constructor and made it private, it hardly caused any issues, this isn't something that we tend copy about very much, we tend copy construct it only when we do getLLVMStyle() to make another style say like google style sure we'll do that a handful of times but I very much doubt it would make an impact.. can we measure?

I have no problems adding the base class if that's the LLVM coding style recommendation, otherwise I don't see it gives us anything.

Before this lands can we have a discussion about what clarity this gives us?, because I think it makes the code more unreadable, but surely we are saving just 7x(3 bytes) (the difference between and int and a unsigned char for 7 enums)

Is saving 21 bytes valuable?

Peanut gallery says: I think giving every enum a concrete underlying type is a good practice, so I don't think this makes anything less readable. (That is, I'd even favor adding : int to each of these enums, over the status quo where the type is merely implied — we've seen that people can be unsure of the exact behavior.)

I also like a defined type for enums (I even have multiple enums based on bool).

I can imagine that someone might object that hard-coding one byte for each enum might one day cause problems if we want more than 256 possible settings for any given enum, but that's not likely, is it?

I think an option with more than 256 variants would be a horror. But even if we need that, we can just increase the type, since we have no API or ABI stability in clang-format. We regulary change a bool to an enum.

That being said, I don't care that much that we declare the enums with a type. But I care about consistency. Either land this, or revert D93758.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 12:08 PM
kwk added a comment.Mar 1 2022, 12:48 PM

That being said, I don't care that much that we declare the enums with a type. But I care about consistency. Either land this, or revert D93758.

Consistency was more or less the only reason why I made this change in the first place and I didn't expect so many comments on a change that is almost identical to D93758. Sure I didn't have a good argument for or against it so far but... consistency... I'm glad this discussion came up anyways. Will look at each comment tomorrow.

owenpan added a subscriber: owenpan.Mar 1 2022, 3:24 PM

Can we use int8_t instead of unsigned char to keep the enum types signed (and update the changes made in D93758)?

Can we use int8_t instead of unsigned char to keep the enum types signed (and update the changes made in D93758)?

+1

kwk abandoned this revision.Mar 3 2022, 2:59 AM

I abandon this revision in favor of https://reviews.llvm.org/D120884.