This is an archive of the discontinued LLVM Phabricator instance.

[clang-format][NFC] Be more careful about the layout of FormatToken.
ClosedPublic

Authored by riccibruno on Jul 22 2020, 4:04 AM.

Details

Summary

The underlying ABI forces FormatToken to have a lot of padding. Currently (on x86-64 linux) sizeof(FormatToken) == 288. After this patch sizeof(FormatToken) == 232.

No functional changes.

Diff Detail

Event Timeline

riccibruno created this revision.Jul 22 2020, 4:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2020, 4:04 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MyDeveloperDay added inline comments.Jul 23 2020, 2:27 AM
clang/lib/Format/FormatToken.h
177

educate me, why

unsigned IsFirst : 1;

here and not

bool IsFirst : 1;

is that equivalent? (I'm literally not sure myself), I wrote a little test just to remind myself how this stuff works.

#include <iostream>

class Foo
{
public:
    Foo()
      : A(true)
      , B(false)
      , C(true)
    {
    }
    bool A : 1;
    bool B : 1;
    bool C : 1;
};

class Bar
{
public:
    Bar()
      : A(true)
      , B(false)
      , C(true)
    {
    }
    unsigned A : 1;
    unsigned B : 1;
    unsigned C : 1;
};

class Fuz
{
public:
    Fuz()
      : A(true)
      , B(false)
      , C(true)
    {
    }
    bool A;
    bool B;
    bool C;
};

class Baz
{
public:
    Baz()
      : A(true)
      , B(false)
      , C(true)
    {
    }
    unsigned A;
    unsigned B;
    unsigned C;
};

int
main(int argc, char *argv[])
{
    std::cerr << "Foo " << sizeof(Foo) << "\n";
    std::cerr << "Bar " <<sizeof(Bar) << "\n";
    std::cerr << "Fuz " <<sizeof(Fuz) << "\n";
    std::cerr << "Baz " <<sizeof(Baz) << "\n";

    return 0;
}

When run gives the following:

Foo 1
Bar 4
Fuz 3
Baz 12

So I guess my question is could there be more space savings if using bool IsFirst:1 (and the others), I'd also think that would help clarity a little (or did I misunderstand?)

MyDeveloperDay marked an inline comment as done.EditedJul 23 2020, 2:37 AM

Thank you for the patch, I'm generally in agreement but I think we should let some others comment

clang/lib/Format/ContinuationIndenter.cpp
652

I think this is better in that its now easier perhaps to see when the block kind gets checked:

I wonder if it would read even better as if we added is(BraceBlockKind) isNot(BraceBlockKind)

e.g.

Previous.is(BK_BraceInit)

clang/lib/Format/FormatToken.h
151

I much prefer putting the initialization here, I think it makes it MUCH clearer

riccibruno marked 3 inline comments as done.Jul 23 2020, 8:03 AM
riccibruno added inline comments.
clang/lib/Format/ContinuationIndenter.cpp
652

Yes I think I agree with you; adding is and isNot would be an improvement. I will still leave getBlockKind if someone wants to switch on it.

clang/lib/Format/FormatToken.h
151

It is necessary anyway since a bit-field cannot have a default member initializer pre-C++20.

177

It has to do with the ABI since as per [class.bit]p1:

[...] Allocation of bit-fields within a class object is implementation-defined. Alignment of bit fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]

Now the two relevant ABIs are the Itanium ABI (https://github.com/itanium-cxx-abi/cxx-abi/blob/master/abi-layout.html for the details) and the MS ABI (say on x86-64). Happily both are supported by clang so we can use -fdump-record-layouts and compare them.

Consider S0 in https://godbolt.org/z/orYv5j (itanium on the left/MS on the right):

Both ABIs will not let a bit-field cross a boundary. Therefore c0 and c1 will use 10 bits. If the type had been short instead of char then only 9 bits would have been used. The size of S0 would still have been 2 in both cases.

Consider now S1. The MS ABI, unlike the Itanium ABI, will not put bit-fields together if their types have a different size. Therefore sizeof(S1) is 2 under the Itanium ABI and 4 under the MS ABI.

Using an unsigned systematically avoids having a boundary every 8 bits, and avoids the issue with the MS ABI.

Use is and isNot.

riccibruno marked 2 inline comments as done.Jul 26 2020, 9:13 AM

a new minor Nits but looks good.

clang/lib/Format/ContinuationIndenter.cpp
1488

Current.is(PPK_OnePerLine)

1490

Current.is(PPK_Inconclusive)

riccibruno marked 2 inline comments as done.

Remove a few missed getPackingKind.

MyDeveloperDay accepted this revision.Jul 28 2020, 2:01 AM

Thank you for the patch, this LGTM, I think this kind of change should help reduce memory usage and I feel improves the readability.

This revision is now accepted and ready to land.Jul 28 2020, 2:01 AM

Thank you for the patch, this LGTM, I think this kind of change should help reduce memory usage and I feel improves the readability.

Thanks for your review!

klimek added a subscriber: klimek.Sep 25 2020, 4:43 AM

FWIW, finding this due to seeing the added complexity. Do you have data on what the difference is on clang-format for either overall memory use or performance? Thanks!

MyDeveloperDay added a comment.EditedSep 25 2020, 7:41 AM

Which bit do you find more complex? adding something to the FormatToken or the use of the is() calls?

Which bit do you find more complex? adding something to the FormatToken or the use of the is() calls?

I think mainly that
a) the language doesn't support bitfields well yet, as evidenced by having to pull the default values into the constructor, where it's harder to maintain imo (but that will change)
b) we'll need to continuously take care that the bitfields stay bundled

The setters / getters are generally fine, I think; I believe the decision to treat this as a struct from the very beginning of the project was bad, but refactoring this into something more cohesive seems like a daunting task.

To be clear, I don't think it's a lot of complexity, I was mainly wondering whether we have some measurement what it gets us.