This is an archive of the discontinued LLVM Phabricator instance.

[clang][CodeGen]Fix clang crash and add bitfield support in __builtin_dump_struct
ClosedPublic

Authored by yihanaa on Mar 22 2022, 11:51 AM.

Details

Summary

Fix clang crash and add bitfield support in __builtin_dump_struct.

In clang13.0.x, a struct with three or more members and a bitfield at the same time will cause a crash. In clang15.x, as long as the struct has one bitfield, it will cause a crash in clang.

Open issue: https://github.com/llvm/llvm-project/issues/54462

Diff Detail

Event Timeline

yihanaa created this revision.Mar 22 2022, 11:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 11:51 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
yihanaa requested review of this revision.Mar 22 2022, 11:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 11:51 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
yihanaa edited the summary of this revision. (Show Details)Mar 22 2022, 11:54 AM
yihanaa added reviewers: dexonsmith, skan, StephenFan.

the remaining changes are code formatting

Please remove code formatting changes.

yihanaa updated this revision to Diff 417435.Mar 22 2022, 4:47 PM
yihanaa added a reviewer: xbolva00.

Remove code formatting changes.

the remaining changes are code formatting

Please remove code formatting changes.

Okay, i have removed.

yihanaa updated this revision to Diff 417439.Mar 22 2022, 4:56 PM

Remove useless comments.

yihanaa edited the summary of this revision. (Show Details)Mar 23 2022, 12:59 AM

Thanks for the fix! Can you please add a release note as well?

clang/test/CodeGen/dump-struct-builtin.c
737

Can you add some test coverage for zero-width bit-fields and unnamed bit-fields just to demonstrate how they behave?

Seems reasonable to me. Obviously needs release notes + the tests that Aaron did, but I think this looks right.

Thanks for the fix! Can you please add a release note as well?

Okay, i try to add some test case for zero-width bitfield and unnamed bitfield, and add a release notes.

yihanaa updated this revision to Diff 417687.EditedMar 23 2022, 10:47 AM
  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}
yihanaa marked an inline comment as done.

Seems reasonable to me. Obviously needs release notes + the tests that Aaron did, but I think this looks right.

Thank you for reminding,i have add a release notes entry and a test case for zero-width bitfield & unnamed bitfield.

  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

yihanaa added a comment.EditedMar 23 2022, 11:50 AM
  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

Thanks, I don't know what the value of this zero-width bitfield should output, can it be a empty value as same as unnamed-bitfield’ she field name?

for example:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int  :
unsigned int b : 2022
}
  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

Thanks, I don't know what the value of this zero-width bitfield should output, can it be a empty value as same as unnamed-bitfield’ she field name?

for example:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int  :
unsigned int b : 2022
}

I would definitely expect this, yes.

  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

Thanks, I don't know what the value of this zero-width bitfield should output, can it be a empty value as same as unnamed-bitfield’ she field name?

for example:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int  :
unsigned int b : 2022
}

I would definitely expect this, yes.

I will try to modify it according to this rule.

  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

Thanks, I don't know what the value of this zero-width bitfield should output, can it be a empty value as same as unnamed-bitfield’ she field name?

for example:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int  :
unsigned int b : 2022
}

I would definitely expect this, yes.

Oh wow... I was seeing the : in the output and thinking that was showing me the bit width after the colon (given that we show the type information), not the value. I wonder how many other folks are tripped up by that sort of thing? In the meantime, now that I understand the printing format better, I would expect that output as well.

  1. Support zero-width bitfield, named bitfield and unnamed bitfield.
  2. Add a release notes.

The builtin function __builtin_dump_struct behaves for zero-width bitfield and unnamed bitfield as follows

int printf(const char *fmt, ...);

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

int main() {
  foo();
  return 0;
}

Output:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int b : 2022
}

Thank you for the release note and additional test coverage. I'm wondering why we handle the zero-width bit-field differently from the anonymous one (e.g., why do we not have unsigned int : 3 before the unsigned int : 0? It seems a bit odd to drop that from the output.

Thanks, I don't know what the value of this zero-width bitfield should output, can it be a empty value as same as unnamed-bitfield’ she field name?

for example:

struct Bar {
unsigned int c : 1
unsigned int  : 0
unsigned int  :
unsigned int b : 2022
}

I would definitely expect this, yes.

Oh wow... I was seeing the : in the output and thinking that was showing me the bit width after the colon (given that we show the type information), not the value. I wonder how many other folks are tripped up by that sort of thing? In the meantime, now that I understand the printing format better, I would expect that output as well.

Now that you mention it: SHOULD we emit the size of the bitfield on that line? The 'colon' is quite unfortunate there for exactly that reason (the way you misread it).

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

good idea!

yihanaa added a comment.EditedMar 23 2022, 1:15 PM

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

How to judge whether this field is initialized? Maybe this memory has been initialized by memset.

Does it just need to be marked with an zero-width bitfield?because we cannot get its value.

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

how

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

How to judge whether this field is initialized? Maybe this memory has been initialized by memset

He means a special-case for the 0-size bitfield, which HAS no value (actually, wonder if this is a problem with the no-unique-address types as well?). I might suggest N/A instead of uninitialized, but am open to bikeshedding.

yihanaa added a comment.EditedMar 23 2022, 1:30 PM

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

how

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

How to judge whether this field is initialized? Maybe this memory has been initialized by memset

He means a special-case for the 0-size bitfield, which HAS no value (actually, wonder if this is a problem with the no-unique-address types as well?). I might suggest N/A instead of uninitialized, but am open to bikeshedding.

I'm sorry, I misunderstood @aaron.ballman.

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

how

If it is ok, I think we should probably change the format of the 'dump' for fields. Using the colon to split up the field from the value is unfortunate, may I suggest replacing it with '=' instead? As well as printing the size after a colon. So for:

void foo(void) {
  struct Bar {
    unsigned c : 1;
    unsigned : 3;
    unsigned : 0;
    unsigned b;
  };

  struct Bar a = {
    .c = 1,
    .b = 2022,
  };

  __builtin_dump_struct(&a, &printf);
}

Output:

struct Bar {
unsigned int c : 1 = 1
unsigned int : 3  = 0
unsigned int : 0 = 
unsigned int b = 2022
}

What do you all think?

I think that's a good idea for clarity. For the case where we have no value, I wonder if we want to do something like: unsigned int : 0 = <uninitialized> (or something else to make it exceptionally clear that there's nothing missing after the =)?

How to judge whether this field is initialized? Maybe this memory has been initialized by memset

He means a special-case for the 0-size bitfield, which HAS no value (actually, wonder if this is a problem with the no-unique-address types as well?). I might suggest N/A instead of uninitialized, but am open to bikeshedding.

I'm sorry I misunderstood what you meant @aaron.ballman.

Can we follow the lead of LLVM IR?it use 'undef'
for example:

struct T6A {
    unsigned a : 1;
    unsigned  : 0;
    unsigned c : 1;
};

@__const.foo.a = private unnamed_addr constant %struct.T6A { i8 1, [3 x i8] undef, i8 1, [3 x i8] undef }, align 4

i'm open to bikeshedding too.

I'm sorry I misunderstood what you meant @aaron.ballman.

Can we follow the lead of LLVM IR?it use 'undef'
for example:

struct T6A {
    unsigned a : 1;
    unsigned  : 0;
    unsigned c : 1;
};

@__const.foo.a = private unnamed_addr constant %struct.T6A { i8 1, [3 x i8] undef, i8 1, [3 x i8] undef }, align 4

I misunderstood him too, he told me off line :)

I guess I would be 'ok' with undef, though that has a different meaning (it means, this has an arbitrary value). In this case, it has NO value, which is somewhat different.

yihanaa added a comment.EditedMar 23 2022, 1:43 PM

I'm sorry I misunderstood what you meant @aaron.ballman.

Can we follow the lead of LLVM IR?it use 'undef'
for example:

struct T6A {
    unsigned a : 1;
    unsigned  : 0;
    unsigned c : 1;
};

@__const.foo.a = private unnamed_addr constant %struct.T6A { i8 1, [3 x i8] undef, i8 1, [3 x i8] undef }, align 4

I misunderstood him too, he told me off line :)

I guess I would be 'ok' with undef, though that has a different meaning (it means, this has an arbitrary value). In this case, it has NO value, which is somewhat different.

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

I like this idea best of all!

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

I like this idea best of all!

Agreed!

yihanaa added a comment.EditedMar 24 2022, 9:14 AM

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

I like this idea best of all!

Agreed!

@erichkeane @aaron.ballman
Previously we used FieldDecl->getNameAsString to get the field name, but the comments for this function indicate that it is Deprecated,and suggestion move clients to getName().

The FieldDecl->getName() return the anonymous inner struct's name like:

struct T3A {
  union {
    int a;
    char b[4];
  };
};
struct T3A {
union T3A::(anonymous at ./builtin_dump_struct.c:77:5) {
    int a = 42
    char[4] b = 0x2a
    }
}

what do you all think?

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

I like this idea best of all!

Agreed!

@erichkeane @aaron.ballman
Previously we used FieldDecl->getNameAsString to get the field name, but the comments for this function indicate that it is Deprecated,and suggestion move clients to getName().

The FieldDecl->getName() return the anonymous inner struct's name like:

struct T3A {
  union {
    int a;
    char b[4];
  };
};
struct T3A {
union T3A::(anonymous at ./builtin_dump_struct.c:77:5) {
    int a = 42
    char[4] b = 0x2a
    }
}

what do you all think?

I would probably keep using getNameAsString() as I don't think the extra information helps overly much. That interface was marked deprecated 13 years ago, so don't feel bad for making use of it.

Agreed. DeclarationName::getAsString is in no way marked deprecated though, so you could call that?

FieldDecl->getDeclName()->getAsString() or the DeclarationName operator<< would both be equivalent with no threat of deprecation.

yihanaa updated this revision to Diff 417990.Mar 24 2022, 10:58 AM

Support dump bitwidth of bitfields, and unnamed bitfields.
for example:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What if we don't emit '=' for zero-width bitfield, like this:

struct Bar {
unsigned c : 1;
unsigned : 3;
unsigned : 0;
unsigned b;
};

struct Bar {
unsigned int c : 1 = 0
unsigned int   : 3 = 0
unsigned int   : 0
unsigned int b = 0
}

What do you all think?

I like this idea best of all!

Agreed!

@erichkeane @aaron.ballman
Previously we used FieldDecl->getNameAsString to get the field name, but the comments for this function indicate that it is Deprecated,and suggestion move clients to getName().

The FieldDecl->getName() return the anonymous inner struct's name like:

struct T3A {
  union {
    int a;
    char b[4];
  };
};
struct T3A {
union T3A::(anonymous at ./builtin_dump_struct.c:77:5) {
    int a = 42
    char[4] b = 0x2a
    }
}

what do you all think?

I would probably keep using getNameAsString() as I don't think the extra information helps overly much. That interface was marked deprecated 13 years ago, so don't feel bad for making use of it.

Okay,

Agreed. DeclarationName::getAsString is in no way marked deprecated though, so you could call that?

FieldDecl->getDeclName()->getAsString() or the DeclarationName operator<< would both be equivalent with no threat of deprecation.

Thanks @erichkeane @aaron.ballman , i have updated the diff.

erichkeane added inline comments.Mar 24 2022, 11:02 AM
clang/docs/ReleaseNotes.rst
81

Perhaps worth making a separate release note for the change in format?

clang/lib/CodeGen/CGBuiltin.cpp
2093

This seems like a pretty expensive way to do this... Can you do this as FD->getDeclName().empty()?

yihanaa marked an inline comment as done.Mar 24 2022, 11:18 AM

Maybe add the changes under "Attribute Changes in Clang"?

clang/docs/ReleaseNotes.rst
81

Okay, Under which ReleaseNotes tag should I add? i can't think of a good idea 😂

clang/lib/CodeGen/CGBuiltin.cpp
2093

Okay

erichkeane added inline comments.Mar 24 2022, 11:20 AM
clang/docs/ReleaseNotes.rst
81

I think attribute changes would be a good idea (in addition to this one staying where it is).

yihanaa updated this revision to Diff 417999.Mar 24 2022, 11:45 AM
yihanaa marked an inline comment as done.

Use `FD->getDeclName().empty() instead of FD->getNameAsString().empty()`
Add a the format changes to release notes

Waitting for CI...

erichkeane accepted this revision.Mar 24 2022, 11:48 AM

LGTM! Please give Aaron a few hours (perhaps until tomorrow?) to take 1 last look before committing.

Also, if you lack commit rights and need someone to commit for you, please provide the name + email address you'd like it committed under.

This revision is now accepted and ready to land.Mar 24 2022, 11:48 AM
yihanaa added a comment.EditedMar 24 2022, 11:57 AM

LGTM! Please give Aaron a few hours (perhaps until tomorrow?) to take 1 last look before committing.

Also, if you lack commit rights and need someone to commit for you, please provide the name + email address you'd like it committed under.

Thanks for your help and review @erichkeane, @aaron.ballman 😉 . I don’t have commit access, (if Aaron are no objections after a few hours) Please use “wangyihan 1135831309@qq.com” to commit the change.

aaron.ballman accepted this revision.Mar 24 2022, 11:59 AM

LGTM aside from a tiny nit with one of the release notes.

clang/docs/ReleaseNotes.rst
140–144

Er, this is a builtin and not an attribute, so I'd put it under the "Non-comprehensive list of changes in this release" heading instead.

LGTM aside from a tiny nit with one of the release notes.

I'd be happy to fix it

LGTM aside from a tiny nit with one of the release notes.

LGTM aside from a tiny nit with one of the release notes.

I'd be happy to fix it

LGTM aside from a tiny nit with one of the release notes.

Don't worry about it, I'll make the change as part of committing.

yihanaa updated this revision to Diff 418008.Mar 24 2022, 12:10 PM

Put the dump format changes under the "Non-comprehensive list of changes in this release" heading instead.

LGTM aside from a tiny nit with one of the release notes.

I'd be happy to fix it

LGTM aside from a tiny nit with one of the release notes.

Don't worry about it, I'll make the change as part of committing.

Thanks, i edited it, if there is anything wrong, please help me correct it😊

Thank you for the fix and the improvements here, @yihanaa!

This revision was landed with ongoing or failed builds.Mar 24 2022, 12:23 PM
This revision was automatically updated to reflect the committed changes.
tambre added a subscriber: tambre.Mar 24 2022, 1:23 PM
tambre added inline comments.
clang/docs/ReleaseNotes.rst
95–96

Though I'd split the note about supporting dumping of bitfield widths into a separate point.

144
rsmith added a subscriber: rsmith.Mar 31 2022, 3:48 PM
rsmith added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
2113–2117

After this patch, this case no longer prints the field name. Eg: https://godbolt.org/z/o7vcbWaEf

#include <stdio.h>

struct A {};

struct B {
    A a;
};


int main() {
    B x;
    __builtin_dump_struct(&x, &printf);
}

now prints:

B {
    A {
    }
}

This seems like an important regression; please can you take a look? This also suggests to me that there's a hole in our test coverage.

yihanaa marked 3 inline comments as done.Apr 1 2022, 8:26 AM
yihanaa added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
2113–2117

Thanks for taking the time to review my patch and writing the Compiler Explorer examples, I'll take a look at this problem

yihanaa added inline comments.Apr 1 2022, 9:00 AM
clang/lib/CodeGen/CGBuiltin.cpp
2113–2117

After this patch, this case no longer prints the field name. Eg: https://godbolt.org/z/o7vcbWaEf

#include <stdio.h>

struct A {};

struct B {
    A a;
};


int main() {
    B x;
    __builtin_dump_struct(&x, &printf);
}

now prints:

B {
    A {
    }
}

This seems like an important regression; please can you take a look? This also suggests to me that there's a hole in our test coverage.

I'm sorry for introducing this bug. 😔 Do you think the following is correct behavior? If yes I will try to fix it.

struct B {
    struct A a = {
    }
}
yihanaa marked an inline comment as done.Apr 3 2022, 7:57 AM
yihanaa added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
2113–2117

I have submitted a patch to fix this. https://reviews.llvm.org/D122920

yihanaa added inline comments.Apr 7 2022, 3:06 AM
clang/lib/CodeGen/CGBuiltin.cpp
2113–2117

I have submitted a patch to fix this. https://reviews.llvm.org/D122920

Please can you take a review?@rsmith