This is an archive of the discontinued LLVM Phabricator instance.

[Clang][CodeGen]Add constant array support for __builtin_dump_sturct
AbandonedPublic

Authored by yihanaa on Mar 31 2022, 7:47 AM.

Details

Summary

Add constant array support for __builtin_dump_sturct. For N-dimensional arrays, the new dumpArray function dumps the elements of the array by generating N layers of 'for' loops.
for example:
The struct:

struct Foo {
  int x;
};

struct S {
  int a[3];
  int b[2][2];
  struct Foo c[2];
};

The dump result:

struct S {
    int[3] a = [
        [0] = 1
        [1] = 2
        [2] = 3
    ]
    int[2][2] b = [
        [0] = [
            [0] = 1
            [1] = 2
        ]
        [1] = [
            [0] = 3
            [1] = 4
        ]
    ]
    struct Foo[2] c = [
        [0] = {
            int x = 1
        }
        [1] = {
            int x = 2
        }
    ]
}

Diff Detail

Event Timeline

yihanaa created this revision.Mar 31 2022, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 7:47 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
yihanaa requested review of this revision.Mar 31 2022, 7:47 AM
erichkeane added inline comments.Mar 31 2022, 8:29 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

Instead of this initialization, can we make this a constexpr constructor/collection of some sort and instantiate it inline?

2049

I wonder if instead of this whole business, this function should just be replaced with a 'switch' on the types.

2071

When can we hit this case? Does this mean that any types we dont understand we are just treating as void ptrs?

2119

This branch doesn't seem right? We are in a ConstantArrayType, which means we DEFINITELY know the length. Additionally, Length is constructed directly from a llvm::ConstantInt::get, so we already know this answer.

2138

We already know if this is a 'zero' length array, we should probably just not emit IR at all in that case.

2204

instead of inArray you probably should have something a little more descriptive here, it seems what you really need is something like, includeTypeName, preferably as an enum to make it more clear at the caller side.

2257

Do we want to do anything for the OTHER array types?

2827

Typically we do a comment to explain what bool parms mean. Alternatively, if instead you could create an enum of some sort that would better describe the 'false', it wouldlikely be more helpful. (same on 2252).

Thanks for your suggestion @erichkeane !

clang/lib/CodeGen/CGBuiltin.cpp
2048

Instead of this initialization, can we make this a constexpr constructor/collection of some sort and instantiate it inline?

I don't have a good idea because it relies on Context to get some types like const char *

2071

The previous version treated it as a void *ptr for unsupported types, such as arrays (before this patch) and __int128. This is the case i saw in an issue. I also think that for unsupported types, we should generate a clearer message saying "This type is temporarily not supported", do you have any good ideas?

2119

I agree with you

2138

Good idea!

2204

I think that's a good idea for clarity.

2257

I understand struct fields must have a constant size, and I see" 'variable length array in structure' extension will never be supported" from clang's diagnostics

2827

I agree!

erichkeane added inline comments.Mar 31 2022, 9:55 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

We could at minimum do a in inline-initializer instead of the branch below.

2071

A diagnostic about ''TYPE' not yet supported by __bultin...' probably isn't a bad idea if we have the ability.

2257

Ah, right, good point. I think this + the diagnostic discussed above would make that alright.

There is ALSO the 'matrix' type that we could emit, but that might be worth a separate commit.

yihanaa added inline comments.Mar 31 2022, 10:32 AM
clang/lib/CodeGen/CGBuiltin.cpp
2071

I think maybe we can emit an error(or warning? in this way, it may be necessary to handle those unsupported types) diag msg, but that might be worth a separate commit. What do you think?

erichkeane added inline comments.Mar 31 2022, 10:35 AM
clang/lib/CodeGen/CGBuiltin.cpp
2071

As long as we are good about 'notes' to tell the user how we GOT to there, I'd think the warning would be useful. I'd probably just do something like, "FunnyType Foo = <unsupported>" rather than do something about it.

I'd be OK with that being a separate commit.

There is another question, which way should I print for an empty array?

#1: Same as non empty array

int[0] a = [
]

#2:

int[0] a = []

The #1's format is more uniform, and the #2 maybe looks a little clearer. What do you think? @erichkeane

There is another question, which way should I print for an empty array?

#1: Same as non empty array

int[0] a = [
]

#2:

int[0] a = []

The #1's format is more uniform, and the #2 maybe looks a little clearer. What do you think? @erichkeane

I'm not motivated one way or another, I can see the logic to each. I would do whichever doesn't require 'special' handling in any way.

yihanaa added inline comments.Mar 31 2022, 11:09 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

I still haven't come up with a good idea to do it😔

erichkeane added inline comments.Mar 31 2022, 11:14 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

Hrmph, DenseMap doesn't seem to have an init list ctor?!

Since this is a finite, known at compile time list of low 'N', I'd suggest just making it static std::array<pair<QualType, StringRef>> Types = {{Context.CharTy, "%c"},....

Then just doing a llvm::find_if. The list is small enough I suspect the DenseMap overhead is more than the use of llvm::find_if.

yihanaa added inline comments.Mar 31 2022, 11:23 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

Haha,DenseMap also has no constexpr ctor. I think the time complexity of these two is the same.

I'm concerned about the direction of this patch given @rjmccall's comments on https://reviews.llvm.org/D112626 -- presumably the way we'd want to address those comments would be to convert a __builtin_dump_struct(a, f) call into a single f("...", a.x, a.y, a.z) call in Sema, and that approach doesn't seem compatible with generating code to loop over array elements.

I'm also concerned that this builtin is making a lot of design decisions on behalf of the programmer, meaning that either it does exactly what you want or it's not suitable for your use and there's not much you can do about it. A different design that let the programmer choose whether to recurse into structs and arrays and similarly let the programmer choose how to format the fields would seem preferable, but I'm not confident there's a good way to expose that in C (though in C++ there seem to be good designs to choose from). As an example of how that manifests here, the choice to print each element as a separate line for a struct member of type const char str[256]; is probably going to make the output very unreadable, but choosing to print every array of char using "%s" will be equally bad for arrays that don't contain text -- the "right" answer for a dumping facility probably involves checking whether the array contains only printable characters and trimming a trailing sequence of zeroes, but that seems like vastly more complexity than we'd want in code generated by a builtin.

yihanaa added inline comments.Mar 31 2022, 11:29 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

The Context.CharTy... etc. are not static, So we might need a static value instead of it, and compare it with QualType

erichkeane added inline comments.Mar 31 2022, 11:31 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

They don't have to be static? This should do 'magic static' initialization.

yihanaa added inline comments.Mar 31 2022, 11:51 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

I can't agree more. it might be a type identifier, such as getAsString()'s return value

erichkeane added inline comments.Mar 31 2022, 11:57 AM
clang/lib/CodeGen/CGBuiltin.cpp
2048

Actually, I don't think the collection can be static! When we re-use the same process for multiple TUs, we get a new ASTContext. SO this would result in us having types that don't necessarily 'match'. I think this collection/checks need to either be stored in ASTContext, or in an 'if-else' tree here.

yihanaa added a comment.EditedMar 31 2022, 11:58 AM

I'm concerned about the direction of this patch given @rjmccall's comments on https://reviews.llvm.org/D112626 -- presumably the way we'd want to address those comments would be to convert a __builtin_dump_struct(a, f) call into a single f("...", a.x, a.y, a.z) call in Sema, and that approach doesn't seem compatible with generating code to loop over array elements.

I'm also concerned that this builtin is making a lot of design decisions on behalf of the programmer, meaning that either it does exactly what you want or it's not suitable for your use and there's not much you can do about it. A different design that let the programmer choose whether to recurse into structs and arrays and similarly let the programmer choose how to format the fields would seem preferable, but I'm not confident there's a good way to expose that in C (though in C++ there seem to be good designs to choose from). As an example of how that manifests here, the choice to print each element as a separate line for a struct member of type const char str[256]; is probably going to make the output very unreadable, but choosing to print every array of char using "%s" will be equally bad for arrays that don't contain text -- the "right" answer for a dumping facility probably involves checking whether the array contains only printable characters and trimming a trailing sequence of zeroes, but that seems like vastly more complexity than we'd want in code generated by a builtin.

I'm also a newbie and only gradually integrated into the community with the help of @erichkeane , @aaron.ballman and others many times, this problem may be a bit difficult for me at the moment, but I'd be like to do a bigger overhaul of the whole builtin if that would make it more great.😉

I'm concerned about the direction of this patch given @rjmccall's comments on https://reviews.llvm.org/D112626 -- presumably the way we'd want to address those comments would be to convert a __builtin_dump_struct(a, f) call into a single f("...", a.x, a.y, a.z) call in Sema, and that approach doesn't seem compatible with generating code to loop over array elements.

Well, we could generate a whole helper function that takes the struct by reference. That would also drastically improve the code-size impact of this builtin, since it's likely that the builtin will be used multiple times with the same struct.

I'm also concerned that this builtin is making a lot of design decisions on behalf of the programmer, meaning that either it does exactly what you want or it's not suitable for your use and there's not much you can do about it.

Yeah, I agree with this point, too; it's a very opinionated builtin.

yihanaa added inline comments.Mar 31 2022, 12:02 PM
clang/lib/CodeGen/CGBuiltin.cpp
2048

I also don't think it should be maintained in this place. Shouldn't this be better maintained together with the printf parameter checking part, what do you think?

erichkeane added inline comments.Mar 31 2022, 12:08 PM
clang/lib/CodeGen/CGBuiltin.cpp
2048

I'm unfamiliar with what you mean.

HOWEVER, the comments by @rsmith and @rjmccall I think are blocking on this feature/functionality anyway. So we might still wish to refactor this builtin significantly based on what they come up with.

One thing we could do would be to build a descriptor for the type (as data) -- a type name string, a type "kind", plus (for a record type) a list of (field type descriptor, field name, offset) tuples -- and pass it to a user-specified function. We could provide a library implementation of a dumping function that takes that descriptor and a function with printf's signature and provides the "normal" formatting per the current __builtin_dump_struct behavior -- and indeed we could then implement __builtin_dump_struct(p, f) as a call to normal_dumping_function(__builtin_type_descriptor(T), p, f). I'm not sure how comfortable we'd be putting that library implementation in compiler-rt versus somewhere else, though it presumably wouldn't be very large.

One thing we could do would be to build a descriptor for the type (as data) -- a type name string, a type "kind", plus (for a record type) a list of (field type descriptor, field name, offset) tuples -- and pass it to a user-specified function. We could provide a library implementation of a dumping function that takes that descriptor and a function with printf's signature and provides the "normal" formatting per the current __builtin_dump_struct behavior -- and indeed we could then implement __builtin_dump_struct(p, f) as a call to normal_dumping_function(__builtin_type_descriptor(T), p, f). I'm not sure how comfortable we'd be putting that library implementation in compiler-rt versus somewhere else, though it presumably wouldn't be very large.

This seems to be quickly decaying into 'implement Reflection', doesn't it? I see this __builtin_dump_struct to be a useful debugging builtin. As much as I see 'Reflection' as incredibly useful, I would think this is intended to be something much simpler, right?

I'm also concerned that this builtin is making a lot of design decisions on behalf of the programmer, meaning that either it does exactly what you want or it's not suitable for your use and there's not much you can do about it.

FWIW, I'm not concerned by this. My understanding of this builtin is that it exists solely as a developer debugging aid (https://reviews.llvm.org/D44093) predominately for kernel folks when a debugger may not be available. As such, I view this the same as our other dump() methods -- we try to dump useful information, but ultimately, nobody should be relying on this output or building anything around it.

A different design that let the programmer choose whether to recurse into structs and arrays and similarly let the programmer choose how to format the fields would seem preferable, but I'm not confident there's a good way to expose that in C (though in C++ there seem to be good designs to choose from). As an example of how that manifests here, the choice to print each element as a separate line for a struct member of type const char str[256]; is probably going to make the output very unreadable, but choosing to print every array of char using "%s" will be equally bad for arrays that don't contain text -- the "right" answer for a dumping facility probably involves checking whether the array contains only printable characters and trimming a trailing sequence of zeroes, but that seems like vastly more complexity than we'd want in code generated by a builtin.

While this is interesting, I am not certain there's a lot of value to it as a debugging aid for times when a debugger is not available to you. It sounds like a fair amount of complexity, so I'd like to know whether there's a strong need for this much generality.

yihanaa added inline comments.Mar 31 2022, 8:34 PM
clang/lib/CodeGen/CGBuiltin.cpp
2048

Ah, I think the way I described it is not correct, my thought is, can this getDumpStructFormatSpecifier similar function be maintained in somewhere like FormatString.h😂

I also agree that this is a useful debugging builtin, and no program logic should be forced to depend on the output of this builtin. It's just used to print out some useful debugging information

yihanaa added inline comments.Apr 1 2022, 7:59 AM
clang/lib/CodeGen/CGBuiltin.cpp
2052

I think this types should have been promoted according to the integer promotions.

I also agree that this is a useful debugging builtin, and no program logic should be forced to depend on the output of this builtin. It's just used to print out some useful debugging information

FWIW, @rsmith is the clang-codeowner, so he's the one to convince (and @rjmccall is also a very senior reviewer). So I wouldn't be able to approve this patch without their consent/agreement.

I also agree that this is a useful debugging builtin, and no program logic should be forced to depend on the output of this builtin. It's just used to print out some useful debugging information

FWIW, @rsmith is the clang-codeowner, so he's the one to convince (and @rjmccall is also a very senior reviewer). So I wouldn't be able to approve this patch without their consent/agreement.

Thank you for your information, @erichkeane I really hope we can find a solution and reach a consensus😊

yihanaa planned changes to this revision.Apr 1 2022, 6:11 PM
yihanaa abandoned this revision.