This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
15/27
LanguageExtensions.rst
2/2
ReleaseNotes.rst
-
include/clang/Basic/
-
clang/
-
Basic/
-
Builtins.def
2/3
DiagnosticSemaKinds.td
-
lib/
-
AST/
2/2
Expr.cpp
-
Sema/
27/48
SemaChecking.cpp
-
SemaStmt.cpp
-
test/
-
CodeGenCXX/
1/3
builtin-reflect-struct.cpp
-
SemaCXX/
2
builtin-reflect-struct.cpp

Differential D124221

Reimplement `__builtin_dump_struct` in Sema.
ClosedPublic

Authored by rsmith on Apr 21 2022, 5:52 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
erichkeane
yihanaa

Commits

rGc4f95ef86a22: Reimplement `__builtin_dump_struct` in Sema.

Summary

Compared to the old implementation:

In C++, we only recurse into aggregate classes.
Unnamed bit-fields are not printed.
Constant evaluation is supported.
Proper conversion is done when passing arguments through ....
Additional arguments are supported and are injected prior to the format string; this directly supports use with fprintf, for example.
An arbitrary callable can be passed rather than only a function pointer. In particular, in C++, a function template or overload set is acceptable.
All text generated by Clang is printed via %s rather than directly; this avoids issues where Clang's pretty-printing output might itself contain a % character.
Fields of types that we don't know how to print are printed with a "*%p" format and passed by address to the print function.
No return value is produced.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rsmith created this revision.Apr 21 2022, 5:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:52 PM

rsmith requested review of this revision.Apr 21 2022, 5:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:52 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

rsmith edited the summary of this revision. (Show Details)Apr 21 2022, 5:52 PM

I'm not entirely sure whether I want to pursue this -- I'd prefer to have only a single mechanism that works well in both C and C++, rather than this (which is flexible but not really usable in C) and __builtin_dump_struct (which is inflexible but works well for C) -- but I've not found a good way to expose this functionality that provides good support for both use cases yet. Thoughts on that are welcome.

This came from wanting to write up how I would have like to have seen __builtin_dump_struct be implemented. Note in particular that all of the work is done in Sema with proper semantic checking, rather than in CodeGen, which avoids the problems that __builtin_dump_struct has with converting arguments to the right type, provides a lot more flexibility in the function argument, and supports constant evaluation for free.

This is somewhat WIP: in particular, access checking is not done consistently. I'm not sure whether we should ignore access control or properly enforce it; the derived-to-base conversions currently enforce it and the field accesses ignore it, which is certainly not the right answer.

Some improvements.

Harbormaster completed remote builds in B160765: Diff 424350.Apr 21 2022, 6:24 PM

Fix some code that was making incorrect assumptions that PseudoObjectExpr is only used for ObjC properties.

I generally just question the usefulness of this. Despite its shortcomings, the 'dump struct' has the capability of runtime introspection into a type, whereas this seems to require that the user 'know' a significant amount about the type that they are introspecting (number of bases, number of fields, field TYPES). If you knew the type of your struct, THIS well, I would expect you wouldn't NEED a builtin like this.

clang/docs/LanguageExtensions.rst
2434	I'm curious as to the need for the 'arg1' and 'arg2'? Also, what is the type of the 'fields' array that works in C?
2437	Hmm... what is the type of f.bitwidth? A zero-length bitwidth is a valid thing, so having it just be an 'int' doesn't seem workable.
2460	While I get the value of a generic reflection 'thing', this is once again getting awful close to 'just implement reflection capabilities', at which point this seems insufficient.
2462	Is the purpose of the overload set/functor so you can support multiple types (seeing how the 'base' below works)? How does this all work in C? (or do we just do fields in C, since they cannot have the rest? I wonder if we would be better if we treated this function as a 'callback' or a 'callback set' that returned a discriminated union of a pre-defined type that describes what is being returned. That way it could handle member functions, static variables, etc.
2464	Same question about init-list, how is this usable in C? It becomes a BIT more usable in C if instead we return a single object that represents a single AST node.
2471	I find myself wishing this was a more complete list.
2473	I wonder if bases should include their virtualness or access specifier.
clang/lib/Sema/SemaChecking.cpp
420	Losing the unnamed bitfield is unfortunate, and I the 'dump struct' handles. As is losing the anonymous struct/union. Also, how does this handle 'record type' members? Does the user have to know the inner type already? If they already know that much information about the type, they wouldn't really need this, right?
clang/test/CodeGenCXX/builtin-reflect-struct.cpp
25	OH! I see now. This is unfortunately quite limiting by doing it this way. At this point the user of this builtin is required to know the layout of the struct before calling the builtin, which seems unfortunate. Despite the downsides of being just a 'dump function', the other builtin had the benefit of working on an arbitrary type.

In D124221#3467467, @erichkeane wrote:

I generally just question the usefulness of this. Despite its shortcomings, the 'dump struct' has the capability of runtime introspection into a type, whereas this seems to require that the user 'know' a significant amount about the type that they are introspecting (number of bases, number of fields, field TYPES).

I think you've misunderstood; that is not required. Though with the design as-is, it might make sense to restrict this to being C++-only, given that there's not really a way to effectively use this from C.

clang/docs/LanguageExtensions.rst
2434	This is aimed to fix a major deficiency in `__builtin_dump_struct`, that there's no way to pass information into its callback (other than stashing it in TLS). As noted in the patch description, this builtin is not especially usable in C. I'm not sure whether that's fixable, given the very limited metaprogramming support in C.
2437	It's a `size_t`. There are no zero-size bitfield members. (Unnamed bitfield aren't members, so are excluded here.) In any case, you can tell the difference between a bitfield and a regular member by the length of the initialiser list; we don't pass a bit width for non-bit-field members.
2462	This doesn't really work well in C, as noted in the patch description. A set of callbacks might work a bit better across both languages, but I don't think it solves the larger problem that there's no good way to pass type information into a callback in C.
2471	What else do you want? My goal here was to be able to do what `__builtin_dump_struct` does but without its many arbitrary limitations. From C++, this is enough for that.
2473	I mean, maybe, but I don't have a use case for either. For access in general my leaning is to say that the built-in doesn't get special access rights -- either it can access everything or you'll get an access error if you use it -- in which case including the access for bases and fields doesn't seem likely to be super useful.
clang/lib/Sema/SemaChecking.cpp
420	If `__builtin_dump_struct` includes unnamed bitfields, that's a bug in that builtin. In general, the user function gets given the bases and members with the right types, so they can use an overload set or a template to dispatch based on that type. See the SemaCXX test for a basic example of that.
clang/test/CodeGenCXX/builtin-reflect-struct.cpp
25	Well, no, this is set up this way to test code generation. See the SemaCXX test for an example of using this to dump an arbitrary type.

I think you've misunderstood; that is not required. Though with the design as-is, it might make sense to restrict this to being C++-only, given that there's not really a way to effectively use this from C.

Ah, I see the example in SemaCXX. its sad we can't do this from C, I would consider the use case of __builtin_dump_struct to be very handy in C.

I guess I think I would want to start off with, "what do we see the 'usecase' of these builtins" being. __builtin_dump_struct to me seems to have a pretty clear and semantically enforced one: its useful to printf it for debugging purposes.

This seems to be more useful-enough that it leaves me feeling like users would very quickly fall into, "man, I wish this did more" because of it. Its just a shame we can't get SG7 to work quicker :)

clang/docs/LanguageExtensions.rst
2434	Presumably one could also put it in the functor for the callback, or even as a capture in the lambda. Though, I'm not positive I see the VALUE to passing this information, but was just curious as to the intent.
2437	There are no zero-size bitfield members. (Unnamed bitfield aren't members, so are excluded here.) Hrm... Semantically YES, lexically I'm not sure folks make that differentiation. One issue with skipping 'unnamed bitfields' is it makes something like: struct Foo { int : 31; // Take up some space so something else could have been put there. int field : 1; int field2 : 2; }; and comparing `sizeof` the type with the output to be... jarring at least. you can tell the difference between a bitfield and a regular member by the length of the initialiser list; I see now how you've done that with ctors in C++. It feels that the way of doing this in C is unfortunately quite clunky (in addition, not being able to use arbitrary types).
2462	Well, if the callback itself took the discriminated union for each thing, and we did 1 callback per base/field/etc, perhaps that would be useful? I just am having a hard time seeing this being all that useful in C, which is a shame.
2471	I think this is again falling down the "this is close enough to reflection, I want reflection" hole :) I find myself wanting to be able to introspect member functions, static variables/functions/etc.
2473	I guess it depends on the use case for these builtins. I saw the 'dump-struct' builtin's purpose was for quick & dirty printf-debugging. At which point the format/features were less important than the action. So from that perspective, I would see this having 'special access rights' as sort of a necessity. Though again, this is a significantly more general builtin, which comes with additional potential use cases.
clang/lib/Sema/SemaChecking.cpp
420	I thought it did? For the use case I see `__builtin_dump_struct` having, I would think printing the existence of unnamed bitfields to be quite useful.
clang/test/CodeGenCXX/builtin-reflect-struct.cpp
25	Yep, thanks for that. I see now how it can be used that way.
clang/test/SemaCXX/builtin-reflect-struct.cpp
53	Ah, this is perhaps the example I was looking for.
83	What would this look like if you needed to look into a union? Perhaps it would be handled similarly to "__is_class"? In both builtins, its a shame we don't have a way to properly handle the 'active member'.

rsmith added inline comments.Apr 22 2022, 10:07 AM

clang/docs/LanguageExtensions.rst
2434	Suppose you want to dump a struct to a log file, or you want the dump in a string. Or you want to carry along an indentation level. My first attempt at using `__builtin_dump_struct` essentially failed because it doesn't have anything like this; even if we don't want this patch, this is functionality we should add to that builtin.
2462	I think that would work. I'm not sure it would be better, though, given that it loses the ability to build a type whose shape depends on the number of bases and fields, because you're not given them all at once. (Eg, if you want to build a static array of field names.)
2473	The use cases are the things that `__builtin_dump_struct` gets close to but fails at, such as: I have an `EXPECT_EQ` macro that I'd like to log the difference in input and output values where possible, or I want to build a simple print-to-string utility for structs. Or I want a dump to stout but I want it formatted differently from what `__builtin_dump_struct` thinks is best. The fact that the dump built-in gets us into this area but can't solve these problems is frustrating.
clang/lib/Sema/SemaChecking.cpp
420	Why? They're not part of the value, they're just padding.

rsmith added inline comments.Apr 22 2022, 10:14 AM

clang/lib/Sema/SemaChecking.cpp
420	Looks like `__builtin_dump_struct` currently includes them and prints a value (whatever garbage happens to be in those padding bits). That's obviously a bug.

erichkeane added inline comments.Apr 22 2022, 10:17 AM

clang/docs/LanguageExtensions.rst
2434	Ah, I see! Sure, I can see how those are useful, though again, mostly to the C implementation (as those should all be part of the functor's state), and this doesn't work particularly well in C.
2462	Yep, thats definitely true. There are ways to DO that (either a 2-pass at this to get the sizes, allocate something, then go through again filling in your struct OR a constant 'append' thing with a `vector` like type), but I see what you mean. It seems that either case we are limiting a couple of useful use cases.
2473	The use cases are the things that __builtin_dump_struct gets close to but fails at, such as: I have an EXPECT_EQ macro that I'd like to log the difference in input and output values where possible THAT is an interesting use case, I see value to that too. The fact that the dump built-in gets us into this area but can't solve these problems is frustrating. I definitely see that. FWIW, I don't think I was the one to approve/review the initial version of `__builtin_dump_struct` (and perhaps would have been equally frustrated if I thought that through), but was looking into the patches after the fact to help out. I guess I find myself equally/additionally/etc frustrated at THIS patch because it gets us so close to everything SG7 promises they can invent.
clang/lib/Sema/SemaChecking.cpp
420	They are lexically part of the type and are part of what makes up the 'size' of the thing. I would expect something dumping the type to be as consistent lexically as possible.
420	O.o! Yeah, printing the value is nonsense.

Thank you for looking into this! I've also thought that the __builtin_dump_struct implementation hasn't been quite as robust as it could be. However, the primary use case for that builtin has been for kernel folks to debug when they have no access to an actual debugger -- it being a super limited interface is actually a feature rather than a bug, in some ways.

I think what you're designing here is straight-up reflection, which is a different use case. If we're going to get something that's basically only usable in C++, I'm not certain there's much value in adding builtins for reflection until WG21 decides what reflection looks like, and then we can design around that. (Personally, I think designing usable reflection interfaces for C would be considerably harder but would provide considerably more benefit to users given the lack of reflection capabilities. There's almost no chance WG14 and WG21 would come up with the same interfaces for reflection, so I think we've got some opportunity for exploration here.)

In D124221#3468706, @aaron.ballman wrote:

Thank you for looking into this! I've also thought that the __builtin_dump_struct implementation hasn't been quite as robust as it could be. However, the primary use case for that builtin has been for kernel folks to debug when they have no access to an actual debugger -- it being a super limited interface is actually a feature rather than a bug, in some ways.

I'm more concerned about the design of __builtin_dump_struct than the implementation -- we can always fix the implementation. Right now the design is hostile to any use other than the very simplest one where the function you give it is essentially printf. You can't even dump to a log file or to a string without having to fight the design of the builtin. And if you want anything other than the exact recursion and formatting choices that we've arbitrarily made, you're out of luck. And of course, if you're in C++, you'll want to be able to print out std::string and std::optional<int> and similar types too, which are not supported by the current design at all. Perhaps this is precisely what a specific set of kernel folks want for a specific use case, but if the only purpose is to address that one use case, then I don't really see how we can justify keeping this builtin in-tree. So I think we should either expand the scope beyond the specific kernel folks or we should remove it.

I think what you're designing here is straight-up reflection, which is a different use case. If we're going to get something that's basically only usable in C++, I'm not certain there's much value in adding builtins for reflection until WG21 decides what reflection looks like, and then we can design around that. (Personally, I think designing usable reflection interfaces for C would be considerably harder but would provide considerably more benefit to users given the lack of reflection capabilities. There's almost no chance WG14 and WG21 would come up with the same interfaces for reflection, so I think we've got some opportunity for exploration here.)

What I set out to build is something that lets you implement struct dumping without all the shortcomings I see in __builtin_dump_struct. I think it's inevitable that that ends up being substantially a struct reflection system, and indeed __builtin_dump_struct is a reflection system, too, just a really awkward one -- people have already started parsing its format strings to extract struct field names, for example. (The usage I've seen actually *is* struct dumping, but that usage can't use __builtin_dump_struct directly because of its limitations, so __builtin_dump_struct is just used to extract the field names, and the field values and types are extracted via structured bindings instead.)

I do agree that we shouldn't be providing a full reflection mechanism here, given both that one is coming anyway, and that whatever we design, people will inevitably ask for more, and we don't want to be maintaining our own reflection technology.

So, I think we should either roll back __builtin_dump_struct or fix forward. This patch attempted to do the latter, but maybe it's gone too far in the direction of reflection. I think we can address most of my concerns with __builtin_dump_struct without a new builtin, by incorporating things from this implementation as follows:

Desugar it in Sema rather than in CodeGen -- this is necessary to enable following things as well as for constant evaluation
Take any callable as the printing function and pass it the fields with their correct types, so that a C++ implementation can dispatch based on the type and print the values of types that we don't hard-code (eg, we'd generate calls like user_dump_function("%s = %p", "field_name", &p->field_name);, and a smart C++ formatting function can preserve the field type and do something suitable when formatting a corresponding %p).
Take additional arguments to be passed onto the provided function, so that C users can easily format to a different file or to a string or whatever else

The main concern that I think we can't address this way is that __builtin_dump_struct decides for itself how to format the struct. But if we keep that property, that might actually help to keep us in the bounds of a builtin that's good for dumping but nothing else?

yihanaa added inline comments.Apr 23 2022, 8:33 AM

clang/lib/Sema/SemaChecking.cpp
420	I think the user should be informed somehow that there is an unnamed bitfield here

In D124221#3469251, @rsmith wrote:

In D124221#3468706, @aaron.ballman wrote:

Thank you for looking into this! I've also thought that the __builtin_dump_struct implementation hasn't been quite as robust as it could be. However, the primary use case for that builtin has been for kernel folks to debug when they have no access to an actual debugger -- it being a super limited interface is actually a feature rather than a bug, in some ways.

I'm more concerned about the design of __builtin_dump_struct than the implementation -- we can always fix the implementation. Right now the design is hostile to any use other than the very simplest one where the function you give it is essentially printf. You can't even dump to a log file or to a string without having to fight the design of the builtin. And if you want anything other than the exact recursion and formatting choices that we've arbitrarily made, you're out of luck. And of course, if you're in C++, you'll want to be able to print out std::string and std::optional<int> and similar types too, which are not supported by the current design at all. Perhaps this is precisely what a specific set of kernel folks want for a specific use case, but if the only purpose is to address that one use case, then I don't really see how we can justify keeping this builtin in-tree. So I think we should either expand the scope beyond the specific kernel folks or we should remove it.

Yes, I agree that the current builtin is limited and not particularly well designed for general use. I would be fine removing support for it. I'd be happier augmenting it so it works for more use cases. However, I don't think what you've got here so far "expands the scope" so much as "moves the goalposts" -- if the replacement can't be used by the kernel folks, then we're losing the primary use case we have for that builtin. However, it sounds like you may have some ideas on that so we can make it more usable in C.

I think what you're designing here is straight-up reflection, which is a different use case. If we're going to get something that's basically only usable in C++, I'm not certain there's much value in adding builtins for reflection until WG21 decides what reflection looks like, and then we can design around that. (Personally, I think designing usable reflection interfaces for C would be considerably harder but would provide considerably more benefit to users given the lack of reflection capabilities. There's almost no chance WG14 and WG21 would come up with the same interfaces for reflection, so I think we've got some opportunity for exploration here.)

What I set out to build is something that lets you implement struct dumping without all the shortcomings I see in __builtin_dump_struct. I think it's inevitable that that ends up being substantially a struct reflection system, and indeed __builtin_dump_struct is a reflection system, too, just a really awkward one -- people have already started parsing its format strings to extract struct field names, for example. (The usage I've seen actually *is* struct dumping, but that usage can't use __builtin_dump_struct directly because of its limitations, so __builtin_dump_struct is just used to extract the field names, and the field values and types are extracted via structured bindings instead.)

Agreed that the current builtin is effectively (really bad) reflection as well, and I'm sad to hear people are trying to parse its output to get structure field names (we have a JSON AST dump for exactly that kind of situation, so I'm not super sympathetic to people using builtin dump struct in that way unless they also need the runtime value information of the fields).

I do agree that we shouldn't be providing a full reflection mechanism here, given both that one is coming anyway, and that whatever we design, people will inevitably ask for more, and we don't want to be maintaining our own reflection technology.

Agreed.

So, I think we should either roll back __builtin_dump_struct or fix forward. This patch attempted to do the latter, but maybe it's gone too far in the direction of reflection. I think we can address most of my concerns with __builtin_dump_struct without a new builtin, by incorporating things from this implementation as follows:

Desugar it in Sema rather than in CodeGen -- this is necessary to enable following things as well as for constant evaluation

Take any callable as the printing function and pass it the fields with their correct types, so that a C++ implementation can dispatch based on the type and print the values of types that we don't hard-code (eg, we'd generate calls like user_dump_function("%s = %p", "field_name", &p->field_name);, and a smart C++ formatting function can preserve the field type and do something suitable when formatting a corresponding %p).

+1, though I'm curious what we want to do about things like bit-fields (do we want to alert the caller that it's a bit-field and what the width is?), anonymous bit-fields, base classes vs fields of class type, etc.

Take additional arguments to be passed onto the provided function, so that C users can easily format to a different file or to a string or whatever else

+1, that seems pretty sensible for what is effectively a callback function.

The main concern that I think we can't address this way is that __builtin_dump_struct decides for itself how to format the struct. But if we keep that property, that might actually help to keep us in the bounds of a builtin that's good for dumping but nothing else?

I think that could be workable. Given that it was intended as a debugging interface, I don't think the format of the struct is critical so long as the default format isn't too awful (for some definition of "too awful") and we provide at least the same amount of information as before.

In D124221#3470550, @aaron.ballman wrote:

In D124221#3469251, @rsmith wrote:

In D124221#3468706, @aaron.ballman wrote:

Thank you for looking into this! I've also thought that the __builtin_dump_struct implementation hasn't been quite as robust as it could be. However, the primary use case for that builtin has been for kernel folks to debug when they have no access to an actual debugger -- it being a super limited interface is actually a feature rather than a bug, in some ways.

I'm more concerned about the design of __builtin_dump_struct than the implementation -- we can always fix the implementation. Right now the design is hostile to any use other than the very simplest one where the function you give it is essentially printf. You can't even dump to a log file or to a string without having to fight the design of the builtin. And if you want anything other than the exact recursion and formatting choices that we've arbitrarily made, you're out of luck. And of course, if you're in C++, you'll want to be able to print out std::string and std::optional<int> and similar types too, which are not supported by the current design at all. Perhaps this is precisely what a specific set of kernel folks want for a specific use case, but if the only purpose is to address that one use case, then I don't really see how we can justify keeping this builtin in-tree. So I think we should either expand the scope beyond the specific kernel folks or we should remove it.

Yes, I agree that the current builtin is limited and not particularly well designed for general use. I would be fine removing support for it. I'd be happier augmenting it so it works for more use cases. However, I don't think what you've got here so far "expands the scope" so much as "moves the goalposts" -- if the replacement can't be used by the kernel folks, then we're losing the primary use case we have for that builtin. However, it sounds like you may have some ideas on that so we can make it more usable in C.

In general I've not been a fan of __builtin_dump_struct (I wasn't paying attention when it got merged), but I was assured in the meantime that it was JUST for printf-debugging in places without a debugger. I found its inflexibility, changing format, and inconsistent functionality to be a 'feature' at that point, as it would discourage non-printf-debugging use cases. BUT I guess you're seeing Hyrum's law in action :D It would be a shame to LOSE the builtin, since it is apparently REALLY useful in the debugging case, but I definitely don't think I would have +1'ed it if I were to see/understand it as a new addition.

I think what you're designing here is straight-up reflection, which is a different use case. If we're going to get something that's basically only usable in C++, I'm not certain there's much value in adding builtins for reflection until WG21 decides what reflection looks like, and then we can design around that. (Personally, I think designing usable reflection interfaces for C would be considerably harder but would provide considerably more benefit to users given the lack of reflection capabilities. There's almost no chance WG14 and WG21 would come up with the same interfaces for reflection, so I think we've got some opportunity for exploration here.)

What I set out to build is something that lets you implement struct dumping without all the shortcomings I see in __builtin_dump_struct. I think it's inevitable that that ends up being substantially a struct reflection system, and indeed __builtin_dump_struct is a reflection system, too, just a really awkward one -- people have already started parsing its format strings to extract struct field names, for example. (The usage I've seen actually *is* struct dumping, but that usage can't use __builtin_dump_struct directly because of its limitations, so __builtin_dump_struct is just used to extract the field names, and the field values and types are extracted via structured bindings instead.)

Agreed that the current builtin is effectively (really bad) reflection as well, and I'm sad to hear people are trying to parse its output to get structure field names (we have a JSON AST dump for exactly that kind of situation, so I'm not super sympathetic to people using builtin dump struct in that way unless they also need the runtime value information of the fields).

I agree completely, over the weekend I also came to the conclusion that any attempt to do ANYTHING in this space is likely going to be used as a poor-man's reflection. At that point, I consider the awkwardness to be a bit of a 'feature' here, as I'd hope anyone trying to use it for its not-intended-purpose would at one point pause thinking whether they COULD, and think briefly whether they SHOULD.

I do agree that we shouldn't be providing a full reflection mechanism here, given both that one is coming anyway, and that whatever we design, people will inevitably ask for more, and we don't want to be maintaining our own reflection technology.

Agreed.

I VERY much don't want to be spending cycles maintaining our own reflection, part of why I am/was so concerned with __builtin_reflect_struct. Though, I'm coming to terms with the fact that __builtin_dump_struct _IS_ 'crappy reflection' already.

So, I think we should either roll back __builtin_dump_struct or fix forward. This patch attempted to do the latter, but maybe it's gone too far in the direction of reflection. I think we can address most of my concerns with __builtin_dump_struct without a new builtin, by incorporating things from this implementation as follows:

Desugar it in Sema rather than in CodeGen -- this is necessary to enable following things as well as for constant evaluation

+1

I agree here, I found the implemented-in-codegen bizarre and limiting, so even this would be an improvement without the others.

Take any callable as the printing function and pass it the fields with their correct types, so that a C++ implementation can dispatch based on the type and print the values of types that we don't hard-code (eg, we'd generate calls like user_dump_function("%s = %p", "field_name", &p->field_name);, and a smart C++ formatting function can preserve the field type and do something suitable when formatting a corresponding %p).

+1, though I'm curious what we want to do about things like bit-fields (do we want to alert the caller that it's a bit-field and what the width is?), anonymous bit-fields, base classes vs fields of class type, etc.

I suspect we'd want some sort of 'depth' parameter to that as well so we can handle the 'tabbing' correctly, and am also concerned with the things Aaron brought up (particularly the anonymous bit-fields/zero length bitfields), as they seem to be things that make current users persnickety.

Take additional arguments to be passed onto the provided function, so that C users can easily format to a different file or to a string or whatever else

+1, that seems pretty sensible for what is effectively a callback function.

I am somewhat concerned with these as they get confusing/obtuse to use sometimes (particularly with variadic functions as the callback, where they cause strange non-compile-time errors with minor, seemingly innocuous changes), but am reasonably sure we could come up with some level of design that would make it not-awful. If this was JUST C++, I'd say "we should not, and the user should use a functor/lambda for state", but, of course, this is C :/

The main concern that I think we can't address this way is that __builtin_dump_struct decides for itself how to format the struct. But if we keep that property, that might actually help to keep us in the bounds of a builtin that's good for dumping but nothing else?

I think that could be workable. Given that it was intended as a debugging interface, I don't think the format of the struct is critical so long as the default format isn't too awful (for some definition of "too awful") and we provide at least the same amount of information as before.

Yep, agreed. The guarantee that we give is "human readable", anything we do beyond that encourages folks trying to see if it is "computer readable", which is immediately "bad reflection". The BOFH-compiler-guy part of me (B-Compiler-Dev-FH ?) thinks we should start randomizing the output to discourage this behavior, but I suspect that would be an arms-race we would ultimately lose.

erichkeane mentioned this in D122920: [Clang][CodeGen]Fix __builtin_dump_struct missing record type field name.Apr 25 2022, 6:26 AM

Thank you for inviting me to a discussion on this topic. I have something very confusing as follows:

If the developer already knows all fields info in some struct, wyh doesn't he write a simple function to do the printing(like: print_struct_foo(const struct foo *); )? It might be easier to write a printing function.
If the struct has many levels of nesting, and the developers just want to output the details of the struct for debugging purposes, then I think they might just need something like the 'p' command in LLDB.
If we want to support these printing functions, and make this builtin flexible and general, I think reflection is a good idea. But we may want to make this builtin support C/C++ or other languages supported by Clang at the same time. I have a not-so-sophisticated (perhaps silly-sounding) idea: this builtin might be one or more C-style functions in order to be usable in many different languages. This family of functions supports getting the number of struct fields, as well as their details (and possibly getting the value of the field by getting the subscript). like:

int __builtin_reflect_struct_member_count(const void *struct_addr, unsigned *count);
int __builtin_reflect_struct_member_detail(const void *struct_addr, unsigned index, char *name, unsigned *bitfield, int *is_unnamed);
int __builtin_reflect_struct_member_ptr(const void *struct_addr, unsigned index, void **ptr);

Maybe it also can support nested struct?

emmmmm, my idea looks really stupid, don't laugh at me, hahahaha~~~😂😂😂😂
I'm really looking forward to hearing your idea💡

In D124221#3471645, @erichkeane wrote:

In D124221#3470550, @aaron.ballman wrote:

In D124221#3469251, @rsmith wrote:

Take any callable as the printing function and pass it the fields with their correct types, so that a C++ implementation can dispatch based on the type and print the values of types that we don't hard-code (eg, we'd generate calls like user_dump_function("%s = %p", "field_name", &p->field_name);, and a smart C++ formatting function can preserve the field type and do something suitable when formatting a corresponding %p).

+1, though I'm curious what we want to do about things like bit-fields (do we want to alert the caller that it's a bit-field and what the width is?), anonymous bit-fields, base classes vs fields of class type, etc.

I suspect we'd want some sort of 'depth' parameter to that as well so we can handle the 'tabbing' correctly,

I was thinking that we'd pass hard-coded " " strings to the user_dump_function for indentation (like we do now), rather than trying to make it customizable, so that, in particular, you can still pass in printf as the user_dump_function and have it "just work".

and am also concerned with the things Aaron brought up (particularly the anonymous bit-fields/zero length bitfields), as they seem to be things that make current users persnickety.

Yeah. User concerns about the output format not being what they wanted really reinforce for me the fundamental problems with this builtin. But we can still make reasonable and principled choices here (eg, show the same information that would be in a corresponding compound literal).

Take additional arguments to be passed onto the provided function, so that C users can easily format to a different file or to a string or whatever else

+1, that seems pretty sensible for what is effectively a callback function.

I am somewhat concerned with these as they get confusing/obtuse to use sometimes (particularly with variadic functions as the callback, where they cause strange non-compile-time errors with minor, seemingly innocuous changes), but am reasonably sure we could come up with some level of design that would make it not-awful. If this was JUST C++, I'd say "we should not, and the user should use a functor/lambda for state", but, of course, this is C :/

I definitely understand that concern. I think it's important to allow at least one parameter -- I'd be OK with saying that you can pass either the signature of printf or that signature with a leading void*, but I think it's nicer to say we'll take whatever extra arguments are given and blindly pass them onto the given function, so that we can directly support fprintf and dprintf. (You'd still need a wrapper in order to be able to call snprintf, though, in order to move the pointer forward and correspondingly decrease the remaining buffer length.) But as long as we provide *some* way to pass arbitrary information to the callback, I think that satisfies the use case.

The BOFH-compiler-guy part of me (B-Compiler-Dev-FH ?) thinks we should start randomizing the output to discourage this behavior, but I suspect that would be an arms-race we would ultimately lose.

:-) I think the best thing we can do to discourage people from using this dumping facility for reflection is to provide builtins to implement the C++ reflection functionality as soon as we can after we think it's ready -- and ideally to make sure that those builtins support at least basic use cases in C. (The latter might require that we wait until C's constant evaluation functionality improves, but at least some in WG14 are pushing hard for that.)

rsmith added inline comments.Apr 25 2022, 1:23 PM

clang/lib/Sema/SemaChecking.cpp
420	It seems to me that `__builtin_dump_struct` is displaying the value of a struct object, not the representation of the object, and unnamed bitfields are not a part of the value, so should not be included. And I think it makes sense for that builtin to be value-oriented: someone using it presumably already knows whatever they want to know about the representation, to the extent that it matters, or can look it up; what they don't know is the information that varies from one instance to another. If we want a builtin that's more oriented around showing the struct's memory representation, I think we'd want quite a different output format, including offsets for fields -- and even then I don't think it makes sense to include unnamed bit-fields because they're not different from any other kind of padding in the object. If we still want to print unnamed bit-fields for some reason, presumably we should also print alignment and packing attributes and pragmas, because they too can change the padding within a struct.

aaron.ballman added inline comments.Apr 27 2022, 12:22 PM

clang/lib/Sema/SemaChecking.cpp
420	I tend to agree that `__builtin_dump_struct` should be focused on the values in the structure and not the representation. However, keeping in mind that the original builtin was for kernel folks and the kernel tends to do some pretty interesting things (that aren't always supported by the standard), there's a part of me that wonders if they have a use case for unnamed bit-fields where they expect data to be squirreled away in those bits (through `memcpy()` or some other means) that they'd like to be able to read.

Reimplement __builtin_dump_struct in Sema.

rsmith retitled this revision from Add new builtin __builtin_reflect_struct. to Reimplement `__builtin_dump_struct` in Sema..May 2 2022, 12:47 AM

rsmith edited the summary of this revision. (Show Details)

In D124221#3469251, @rsmith wrote:

I think we can address most of my concerns with __builtin_dump_struct without a new builtin, by incorporating things from this implementation

Done; this patch now reimplements __builtin_dump_struct using the Sema desugaring approach rather than providing a new builtin. As in the prior patch, we still support passing an arbitrary callable and additional arguments, and in C++ we support constant-evaluation of the dump as well as generic printing of field values even for types that the builtin doesn't natively understand. But the interface remains simple enough that you can just call __builtin_dump_struct(&s, printf) or __builtin_dump_struct(&s, fprintf, stderr) and get a dump to stdout / stderr.

Fix example in documentation: the dump is terminated by a newline.

Minor doc correction

Update release notes

yihanaa added inline comments.May 2 2022, 1:19 AM

clang/lib/Sema/SemaChecking.cpp
427	What do you think of PrintingPolicy.Indentation here?
433	I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf::PrintfSpecifier can get format specifier, what do you all think about?

Harbormaster completed remote builds in B162204: Diff 426349.May 2 2022, 2:11 AM

yihanaa added inline comments.May 2 2022, 3:03 AM

clang/docs/ReleaseNotes.rst

179

I tried it, D124221 will still generate anonymous tag locations, this place needs to be corrected, or removed anonymous tag locations.

int main() {

struct U20A {
  _Bool a;
  unsigned : 32;
  struct {
    int x;
    int y;
  } c;
};

struct U20A a = {
    .a = 1,
    .c.x = 1998
};
  __builtin_dump_struct(&a, printf);

}

struct U20A {

_Bool a = 1
struct (unnamed struct at ./builtin_dump_struct.c:10:5) c = {
    int x = 1998
    int y = 0
}

}

Thanks for working on this, I like the direction it's going!

clang/include/clang/Basic/DiagnosticSemaKinds.td
8470–8473	Should we combine these into: def err_builtin_dump_struct_expects : Error< "expected %select{pointer to struct\|a callable expression}0 as %ordinal1 argument to %2, found %3">; to reduce the number of diagnostics we have to tablegen?
clang/lib/AST/Expr.cpp
2725	I don't think the changes here are incorrect, but I'm wondering if they should be split into a different patch as they seem to be largely unrelated to the builtin?
clang/lib/Sema/SemaChecking.cpp
390	Haha, neat trick. Perhaps someday we should allow `getPredefinedStringLiteralFromCache()` to specify a `SourceLocation`?
397	Can you add an assertion that the call has at least two args?
433	+1 to this suggestion -- my hope is that we could generalize it more then as I notice there are missing specifiers for things like intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier for __builtin_dump_struct to benefit when new format specifiers are added, such as ones for printing a _BitInt.
500	Can you spell out the type since it's not in the initialization?
507
572–574	Should we handle array types specifically, or should we continue to punt on those for now?
625–626	ObjC block?

This is a pretty nice seeming interface that I think does a fairly good job at maintaining backwards compat while improving the functionality. A few questions/comments.

clang/docs/LanguageExtensions.rst
2427	I don't know if anyone would be using this value, but I wonder if there is value to making this a 'sum' of the results of `f`? Less useful for the purposes of printf, but perhaps to a sprintf-type-function? Not sure if we have the use cases to support this however.
clang/lib/Sema/SemaChecking.cpp
558
595	What is happening here? I would expect the opposite to need to happen here (wrapper be set to match the call, not the other way around?).
603	This is unfortunate. Do we really not have a `checkAtLeastArgCount` type function here? I know we have something similar for attributes.
614	Is this sufficient? Couldn't this be a pointer to a union? Do you mean the suggestion?

This implementation looks good to me😁

yihanaa added inline comments.May 2 2022, 11:08 AM

clang/lib/Sema/SemaChecking.cpp
573	Can we generate a warning diagnostic message when we see an unsupported type (perhaps temporarily not supported), what do you all think about?

Do we need to do argument promotion here，ahh...maybe i understood wrong

Respond to review comments.

Use printing policy more, and turn off anonymous tag locations.

Add requested assert.

clang/docs/LanguageExtensions.rst
2427	I don't think you can use this directly with `sprintf` -- you'd need a wrapper to update the write position as part of each call to `f`, and that wrapper can handle accumulating the value returned from `sprintf`. In order to do that, you'd want to provide an additional input argument that can be modified as the dump progresses, and that can be used to return the overall (desired) size too. So I don't think even the `sprintf` case provides motivation for this, and it would add complexity (as well as making the builtin less usable in the case where the printing function doesn't return something addable). I could see this being at least a little useful for a formatting function that returns `std::string` or similar, but ... I think it's probably better to let the formatting function deal with that itself through argument passing instead of assuming that we can apply `+` to the results. WDYT?
clang/docs/ReleaseNotes.rst
179	Thanks, fixed.
clang/include/clang/Basic/DiagnosticSemaKinds.td
8470–8473	I'm indifferent. That kind of refactoring will probably decrease the chance of these being reused for any other purpose, which might end up costing us more diagnostics, and will need a magic number in the source code to select between alternatives. Happy to do it if you think it's worthwhile, but I don't have any measurements for how much each diagnostic ID costs.
clang/lib/AST/Expr.cpp
2725	Yeah, will do. I think these changes may actually turn out to not be necessary for `__builtin_dump_struct`, because it returns `void` (they were necessary for __builtin_reflect_struct`).
clang/lib/Sema/SemaChecking.cpp
390	The desire for caching and the desire for a source location are a bit at odds here, unfortunately. Also unfortunate: if you actually put a source location on a `StringLiteral`, other parts of Clang (especially relevant for this patch: format string checking!) will assume that there's a string literal token that they can re-lex from that location, and will get very unhappy if there isn't. Perhaps a no-op `LocatedExpr` that just wraps an expression and gives it a different location would work, but of course we'd need to teach various things to look through it.
427	It looks like nothing sets that outside of libclang, so I don't think that'd be especially useful, but it seems fine to me to use it here. We're not exactly pretty-printing source code, but that doesn't seem like a problem. Done.
433	I am somewhat uncertain: every one of these is making arbitrary choices about how to format the value, so it's not clear to me that this is general logic rather than something specific to `__builtin_dump_struct`. For example, using `%f` rather than one of the myriad other ways of formatting `double` is somewhat arbitrary. Using `%s` for any `const char` is extremely* arbitrary and will be wrong and will cause crashes in some cases, but it may be the pragmatically correct choice for a dumping utility. A general-purpose mechanism would use `%p` for all kinds of pointer. We could perhaps factor out the formatting for cases where there is a clear canonical default formatting, such as for integer types and probably `%p` for pointers, then call that from here with some special-casing, but without a second consumer for that functionality it's really not clear to me what form it should take.
572–574	I'd prefer to punt for now, if that's OK. I don't think this is hard to add, but it's probably clearer to do it in a separate patch. There are also some nontrivial questions there, eg: should we dump all elements or stop after some limit? Should we dump trailing zeroes in an integer array? How should we dump arrays of `char`? I'd prefer to not deal with those questions in this patch :)
573	We could, but I'm leaning towards thinking that we shouldn't warn by default at least. That would get in the way of the C++ use case where the type actually is supported by the printing function; I'm not sure that we can easily tell those cases apart. And it's probably not all that useful to someone who just wants whatever information we can give them while debugging. I'd be OK with adding a warning that's off by default, though. What would be the intended use case here? Would an off-by-default warning satisfy it?
595	This is the normal way we handle builtins with custom type checking: the `SemaChecking` logic analyzes the builtin and sets the result type of the call as appropriate, because the type we get from `Builtins.def` is meaningless. As it happens, this is a degenerate case (compared to `__builtin_reflect_struct` where this did something more interesting), because the type of the `PseudoObjectExpr` is always `void` here; I can just set that directly if you think that's clearer. It shouldn't matter much anyway, given that the call is only used as the syntactic form of the `PseudoObjectExpr`, so its type doesn't really affect anything. but following the type and value kind of the `PseudoObjectExpr` seems "right" to me.
603	We do now. :)
614	Allowing this is what the existing builtin did, and there were tests for it. I'm not sure what the best thing to do with unions is -- especially when a union appears as a struct member. Perhaps the least bad approach is to print out all the union members and hope that nothing goes wrong. That'll crash in some cases (eg, `union { const char *p; int n; };` when the `int` is the active member), but that seems to be the nature of this builtin. I've added a FIXME to `dumpRecordValue`.

Harbormaster completed remote builds in B162333: Diff 426525.May 2 2022, 4:21 PM

aaron.ballman added inline comments.May 3 2022, 6:00 AM

clang/include/clang/Basic/DiagnosticSemaKinds.td
8470–8473	Eh, let's skip it for now. I bring it up on occasion because we do run out of diagnostic IDs on occasion, but that's more a systemic concern.
clang/lib/Sema/SemaChecking.cpp
572–574	Absolutely fine to punt on it, because I agree there's a bunch of questions to answer there. I was also thinking about what to do with flexible array members or other trailing array objects.
573	So long as we don't trigger UB, I think it's fine to not diagnose for unsupported types. But the UB cases do strike me as something we might want to warn about if we think they can be hit (specifically, these kinds of UB: https://eel.is/c++draft/cstdarg.syn#1). As an example, do we properly handle dumping this case: struct S { int &i; S(int &ref) : i(ref) {} }; where the vararg argument is of reference type? I suppose another interesting question is: struct S { int i; }; void func() { S s; [=]() { __builtin_dump_struct(&s, ::printf); }; } (is S.i "an entity resulting from a lambda capture"?)

erichkeane added inline comments.May 3 2022, 6:06 AM

clang/docs/LanguageExtensions.rst
2427	I agree that we're probably better off just making the user write a wrapper function/functor to do so. I was thinking about C folks, but you're right, they could pass a state object into their extended args. I think we probably should just hold off on any attempts to do this until we find someone with a bonafide need.
clang/lib/Sema/SemaChecking.cpp
595	Ah, I see some other examples now, thanks for the explanation. I don't think setting to `void` is a good idea, in case we find a good reason to change the type; this ends up being just one more place we need to mess with it.
614	SGTM! We DO document a few times IIRC that we accept a 'struct' param(in fact, the comment on line 637 n ow says we do!), so I wanted to make sure we are consistent there.

Rebase.
Reuse Clang's existing "guess a format specifier" logic.
Format signed/unsigned/plain char as numbers.
Add test for reference members.

rsmith added inline comments.May 3 2022, 1:33 PM

clang/lib/Sema/SemaChecking.cpp
433	I went ahead and did this, mostly to match concurrent changes to the old implementation. There are a few cases where our existing "guess a format specifier" logic does the wrong thing for dumping purposes, which I've explicitly handled -- things like wanting to dump a `char` / `signed char` / `unsigned char` member as a number rather than as a (potentially non-printable or whitespace) character.
573	The only things we pass to the formatting function are: The small number of builtin types for which we have known formatting specifiers. Pointers to `void`, using `%p`. Pointers to `[const] char`, using `%s`. Pointers to user-defined types, using `%p`. All of these can be passed through a `...` without problems. We can potentially introduce the following sources of UB: Indirection through a struct pointer that doesn't point to a struct object. Read of an inactive union member. Read of an uninitialized member. Printing something other than a pointer to a nul-terminated byte sequence with `%s`. I'm not sure how much we should be worrying about those. It'd be nice to avoid them, though I don't think we can fix the fourth issue without giving up the `%s` formatting entirely. Reference types aren't on our list of types with known formatting specifiers, so in your first example we'd call `printf("%s%s = %p", "int &", "i", &p->i)`, producing output like `int & i = 0x1234abcd`. We could special-case that to also dereference the reference and include the value (`int & i = 0x1234abcd (5)`, but for now I'm treating it like any other type for which we don't have a special formatting rule. I've added test coverage for this case. In your second example, we rewrite into `::printf("%s%s = %s", "int", "i", (&s)->i)`, which is fine. The `parmN` restriction is about calls to `va_start`: void f(int a, ...) { [&] { va_list va; va_start(va, a); // error, `a` is captured here // ... }; }

Harbormaster completed remote builds in B162544: Diff 426819.May 3 2022, 3:03 PM

yihanaa added inline comments.May 4 2022, 4:32 AM

clang/lib/Sema/SemaChecking.cpp
433	When I was patching that old implementation, I found that for uint8_t, int8_t, Clang's existing "guess a format specifier" logic would treat the value as an integer, but for unsigned char, signed char, char types, it would Treat it as a character, please look at this example ( https://godbolt.org/z/ooqn4468T ), I guess this existing logic may have made some special judgment.

rsmith added inline comments.May 4 2022, 11:10 AM

clang/lib/Sema/SemaChecking.cpp
433	Yeah. I think in the case where we see some random call to `printf`, `%c` is probably the right thing to guess here, but it doesn't seem appropriate to me to use this in a dumping routine. If we could dump as `'x'` for printable characters and as `'\xAB'` for everything else, that'd be great, but `printf` can't do that itself and I'm not sure we want to be injecting calls to `isprint` or whatever to make that work. Dumping as an integer always seems like it's probably the least-bad option. Somewhat related: I wonder if we should use `"\"%s\""` instead of simply `"%s"` when dumping a `const char*`. That's not ideal but probably clearer than the current dump output.

erichkeane added inline comments.May 4 2022, 11:11 AM

clang/lib/Sema/SemaChecking.cpp
433	I see value to having strings with SOME level of delimiter, if at least to handle cases when the string itself has a newline in it.

yihanaa added inline comments.May 4 2022, 10:02 PM

clang/lib/Sema/SemaChecking.cpp
433	This looks good, but the string may need to be escaped (if there are some special characters in the original string)

yihanaa added inline comments.May 4 2022, 10:25 PM

clang/lib/Sema/SemaChecking.cpp
433	Maybe users should wrap the printf function, and then pass the wrapper function as a parameter, so that they can do some custom things

Aside from a minor nit, I think this LGTM. Thank you for working on this!

clang/lib/Sema/SemaChecking.cpp
433	Somewhat related: I wonder if we should use "\"%s\"" instead of simply "%s" when dumping a const char*. That's not ideal but probably clearer than the current dump output. I'd be in favor of that. (I sort of wonder whether we do want to inject calls to `isprint()` though -- it seems like escaping nonprintable characters would be the kindest option for both characters and strings.)
462	Note for future reference: I don't think `fixType()` does anything special to help us out with arrays.
493	My suggestion is to print the value of all fields and treat any string-like types as `%p` rather than `%s`. I'm fine with this being done later.
528
573	Thank you for the explanation, it sounds like we're doing about as good as we can get here in terms of safety. FWIW, I like the idea of outputting the value for a reference type, but am fine punting on that for now (it could get awkward to format when the referenced type is a structure, for example).

This revision is now accepted and ready to land.May 5 2022, 4:44 AM

yihanaa accepted this revision.May 5 2022, 5:25 AM

FWIW, I'm in favor of the patch as it sits.

As a followup: So I was thinking about the "%s" specifier for string types. Assuming char-ptr types are all strings is a LITTLE dangerous, but more so the way we're doing it. Its a shame we don't have some way of setting a 'max' limit to the number of characters we have for 2 reasons:

1- For safety: If the char-ptr points to non-null-terminated memory, it'll stop us from just arbitrarily printing into space by limiting at least the NUMBER of characters we print into nonsense.
2- For readability: printing a 'long' string likely makes this output look like nonsense and breaks everything up. Limiting us to only a few characters is likely a good idea.
3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of attempts at using this for reflection, or at least make it a little harder.

What I would love would be for something like a 10 char max:

struct S {
   char *C;
 };
 S s { "The Rest of this string is cut off"};
 print as:
 struct U20A a = {
   .c = 0x1234 "The Rest o"
 };

Sadly, I don't see something like that in printf specifiers? Unless someone smarter than me can come up with some trickery. PERHAPS have the max-limit compile-time configurable, but I don't feel strongly.

In D124221#3493792, @erichkeane wrote:
FWIW, I'm in favor of the patch as it sits.

As a followup: So I was thinking about the "%s" specifier for string types. Assuming char-ptr types are all strings is a LITTLE dangerous, but more so the way we're doing it. Its a shame we don't have some way of setting a 'max' limit to the number of characters we have for 2 reasons:

1- For safety: If the char-ptr points to non-null-terminated memory, it'll stop us from just arbitrarily printing into space by limiting at least the NUMBER of characters we print into nonsense.
2- For readability: printing a 'long' string likely makes this output look like nonsense and breaks everything up. Limiting us to only a few characters is likely a good idea.
3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of attempts at using this for reflection, or at least make it a little harder.

What I would love would be for something like a 10 char max:
struct S {
   char *C;
 };
 S s { "The Rest of this string is cut off"};
 print as:
 struct U20A a = {
   .c = 0x1234 "The Rest o"
 };
Sadly, I don't see something like that in printf specifiers? Unless someone smarter than me can come up with some trickery. PERHAPS have the max-limit compile-time configurable, but I don't feel strongly.

The C Standard has this in the specification of the %s format specifier:

If no l length modifier is present, the argument shall be a pointer to storage of character
type. Characters from the storage are written up to (but not including) the terminating
null character. If the precision is specified, no more than that many bytes are written. If
the precision is not specified or is greater than the size of the storage, the storage shall
contain a null character.

So you can use the precision modifier on %s to limit the length to a particular number of bytes. The only downside I can think of to picking a limit is, what happens when the user stores valid UTF-8 data in their string and prints it via %.10s (will we then potentially be splitting a codepoint in half and that does something bad?

In D124221#3494000, @aaron.ballman wrote:
In D124221#3493792, @erichkeane wrote:
FWIW, I'm in favor of the patch as it sits.

As a followup: So I was thinking about the "%s" specifier for string types. Assuming char-ptr types are all strings is a LITTLE dangerous, but more so the way we're doing it. Its a shame we don't have some way of setting a 'max' limit to the number of characters we have for 2 reasons:

1- For safety: If the char-ptr points to non-null-terminated memory, it'll stop us from just arbitrarily printing into space by limiting at least the NUMBER of characters we print into nonsense.
2- For readability: printing a 'long' string likely makes this output look like nonsense and breaks everything up. Limiting us to only a few characters is likely a good idea.
3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of attempts at using this for reflection, or at least make it a little harder.

What I would love would be for something like a 10 char max:
struct S {
   char *C;
 };
 S s { "The Rest of this string is cut off"};
 print as:
 struct U20A a = {
   .c = 0x1234 "The Rest o"
 };
Sadly, I don't see something like that in printf specifiers? Unless someone smarter than me can come up with some trickery. PERHAPS have the max-limit compile-time configurable, but I don't feel strongly.
The C Standard has this in the specification of the %s format specifier:
If no l length modifier is present, the argument shall be a pointer to storage of character
type. Characters from the storage are written up to (but not including) the terminating
null character. If the precision is specified, no more than that many bytes are written. If
the precision is not specified or is greater than the size of the storage, the storage shall
contain a null character.
So you can use the precision modifier on %s to limit the length to a particular number of bytes. The only downside I can think of to picking a limit is, what happens when the user stores valid UTF-8 data in their string and prints it via %.10s (will we then potentially be splitting a codepoint in half and that does something bad?

Ah! TIL! That I think would be an excellent improvement to this builtin. I suspect we could just choose a fixed value to start (maybe even in this patch @rsmith ?), then add a compile-time option in the future if folks care to extend it. I would posit that 10-15 would be a good start? Any shorter would be prohibitive I think? I could be talked into a little longer...

I found LLDB have this feature. LLDB uses "..." to indicate that only part of the string is now printed, and it also have a command to modify length limit. This approach is similar to the idea of @erichkeane, can we do the same？what do you all think?

typedef signed char int8_t;
typedef unsigned char uint8_t;

int printf(const char *, ...);

int main() {
  struct U20A {
    const char *str;
  };

  struct U20A a = {
      .str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
  };
    __builtin_dump_struct(&a, &printf);
}

(lldb) p a.str
(const char *) $1 = 0x000000010000329a "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"...

I was thinking of that, but I don't have a good way using just format strings to get us to print the '...' after. I think if someone wants that part of the feature, they'll have to wrap printf/etc in a wrapper that does that.

This is unfortunately true, unless we generate some code that calculates the length of the string at runtime. I find it difficult to draw the line between defining this built-in functionality and developers developing printf-like functionality, if built-in does too much, flexibility will decrease, if built-in does less, (features like length limits) then develop personnel may need to do some work. o(≧口≦)o

Put ""s around %s formats and add a length limit.

This revision was landed with ongoing or failed builds.May 5 2022, 2:56 PM

Closed by commit rGc4f95ef86a22: Reimplement `__builtin_dump_struct` in Sema. (authored by rsmith). · Explain Why

This revision was automatically updated to reflect the committed changes.

rsmith added a commit: rGc4f95ef86a22: Reimplement `__builtin_dump_struct` in Sema..

Harbormaster completed remote builds in B163020: Diff 427463.May 5 2022, 4:08 PM

erichkeane added inline comments.May 6 2022, 6:00 AM

clang/docs/LanguageExtensions.rst
2421	For post-commit: perhaps the string quoting and limit should be documented?

https://github.com/llvm/llvm-project/issues/63169

Revision Contents

Path

Size

clang/

docs/

LanguageExtensions.rst

75 lines

ReleaseNotes.rst

4 lines

include/

clang/

Basic/

Builtins.def

3 lines

DiagnosticSemaKinds.td

2 lines

lib/

AST/

Expr.cpp

21 lines

Sema/

SemaChecking.cpp

115 lines

SemaStmt.cpp

2 lines

test/

CodeGenCXX/

builtin-reflect-struct.cpp

36 lines

SemaCXX/

builtin-reflect-struct.cpp

104 lines

Diff 424347

clang/docs/LanguageExtensions.rst

	Show First 20 Lines • Show All 2,401 Lines • ▼ Show 20 Lines
	Description:			Description:

	The '``__builtin_dump_struct``' function is used to print the fields of a simple			The '``__builtin_dump_struct``' function is used to print the fields of a simple
	structure and their values for debugging purposes. The builtin accepts a pointer			structure and their values for debugging purposes. The builtin accepts a pointer
	to a structure to dump the fields of, and a pointer to a formatted output			to a structure to dump the fields of, and a pointer to a formatted output
	function whose signature must be: ``int ()(const char , ...)`` and must			function whose signature must be: ``int ()(const char , ...)`` and must
	support the format specifiers used by ``printf()``.			support the format specifiers used by ``printf()``.

				``__builtin_reflect_struct``
				----------------------------

				Syntax:

				.. code-block:: c++

				__builtin_reflect_struct(&some_struct, some_func, args...);

				Examples:

				.. code-block:: c++
				erichkeaneUnsubmitted Not Done Reply Inline Actions For post-commit: perhaps the string quoting and limit should be documented? erichkeane: For post-commit: perhaps the string quoting and limit should be documented?

				struct S {
				int x, y;
				int z : 4;
				};

				erichkeaneUnsubmitted Not Done Reply Inline Actions I don't know if anyone would be using this value, but I wonder if there is value to making this a 'sum' of the results of `f`? Less useful for the purposes of printf, but perhaps to a sprintf-type-function? Not sure if we have the use cases to support this however. erichkeane: I don't know if anyone would be using this value, but I wonder if there is value to making this…
				rsmithAuthorUnsubmitted Not Done Reply Inline Actions I don't think you can use this directly with `sprintf` -- you'd need a wrapper to update the write position as part of each call to `f`, and that wrapper can handle accumulating the value returned from `sprintf`. In order to do that, you'd want to provide an additional input argument that can be modified as the dump progresses, and that can be used to return the overall (desired) size too. So I don't think even the `sprintf` case provides motivation for this, and it would add complexity (as well as making the builtin less usable in the case where the printing function doesn't return something addable). I could see this being at least a little useful for a formatting function that returns `std::string` or similar, but ... I think it's probably better to let the formatting function deal with that itself through argument passing instead of assuming that we can apply `+` to the results. WDYT? rsmith: I don't think you can use this directly with `sprintf` -- you'd need a wrapper to update the…
				erichkeaneUnsubmitted Done Reply Inline Actions I agree that we're probably better off just making the user write a wrapper function/functor to do so. I was thinking about C folks, but you're right, they could pass a state object into their extended args. I think we probably should just hold off on any attempts to do this until we find someone with a bonafide need. erichkeane: I agree that we're probably better off just making the user write a wrapper function/functor to…
				struct Field {
				const char *name;
				int value;
				int bitwidth;
				};

				bool print(int arg1, int arg2, std::initializer_list<Field> fields) {
				erichkeaneUnsubmitted Not Done Reply Inline Actions I'm curious as to the need for the 'arg1' and 'arg2'? Also, what is the type of the 'fields' array that works in C? erichkeane: I'm curious as to the need for the 'arg1' and 'arg2'? Also, what is the type of the 'fields'…
				rsmithAuthorUnsubmitted Done Reply Inline Actions This is aimed to fix a major deficiency in `__builtin_dump_struct`, that there's no way to pass information into its callback (other than stashing it in TLS). As noted in the patch description, this builtin is not especially usable in C. I'm not sure whether that's fixable, given the very limited metaprogramming support in C. rsmith: This is aimed to fix a major deficiency in `__builtin_dump_struct`, that there's no way to pass…
				erichkeaneUnsubmitted Not Done Reply Inline Actions Presumably one could also put it in the functor for the callback, or even as a capture in the lambda. Though, I'm not positive I see the VALUE to passing this information, but was just curious as to the intent. erichkeane: Presumably one could also put it in the functor for the callback, or even as a capture in the…
				rsmithAuthorUnsubmitted Done Reply Inline Actions Suppose you want to dump a struct to a log file, or you want the dump in a string. Or you want to carry along an indentation level. My first attempt at using `__builtin_dump_struct` essentially failed because it doesn't have anything like this; even if we don't want this patch, this is functionality we should add to that builtin. rsmith: Suppose you want to dump a struct to a log file, or you want the dump in a string. Or you want…
				erichkeaneUnsubmitted Done Reply Inline Actions Ah, I see! Sure, I can see how those are useful, though again, mostly to the C implementation (as those should all be part of the functor's state), and this doesn't work particularly well in C. erichkeane: Ah, I see! Sure, I can see how those are useful, though again, mostly to the C implementation…
				for (auto f : fields) {
				std::cout << f.name;
				if (f.bitwidth)
				erichkeaneUnsubmitted Not Done Reply Inline Actions Hmm... what is the type of f.bitwidth? A zero-length bitwidth is a valid thing, so having it just be an 'int' doesn't seem workable. erichkeane: Hmm... what is the type of f.bitwidth? A zero-length bitwidth is a valid thing, so having it…
				rsmithAuthorUnsubmitted Done Reply Inline Actions It's a `size_t`. There are no zero-size bitfield members. (Unnamed bitfield aren't members, so are excluded here.) In any case, you can tell the difference between a bitfield and a regular member by the length of the initialiser list; we don't pass a bit width for non-bit-field members. rsmith: It's a `size_t`. There are no zero-size bitfield members. (Unnamed bitfield aren't members, so…
				erichkeaneUnsubmitted Done Reply Inline Actions There are no zero-size bitfield members. (Unnamed bitfield aren't members, so are excluded here.) Hrm... Semantically YES, lexically I'm not sure folks make that differentiation. One issue with skipping 'unnamed bitfields' is it makes something like: struct Foo { int : 31; // Take up some space so something else could have been put there. int field : 1; int field2 : 2; }; and comparing `sizeof` the type with the output to be... jarring at least. you can tell the difference between a bitfield and a regular member by the length of the initialiser list; I see now how you've done that with ctors in C++. It feels that the way of doing this in C is unfortunately quite clunky (in addition, not being able to use arbitrary types). erichkeane: >>There are no zero-size bitfield members. (Unnamed bitfield aren't members, so are excluded…
				std::cout << " : " << f.bitwidth;
				std::cout << " = " << f.value << std::endl;
				}
				return true;
				}

				bool func(int arg1, int arg2) {
				S s = {1, 2, 3};
				return __builtin_reflect_struct(&s, print, arg1, arg2);
				}

				Example output:

				.. code-block:: none

				x = 1
				y = 2
				z : 4 = 3

				Description:

				The ``__builtin_reflect_struct`` function provides simple reflection for a
				class, struct, or union. The first argument of the builtin should be a pointer
				erichkeaneUnsubmitted Not Done Reply Inline Actions While I get the value of a generic reflection 'thing', this is once again getting awful close to 'just implement reflection capabilities', at which point this seems insufficient. erichkeane: While I get the value of a generic reflection 'thing', this is once again getting awful close…
				to the type to reflect. The second argument ``f`` should be some callable
				expression, and can be a function object or an overload set. The builtin calls
				erichkeaneUnsubmitted Not Done Reply Inline Actions Is the purpose of the overload set/functor so you can support multiple types (seeing how the 'base' below works)? How does this all work in C? (or do we just do fields in C, since they cannot have the rest? I wonder if we would be better if we treated this function as a 'callback' or a 'callback set' that returned a discriminated union of a pre-defined type that describes what is being returned. That way it could handle member functions, static variables, etc. erichkeane: Is the purpose of the overload set/functor so you can support multiple types (seeing how the…
				rsmithAuthorUnsubmitted Done Reply Inline Actions This doesn't really work well in C, as noted in the patch description. A set of callbacks might work a bit better across both languages, but I don't think it solves the larger problem that there's no good way to pass type information into a callback in C. rsmith: This doesn't really work well in C, as noted in the patch description. A set of callbacks might…
				erichkeaneUnsubmitted Not Done Reply Inline Actions Well, if the callback itself took the discriminated union for each thing, and we did 1 callback per base/field/etc, perhaps that would be useful? I just am having a hard time seeing this being all that useful in C, which is a shame. erichkeane: Well, if the callback itself took the discriminated union for each thing, and we did 1 callback…
				rsmithAuthorUnsubmitted Done Reply Inline Actions I think that would work. I'm not sure it would be better, though, given that it loses the ability to build a type whose shape depends on the number of bases and fields, because you're not given them all at once. (Eg, if you want to build a static array of field names.) rsmith: I think that would work. I'm not sure it would be better, though, given that it loses the…
				erichkeaneUnsubmitted Done Reply Inline Actions Yep, thats definitely true. There are ways to DO that (either a 2-pass at this to get the sizes, allocate something, then go through again filling in your struct OR a constant 'append' thing with a `vector` like type), but I see what you mean. It seems that either case we are limiting a couple of useful use cases. erichkeane: Yep, thats definitely true. There are ways to DO that (either a 2-pass at this to get the sizes…
				``f``, passing any further arguments ``args...`` followed by an initializer
				list describing the struct value.
				erichkeaneUnsubmitted Done Reply Inline Actions Same question about init-list, how is this usable in C? It becomes a BIT more usable in C if instead we return a single object that represents a single AST node. erichkeane: Same question about init-list, how is this usable in C? It becomes a BIT more usable in C if…

				.. code-block:: none

				// '__builtin_reflect_struct(&s, print, arg1, arg2)' calls:
				print(arg1, arg2, {{"x", 1}, {"y", 2}, {"z", 3, 4}})

				The initializer list contains the following components:
				erichkeaneUnsubmitted Not Done Reply Inline Actions I find myself wishing this was a more complete list. erichkeane: I find myself wishing this was a more complete list.
				rsmithAuthorUnsubmitted Done Reply Inline Actions What else do you want? My goal here was to be able to do what `__builtin_dump_struct` does but without its many arbitrary limitations. From C++, this is enough for that. rsmith: What else do you want? My goal here was to be able to do what `__builtin_dump_struct` does but…
				erichkeaneUnsubmitted Done Reply Inline Actions I think this is again falling down the "this is close enough to reflection, I want reflection" hole :) I find myself wanting to be able to introspect member functions, static variables/functions/etc. erichkeane: I think this is again falling down the "this is close enough to reflection, I want reflection"…

				* For each direct base class, the address of the base class, in declaration
				erichkeaneUnsubmitted Not Done Reply Inline Actions I wonder if bases should include their virtualness or access specifier. erichkeane: I wonder if bases should include their virtualness or access specifier.
				rsmithAuthorUnsubmitted Done Reply Inline Actions I mean, maybe, but I don't have a use case for either. For access in general my leaning is to say that the built-in doesn't get special access rights -- either it can access everything or you'll get an access error if you use it -- in which case including the access for bases and fields doesn't seem likely to be super useful. rsmith: I mean, maybe, but I don't have a use case for either. For access in general my leaning is to…
				erichkeaneUnsubmitted Not Done Reply Inline Actions I guess it depends on the use case for these builtins. I saw the 'dump-struct' builtin's purpose was for quick & dirty printf-debugging. At which point the format/features were less important than the action. So from that perspective, I would see this having 'special access rights' as sort of a necessity. Though again, this is a significantly more general builtin, which comes with additional potential use cases. erichkeane: I guess it depends on the use case for these builtins. I saw the 'dump-struct' builtin's…
				rsmithAuthorUnsubmitted Done Reply Inline Actions The use cases are the things that `__builtin_dump_struct` gets close to but fails at, such as: I have an `EXPECT_EQ` macro that I'd like to log the difference in input and output values where possible, or I want to build a simple print-to-string utility for structs. Or I want a dump to stout but I want it formatted differently from what `__builtin_dump_struct` thinks is best. The fact that the dump built-in gets us into this area but can't solve these problems is frustrating. rsmith: The use cases are the things that `__builtin_dump_struct` gets close to but fails at, such as…
				erichkeaneUnsubmitted Done Reply Inline Actions The use cases are the things that __builtin_dump_struct gets close to but fails at, such as: I have an EXPECT_EQ macro that I'd like to log the difference in input and output values where possible THAT is an interesting use case, I see value to that too. The fact that the dump built-in gets us into this area but can't solve these problems is frustrating. I definitely see that. FWIW, I don't think I was the one to approve/review the initial version of `__builtin_dump_struct` (and perhaps would have been equally frustrated if I thought that through), but was looking into the patches after the fact to help out. I guess I find myself equally/additionally/etc frustrated at THIS patch because it gets us so close to everything SG7 promises they can invent. erichkeane: >>The use cases are the things that __builtin_dump_struct gets close to but fails at, such as…
				order.
				* For each field, ``{"field_name", p->field_name}`` for a regular field, and
				``{"field_name", p->field_name, bit_width}`` for a bit-field, in declaration
				order.

				The result of the builtin is the result of the call to function ``f``.

				This builtin can be used in constant expressions.

				Query for this feature with ``__has_builtin(__builtin_reflect_struct)``

	.. _langext-__builtin_shufflevector:			.. _langext-__builtin_shufflevector:

	``__builtin_shufflevector``			``__builtin_shufflevector``
	---------------------------			---------------------------

	``__builtin_shufflevector`` is used to express generic vector			``__builtin_shufflevector`` is used to express generic vector
	permutation/shuffle/swizzle operations. This builtin is also very important			permutation/shuffle/swizzle operations. This builtin is also very important
	for the implementation of various target-specific header files like			for the implementation of various target-specific header files like
	▲ Show 20 Lines • Show All 2,188 Lines • Show Last 20 Lines

clang/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	- Clang now supports randomizing structure layout in C. This feature is a
A seed value is required to enable randomization, and is deterministic based		A seed value is required to enable randomization, and is deterministic based
on a seed value. Use the ``-frandomize-layout-seed=`` or		on a seed value. Use the ``-frandomize-layout-seed=`` or
``-frandomize-layout-seed-file=`` flags.		``-frandomize-layout-seed-file=`` flags.

.. note::		.. note::

Randomizing structure layout is a C-only feature.		Randomizing structure layout is a C-only feature.

		- Clang now supports a ``__builtin_reflect_struct`` builtin. This builtin
		provides a simple and flexible way to reflect on the fields and base classes
		of an object of struct or class type.

Bug Fixes		Bug Fixes
------------------		------------------
- ``CXXNewExpr::getArraySize()`` previously returned a ``llvm::Optional``		- ``CXXNewExpr::getArraySize()`` previously returned a ``llvm::Optional``
wrapping a ``nullptr`` when the ``CXXNewExpr`` did not have an array		wrapping a ``nullptr`` when the ``CXXNewExpr`` did not have an array
size expression. This was fixed and ``::getArraySize()`` will now always		size expression. This was fixed and ``::getArraySize()`` will now always
either return ``None`` or a ``llvm::Optional`` wrapping a valid ``Expr*``.		either return ``None`` or a ``llvm::Optional`` wrapping a valid ``Expr*``.
This fixes `Issue 53742 <https://github.com/llvm/llvm-project/issues/53742>`_.		This fixes `Issue 53742 <https://github.com/llvm/llvm-project/issues/53742>`_.
- We now ignore full expressions when traversing cast subexpressions. This		- We now ignore full expressions when traversing cast subexpressions. This
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	- ``-Wmisexpect`` warns when the branch weights collected during profiling
conflict with those added by ``llvm.expect``.		conflict with those added by ``llvm.expect``.

Non-comprehensive list of changes in this release		Non-comprehensive list of changes in this release
-------------------------------------------------		-------------------------------------------------
- Improve __builtin_dump_struct:		- Improve __builtin_dump_struct:
- Support bitfields in struct and union.		- Support bitfields in struct and union.
- Improve the dump format, dump both bitwidth(if its a bitfield) and field value.		- Improve the dump format, dump both bitwidth(if its a bitfield) and field value.
- Remove anonymous tag locations.		- Remove anonymous tag locations.
- Beautify dump format, add indent for nested struct and struct members.		- Beautify dump format, add indent for nested struct and struct members.
		yihanaaUnsubmitted Done Reply Inline Actions I tried it, D124221 will still generate anonymous tag locations, this place needs to be corrected, or removed anonymous tag locations. int main() { struct U20A { _Bool a; unsigned : 32; struct { int x; int y; } c; }; struct U20A a = { .a = 1, .c.x = 1998 }; __builtin_dump_struct(&a, printf); } struct U20A { _Bool a = 1 struct (unnamed struct at ./builtin_dump_struct.c:10:5) c = { int x = 1998 int y = 0 } } yihanaa: I tried it, D124221 will still generate anonymous tag locations, this place needs to be…
		rsmithAuthorUnsubmitted Done Reply Inline Actions Thanks, fixed. rsmith: Thanks, fixed.

New Compiler Flags		New Compiler Flags
------------------		------------------
- Added the ``-fno-knr-functions`` flag to allow users to opt into the C2x		- Added the ``-fno-knr-functions`` flag to allow users to opt into the C2x
behavior where a function with an empty parameter list is treated as though		behavior where a function with an empty parameter list is treated as though
the parameter list were ``void``. There is no ``-fknr-functions`` or		the parameter list were ``void``. There is no ``-fknr-functions`` or
``-fno-no-knr-functions`` flag; this feature cannot be disabled in language		``-fno-no-knr-functions`` flag; this feature cannot be disabled in language
modes where it is required, such as C++ or C2x.		modes where it is required, such as C++ or C2x.
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

clang/include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 1,596 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_smulll_overflow, "bSLLiCSLLiCSLLi*", "n")			BUILTIN(__builtin_smulll_overflow, "bSLLiCSLLiCSLLi*", "n")

	// Clang builtins (not available in GCC).			// Clang builtins (not available in GCC).
	BUILTIN(__builtin_addressof, "v*v&", "nct")			BUILTIN(__builtin_addressof, "v*v&", "nct")
	BUILTIN(__builtin_function_start, "v*v&", "nct")			BUILTIN(__builtin_function_start, "v*v&", "nct")
	BUILTIN(__builtin_operator_new, "v*z", "tc")			BUILTIN(__builtin_operator_new, "v*z", "tc")
	BUILTIN(__builtin_operator_delete, "vv*", "tn")			BUILTIN(__builtin_operator_delete, "vv*", "tn")
	BUILTIN(__builtin_char_memchr, "ccCiz", "n")			BUILTIN(__builtin_char_memchr, "ccCiz", "n")
	BUILTIN(__builtin_dump_struct, "ivCv", "tn")			BUILTIN(__builtin_dump_struct, "ivCv", "t")
	BUILTIN(__builtin_preserve_access_index, "v.", "t")			BUILTIN(__builtin_preserve_access_index, "v.", "t")
				BUILTIN(__builtin_reflect_struct, "v.", "t")

	// Alignment builtins (uses custom parsing to support pointers and integers)			// Alignment builtins (uses custom parsing to support pointers and integers)
	BUILTIN(__builtin_is_aligned, "bvC*z", "nct")			BUILTIN(__builtin_is_aligned, "bvC*z", "nct")
	BUILTIN(__builtin_align_up, "vvCz", "nct")			BUILTIN(__builtin_align_up, "vvCz", "nct")
	BUILTIN(__builtin_align_down, "vvCz", "nct")			BUILTIN(__builtin_align_down, "vvCz", "nct")

	// Safestack builtins			// Safestack builtins
	BUILTIN(__builtin___get_unsafe_stack_start, "v*", "Fn")			BUILTIN(__builtin___get_unsafe_stack_start, "v*", "Fn")
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

clang/include/clang/Basic/DiagnosticSemaKinds.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,461 Lines • ▼ Show 20 Lines
	def err_overflow_builtin_must_be_int : Error<			def err_overflow_builtin_must_be_int : Error<
	"operand argument to overflow builtin must be an integer (%0 invalid)">;			"operand argument to overflow builtin must be an integer (%0 invalid)">;
	def err_overflow_builtin_must_be_ptr_int : Error<			def err_overflow_builtin_must_be_ptr_int : Error<
	"result argument to overflow builtin must be a pointer "			"result argument to overflow builtin must be a pointer "
	"to a non-const integer (%0 invalid)">;			"to a non-const integer (%0 invalid)">;
	def err_overflow_builtin_bit_int_max_size : Error<			def err_overflow_builtin_bit_int_max_size : Error<
	"__builtin_mul_overflow does not support 'signed _BitInt' operands of more "			"__builtin_mul_overflow does not support 'signed _BitInt' operands of more "
	"than %0 bits">;			"than %0 bits">;
				def err_expected_struct_pointer_argument : Error<
				"expected pointer to struct as %ordinal0 argument to %1, found %2">;

	def err_atomic_load_store_uses_lib : Error<			def err_atomic_load_store_uses_lib : Error<
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Should we combine these into: def err_builtin_dump_struct_expects : Error< "expected %select{pointer to struct\|a callable expression}0 as %ordinal1 argument to %2, found %3">; to reduce the number of diagnostics we have to tablegen? aaron.ballman: Should we combine these into: ``` def err_builtin_dump_struct_expects : Error< "expected…
				rsmithAuthorUnsubmitted Done Reply Inline Actions I'm indifferent. That kind of refactoring will probably decrease the chance of these being reused for any other purpose, which might end up costing us more diagnostics, and will need a magic number in the source code to select between alternatives. Happy to do it if you think it's worthwhile, but I don't have any measurements for how much each diagnostic ID costs. rsmith: I'm indifferent. That kind of refactoring will probably decrease the chance of these being…
				aaron.ballmanUnsubmitted Done Reply Inline Actions Eh, let's skip it for now. I bring it up on occasion because we do run out of diagnostic IDs on occasion, but that's more a systemic concern. aaron.ballman: Eh, let's skip it for now. I bring it up on occasion because we do run out of diagnostic IDs on…
	"atomic %select{load\|store}0 requires runtime support that is not "			"atomic %select{load\|store}0 requires runtime support that is not "
	"available for this target">;			"available for this target">;

	def err_nontemporal_builtin_must_be_pointer : Error<			def err_nontemporal_builtin_must_be_pointer : Error<
	"address argument to nontemporal builtin must be a pointer (%0 invalid)">;			"address argument to nontemporal builtin must be a pointer (%0 invalid)">;
	def err_nontemporal_builtin_must_be_pointer_intfltptr_or_vector : Error<			def err_nontemporal_builtin_must_be_pointer_intfltptr_or_vector : Error<
	"address argument to nontemporal builtin must be a pointer to integer, float, "			"address argument to nontemporal builtin must be a pointer to integer, float, "
	"pointer, or a vector of such types (%0 invalid)">;			"pointer, or a vector of such types (%0 invalid)">;
	▲ Show 20 Lines • Show All 3,142 Lines • Show Last 20 Lines

clang/lib/AST/Expr.cpp

Show First 20 Lines • Show All 2,702 Lines • ▼ Show 20 Lines	if (const ObjCMethodDecl *MD = ME->getMethodDecl())
Loc = getExprLoc();		Loc = getExprLoc();
return true;		return true;
}		}

return false;		return false;
}		}

case ObjCPropertyRefExprClass:		case ObjCPropertyRefExprClass:
		case ObjCSubscriptRefExprClass:
WarnE = this;		WarnE = this;
Loc = getExprLoc();		Loc = getExprLoc();
R1 = getSourceRange();		R1 = getSourceRange();
return true;		return true;

case PseudoObjectExprClass: {		case PseudoObjectExprClass:
const PseudoObjectExpr *PO = cast<PseudoObjectExpr>(this);		// Ask the syntactic form whether to warn.
		// FIXME: Consider looking at the result expression in the semantic form
// Only complain about things that have the form of a getter.		// instead.
if (isa<UnaryOperator>(PO->getSyntacticForm()) \|\|		return cast<PseudoObjectExpr>(this)
isa<BinaryOperator>(PO->getSyntacticForm()))		->getSyntacticForm()
return false;		->isUnusedResultAWarning(WarnE, Loc, R1, R2, Ctx);

WarnE = this;
Loc = getExprLoc();
R1 = getSourceRange();
return true;
}

case StmtExprClass: {		case StmtExprClass: {
		aaron.ballmanUnsubmitted Done Reply Inline Actions I don't think the changes here are incorrect, but I'm wondering if they should be split into a different patch as they seem to be largely unrelated to the builtin? aaron.ballman: I don't think the changes here are incorrect, but I'm wondering if they should be split into a…
		rsmithAuthorUnsubmitted Done Reply Inline Actions Yeah, will do. I think these changes may actually turn out to not be necessary for `__builtin_dump_struct`, because it returns `void` (they were necessary for __builtin_reflect_struct`). rsmith: Yeah, will do. I think these changes may actually turn out to not be necessary for…
// Statement exprs don't logically have side effects themselves, but are		// Statement exprs don't logically have side effects themselves, but are
// sometimes used in macros in ways that give them a type that is unused.		// sometimes used in macros in ways that give them a type that is unused.
// For example ({ blah; foo(); }) will end up with a type if foo has a type.		// For example ({ blah; foo(); }) will end up with a type if foo has a type.
// however, if the result of the stmt expr is dead, we don't want to emit a		// however, if the result of the stmt expr is dead, we don't want to emit a
// warning.		// warning.
const CompoundStmt *CS = cast<StmtExpr>(this)->getSubStmt();		const CompoundStmt *CS = cast<StmtExpr>(this)->getSubStmt();
if (!CS->body_empty()) {		if (!CS->body_empty()) {
if (const Expr *E = dyn_cast<Expr>(CS->body_back()))		if (const Expr *E = dyn_cast<Expr>(CS->body_back()))
▲ Show 20 Lines • Show All 2,270 Lines • Show Last 20 Lines

clang/lib/Sema/SemaChecking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines for (unsigned I = 0; I < 3; ++I) {

diag::err_overflow_builtin_bit_int_max_size) diag::err_overflow_builtin_bit_int_max_size)

<< 128; << 128;

} }

return false; return false;

} }

static ExprResult SemaBuiltinReflectStruct(Sema &S, CallExpr *TheCall) {

if (TheCall->getNumArgs() < 2 && checkArgCount(S, TheCall, 2))

return ExprError();

SourceLocation Loc = TheCall->getBeginLoc();

// First argument should be a pointer to a struct.

ExprResult PtrArgResult = S.DefaultLvalueConversion(TheCall->getArg(0));

if (PtrArgResult.isInvalid())

return ExprError();

Expr *PtrArg = PtrArgResult.get();

QualType PtrArgType = PtrArg->getType();

if (!PtrArgType->isPointerType() ||

!PtrArgType->getPointeeType()->isRecordType()) {

S.Diag(PtrArg->getBeginLoc(), diag::err_expected_struct_pointer_argument)

<< 1 << TheCall->getDirectCallee() << PtrArgType;

return ExprError();

}

const RecordDecl *RD = PtrArgType->getPointeeType()->getAsRecordDecl();

// Build an OpaqueValueExpr so we can refer to the argument more than once.

auto *ArgOVE = new (S.Context)

aaron.ballmanUnsubmitted

Not Done

Haha, neat trick. Perhaps someday we should allow getPredefinedStringLiteralFromCache() to specify a SourceLocation?

aaron.ballman: Haha, neat trick. Perhaps someday we should allow `getPredefinedStringLiteralFromCache()` to…

rsmithAuthorUnsubmitted

Done

The desire for caching and the desire for a source location are a bit at odds here, unfortunately. Also unfortunate: if you actually put a source location on a StringLiteral, other parts of Clang (especially relevant for this patch: format string checking!) will assume that there's a string literal token that they can re-lex from that location, and will get very unhappy if there isn't. Perhaps a no-op LocatedExpr that just wraps an expression and gives it a different location would work, but of course we'd need to teach various things to look through it.

rsmith: The desire for caching and the desire for a source location are a bit at odds here…

OpaqueValueExpr(Loc, PtrArg->getType(), PtrArg->getValueKind(),

PtrArg->getObjectKind(), PtrArg);

// Start with all the provided arguments after the callee.

SmallVector<Expr *, 8> Args(TheCall->arg_begin() + 2, TheCall->arg_end());

SmallVector<Expr *, 32> Elts;

aaron.ballmanUnsubmitted

Done

Can you add an assertion that the call has at least two args?

aaron.ballman: Can you add an assertion that the call has at least two args?

// Add a 'static_cast<Base*>(arg)' argument for each base class.

if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {

for (const auto &Base : CXXRD->bases()) {

QualType BaseType = S.Context.getPointerType(S.Context.getQualifiedType(

Base.getType(), PtrArgType->getPointeeType().getQualifiers()));

ExprResult BasePtr = S.BuildCXXNamedCast(

Loc, tok::kw_static_cast,

S.Context.getTrivialTypeSourceInfo(BaseType, Loc), ArgOVE,

SourceRange(Loc, Loc), SourceRange(Loc, Loc));

if (BasePtr.isInvalid())

return ExprError();

Elts.push_back(BasePtr.get());

}

// Add a '{"field_name", arg->field_name}' argument for each field.

// Add a '{"field_name", arg->field_name, bitwidth}' argument for each

// bit-field.

for (auto *D : RD->decls()) {

auto *IFD = dyn_cast<IndirectFieldDecl>(D);

auto *FD = IFD ? IFD->getAnonField() : dyn_cast<FieldDecl>(D);

if (!FD || (FD->isUnnamedBitfield() || FD->isAnonymousStructOrUnion()))

erichkeaneUnsubmitted

Not Done

Losing the unnamed bitfield is unfortunate, and I the 'dump struct' handles. As is losing the anonymous struct/union.

Also, how does this handle 'record type' members? Does the user have to know the inner type already? If they already know that much information about the type, they wouldn't really need this, right?

erichkeane: Losing the unnamed bitfield is unfortunate, and I the 'dump struct' handles. As is losing the…

rsmithAuthorUnsubmitted

Done

If __builtin_dump_struct includes unnamed bitfields, that's a bug in that builtin.

In general, the user function gets given the bases and members with the right types, so they can use an overload set or a template to dispatch based on that type. See the SemaCXX test for a basic example of that.

rsmith: If `__builtin_dump_struct` includes unnamed bitfields, that's a bug in that builtin. In…

erichkeaneUnsubmitted

Not Done

I thought it did? For the use case I see __builtin_dump_struct having, I would think printing the existence of unnamed bitfields to be quite useful.

erichkeane: I thought it did? For the use case I see `__builtin_dump_struct` having, I would think…

rsmithAuthorUnsubmitted

Done

Why? They're not part of the value, they're just padding.

rsmith: Why? They're not part of the value, they're just padding.

erichkeaneUnsubmitted

Not Done

They are lexically part of the type and are part of what makes up the 'size' of the thing. I would expect something dumping the type to be as consistent lexically as possible.

erichkeane: They are lexically part of the type and are part of what makes up the 'size' of the thing. I…

rsmithAuthorUnsubmitted

Done

Looks like __builtin_dump_struct currently includes them *and* prints a value (whatever garbage happens to be in those padding bits). That's obviously a bug.

rsmith: Looks like `__builtin_dump_struct` currently includes them *and* prints a value (whatever…

erichkeaneUnsubmitted

Not Done

O.o! Yeah, printing the value is nonsense.

erichkeane: O.o! Yeah, printing the value is nonsense.

yihanaaUnsubmitted

Not Done

I think the user should be informed somehow that there is an unnamed bitfield here

yihanaa: I think the user should be informed somehow that there is an unnamed bitfield here

rsmithAuthorUnsubmitted

Done

It seems to me that __builtin_dump_struct is displaying the value of a struct object, not the representation of the object, and unnamed bitfields are not a part of the value, so should not be included. And I think it makes sense for that builtin to be value-oriented: someone using it presumably already knows whatever they want to know about the representation, to the extent that it matters, or can look it up; what they don't know is the information that varies from one instance to another. If we want a builtin that's more oriented around showing the struct's memory representation, I think we'd want quite a different output format, including offsets for fields -- and even then I don't think it makes sense to include unnamed bit-fields because they're not different from any other kind of padding in the object. If we still want to print unnamed bit-fields for some reason, presumably we should also print alignment and packing attributes and pragmas, because they too can change the padding within a struct.

rsmith: It seems to me that `__builtin_dump_struct` is displaying the value of a struct object, not the…

aaron.ballmanUnsubmitted

Not Done

I tend to agree that __builtin_dump_struct should be focused on the values in the structure and not the representation.

However, keeping in mind that the original builtin was for kernel folks and the kernel tends to do some pretty interesting things (that aren't always supported by the standard), there's a part of me that wonders if they have a use case for unnamed bit-fields where they expect data to be squirreled away in those bits (through memcpy() or some other means) that they'd like to be able to read.

aaron.ballman: I tend to agree that `__builtin_dump_struct` should be focused on the values in the structure…

continue;

Expr *Name = S.Context.getPredefinedStringLiteralFromCache(FD->getName());

// FIXME: Access check?

ExprResult Field =

IFD ? S.BuildAnonymousStructUnionMemberReference(

yihanaaUnsubmitted

Done

What do you think of PrintingPolicy.Indentation here?

yihanaa: What do you think of PrintingPolicy.Indentation here?

rsmithAuthorUnsubmitted

Done

It looks like nothing sets that outside of libclang, so I don't think that'd be especially useful, but it seems fine to me to use it here. We're not exactly pretty-printing source code, but that doesn't seem like a problem. Done.

rsmith: It looks like nothing sets that outside of libclang, so I don't think that'd be especially…

CXXScopeSpec(), Loc, IFD,

DeclAccessPair::make(IFD, IFD->getAccess()), ArgOVE, Loc)

: S.BuildFieldReferenceExpr(

ArgOVE, /*IsArrow=*/true, Loc, CXXScopeSpec(), FD,

DeclAccessPair::make(FD, FD->getAccess()),

DeclarationNameInfo(FD->getDeclName(), Loc));

yihanaaUnsubmitted

Not Done

I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf::PrintfSpecifier can get format specifier, what do you all think about?

yihanaa: I think this is better maintained in "clang/AST/FormatString.h". For example analyze_printf…

aaron.ballmanUnsubmitted

Not Done

+1 to this suggestion -- my hope is that we could generalize it more then as I notice there are missing specifiers for things like intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier for __builtin_dump_struct to benefit when new format specifiers are added, such as ones for printing a _BitInt.

aaron.ballman: +1 to this suggestion -- my hope is that we could generalize it more then as I notice there are…

rsmithAuthorUnsubmitted

Done

I am somewhat uncertain: every one of these is making arbitrary choices about how to format the value, so it's not clear to me that this is general logic rather than something specific to __builtin_dump_struct. For example, using %f rather than one of the myriad other ways of formatting double is somewhat arbitrary. Using %s for any const char* is *extremely* arbitrary and will be wrong and will cause crashes in some cases, but it may be the pragmatically correct choice for a dumping utility. A general-purpose mechanism would use %p for all kinds of pointer.

We could perhaps factor out the formatting for cases where there is a clear canonical default formatting, such as for integer types and probably %p for pointers, then call that from here with some special-casing, but without a second consumer for that functionality it's really not clear to me what form it should take.

rsmith: I am somewhat uncertain: every one of these is making arbitrary choices about how to format the…

rsmithAuthorUnsubmitted

Done

I went ahead and did this, mostly to match concurrent changes to the old implementation. There are a few cases where our existing "guess a format specifier" logic does the wrong thing for dumping purposes, which I've explicitly handled -- things like wanting to dump a char / signed char / unsigned char member as a number rather than as a (potentially non-printable or whitespace) character.

rsmith: I went ahead and did this, mostly to match concurrent changes to the old implementation. There…

yihanaaUnsubmitted

Not Done

When I was patching that old implementation, I found that for uint8_t, int8_t, Clang's existing "guess a format specifier" logic would treat the value as an integer, but for unsigned char, signed char, char types, it would Treat it as a character, please look at this example ( https://godbolt.org/z/ooqn4468T ), I guess this existing logic may have made some special judgment.

yihanaa: When I was patching that old implementation, I found that for uint8_t, int8_t, Clang's…

rsmithAuthorUnsubmitted

Done

Yeah. I think in the case where we see some random call to printf, %c is probably the right thing to guess here, but it doesn't seem appropriate to me to use this in a dumping routine. If we could dump as 'x' for printable characters and as '\xAB' for everything else, that'd be great, but printf can't do that itself and I'm not sure we want to be injecting calls to isprint or whatever to make that work. Dumping as an integer always seems like it's probably the least-bad option.

Somewhat related: I wonder if we should use "\"%s\"" instead of simply "%s" when dumping a const char*. That's not ideal but probably clearer than the current dump output.

rsmith: Yeah. I think in the case where we see some random call to `printf`, `%c` is probably the right…

erichkeaneUnsubmitted

Not Done

I see value to having strings with SOME level of delimiter, if at least to handle cases when the string itself has a newline in it.

erichkeane: I see value to having strings with SOME level of delimiter, if at least to handle cases when…

yihanaaUnsubmitted

Not Done

This looks good, but the string may need to be escaped (if there are some special characters in the original string)

yihanaa: This looks good, but the string may need to be escaped (if there are some special characters in…

aaron.ballmanUnsubmitted

Done

Somewhat related: I wonder if we should use "\"%s\"" instead of simply "%s" when dumping a const char*. That's not ideal but probably clearer than the current dump output.

I'd be in favor of that. (I sort of wonder whether we do want to inject calls to isprint() though -- it seems like escaping nonprintable characters would be the kindest option for both characters and strings.)

aaron.ballman: > Somewhat related: I wonder if we should use "\"%s\"" instead of simply "%s" when dumping a…

yihanaaUnsubmitted

Not Done

Maybe users should wrap the printf function, and then pass the wrapper function as a parameter, so that they can do some custom things

yihanaa: Maybe users should wrap the printf function, and then pass the wrapper function as a parameter…

if (Field.isInvalid())

return ExprError();

Expr *Width = nullptr;

if (FD->isBitField()) {

QualType SizeT = S.Context.getSizeType();

llvm::APInt BitWidth(S.Context.getIntWidth(SizeT),

FD->getBitWidthValue(S.Context));

Width = IntegerLiteral::Create(S.Context, BitWidth, SizeT, Loc);

}

Expr *Parts[] = {Name, Field.get(), Width};

MultiExprArg PartList = Parts;

if (!Width)

PartList = PartList.drop_back(1);

ExprResult InitList = S.BuildInitList(Loc, PartList, Loc);

if (InitList.isInvalid())

return ExprError();

Elts.push_back(InitList.get());

}

// Wrap all the reflection data in '{'...'}'.

ExprResult ReflectionData = S.BuildInitList(Loc, Elts, Loc);

Args.push_back(ReflectionData.get());

// Call the second argument with the assembled argument list.

ExprResult RealCall =

aaron.ballmanUnsubmitted

Not Done

Note for future reference: I don't think fixType() does anything special to help us out with arrays.

aaron.ballman: Note for future reference: I don't think `fixType()` does anything special to help us out with…

S.BuildCallExpr(/*Scope=*/nullptr, TheCall->getArg(1),

TheCall->getBeginLoc(), Args, TheCall->getRParenLoc());

if (RealCall.isInvalid())

return ExprError();

// The result of the real call is the result of the builtin call.

TheCall->setType(RealCall.get()->getType());

TheCall->setValueKind(RealCall.get()->getValueKind());

Expr *SemanticForm[] = {ArgOVE, RealCall.get()};

return PseudoObjectExpr::Create(S.Context, TheCall, SemanticForm,

/*ResultIdx*/ 1);

}

static bool SemaBuiltinCallWithStaticChain(Sema &S, CallExpr *BuiltinCall) { static bool SemaBuiltinCallWithStaticChain(Sema &S, CallExpr *BuiltinCall) {

if (checkArgCount(S, BuiltinCall, 2)) if (checkArgCount(S, BuiltinCall, 2))

return true; return true;

SourceLocation BuiltinLoc = BuiltinCall->getBeginLoc(); SourceLocation BuiltinLoc = BuiltinCall->getBeginLoc();

Expr *Builtin = BuiltinCall->getCallee()->IgnoreImpCasts(); Expr *Builtin = BuiltinCall->getCallee()->IgnoreImpCasts();

Expr *Call = BuiltinCall->getArg(0); Expr *Call = BuiltinCall->getArg(0);

Expr *Chain = BuiltinCall->getArg(1); Expr *Chain = BuiltinCall->getArg(1);

if (Call->getStmtClass() != Stmt::CallExprClass) { if (Call->getStmtClass() != Stmt::CallExprClass) {

S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_not_call) S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_not_call)

<< Call->getSourceRange(); << Call->getSourceRange();

return true; return true;

} }

auto CE = cast<CallExpr>(Call); auto CE = cast<CallExpr>(Call);

if (CE->getCallee()->getType()->isBlockPointerType()) { if (CE->getCallee()->getType()->isBlockPointerType()) {

aaron.ballmanUnsubmitted

Not Done

My suggestion is to print the value of all fields and treat any string-like types as %p rather than %s. I'm fine with this being done later.

aaron.ballman: My suggestion is to print the value of all fields and treat any string-like types as `%p`…

S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_block_call) S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_block_call)

<< Call->getSourceRange(); << Call->getSourceRange();

return true; return true;

} }

const Decl *TargetDecl = CE->getCalleeDecl(); const Decl *TargetDecl = CE->getCalleeDecl();

if (const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl)) if (const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl))

aaron.ballmanUnsubmitted

Done

Can you spell out the type since it's not in the initialization?

aaron.ballman: Can you spell out the type since it's not in the initialization?

if (FD->getBuiltinID()) { if (FD->getBuiltinID()) {

S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_builtin_call) S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_builtin_call)

<< Call->getSourceRange(); << Call->getSourceRange();

return true; return true;

} }

if (isa<CXXPseudoDestructorExpr>(CE->getCallee()->IgnoreParens())) { if (isa<CXXPseudoDestructorExpr>(CE->getCallee()->IgnoreParens())) {

aaron.ballmanUnsubmitted

Done

// Dump each base class, regardless of whether they're aggregates.

- if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {

+ if (const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {

for (const auto &Base : CXXRD->bases()) {

aaron.ballman:

S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_pdtor_call) S.Diag(BuiltinLoc, diag::err_first_argument_to_cwsc_pdtor_call)

<< Call->getSourceRange(); << Call->getSourceRange();

return true; return true;

} }

ExprResult ChainResult = S.UsualUnaryConversions(Chain); ExprResult ChainResult = S.UsualUnaryConversions(Chain);

if (ChainResult.isInvalid()) if (ChainResult.isInvalid())

return true; return true;

if (!ChainResult.get()->getType()->isPointerType()) { if (!ChainResult.get()->getType()->isPointerType()) {

S.Diag(BuiltinLoc, diag::err_second_argument_to_cwsc_not_pointer) S.Diag(BuiltinLoc, diag::err_second_argument_to_cwsc_not_pointer)

<< Chain->getSourceRange(); << Chain->getSourceRange();

return true; return true;

} }

QualType ReturnTy = CE->getCallReturnType(S.Context); QualType ReturnTy = CE->getCallReturnType(S.Context);

QualType ArgTys[2] = { ReturnTy, ChainResult.get()->getType() }; QualType ArgTys[2] = { ReturnTy, ChainResult.get()->getType() };

QualType BuiltinTy = S.Context.getFunctionType( QualType BuiltinTy = S.Context.getFunctionType(

ReturnTy, ArgTys, FunctionProtoType::ExtProtoInfo()); ReturnTy, ArgTys, FunctionProtoType::ExtProtoInfo());

QualType BuiltinPtrTy = S.Context.getPointerType(BuiltinTy); QualType BuiltinPtrTy = S.Context.getPointerType(BuiltinTy);

Builtin = Builtin =

aaron.ballmanUnsubmitted

Done

auto *FD = IFD ? IFD->getAnonField() : dyn_cast<FieldDecl>(D);

- if (!FD || (FD->isUnnamedBitfield() || FD->isAnonymousStructOrUnion()))

+ if (!FD || FD->isUnnamedBitfield() || FD->isAnonymousStructOrUnion())

continue;

aaron.ballman:

S.ImpCastExprToType(Builtin, BuiltinPtrTy, CK_BuiltinFnToFnPtr).get(); S.ImpCastExprToType(Builtin, BuiltinPtrTy, CK_BuiltinFnToFnPtr).get();

BuiltinCall->setType(CE->getType()); BuiltinCall->setType(CE->getType());

BuiltinCall->setValueKind(CE->getValueKind()); BuiltinCall->setValueKind(CE->getValueKind());

BuiltinCall->setObjectKind(CE->getObjectKind()); BuiltinCall->setObjectKind(CE->getObjectKind());

BuiltinCall->setCallee(Builtin); BuiltinCall->setCallee(Builtin);

BuiltinCall->setArg(1, ChainResult.get()); BuiltinCall->setArg(1, ChainResult.get());

Show All 13 Lines class ScanfDiagnosticFormatHandler

// destination size, and the source size). // destination size, and the source size).

using DiagnoseFunction = using DiagnoseFunction =

llvm::function_ref<void(unsigned, unsigned, unsigned)>; llvm::function_ref<void(unsigned, unsigned, unsigned)>;

ComputeSizeFunction ComputeSizeArgument; ComputeSizeFunction ComputeSizeArgument;

DiagnoseFunction Diagnose; DiagnoseFunction Diagnose;

public: public:

ScanfDiagnosticFormatHandler(ComputeSizeFunction ComputeSizeArgument, ScanfDiagnosticFormatHandler(ComputeSizeFunction ComputeSizeArgument,

erichkeaneUnsubmitted

Done

auto *InnerRD = FD->getType()->getAsRecordDecl();

- auto *InnerCXXRD = InnerRD ? dyn_cast<CXXRecordDecl>(InnerRD) : nullptr;

+ auto *InnerCXXRD = dyn_cast_or_null<CXXRecordDecl>(InnerRD);

if (InnerRD && (!InnerCXXRD || InnerCXXRD->isAggregate())) {

erichkeane:

DiagnoseFunction Diagnose) DiagnoseFunction Diagnose)

: ComputeSizeArgument(ComputeSizeArgument), Diagnose(Diagnose) {} : ComputeSizeArgument(ComputeSizeArgument), Diagnose(Diagnose) {}

bool HandleScanfSpecifier(const analyze_scanf::ScanfSpecifier &FS, bool HandleScanfSpecifier(const analyze_scanf::ScanfSpecifier &FS,

const char *StartSpecifier, const char *StartSpecifier,

unsigned specifierLen) override { unsigned specifierLen) override {

if (!FS.consumesDataArgument()) if (!FS.consumesDataArgument())

return true; return true;

unsigned NulByte = 0; unsigned NulByte = 0;

switch ((FS.getConversionSpecifier().getKind())) { switch ((FS.getConversionSpecifier().getKind())) {

default: default:

return true; return true;

case analyze_format_string::ConversionSpecifier::sArg: case analyze_format_string::ConversionSpecifier::sArg:

case analyze_format_string::ConversionSpecifier::ScanListArg: case analyze_format_string::ConversionSpecifier::ScanListArg:

yihanaaUnsubmitted

Not Done

Can we generate a warning diagnostic message when we see an unsupported type (perhaps temporarily not supported), what do you all think about?

yihanaa: Can we generate a warning diagnostic message when we see an unsupported type (perhaps…

rsmithAuthorUnsubmitted

Done

We could, but I'm leaning towards thinking that we shouldn't warn by default at least. That would get in the way of the C++ use case where the type actually is supported by the printing function; I'm not sure that we can easily tell those cases apart. And it's probably not all that useful to someone who just wants whatever information we can give them while debugging. I'd be OK with adding a warning that's off by default, though. What would be the intended use case here? Would an off-by-default warning satisfy it?

rsmith: We could, but I'm leaning towards thinking that we shouldn't warn by default at least. That…

aaron.ballmanUnsubmitted

Not Done

So long as we don't trigger UB, I think it's fine to not diagnose for unsupported types. But the UB cases do strike me as something we might want to warn about if we think they can be hit (specifically, these kinds of UB: https://eel.is/c++draft/cstdarg.syn#1).

As an example, do we properly handle dumping this case:

struct S {
  int &i;
  S(int &ref) : i(ref) {}
};

where the vararg argument is of reference type? I suppose another interesting question is:

struct S {
  int i;
};

void func() {
  S s;
  [=]() {
    __builtin_dump_struct(&s, ::printf);
  };
}

(is S.i "an entity resulting from a lambda capture"?)

aaron.ballman: So long as we don't trigger UB, I think it's fine to not diagnose for unsupported types. But…

rsmithAuthorUnsubmitted

Done

The only things we pass to the formatting function are:

The small number of builtin types for which we have known formatting specifiers.
Pointers to void, using %p.
Pointers to [const] char, using %s.
Pointers to user-defined types, using *%p.

All of these can be passed through a ... without problems.

We can potentially introduce the following sources of UB:

Indirection through a struct pointer that doesn't point to a struct object.
Read of an inactive union member.
Read of an uninitialized member.
Printing something other than a pointer to a nul-terminated byte sequence with %s.

I'm not sure how much we should be worrying about those. It'd be nice to avoid them, though I don't think we can fix the fourth issue without giving up the %s formatting entirely.

Reference types aren't on our list of types with known formatting specifiers, so in your first example we'd call printf("%s%s = *%p", "int &", "i", &p->i), producing output like int & i = *0x1234abcd. We could special-case that to also dereference the reference and include the value (int & i = *0x1234abcd (5), but for now I'm treating it like any other type for which we don't have a special formatting rule. I've added test coverage for this case.

In your second example, we rewrite into ::printf("%s%s = %s", "int", "i", (&s)->i), which is fine. The parmN restriction is about calls to va_start:

void f(int a, ...) {
  [&] {
    va_list va;
    va_start(va, a); // error, `a` is captured here
    // ...
  };
}

rsmith: The only things we pass to the formatting function are: * The small number of builtin types for…

aaron.ballmanUnsubmitted

Done

Thank you for the explanation, it sounds like we're doing about as good as we can get here in terms of safety.

FWIW, I like the idea of outputting the value for a reference type, but am fine punting on that for now (it could get awkward to format when the referenced type is a structure, for example).

aaron.ballman: Thank you for the explanation, it sounds like we're doing about as good as we can get here in…

NulByte = 1; NulByte = 1;

aaron.ballmanUnsubmitted

Not Done

Should we handle array types specifically, or should we continue to punt on those for now?

aaron.ballman: Should we handle array types specifically, or should we continue to punt on those for now?

rsmithAuthorUnsubmitted

Done

I'd prefer to punt for now, if that's OK. I don't think this is hard to add, but it's probably clearer to do it in a separate patch. There are also some nontrivial questions there, eg: should we dump all elements or stop after some limit? Should we dump trailing zeroes in an integer array? How should we dump arrays of char? I'd prefer to not deal with those questions in this patch :)

rsmith: I'd prefer to punt for now, if that's OK. I don't think this is hard to add, but it's probably…

aaron.ballmanUnsubmitted

Done

Absolutely fine to punt on it, because I agree there's a bunch of questions to answer there. I was also thinking about what to do with flexible array members or other trailing array objects.

aaron.ballman: Absolutely fine to punt on it, because I agree there's a bunch of questions to answer there. I…

break; break;

case analyze_format_string::ConversionSpecifier::cArg: case analyze_format_string::ConversionSpecifier::cArg:

break; break;

} }

analyze_format_string::OptionalAmount FW = FS.getFieldWidth(); analyze_format_string::OptionalAmount FW = FS.getFieldWidth();

if (FW.getHowSpecified() != if (FW.getHowSpecified() !=

analyze_format_string::OptionalAmount::HowSpecified::Constant) analyze_format_string::OptionalAmount::HowSpecified::Constant)

return true; return true;

unsigned SourceSize = FW.getConstantAmount() + NulByte; unsigned SourceSize = FW.getConstantAmount() + NulByte;

Optional<llvm::APSInt> DestSizeAPS = ComputeSizeArgument(FS.getArgIndex()); Optional<llvm::APSInt> DestSizeAPS = ComputeSizeArgument(FS.getArgIndex());

if (!DestSizeAPS) if (!DestSizeAPS)

return true; return true;

unsigned DestSize = DestSizeAPS->getZExtValue(); unsigned DestSize = DestSizeAPS->getZExtValue();

if (DestSize < SourceSize) if (DestSize < SourceSize)

Diagnose(FS.getArgIndex(), DestSize, SourceSize); Diagnose(FS.getArgIndex(), DestSize, SourceSize);

erichkeaneUnsubmitted

Not Done

What is happening here? I would expect the opposite to need to happen here (wrapper be set to match the call, not the other way around?).

erichkeane: What is happening here? I would expect the opposite to need to happen here (wrapper be set to…

rsmithAuthorUnsubmitted

Done

This is the normal way we handle builtins with custom type checking: the SemaChecking logic analyzes the builtin and sets the result type of the call as appropriate, because the type we get from Builtins.def is meaningless. As it happens, this is a degenerate case (compared to __builtin_reflect_struct where this did something more interesting), because the type of the PseudoObjectExpr is always void here; I can just set that directly if you think that's clearer. It shouldn't matter much anyway, given that the call is only used as the syntactic form of the PseudoObjectExpr, so its type doesn't really affect anything. but following the type and value kind of the PseudoObjectExpr seems "right" to me.

rsmith: This is the normal way we handle builtins with custom type checking: the `SemaChecking` logic…

erichkeaneUnsubmitted

Done

Ah, I see some other examples now, thanks for the explanation. I don't think setting to void is a good idea, in case we find a good reason to change the type; this ends up being just one more place we need to mess with it.

erichkeane: Ah, I see some other examples now, thanks for the explanation. I don't think setting to `void`…

return true; return true;

} }

}; };

class EstimateSizeFormatHandler class EstimateSizeFormatHandler

: public analyze_format_string::FormatStringHandler { : public analyze_format_string::FormatStringHandler {

size_t Size; size_t Size;

erichkeaneUnsubmitted

Done

This is unfortunate. Do we really not have a checkAtLeastArgCount type function here? I know we have something similar for attributes.

erichkeane: This is unfortunate. Do we really not have a `checkAtLeastArgCount` type function here? I…

rsmithAuthorUnsubmitted

Done

We do now. :)

rsmith: We do now. :)

public: public:

EstimateSizeFormatHandler(StringRef Format) EstimateSizeFormatHandler(StringRef Format)

: Size(std::min(Format.find(0), Format.size()) + : Size(std::min(Format.find(0), Format.size()) +

1 /* null byte always written by sprintf */) {} 1 /* null byte always written by sprintf */) {}

bool HandlePrintfSpecifier(const analyze_printf::PrintfSpecifier &FS, bool HandlePrintfSpecifier(const analyze_printf::PrintfSpecifier &FS,

const char *, unsigned SpecifierLen, const char *, unsigned SpecifierLen,

const TargetInfo &) override { const TargetInfo &) override {

const size_t FieldWidth = computeFieldWidth(FS); const size_t FieldWidth = computeFieldWidth(FS);

const size_t Precision = computePrecision(FS); const size_t Precision = computePrecision(FS);

erichkeaneUnsubmitted

Not Done

if (!PtrArgType->isPointerType() ||

- !PtrArgType->getPointeeType()->isRecordType()) {

+ !PtrArgType->getPointeeType()->isStructureOrClassType()) {

S.Diag(PtrArgResult.get()->getBeginLoc(),

Is this sufficient? Couldn't this be a pointer to a union? Do you mean the suggestion?

erichkeane: Is this sufficient? Couldn't this be a pointer to a union? Do you mean the suggestion?

rsmithAuthorUnsubmitted

Done

Allowing this is what the existing builtin did, and there were tests for it.

I'm not sure what the best thing to do with unions is -- especially when a union appears as a struct member. Perhaps the least bad approach is to print out all the union members and hope that nothing goes wrong. That'll crash in some cases (eg, union { const char *p; int n; }; when the int is the active member), but that seems to be the nature of this builtin.

I've added a FIXME to dumpRecordValue.

rsmith: Allowing this is what the existing builtin did, and there were tests for it. I'm not sure what…

erichkeaneUnsubmitted

Not Done

SGTM! We DO document a few times IIRC that we accept a 'struct' param(in fact, the comment on line 637 n ow says we do!), so I wanted to make sure we are consistent there.

erichkeane: SGTM! We DO document a few times IIRC that we accept a 'struct' param(in fact, the comment on…

// The actual format. // The actual format.

switch (FS.getConversionSpecifier().getKind()) { switch (FS.getConversionSpecifier().getKind()) {

// Just a char. // Just a char.

case analyze_format_string::ConversionSpecifier::cArg: case analyze_format_string::ConversionSpecifier::cArg:

case analyze_format_string::ConversionSpecifier::CArg: case analyze_format_string::ConversionSpecifier::CArg:

Size += std::max(FieldWidth, (size_t)1); Size += std::max(FieldWidth, (size_t)1);

break; break;

// Just an integer. // Just an integer.

case analyze_format_string::ConversionSpecifier::dArg: case analyze_format_string::ConversionSpecifier::dArg:

case analyze_format_string::ConversionSpecifier::DArg: case analyze_format_string::ConversionSpecifier::DArg:

case analyze_format_string::ConversionSpecifier::iArg: case analyze_format_string::ConversionSpecifier::iArg:

aaron.ballmanUnsubmitted

Done

ObjC block?

aaron.ballman: ObjC block?

case analyze_format_string::ConversionSpecifier::oArg: case analyze_format_string::ConversionSpecifier::oArg:

case analyze_format_string::ConversionSpecifier::OArg: case analyze_format_string::ConversionSpecifier::OArg:

case analyze_format_string::ConversionSpecifier::uArg: case analyze_format_string::ConversionSpecifier::uArg:

case analyze_format_string::ConversionSpecifier::UArg: case analyze_format_string::ConversionSpecifier::UArg:

case analyze_format_string::ConversionSpecifier::xArg: case analyze_format_string::ConversionSpecifier::xArg:

case analyze_format_string::ConversionSpecifier::XArg: case analyze_format_string::ConversionSpecifier::XArg:

Size += std::max(FieldWidth, Precision); Size += std::max(FieldWidth, Precision);

break; break;

▲ Show 20 Lines • Show All 1,490 Lines • ▼ Show 20 Lines case Builtin::BI__builtin_dump_struct: {

// We first want to ensure we are called with 2 arguments // We first want to ensure we are called with 2 arguments

if (checkArgCount(*this, TheCall, 2)) if (checkArgCount(*this, TheCall, 2))

return ExprError(); return ExprError();

// Ensure that the first argument is of type 'struct XX *' // Ensure that the first argument is of type 'struct XX *'

const Expr *PtrArg = TheCall->getArg(0)->IgnoreParenImpCasts(); const Expr *PtrArg = TheCall->getArg(0)->IgnoreParenImpCasts();

const QualType PtrArgType = PtrArg->getType(); const QualType PtrArgType = PtrArg->getType();

if (!PtrArgType->isPointerType() || if (!PtrArgType->isPointerType() ||

!PtrArgType->getPointeeType()->isRecordType()) { !PtrArgType->getPointeeType()->isRecordType()) {

Diag(PtrArg->getBeginLoc(), diag::err_typecheck_convert_incompatible) Diag(PtrArg->getBeginLoc(), diag::err_expected_struct_pointer_argument)

<< PtrArgType << "structure pointer" << 1 << 0 << 3 << 1 << PtrArgType << 1 << FDecl << TheCall->getArg(0)->getType();

<< "structure pointer";

return ExprError(); return ExprError();

} }

// Ensure that the second argument is of type 'FunctionType' // Ensure that the second argument is of type 'FunctionType'

const Expr *FnPtrArg = TheCall->getArg(1)->IgnoreImpCasts(); const Expr *FnPtrArg = TheCall->getArg(1)->IgnoreImpCasts();

const QualType FnPtrArgType = FnPtrArg->getType(); const QualType FnPtrArgType = FnPtrArg->getType();

if (!FnPtrArgType->isPointerType()) { if (!FnPtrArgType->isPointerType()) {

Diag(FnPtrArg->getBeginLoc(), diag::err_typecheck_convert_incompatible) Diag(FnPtrArg->getBeginLoc(), diag::err_typecheck_convert_incompatible)

Show All 28 Lines if (const auto *FT = dyn_cast<FunctionProtoType>(FuncType)) {

<< 2 << FnPtrArgType << "'int (*)(const char *, ...)'"; << 2 << FnPtrArgType << "'int (*)(const char *, ...)'";

return ExprError(); return ExprError();

} }

TheCall->setType(Context.IntTy); TheCall->setType(Context.IntTy);

break; break;

} }

case Builtin::BI__builtin_reflect_struct:

return SemaBuiltinReflectStruct(*this, TheCall);

case Builtin::BI__builtin_expect_with_probability: { case Builtin::BI__builtin_expect_with_probability: {

// We first want to ensure we are called with 3 arguments // We first want to ensure we are called with 3 arguments

if (checkArgCount(*this, TheCall, 3)) if (checkArgCount(*this, TheCall, 3))

return ExprError(); return ExprError();

// then check probability is constant float in range [0.0, 1.0] // then check probability is constant float in range [0.0, 1.0]

const Expr *ProbArg = TheCall->getArg(2); const Expr *ProbArg = TheCall->getArg(2);

SmallVector<PartialDiagnosticAt, 8> Notes; SmallVector<PartialDiagnosticAt, 8> Notes;

Expr::EvalResult Eval; Expr::EvalResult Eval;

▲ Show 20 Lines • Show All 15,395 Lines • Show Last 20 Lines

clang/lib/Sema/SemaStmt.cpp

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	if (const ObjCMessageExpr *ME = dyn_cast<ObjCMessageExpr>(E)) {
const Expr *Source = POE->getSyntacticForm();		const Expr *Source = POE->getSyntacticForm();
// Handle the actually selected call of an OpenMP specialized call.		// Handle the actually selected call of an OpenMP specialized call.
if (LangOpts.OpenMP && isa<CallExpr>(Source) &&		if (LangOpts.OpenMP && isa<CallExpr>(Source) &&
POE->getNumSemanticExprs() == 1 &&		POE->getNumSemanticExprs() == 1 &&
isa<CallExpr>(POE->getSemanticExpr(0)))		isa<CallExpr>(POE->getSemanticExpr(0)))
return DiagnoseUnusedExprResult(POE->getSemanticExpr(0), DiagID);		return DiagnoseUnusedExprResult(POE->getSemanticExpr(0), DiagID);
if (isa<ObjCSubscriptRefExpr>(Source))		if (isa<ObjCSubscriptRefExpr>(Source))
DiagID = diag::warn_unused_container_subscript_expr;		DiagID = diag::warn_unused_container_subscript_expr;
else		else if (isa<ObjCPropertyRefExpr>(Source))
DiagID = diag::warn_unused_property_expr;		DiagID = diag::warn_unused_property_expr;
} else if (const CXXFunctionalCastExpr *FC		} else if (const CXXFunctionalCastExpr *FC
= dyn_cast<CXXFunctionalCastExpr>(E)) {		= dyn_cast<CXXFunctionalCastExpr>(E)) {
const Expr *E = FC->getSubExpr();		const Expr *E = FC->getSubExpr();
if (const CXXBindTemporaryExpr *TE = dyn_cast<CXXBindTemporaryExpr>(E))		if (const CXXBindTemporaryExpr *TE = dyn_cast<CXXBindTemporaryExpr>(E))
E = TE->getSubExpr();		E = TE->getSubExpr();
if (isa<CXXTemporaryObjectExpr>(E))		if (isa<CXXTemporaryObjectExpr>(E))
return;		return;
▲ Show 20 Lines • Show All 4,479 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/builtin-reflect-struct.cpp

This file was added.

				// RUN: %clang_cc1 -std=c++20 -triple x86_64-linux-gnu %s -emit-llvm -o - \| FileCheck %s

				// CHECK: @[[F_STR:.]] = {{.}} constant [2 x i8] c"f\00",
				// CHECK: @[[G_STR:.]] = {{.}} constant [2 x i8] c"g\00",

				struct A { int n; };
				struct B { int n; };
				struct C : A, B { int f, g; };

				struct Base {
				Base(void*);
				void *p;
				};
				struct Field {
				Field(const char*, int&);
				int n;
				};

				struct Expected {
				Base a, b;
				Field f, g;
				};

				int consume(int, Expected);

				erichkeaneUnsubmitted Not Done Reply Inline Actions OH! I see now. This is unfortunately quite limiting by doing it this way. At this point the user of this builtin is required to know the layout of the struct before calling the builtin, which seems unfortunate. Despite the downsides of being just a 'dump function', the other builtin had the benefit of working on an arbitrary type. erichkeane: OH! I see now. This is unfortunately quite limiting by doing it this way. At this point the…
				rsmithAuthorUnsubmitted Done Reply Inline Actions Well, no, this is set up this way to test code generation. See the SemaCXX test for an example of using this to dump an arbitrary type. rsmith: Well, no, this is set up this way to test code generation. See the SemaCXX test for an example…
				erichkeaneUnsubmitted Not Done Reply Inline Actions Yep, thanks for that. I see now how it can be used that way. erichkeane: Yep, thanks for that. I see now how it can be used that way.
				C c;

				// CHECK: %[[VAL:.]] = alloca %[[STRUCT_EXPECTED:.]],
				// CHECK: call void @_ZN4BaseC1EPv(ptr {{.}}, ptr {{.}} @c)
				// CHECK: %[[BASE_B:.]] = {{.}} getelementptr inbounds (i8, ptr @c, i64 4)
				// CHECK: call void @_ZN4BaseC1EPv(ptr {{.}}, ptr {{.}} %[[BASE_B]])
				// CHECK: call void @_ZN5FieldC1EPKcRi(ptr {{.}}, ptr {{.}} @[[F_STR]], ptr {{.*}} getelementptr inbounds (%struct.C, ptr @c, i32 0, i32 2))
				// CHECK: call void @_ZN5FieldC1EPKcRi(ptr {{.}}, ptr {{.}} @[[G_STR]], ptr {{.*}} getelementptr inbounds (%struct.C, ptr @c, i32 0, i32 3))
				// CHECK: %[[RET:.]] = call {{.}} i32 @_Z7consumei8Expected(i32 {{.}} 42, ptr noundef byval(%[[STRUCT_EXPECTED]]) {{.}} %[[VAL]])
				// CHECK: store i32 %[[RET]], ptr @k
				int k = __builtin_reflect_struct(&c, consume, 42);

clang/test/SemaCXX/builtin-reflect-struct.cpp

This file was added.

				// RUN: %clang_cc1 -std=c++20 -verify %s
				// expected-no-diagnostics

				namespace std {
				typedef decltype(sizeof(int)) size_t;

				template <class E> struct initializer_list {
				const E *data;
				size_t size;

				constexpr initializer_list(const E *data, size_t size)
				: data(data), size(size) {}
				constexpr initializer_list() : data(), size() {}

				constexpr const E *begin() const { return data; }
				constexpr const E *end() const { return data + size; }
				};
				}

				struct ConstexprString {
				constexpr ConstexprString(const char *p) : data(nullptr) {
				auto size = __builtin_strlen(p) + 1;
				data = new char[size];
				__builtin_memcpy(data, p, size);
				}
				constexpr explicit ConstexprString(const char p, const char q) : data(nullptr) {
				auto p_size = __builtin_strlen(p);
				auto q_size = __builtin_strlen(q);
				data = new char[p_size + q_size + 1];
				__builtin_memcpy(data, p, p_size);
				__builtin_memcpy(data + p_size, q, q_size + 1);
				}
				constexpr ConstexprString(ConstexprString &&o) : data(o.data) { o.data = nullptr; }
				constexpr ConstexprString &operator=(ConstexprString &&o) {
				delete[] data;
				data = o.data;
				o.data = nullptr;
				return *this;
				}
				constexpr ~ConstexprString() { delete[] data; }
				char *data;

				friend constexpr ConstexprString operator+(const ConstexprString &a, const ConstexprString &b) {
				return ConstexprString(a.data, b.data);
				}
				friend constexpr bool operator==(const ConstexprString &a, const ConstexprString &b) {
				return __builtin_strcmp(a.data, b.data) == 0;
				}
				};

				template<typename T> constexpr ConstexprString ToString(const T &t);

				struct FieldOrBase {
				erichkeaneUnsubmitted Not Done Reply Inline Actions Ah, this is perhaps the example I was looking for. erichkeane: Ah, this is perhaps the example I was looking for.
				template <typename Base>
				constexpr FieldOrBase(const Base base) : repr(ToString(base)) {}
				template <typename T>
				constexpr FieldOrBase(const char *name, const T &field)
				: repr(name + ConstexprString(" = ") + ToString(field)) {}
				template <typename T>
				constexpr FieldOrBase(const char *name, T field, int bitwidth)
				: repr(name + ConstexprString(" : ") + ToString(bitwidth) +
				ConstexprString(" = ") + ToString(field)) {}
				ConstexprString repr;
				};

				constexpr ConstexprString FieldsToString(std::initializer_list<FieldOrBase> elements) {
				ConstexprString result = "{";
				bool first = true;
				for (const auto &element : elements) {
				if (!first)
				result = result + ", ";
				result = result + element.repr;
				first = false;
				}
				return result + "}";
				}

				template<typename T> constexpr ConstexprString ToString(const T &t) {
				if constexpr (__is_class(T)) {
				return __builtin_reflect_struct(&t, FieldsToString);
				} else {
				// Assume it's an integer.
				T value = t;
				erichkeaneUnsubmitted Not Done Reply Inline Actions What would this look like if you needed to look into a union? Perhaps it would be handled similarly to "__is_class"? In both builtins, its a shame we don't have a way to properly handle the 'active member'. erichkeane: What would this look like if you needed to look into a union? Perhaps it would be handled…
				ConstexprString s = "";
				bool negative = false;
				if (value < 0) {
				value = -value;
				negative = true;
				}
				while (value > 0) {
				char str[2] = {char('0' + value % 10), '\0'};
				s = ConstexprString(str) + s;
				value /= 10;
				}
				if (negative)
				s = "-" + s;
				return s;
				}
				}

				struct A { int x, y, z : 3; int : 4; };
				struct B : A { int p, q; struct { int anon1, anon2; }; union { int anon3; }; };

				static_assert(ToString(B{1, 2, 3, 4, 5, 6, 7, 8}) == "{{x = 1, y = 2, z : 3 = 3}, p = 4, q = 5, anon1 = 6, anon2 = 7, anon3 = 8}");

This is an archive of the discontinued LLVM Phabricator instance.

Reimplement `__builtin_dump_struct` in Sema.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 424347

clang/docs/LanguageExtensions.rst

clang/docs/ReleaseNotes.rst

clang/include/clang/Basic/Builtins.def

clang/include/clang/Basic/DiagnosticSemaKinds.td

clang/lib/AST/Expr.cpp

clang/lib/Sema/SemaChecking.cpp

clang/lib/Sema/SemaStmt.cpp

clang/test/CodeGenCXX/builtin-reflect-struct.cpp

clang/test/SemaCXX/builtin-reflect-struct.cpp

Reimplement `__builtin_dump_struct` in Sema.
ClosedPublic