Page MenuHomePhabricator

[libc] Header generation scheme.

Authored by sivachandra on Oct 24 2019, 10:06 PM.



This patch has been prepared to showcase the header generation proposal posted on libc-dev.

It is incomplete and will not be submitted or sent for formal review.

Event Timeline

sivachandra created this revision.Oct 24 2019, 10:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 10:06 PM
MaskRay added inline comments.

Can you explain a bit more why we need a header generator?

If you want to make some declarations condition on feature macros, I think the usual approach is:

int foo(int);
int bar(int);

#if defined(_BSD_SOURCE) || defined(_XOPEN_SOURCE)
int qux(int);
sivachandra marked an inline comment as done.Oct 25 2019, 3:02 PM
sivachandra added inline comments.

Yes, that is how current implementations do it. While it is not necessarily bad, we want to avoid such patterns as much as possible so that we do not end up with the macro mess that current libcs ended up having. If you always generate, neither the sources, nor the generated headers have the macro mess. More importantly though, generating headers serves the use case of being able to selectively pick and choose what one wants to expose on their platform. Down stream configs become easier to manage.

abrachet marked an inline comment as done.Oct 25 2019, 3:23 PM
abrachet added a subscriber: abrachet.
abrachet added inline comments.

FWIW I don’t think conditionally exposing function definitions with the *_SOURCE macros is really a macro mess, it’s a pretty common practice among most (all?) libcs.

How far does this picking and choosing go? Would I be able to choose not to have the qux symbol in my at all? If so then it would be pretty bad to have it in the header at all even if it’s behind _GNU_SOURCE or similar I think.

sivachandra marked an inline comment as done.Oct 25 2019, 3:28 PM
sivachandra added inline comments.

Yes, that is point. It will neither be in the header nor in the library files like libc.a.

abrachet accepted this revision.Oct 25 2019, 4:30 PM

LGTM I see no problems from what is here. I'd also like to say that the emphasis on writing docs for all of this is something I'm really happy to see.
I look forward to seeing how the generator works. I am also cautiously optimistic about how well this mechanism will work to create more exotic functions and typedefs. signal(3) and jmp_buf will surely be interesting interesting :)


Nit: Might be worth it to add strcat and acosl because they are in

This revision is now accepted and ready to land.Oct 25 2019, 4:30 PM
theraven added inline comments.

I think that it will still be important to have the ability to subset dynamically based on macros. Consider a project that is 99% portable POSIX code but has a platform abstraction layer that uses per-platform extensions. When building this project on Linux, for example, I would like to be able to expose everything in the platform abstraction layer, but restrict every other compilation unit to only POSIX functions. This is a fairly common idiom and helps catch portability issues early. Requiring every project with this structure to add a tablegen pass to its build system to generate new libc headers (and handle all of the -isystem flags correctly so that they override the default ones) seems a lot to ask.

That said, I'm fairly confident that this is something that can be added later in the TableGen back end, if the sources contain the information about the set of standards that a function conforms to. It should be easy to generate a .h that exposes everything that will work on a given platform inside #ifdefs for the relevant standards. Generating this from TableGen is a lot less horrible than maintaining it manually.


A minor issue, but it would be nice if the argument types were TableGen types and not strings, so that we can avoid accidental header pollution. For example, there are a number of libc headers that define interfaces that use size_t, but that do not export size_t. To make these work, implementations typically define __size_t in an internal header and then typedef __size_t size_t (guarded against multiple definition) in the headers that are expected by the standard to export it.


math.h is expected to define float_t and double_t. The values of these (float and double, double and double, or something else) depends on the value of the FLT_EVAL_METHOD macro. Please can you at least sketch out how this would be implemented? I presume that we'd want to pick a single value for FLT_EVAL_METHOD (most likely 1) and support only that, but we'd still need to export the two types and the macro from this header.

theraven added inline comments.Oct 28 2019, 2:32 AM

Additionally, it would be nice to decouple attributes here. For example, restrict and _Nonnull can be specified inline (though the GCC equivalent can't), but things like the printf-like attributes are properties of the arguments that need to be specified as function attributes.

In an ideal world, we'd also have SAL annotations, but these are currently lacking a clang implementation.

sivachandra marked 7 inline comments as done.

Add more examples.

sivachandra added inline comments.Oct 29 2019, 3:41 PM

I agree that we will end up with a mix of things. Where exactly do we draw a line, I am not sure at this point; Learn as we go may be.

And yes, we can adjust the backend and add additional attributes to get the header generator to put in the excluding/including macros.


About "decouple attributes", Can you give some examples of what you are describing here? I probably understand vaguely, but I want to see an example to make sure.


[Side note: We have decided, on another patch, that items like size_t come from freestanding headers and so we will make use of them instead of making LLVM-libc re-implement/re-define them.]

I understand and agree that it would be good to have TableGen types for arg types But, how deep/far do we go? For example, should there be a table gen type size_t and another one for size_t *?

What exact problems do you foresee with using strings instead of TableGen types? One obvious problem is typos. Are there any other?


I have added something to show how items like float_t/double_t/size_t would be handled.

sivachandra marked an inline comment as done.

Use TableGen types instead of strings.


This is not to say we should not do TableGen types. I have now uploaded a new diff with only TableGen types.

theraven added inline comments.Oct 30 2019, 4:38 AM

As a concrete example, this is FreeBSD libc's declaration of snprintf:

int snprintf(char * __restrict, size_t, const char * __restrict, ...) __printflike(3, 4);

The third argument has two attributes:

  1. It is a restrict-qualified pointer, so the caller guarantees that it does not alias any other arguments.
  2. It is a printf-compatible format string, with the formatted values starting in argument 4.

The Windows version of snprintf has the following declaration with a much richer set of metadata:

_Success_(return >= 0)
    _Out_writes_opt_(_BufferCount) _Always_(_Post_z_) char*       const _Buffer,
    _In_                                              size_t      const _BufferCount,
    _In_z_ _Printf_format_string_                     char const* const _Format,

Beyond the C standard definitions, these indicate:

  1. That the return value does not need to be checked.
  2. That the calling convention is the one used for the standard library (not necessarily the default calling convention).
  3. That the first argument is either a null pointer or a pointer that is written through and must be (at least) as long as the _BufferCount bytes and, on output, will always be null terminated.
  4. That the second argument is an input parameter (redundant: it's a value type)
  5. That the third argument is an input parameter (not written through) and is a null-terminated string.
  6. That the third argument is a printf-compatible format string.

It would be nice if those were attributes could be expressed uniformly. Visual Studio, clang, and gcc all have subtly different ways of expressing some of these (as an aside: I'd love to see clang gain support for SAL annotations).


Using the definition of size_t from another header makes avoiding header pollution difficult. How do you expect to be able to implement headers that are expected to expose interfaces that use size_t but not provide size_t? Other libc implementations define something like __size_t for use in these headers. Clang's freestanding headers don't have to address this problem, but a full libc does.

I would expect size_t* to be something like Pointer<size_t>, which would be size_t* in the implementation but potentially __size_t* or size_t* in the header, depending on whether the header is supposed to export size_t. This is very hard to do with strings (it's possible, but it involves a load of parsing). Ideally, I'd like it to be something like Pointer<size_t>,In,NotNull.


I'm not sure how this expresses their conditional declaration or the relationship between them and the macro definition. Are these simply informal contracts that we write in comments, or do you imagine a scheme for expressing them directly in the TableGen?


What is the definition of NULL? This is one of the most horrible macros to define in a C standard library. Depending on the language dialect and supported extensions, it is either:

  • __null
  • nullptr
  • 0
  • (void*)0
sivachandra marked 4 inline comments as done.Oct 30 2019, 8:51 AM
sivachandra added inline comments.

GCC and Clang's free standing headers give us a way to expose the type size_t and the macro NULL without causing header pollution. See the file config/linux/ in this patch. It shows how size_t and NULL are "implemented" without causing header pollution.


Thanks for the detailed reply. This file is a list of items the standard says should be provided. Things like function attributes and argument annotations are implementation detail and should be captured in the platform config file like the file config/linux/ in this patch.

I will soon upload a diff with an example of how this can be done.


This file is merely a list of what the standard says should be provided. How it is implemented is an implementation detail captured in the platform config file. I have an example of the Linux config file in this patch at config/linux/ The relationship between float_t, double_t and the FLT_EVAL_METHOD macro are captured in that platform config file.


Again, see config/linux/ for the definition of NULL: It is not really defined, but just pulled out from the stddef.h freestanding header.

Add example of function attrbutes and argument annotations.

My latest diff showcases function attributes and also argument annotations.

About size_t like items which standard says that multiple headers files should provide: As theraven@ suggests, we will in general have an internal header file. For size_t itself, the freestanding headers provide us a convenient way to do it without causing header pollution.

sivachandra marked an inline comment as done.Nov 3 2019, 11:03 PM
sivachandra added inline comments.

I thought about this a little more and may be what you are saying is that, we should "invent" a way to specify argument annotations in a platform independent fashion and have the header generator generate the annotations suitable for the target platform. I agree that such a mechanism would be ideal and also be much superior to what I have proposed.

For the sake of progress now, I am OK to leave a place holder for argument annotations, and in a later round add annotations wherever suitable.

theraven added inline comments.Nov 4 2019, 1:16 AM

I am happy as long as we don't do anything that precludes adding this later. I think the current scheme is sufficient to be refactored into the right thing later.

Ideally, I would like to see a SAL implementation for clang, but that will require a nontrivial amount of work and shouldn't be a blocker for this.

sivachandra abandoned this revision.Nov 22 2019, 1:37 PM

Abandoning this change now as the patch implementing the ideas showcased here has landed.