This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Begin implementing Plan 9 C extensions
Needs ReviewPublic

Authored by ksaunders on Jun 9 2022, 5:55 PM.

Details

Summary

This patch enables the addition of extensions supported by the Plan 9 C compilers by adding the -fplan9-extensions flag.

This flag currently enables 1 extension: allowing typedefs to be declared multiple times like in C11. Once merged, I have plans to implement the following Plan 9 C compiler behavior, as covered below.

Plan 9 C compilers can be summarized as a C89 compiler with the following non-standard extensions enabled:

  • Embedded structures, like Microsoft C (already implemented in Clang with -fms-extensions)
  • Automatic embedded structure type decay
  • Embedded structure type name accesses

As well as the following standardized C extensions enabled:

  • C99 compound literals
  • C99 designated initializers for arrays and structures
  • C11 redeclaration of typedefs
  • C11 anonymous structures and unions
  • C2x omitting the parameter name in a function definition

A description of these extensions can be found in the How to Use the Plan 9 C Compiler paper by Rob Pike: https://9p.io/sys/doc/comp.html. However, there are no plans to implement the "extern register" feature, which is used in the kernel.

The motivation for this patch, and the patches that follow, are to enable the compilation of the Plan 9 kernel C source and Plan 9 userspace applications with Clang for the purpose of increased code optimizations and sanitizer instrumentation. In this respect, GCC support is inadequate, as it does not support the member resolution algorithm used in the Plan 9 C compilers to reconcile overlapping declaration names in a record.

This patch is largely based off of the following obsolete patch by @pcc: https://reviews.llvm.org/D3853.

Diff Detail

Event Timeline

ksaunders created this revision.Jun 9 2022, 5:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2022, 5:55 PM
ksaunders requested review of this revision.Jun 9 2022, 5:55 PM
ksaunders retitled this revision from Begin implementing Plan 9 C extensions [V2] to [Clang] Begin implementing Plan 9 C extensions.Jun 9 2022, 5:57 PM
ychen added a subscriber: ychen.Jun 9 2022, 6:17 PM

Two questions for Clang developers as I work on my next patches:

  1. What is the Clang policy for warnings on extension usages? For example, this diff permits redeclaration of typedefs, which is a Plan 9 and Microsoft C extension. Earlier in the file this extension is enabled for Microsoft C as well, however no warning is emitted. In contrast, in Microsoft C if you use an anonymous embedded record in Clang a warning is emitted for extension usage. So it's unclear what should be a warning and what shouldn't be.
  2. MSVC Compatibility has its own dedicated page on the Clang documentation site. Should I add a page for Plan 9 Compatibility which details the status of each feature now (and can be updated in subsequent diffs)?

Gentle ping. I have 2 other patches ready, and I'd like to get the discussion started on this if possible.

aaron.ballman added a subscriber: aaron.ballman.

Ping.

Sorry for not noticing this patch was languishing! As far as the patch itself goes, I don't see any major concerns.

Because this is the first of quite a few patches adding various extensions to Clang, I think the idea as a whole should get an RFC on Discourse (https://discourse.llvm.org/c/clang/6) to ensure that the community agrees we should adopt and maintain Plan 9 compatibility. It'd be helpful to focus on the criteria we have for adding extensions to the project (https://clang.llvm.org/get_involved.html#criteria) to the extent you can, but having details like you've got in the summary is also very helpful. That said, I have some questions:

I'm wondering if you could go into a bit more detail about what Automatic embedded structure type decay and Embedded structure type name accesses mean in practice (some code examples would be helpful).

Are you planning to add a new driver for Clang to emulate the Plan 9 compiler driver (similar to how we have a driver for MSVC compatibility)?
Are there other extensions planned, or is the list in the summary pretty complete?
Do I understand correctly that plan 9 compatibility mode forces C89 as the language standard mode, or is it expected that users can do things like -std=c2x -fplan9-extensions?

You can feel free to answer my questions in the Discourse RFC if you don't want to fork the discussion.

Thanks for your response. I am working on an RFC now which addresses your feedback and questions.

Hi Aaron. Unfortunately, I don't feel I can make a great case for why these extensions should be in Clang. Although there are users of Plan 9 C extensions, I don't see these features being adopted more generally enough to warrant its inclusion in Clang which violates the inclusion policy.

To this effect, I tried using libTooling to rewrite Plan 9 C to standard C that can be correctly compiled with Clang, but because the AST creation requires semantic analysis to run it leaves the AST in a state of disrepair (it can parse Plan 9 C, but the analyzer gets confused with duplicate fields and so on).

I'll have to decide if I am going to keep these changes in a Clang fork or modify another C compiler for LLVM. Regardless, I believe my diffs for adding the Plan 9 calling convention to LLVM still apply (they are simple), so I will send them upstream when I feel they are ready.


I think it also makes sense to address your questions here for the sake of completeness.

I'm wondering if you could go into a bit more detail about what Automatic embedded structure type decay and Embedded structure type name accesses mean in practice (some code examples would be helpful).

Absolutely. "Automatic embedded structure type decay" and "Embedded structure type name accesses" are features best described by example:

typedef struct Lock Lock;
typedef struct Rc Rc;
typedef struct Resource Resource;

struct Lock
{
  int hold;
};

struct Rc
{
  int references;
}:

struct Resource
{
  Rc;
  Lock;
  void *buffer;
  size_t size;
};

Now with "Embedded structure type name accesses" enabled, if we have a value like Resource *r, we can do r->Lock. This simply returns the field as if Lock; was declared as Lock Lock;, but this special declaration also brings all names into scope (like an anonymous struct) so we can do r->hold. This also does NOT work if you declare the field as struct Lock;, it must be a typedef name.

Further, with "Automatic embedded structure type decay" structure pointers are automatically converted into an access of an embedded compatible structure. So we have a function like: void lock(Lock *); we can call it with lock(r); and the compiler will automatically search all unnamed structures in Resource recursively until it finds a matching type. Note that Lock; is declared after Rc;, this is intentional. In standard C it is possible to have a pointer to a struct declay to a pointer to the first field in the struct. That is completely separate from this extension.

If that was unclear, GCC also supports this functionality and it is documented here for a different explanation: https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Are you planning to add a new driver for Clang to emulate the Plan 9 compiler driver (similar to how we have a driver for MSVC compatibility)?

For now, no.

Adding the Plan 9 object format to LLD is out-of-scope for this project (this was discussed previously) so I don't think it's necessary to add a new driver, we can just use the generic ELF driver.

Similarly, adding the Plan 9 assembler syntax is not necessary either as most programs are C so the assembler can be trivially converted as the idea is that programs will be compiled with the Plan 9 calling convention and C ABI.

Are there other extensions planned, or is the list in the summary pretty complete?

No, the listing above is complete.

Do I understand correctly that plan 9 compatibility mode forces C89 as the language standard mode, or is it expected that users can do things like -std=c2x -fplan9-extensions?

Plan 9 C extensions are not mutually exclusive with C2x so I think that you should be allowed to write C2x Plan 9 C. If we did have a Plan 9 driver though, it would set -fplan9-extensions -std=c89 to be as close as possible to the Plan 9 compilers functionality.

Cheers

Hi Aaron. Unfortunately, I don't feel I can make a great case for why these extensions should be in Clang. Although there are users of Plan 9 C extensions, I don't see these features being adopted more generally enough to warrant its inclusion in Clang which violates the inclusion policy.

Just to check -- do you think (some of) these features are something you wish to propose to WG14 for adoption into C? e.g., are you aiming to get multiple compilers to implement Plan 9 extensions to demonstrate to WG14 that this is existing practice in C compilers?

To this effect, I tried using libTooling to rewrite Plan 9 C to standard C that can be correctly compiled with Clang, but because the AST creation requires semantic analysis to run it leaves the AST in a state of disrepair (it can parse Plan 9 C, but the analyzer gets confused with duplicate fields and so on).

I'll have to decide if I am going to keep these changes in a Clang fork or modify another C compiler for LLVM. Regardless, I believe my diffs for adding the Plan 9 calling convention to LLVM still apply (they are simple), so I will send them upstream when I feel they are ready.

SGTM


I think it also makes sense to address your questions here for the sake of completeness.

Thank you, I appreciate the education. :-)

I'm wondering if you could go into a bit more detail about what Automatic embedded structure type decay and Embedded structure type name accesses mean in practice (some code examples would be helpful).

Absolutely. "Automatic embedded structure type decay" and "Embedded structure type name accesses" are features best described by example:

typedef struct Lock Lock;
typedef struct Rc Rc;
typedef struct Resource Resource;

struct Lock
{
  int hold;
};

struct Rc
{
  int references;
}:

struct Resource
{
  Rc;
  Lock;
  void *buffer;
  size_t size;
};

Now with "Embedded structure type name accesses" enabled, if we have a value like Resource *r, we can do r->Lock. This simply returns the field as if Lock; was declared as Lock Lock;, but this special declaration also brings all names into scope (like an anonymous struct) so we can do r->hold. This also does NOT work if you declare the field as struct Lock;, it must be a typedef name.

What an interesting extension! What happens with something like this?

typedef struct Lock FirstLock;
typedef struct Lock SecondLock;
typedef struct Rc Rc;
typedef struct Resource Resource;

struct Lock
{
  int hold;
};
 
struct Rc
{
  int references;
};

struct Resource
{
  Rc;
  FirstLock;
  SecondLock;
  void *buffer;
  size_t size;
};

Does this work for accessing r->FirstLock but give an ambiguous lookup for r->hold? Or do you only allow one member of the underlying canonical type?

Also, why does it require a typedef name?

Further, with "Automatic embedded structure type decay" structure pointers are automatically converted into an access of an embedded compatible structure. So we have a function like: void lock(Lock *); we can call it with lock(r); and the compiler will automatically search all unnamed structures in Resource recursively until it finds a matching type. Note that Lock; is declared after Rc;, this is intentional. In standard C it is possible to have a pointer to a struct declay to a pointer to the first field in the struct. That is completely separate from this extension.

Ah, interesting. So this is another case where multiple members of the same type would be a problem. Does this only find structure/union members, or does this also work for other members? e.g. void size(size_t *) being called with lock(r)? And if it works for other members... what does it do for bit-field members which share an allocation unit?

If that was unclear, GCC also supports this functionality and it is documented here for a different explanation: https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Are you planning to add a new driver for Clang to emulate the Plan 9 compiler driver (similar to how we have a driver for MSVC compatibility)?

For now, no.

Adding the Plan 9 object format to LLD is out-of-scope for this project (this was discussed previously) so I don't think it's necessary to add a new driver, we can just use the generic ELF driver.

Similarly, adding the Plan 9 assembler syntax is not necessary either as most programs are C so the assembler can be trivially converted as the idea is that programs will be compiled with the Plan 9 calling convention and C ABI.

Are there other extensions planned, or is the list in the summary pretty complete?

No, the listing above is complete.

Do I understand correctly that plan 9 compatibility mode forces C89 as the language standard mode, or is it expected that users can do things like -std=c2x -fplan9-extensions?

Plan 9 C extensions are not mutually exclusive with C2x so I think that you should be allowed to write C2x Plan 9 C. If we did have a Plan 9 driver though, it would set -fplan9-extensions -std=c89 to be as close as possible to the Plan 9 compilers functionality.

Cheers

Thanks for the extra details!

Just to check -- do you think (some of) these features are something you wish to propose to WG14 for adoption into C? e.g., are you aiming to get multiple compilers to implement Plan 9 extensions to demonstrate to WG14 that this is existing practice in C compilers?

A lot of the Plan 9 extensions were actually adopted by C99 like compound literals and anonymous structures. Although I find these additional extensions interesting and useful, I don't think that they belong in C and they should remain as non-standard extensions. My interests lie in compiling existing code with Clang which utilizes these extensions, rather than encouraging new code to utilize them.

There was actually a proposal to add Plan 9 extensions into the Linux kernel, but Linus rejected it. I personally share his opinion that the silent type conversion that the Plan 9 compilers introduce can be problematic. But on the other hand, they are also very powerful when used judiciously. It's on the LKML here if you're interested: https://lkml.org/lkml/2019/1/9/1127

Does this work for accessing r->FirstLock but give an ambiguous lookup for r->hold? Or do you only allow one member of the underlying canonical type?

Good question. The compilers only allow one member of the underlying type. So you'll get an error saying you've declared Lock twice.

Also, why does it require a typedef name?

I am not sure, but I presume it is because of cases like this:

struct A
{
    int a;
};

struct B
{
    int b;
};

typedef struct B A;

struct Example
{
    struct A;
    A;
};

So it's not clear when you do example->A which member you are referring to. If you restrict it to typedef names then you have no ambiguity of this kind.

Ah, interesting. So this is another case where multiple members of the same type would be a problem. Does this only find structure/union members, or does this also work for other members? e.g. void size(size_t *) being called with lock(r)? And if it works for other members... what does it do for bit-field members which share an allocation unit?

It only searches for unnamed structure and union members, so non-record members like bit-fields are not used for resolution. That's a good test case I should add as well, yeah.

Thanks for the extra details!

Thank you for your interest :)

Just to check -- do you think (some of) these features are something you wish to propose to WG14 for adoption into C? e.g., are you aiming to get multiple compilers to implement Plan 9 extensions to demonstrate to WG14 that this is existing practice in C compilers?

A lot of the Plan 9 extensions were actually adopted by C99 like compound literals and anonymous structures. Although I find these additional extensions interesting and useful, I don't think that they belong in C and they should remain as non-standard extensions. My interests lie in compiling existing code with Clang which utilizes these extensions, rather than encouraging new code to utilize them.

There was actually a proposal to add Plan 9 extensions into the Linux kernel, but Linus rejected it. I personally share his opinion that the silent type conversion that the Plan 9 compilers introduce can be problematic. But on the other hand, they are also very powerful when used judiciously. It's on the LKML here if you're interested: https://lkml.org/lkml/2019/1/9/1127

Thank you for the details (and the later technical explanations as well)! Then yes, I'm in agreement that we shouldn't add these extensions at this time. We can revisit should anything change in the future.