This is an archive of the discontinued LLVM Phabricator instance.

[Headers][Modules] Make separate headers for the stdarg.h and stddef.h pieces so that they can be modularized
ClosedPublic

Authored by iana on Aug 24 2023, 12:00 AM.

Details

Summary

stdarg.h and stddef.h have to be textual headers in their upcoming modules to support their __needs_xxx macros. That means that they won't get precompiled into their modules' pcm, and instead their declarations will go into every other pcm that uses them. For now that's ok since the type merger can handle the declarations in these headers, but it's suboptimal at best. Make separate headers for all of the pieces so that they can be properly modularized.

Diff Detail

Event Timeline

iana created this revision.Aug 24 2023, 12:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 12:00 AM
iana requested review of this revision.Aug 24 2023, 12:00 AM
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptAug 24 2023, 12:00 AM
Herald added subscribers: cfe-commits, llvm-commits, Restricted Project. · View Herald Transcript
iana updated this revision to Diff 553242.Aug 24 2023, 1:26 PM

Fixed the failing test

I'm a bit concerned about the impact on (non-modules) build time performance. This splits stddef.h and stdarg.h into needing to open 14 different files instead of 2. Hopefully that's not a lot of overhead, but given that stddef.h is included in approximately every TU and that sometimes external forces cause files to be slow to open (network drives, AV software, etc), I think we should be cautious and measure the impact. I also really do not like how poorly this scales -- the fact that this is only needed for these two files helps a bit, but this is a lot of C++-specific hoops to jump through for C standard library headers.

iana added a comment.EditedAug 25 2023, 11:11 AM

I'm a bit concerned about the impact on (non-modules) build time performance. This splits stddef.h and stdarg.h into needing to open 14 different files instead of 2. Hopefully that's not a lot of overhead, but given that stddef.h is included in approximately every TU and that sometimes external forces cause files to be slow to open (network drives, AV software, etc), I think we should be cautious and measure the impact. I also really do not like how poorly this scales -- the fact that this is only needed for these two files helps a bit, but this is a lot of C++-specific hoops to jump through for C standard library headers.

This is in support of clang modules, not C++ modules. The plan is to make modules like this.

module _Builtin_stddef [system] {
  textual header "stddef.h"

  explicit module max_align_t {
    header "__stddef_max_align_t.h"
    export *
  }

  explicit module null {
    header "__stddef_null.h"
    export *
  }
  ...
}

I do agree it's a bit of a file explosion, but I can't really think of any better alternatives. It's tough to measure build time performance impact. If you have a slow enough file system (and the fs on our builders at Apple is quite slow), and you aren't using modules or pch's, and you have quite a lot of source files, then possibly this would be noticeable? I still don't know what we could really do about it besides have #if !__has_feature(modules) and inline the contents in that case. That seems like a maintenance headache and a half though. I would hope that most sizable projects either use clang modules or pch's?

iana updated this revision to Diff 553677.Aug 25 2023, 5:37 PM

Modules don't support the #include #undef #include pattern to redefine macros, the second include has no effect. Move the #undef of NULL into __stddef_null.h so that the textual behavior is preserved, but the module behavior doesn't leave NULL undefined.

I'm a bit concerned about the impact on (non-modules) build time performance. This splits stddef.h and stdarg.h into needing to open 14 different files instead of 2. Hopefully that's not a lot of overhead, but given that stddef.h is included in approximately every TU and that sometimes external forces cause files to be slow to open (network drives, AV software, etc), I think we should be cautious and measure the impact. I also really do not like how poorly this scales -- the fact that this is only needed for these two files helps a bit, but this is a lot of C++-specific hoops to jump through for C standard library headers.

This is in support of clang modules, not C++ modules.

Ah, good to know, thank you!

The plan is to make modules like this.

module _Builtin_stddef [system] {
  textual header "stddef.h"

  explicit module max_align_t {
    header "__stddef_max_align_t.h"
    export *
  }

  explicit module null {
    header "__stddef_null.h"
    export *
  }
  ...
}

I do agree it's a bit of a file explosion, but I can't really think of any better alternatives. It's tough to measure build time performance impact. If you have a slow enough file system (and the fs on our builders at Apple is quite slow), and you aren't using modules or pch's, and you have quite a lot of source files, then possibly this would be noticeable? I still don't know what we could really do about it besides have #if !__has_feature(modules) and inline the contents in that case. That seems like a maintenance headache and a half though. I would hope that most sizable projects either use clang modules or pch's?

Yeah, measuring file system impacts is always tricky. :-( The reason I'm concerned is actually because a fair number of projects explicitly don't use PCH or Clang modules (or C++ modules), especially in C. Because of how fast compilations go in C, it's possible that the extra filesystem overhead really will be noticeable by users. However, I also can't think of a way around this in code...

At a high-level (not just for this patch, but for the entire series): is this need specific to your downstream or is this a more general need? e.g., should this complexity be kept downstream until we've got a second platform needing it?

iana added a comment.Aug 28 2023, 10:52 AM

At a high-level (not just for this patch, but for the entire series): is this need specific to your downstream or is this a more general need? e.g., should this complexity be kept downstream until we've got a second platform needing it?

I think it's a general need to properly modularize the clang headers, I don't think it's really Apple specific.

aaron.ballman accepted this revision.Aug 28 2023, 11:11 AM

At a high-level (not just for this patch, but for the entire series): is this need specific to your downstream or is this a more general need? e.g., should this complexity be kept downstream until we've got a second platform needing it?

I think it's a general need to properly modularize the clang headers, I don't think it's really Apple specific.

Good to know (that was my understanding as well). I'd say let's move forward with this approach given there's not a clearly better way to go about this. LGTM!

This revision is now accepted and ready to land.Aug 28 2023, 11:11 AM
ChuanqiXu accepted this revision.Aug 28 2023, 8:26 PM

LGTM too.

aheejin added a subscriber: aheejin.EditedAug 31 2023, 2:51 PM

The added space in ((void *)0) in __stddef_null.h broke Emscripten CI, which uses musl (https://www.musl-libc.org/). I added the space back in D159312.