This is an archive of the discontinued LLVM Phabricator instance.

[libc++][Modules] Recreate the top level `std` clang module
ClosedPublic

Authored by iana on Jul 24 2023, 3:36 PM.

Details

Summary

lldb needs the std clang module to make all of libc++ available in the debugger. Make a new header to include the rest of the public headers and use to build a std module that just re-exports the rest of libc++.

Diff Detail

Event Timeline

iana created this revision.Jul 24 2023, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 3:36 PM
Herald added a subscriber: ributzka. · View Herald Transcript
iana requested review of this revision.Jul 24 2023, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 3:36 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
iana updated this revision to Diff 543724.Jul 24 2023, 3:38 PM

Update the contribution notes

iana updated this revision to Diff 543726.Jul 24 2023, 3:40 PM

And add __std to CMakeLists

Would it be possible to have CMake generate the contents of the __std file?

iana added a comment.EditedJul 24 2023, 4:11 PM

Would it be possible to have CMake generate the contents of the __std file?

We've preferred manual setups in the past, although if people change their minds and go with D155141 instead, then maybe I could generate __std in that one.

JDevlieghere accepted this revision.Jul 24 2023, 4:15 PM

Would it be possible to have CMake generate the contents of the __std file?

We've preferred manual setups in the past, although if people change their minds and go with D155141 instead, then maybe I could generate __std in that one.

Gotcha. In that case this LGTM for LLDB.

I'm not entirely fond of this approach. So I first like to get a bit more information how this is uses to see whether there are alternative solutions.

@JDevlieghere can you give a bit more information how LLDB uses this module?

libcxx/include/CMakeLists.txt
656

Is there a reason we need to ship this header to all users of libc++?

libcxx/include/__std
10 ↗(On Diff #543726)

Where does this __building_module come from? I'm not happy with this line since will confuse people. C++23 adds the named module std. This has nothing to do with that module.

10 ↗(On Diff #543726)

I think this file name can benefit from something longer. This include is only needed in a very special case.

80 ↗(On Diff #543726)

When somebody implements the header text_encoding will there be a CI tests that fails?

We have the checklist, which you updated, but in general we like to have a red CI when somebody forgets it.

I'm not entirely fond of this approach. So I first like to get a bit more information how this is uses to see whether there are alternative solutions.

@JDevlieghere can you give a bit more information how LLDB uses this module?

LLDB imports the std module to make STL template declarations available in the expression evaluator, which enables it call methods on STL containers that weren't compiled into the debugged program. See this talk from the LLVM dev meeting 2019 for more details:

https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech13
https://youtu.be/vuNZLlHhy0k

! In D156177#4532335, @Mordante wrote:
I'm not entirely fond of this approach.

Would you mind clarifying what your concerns are?

! In D156177#4532335, @Mordante wrote:
I'm not entirely fond of this approach.

Would you mind clarifying what your concerns are?

My main concern is shipping a header to all our users that includes everything without me understanding why. Especially since that was not needed before the changes to modules. Hence my question why this is needed by LLDB.

Smaller concerns, which are fixable

  • Something else we need to maintain. We don't often add new top-level headers and I don't see tooling to make sure we don't forget to add it.
  • Using a module named std which is a C++23 feature
iana added a comment.EditedJul 25 2023, 10:28 AM

! In D156177#4532335, @Mordante wrote:
I'm not entirely fond of this approach.

Would you mind clarifying what your concerns are?

My main concern is shipping a header to all our users that includes everything without me understanding why. Especially since that was not needed before the changes to modules. Hence my question why this is needed by LLDB.

Smaller concerns, which are fixable

  • Something else we need to maintain. We don't often add new top-level headers and I don't see tooling to make sure we don't forget to add it.
  • Using a module named std which is a C++23 feature

The goal is to have a single clang module that lldb can import to get all of libc++. And maybe for anyone else that's currently importing the shipping std clang module, although they're probably better off importing the individual modules they use like std_algorithm. clang modules require header files, they can't do anything without them. So for the std clang module to provide everything from libc++, it needs a header that includes everything from libc++.

We're shipping a new header that includes everything, but if anyone includes it they'll get an error. (That's what the __building_module check does - it makes the header error for everything _except_ doing import std.)

clang modules are getting to be kind of a pain to support. I'm in favor of scripting that pain away, but @ldionne felt that I was just exchanging manual header/module map editing pain for debugging Python pain.

The module name std is what the current (llvm 16) clang module name is for libc++. So that part isn't a change, it's just restoring the behavior from a few weeks ago.

libcxx/include/__std
10 ↗(On Diff #543726)

__building_module is a clang builtin. Happy to call the header anything, was just guessing on what the convention might be. Do you like __std_clang_module? Should it have .h? (I don't know why std_mbstate_t.h has .h and locale doesn't.)

80 ↗(On Diff #543726)

No, no test right now. If we're going to write a test to check what the contents of this header should be, why not just make that code write the header too?

! In D156177#4532335, @Mordante wrote:
I'm not entirely fond of this approach.

Would you mind clarifying what your concerns are?

My main concern is shipping a header to all our users that includes everything without me understanding why. Especially since that was not needed before the changes to modules. Hence my question why this is needed by LLDB.

Smaller concerns, which are fixable

  • Something else we need to maintain. We don't often add new top-level headers and I don't see tooling to make sure we don't forget to add it.
  • Using a module named std which is a C++23 feature

The goal is to have a single clang module that lldb can import to get all of libc++. And maybe for anyone else that's currently importing the shipping std clang module, although they're probably better off importing the individual modules they use like std_algorithm. clang modules require header files, they can't do anything without them. So for the std clang module to provide everything from libc++, it needs a header that includes everything from libc++.

Does this mean the current module changes may be a breaking change for clang module users?
I know this is used at Google.

How did this work before your change? IIRC we didn't have a header to include at that time.
I really would like to understand why we need a header now, when it could be done without a header before.

We're shipping a new header that includes everything, but if anyone includes it they'll get an error. (That's what the __building_module check does - it makes the header error for everything _except_ doing import std.)

Thanks for the info.

clang modules are getting to be kind of a pain to support.

+1

I'm in favor of scripting that pain away, but @ldionne felt that I was just exchanging manual header/module map editing pain for debugging Python pain.

Unlike D155141 a script to generate this header should be trivial. The file libcxx/utils/libcxx/test/header_information.py already has the logic to get all public_headers in libc++. This script is used for our testing so this script will get updated when the rules for our public headers change. I feel here scripting will make things less error prone. We already have a CMake step to build generated files. So I think this would be a natural fit to add there.

The module name std is what the current (llvm 16) clang module name is for libc++. So that part isn't a change, it's just restoring the behavior from a few weeks ago.

I see, then we can't change that.

libcxx/include/__std
10 ↗(On Diff #543726)

I think that name would help to avoid the confusion. Adding a short comment to this header would help too.
Typically we only use .h for C headers. There might be some inconsistencies.

80 ↗(On Diff #543726)

I'm in favor of a script, I've added more details below.

iana added a comment.Jul 25 2023, 12:22 PM

! In D156177#4532335, @Mordante wrote:
I'm not entirely fond of this approach.

Would you mind clarifying what your concerns are?

My main concern is shipping a header to all our users that includes everything without me understanding why. Especially since that was not needed before the changes to modules. Hence my question why this is needed by LLDB.

Smaller concerns, which are fixable

  • Something else we need to maintain. We don't often add new top-level headers and I don't see tooling to make sure we don't forget to add it.
  • Using a module named std which is a C++23 feature

The goal is to have a single clang module that lldb can import to get all of libc++. And maybe for anyone else that's currently importing the shipping std clang module, although they're probably better off importing the individual modules they use like std_algorithm. clang modules require header files, they can't do anything without them. So for the std clang module to provide everything from libc++, it needs a header that includes everything from libc++.

Does this mean the current module changes may be a breaking change for clang module users?
I know this is used at Google.

Only if they import std by name. If they just build with modules then no*

How did this work before your change? IIRC we didn't have a header to include at that time.
I really would like to understand why we need a header now, when it could be done without a header before.

Before the change, every libc++ header was in the std mega-module as submodules.

We're shipping a new header that includes everything, but if anyone includes it they'll get an error. (That's what the __building_module check does - it makes the header error for everything _except_ doing import std.)

Thanks for the info.

clang modules are getting to be kind of a pain to support.

+1

I'm in favor of scripting that pain away, but @ldionne felt that I was just exchanging manual header/module map editing pain for debugging Python pain.

Unlike D155141 a script to generate this header should be trivial. The file libcxx/utils/libcxx/test/header_information.py already has the logic to get all public_headers in libc++. This script is used for our testing so this script will get updated when the rules for our public headers change. I feel here scripting will make things less error prone. We already have a CMake step to build generated files. So I think this would be a natural fit to add there.

The module name std is what the current (llvm 16) clang module name is for libc++. So that part isn't a change, it's just restoring the behavior from a few weeks ago.

I see, then we can't change that.

iana updated this revision to Diff 544089.Jul 25 2023, 1:48 PM

Fix formatting and update generate_iwyu_mapping.py

JDevlieghere added a comment.EditedJul 25 2023, 1:54 PM

Given that the LLDB bots have been broken for 5 days now [1][2], could we land this to unblock the bots and deal with the remaining feedback (if any) post commit?

[1] https://green.lab.llvm.org/green/view/LLDB/job/as-lldb-cmake/
[2] https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/

iana updated this revision to Diff 544137.Jul 25 2023, 3:58 PM

forgot to add system_header

Changing the name of a module (clang or C++20) is a breaking change, we need to keep importing std by name working with Clang modules.

The Clang module std and the C++20 module std are different things in that the Clang module includes macros and the full content reachable from the top level headers, while the C++ 20 module does not include macros, and only guarantees a set list of names are visible.

Changing the name of a module (clang or C++20) is a breaking change, we need to keep importing std by name working with Clang modules.

That makes sense and sounds like you support landing this change?

The Clang module std and the C++20 module std are different things in that the Clang module includes macros and the full content reachable from the top level headers, while the C++ 20 module does not include macros, and only guarantees a set list of names are visible.

That sounds like a separate issue that could be resolved outside of this patch?

In any case, it would be awesome if we could get the bots running again soon...!

Changing the name of a module (clang or C++20) is a breaking change, we need to keep importing std by name working with Clang modules.

The Clang module std and the C++20 module std are different things in that the Clang module includes macros and the full content reachable from the top level headers, while the C++ 20 module does not include macros, and only guarantees a set list of names are visible.

I'm aware they are different. It's just not great to have two different things called the std module. But well that ship has sailed for a while. I was just not aware we use that name in the modular build. I've used the modular build in the past, but only imported single headers and not std.

In any case, it would be awesome if we could get the bots running again soon...!

Agreed. But since we need to backport this patch to LLVM 17 I want to make sure we have a proper patch.

I understand it's not great to have the LLDB build broken. It would be really great if the LLDB team could help libc++ to get LLDB working in the libc++ pre-commit CI to prevent breakage. Both @ldionne and I have reached out in the past to get assistance, but that did not result in a working LLDB in the CI. I don't recall the exact issues I ran into last time I tried, which I think is over a year ago.
We already have a job that bootstraps Clang and uses that Clang to test libc++ patches. It would be great to add LLDB to that job.

libcxx/include/__std
159 ↗(On Diff #544137)

Please move this on top.

I understand it's not great to have the LLDB build broken.

Not only is it not great, it's not in accordance with LLVM's developer policy: https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy. I could have reverted the original patch. Instead, we're trying work together here towards a constructive solution that works for both of our projects.

It would be really great if the LLDB team could help libc++ to get LLDB working in the libc++ pre-commit CI to prevent breakage.

We did, and D133621 was the result of that. I don't know why it was never merged?

Both @ldionne and I have reached out in the past to get assistance, but that did not result in a working LLDB in the CI. I don't recall the exact issues I ran into last time I tried, which I think is over a year ago.

Who did you reach out to? This is the first time I hear about that.

I understand it's not great to have the LLDB build broken.

Not only is it not great, it's not in accordance with LLVM's developer policy: https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy. I could have reverted the original patch. Instead, we're trying work together here towards a constructive solution that works for both of our projects.

I'm aware of the policy. However I feel somewhat pressured to approve patch that will not work for some libc++ users. Using a modular build with locales disabled will not work. So landing the patch in its current state can break other users. I've spoken with @iana and we will postpone the generation part, that can be done in a separate patch. That makes the finishing this patch a lot easier. (Since we don't add headers in the release this will not accidentally give issues.)

It would be really great if the LLDB team could help libc++ to get LLDB working in the libc++ pre-commit CI to prevent breakage.

We did, and D133621 was the result of that. I don't know why it was never merged?

Good question. @ldionne is out of the office, I'll ask him when he's back.

Both @ldionne and I have reached out in the past to get assistance, but that did not result in a working LLDB in the CI. I don't recall the exact issues I ran into last time I tried, which I think is over a year ago.

Who did you reach out to? This is the first time I hear about that.

I don't remember it has been over a year. I tried to find the patch in Phabricator but I can't find it.

We spend quite a bit of effort in libc++ to prevent breakage by running an extensive pre-commit CI (±60 jobs). So it's a bit frustrating that we haven't been able to get LLDB working and again discover breakage after a patch has landed. If we need assistance to get LLDB running in the CI can we reach out to you?

libcxx/docs/Contributing.rst
50 ↗(On Diff #544137)

Since we're going to add a script, so this can be removed.

libcxx/include/__std
49 ↗(On Diff #544137)

Please check which headers can be included in the libc++ build. Including this header will fail when locales are disabled.
libcxx/utils/libcxx/test/header_information.py has this information.

I'm aware of the policy. However I feel somewhat pressured to approve patch that will not work for some libc++ users.

Yeah, that's totally fair. I personally don't like rushing this either and I don't really have a preference between reverting (which @iana and @ldionne argued against in the original patch) and landing this. I just want our bots to generate signal again after what's now 6 days of being broken.

We spend quite a bit of effort in libc++ to prevent breakage by running an extensive pre-commit CI (±60 jobs). So it's a bit frustrating that we haven't been able to get LLDB working and again discover breakage after a patch has landed. If we need assistance to get LLDB running in the CI can we reach out to you?

Absolutely. Getting pre-commit testing will be a big win for both of our projects.

I'm aware of the policy. However I feel somewhat pressured to approve patch that will not work for some libc++ users.

Yeah, that's totally fair. I personally don't like rushing this either and I don't really have a preference between reverting (which @iana and @ldionne argued against in the original patch) and landing this. I just want our bots to generate signal again after what's now 6 days of being broken.

I understand, and the timing with the release obviously isn't great. I hope we can land a patch tomorrow.

We spend quite a bit of effort in libc++ to prevent breakage by running an extensive pre-commit CI (±60 jobs). So it's a bit frustrating that we haven't been able to get LLDB working and again discover breakage after a patch has landed. If we need assistance to get LLDB running in the CI can we reach out to you?

Absolutely. Getting pre-commit testing will be a big win for both of our projects.

Thanks a lot!

iana updated this revision to Diff 544528.Jul 26 2023, 2:51 PM
iana marked 9 inline comments as done.

Address review comments, fix __std_clang_module to handle all of the public header requirements.

libcxx/docs/Contributing.rst
50 ↗(On Diff #544137)

Discussed in Discord, I'll make a script in a separate review later today, I'll remove this comment then.

libcxx/include/CMakeLists.txt
656

We have to have a header to build the module, so unfortunately it is necessary. However, the __building_module should give an error to anyone who tries to #include it.

libcxx/include/__std
49 ↗(On Diff #544137)

Oooh yes, excellent point, sorry. I'd better make a test that @imports this in ObjC++ to make sure it works with all the parts disabled (even after the generated version)

10 ↗(On Diff #543726)

Sounds good, I'll use __std_clang_module and put a comment in it.

iana updated this revision to Diff 544597.Jul 26 2023, 9:50 PM

Fix formatting in __std_clang_module

iana updated this revision to Diff 544612.Jul 26 2023, 10:49 PM

C++11 is !_LIBCPP_CXX03_LANG && _LIBCPP_STD_VER <14, not _LIBCPP_STD_VER=11

iana updated this revision to Diff 544618.Jul 26 2023, 11:28 PM

The new unit test fails in C++26 and I have no idea why. XFAIL it for now.

iana updated this revision to Diff 544627.Jul 27 2023, 12:09 AM

The new unit test works in C++26 in generic-modules but fails in generic-cxx26. Change it to UNSUPPORTED.

iana updated this revision to Diff 544776.Jul 27 2023, 7:48 AM

Try making modules_include require a modules build.

iana updated this revision to Diff 544833.Jul 27 2023, 10:04 AM

Try moving the REQUIRES

Thanks for working on this! In general LGTM!
The 2 comments in libcxx/include/__std_clang_module can be done in the generator patch and are informative.
The 2 other comments should be addressed before landing this patch.

After you land this patch can you create a backport request and ping me on Discord. Then I can approve the request and hopefully get this in LLVM-17 before RC-1.

@aprantl @JDevlieghere thanks for your patience!

libcxx/docs/Contributing.rst
50 ↗(On Diff #544137)

I would like to remove it anyway. We want to backport this patch to LLVM-17 which then contains incorrect information. For main the information will be wrong for a few days; I don't expect new headers in that short time-span.

libcxx/include/__std_clang_module
155

Similar as below this header can always be included. For now keep it, just keep it in mind for the generator script.

186

Typically we don't use the language version guards. They also seem to be incomplete. For example <print> is C++23 only.
I don't mind it for this patch, it's more a remark to keep in mind for the generator patch.

Including <print> in C++03 does compile and not give diagnostics.
Including <locale> with _LIBCPP_HAS_NO_LOCALIZATION will trigger an #error "foo" in the pre-processor.
That's not clear from the annotations in the python file since that is used for the test where we indeed need to disable tests in older language versions.

libcxx/utils/libcxx/test/header_information.py
18–21 ↗(On Diff #544833)

When the CI is green can you commit this separately. I like to not have this in this patch. Normally I don't mind some additional polishing, but I want to keep the changes for LLVM-17 as small as possible.

Mordante accepted this revision.Jul 27 2023, 10:34 AM
This revision is now accepted and ready to land.Jul 27 2023, 10:34 AM
iana updated this revision to Diff 544869.Jul 27 2023, 11:25 AM

Try again on the unit test, review feedback

iana marked 2 inline comments as done.Jul 27 2023, 1:42 PM
iana added inline comments.
libcxx/docs/Contributing.rst
50 ↗(On Diff #544137)

Done!

libcxx/utils/libcxx/test/header_information.py
18–21 ↗(On Diff #544833)

Yep!

iana marked 2 inline comments as done.Jul 27 2023, 2:03 PM
iana added inline comments.
libcxx/include/__std_clang_module
186

I'll see if I can ignore the versions in

186

Ok, I guess we'll need to add something to header_information.py for codifying the include rules. Either that or the script could just try to include each header with each mode and see if it errors or not.

This revision was automatically updated to reflect the committed changes.