This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Granularize parts of <type_traits>
ClosedPublic

Authored by philnik on May 2 2022, 2:30 AM.

Details

Summary

<type_traits> is quite a large header, so I'll granularize it in a few steps.

Diff Detail

Event Timeline

philnik created this revision.May 2 2022, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 2:30 AM
Herald added a subscriber: mgorny. · View Herald Transcript
philnik requested review of this revision.May 2 2022, 2:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 2:30 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik updated this revision to Diff 426926.May 4 2022, 12:17 AM
  • Generate files
philnik updated this revision to Diff 427593.May 6 2022, 3:54 AM
  • Fix clang-tidy
ldionne requested changes to this revision.May 18 2022, 1:01 PM

The direction LGTM, but I'd like to see the final form with passing CI before shipping this.

libcxx/include/cstddef
64–66

I suspect this was used instead of std::enable_if<std::is_integral<T>> because of circular header dependency issues. I don't think we have those issues anymore, so we could use the simpler form instead and get rid of __libcpp_is_integral altogether. This assumes that all the compilers we support implement the __is_integral builtin, which is probably true.

I suggest doing that cleanup as a separate patch. Even if we can't remove __libcpp_is_integral in the end, it would still be good to use __enable_if instead of __enable_if_integral_imp here.

This revision now requires changes to proceed.May 18 2022, 1:01 PM
philnik updated this revision to Diff 430644.May 19 2022, 5:09 AM
  • Try to fix CI
philnik updated this revision to Diff 431057.May 20 2022, 2:32 PM
  • Generate files
ldionne accepted this revision.May 23 2022, 10:24 AM

LGTM, and it would be nice to do the cleanup suggested in https://reviews.llvm.org/D124755#inline-1207727.

This revision is now accepted and ready to land.May 23 2022, 10:24 AM
This revision was automatically updated to reflect the committed changes.
EricWF added a subscriber: EricWF.May 24 2022, 12:00 PM

You can safely ignore my comments below; I don't intend to block this change.

However, I want to again state my objections for the record.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I'm going to shout my normal speech about how:

  • I don't think this improves the quality of the code,
  • It really muddies the blame history (which has been really really really problematic for me recently,)
  • It makes our module maps more complicated,
  • It creates less understandable diagnostics by exposing what should be private implementation details.
  • I don't know hat it actually improves portability much; We are the only standard library to do this (at least to this extent), with the claim that exposing more names than necessary creates portability problems, but I don't think that's true for a number of reasons, one of which being that every other standard library includes all of <X> if they need one thing from <X>.

Have we done tests to see how many fewer names we actually expose to the end user for a typical real TU? I'm imaging we could do something like:

  1. Produce a pre-processed header.
  2. Run clang-query over it to produce a list of STL declarations.
  3. Repeat for both HEAD and older versions.
  4. Diff the outputs.

Further, as the complexity of our header structure increases, I find it increasingly difficult to reason about the header structure. I'm glad we have a tool to detect circular deps, because that's beyond my capabilities now. But how are we ensuring all necessary declarations have been included *directly*, rather than transitively. AKA how do we enforce "include what you use"?

And finally, given the goal of this is to remove unnecessary declarations leaking from headers, how are we tracking our progress doing this? We should be aware exactly what breaking changes we're shipping to users, and I don't know that we are.

P.S. Also, and I have been meaning to confirm this, but I expect this to be some amount slower for non-trivial real-world C++ builds. But only when there is enough code being compiled to frequently eject the STL headers from the linux kernels file cache. That said, until I actually produce a test, it's fair to say I'm only talking out my butt.

libcxx/include/module.modulemap
987

Should these have non-reserved names?

philnik marked an inline comment as done.May 24 2022, 1:04 PM
  • I don't think this improves the quality of the code,

Not sure what quality you are talking about, but I think the readability is increased a lot. You instantly see what is part of the implementation.

  • It really muddies the blame history (which has been really really really problematic for me recently,)

That's not ideal, I agree.

  • It makes our module maps more complicated,

Eh. I wouldn't consider a list of files "complicated".

  • It creates less understandable diagnostics by exposing what should be private implementation details.

Could you give an example?

  • I don't know hat it actually improves portability much; We are the only standard library to do this (at least to this extent), with the claim that exposing more names than necessary creates portability problems, but I don't think that's true for a number of reasons, one of which being that every other standard library includes all of <X> if they need one thing from <X>.

It improves the portability in the way that you are more likely to properly include all the headers you need.

Have we done tests to see how many fewer names we actually expose to the end user for a typical real TU? I'm imaging we could do something like:

  1. Produce a pre-processed header.
  2. Run clang-query over it to produce a list of STL declarations.
  3. Repeat for both HEAD and older versions.
  4. Diff the outputs.

I'm not aware of any test like this. I think the outcome would be interesting, but I doubt it will influence the direction very much.

Further, as the complexity of our header structure increases, I find it increasingly difficult to reason about the header structure. I'm glad we have a tool to detect circular deps, because that's beyond my capabilities now. But how are we ensuring all necessary declarations have been included *directly*, rather than transitively. AKA how do we enforce "include what you use"?

That's enforced very well by the modules build when the headers are granularized.

And finally, given the goal of this is to remove unnecessary declarations leaking from headers, how are we tracking our progress doing this? We should be aware exactly what breaking changes we're shipping to users, and I don't know that we are.

We have a few patches that remove top-level headers. The removed headers are listed under https://libcxx.llvm.org/ReleaseNotes.html#id4.

P.S. Also, and I have been meaning to confirm this, but I expect this to be some amount slower for non-trivial real-world C++ builds. But only when there is enough code being compiled to frequently eject the STL headers from the linux kernels file cache. That said, until I actually produce a test, it's fair to say I'm only talking out my butt.

Ever tried -fmodules? Makes the libc++ tests run at least 1.6x as fast on my machine.

I'm responding here, but I don't want this to block this PR. Could you ask on Discord if you want further clarification/justification?

libcxx/include/module.modulemap
987

They should be identical to the header name.

  • I don't think this improves the quality of the code,

Not sure what quality you are talking about, but I think the readability is increased a lot. You instantly see what is part of the implementation.

I've never had a problem with that.

  • It makes our module maps more complicated,

Eh. I wouldn't consider a list of files "complicated".

Fair.

  • It creates less understandable diagnostics by exposing what should be private implementation details.

Could you give an example?

When Clang produces a diagnostic about a missing declaration when using clang modules, it tells the user to include the private header. And users listen. Every week the number of instances of "#include <__private/foo>" in googles codebase grows as a result.
I can provide other examples as well.

  • I don't know hat it actually improves portability much; We are the only standard library to do this (at least to this extent), with the claim that exposing more names than necessary creates portability problems, but I don't think that's true for a number of reasons, one of which being that every other standard library includes all of <X> if they need one thing from <X>.

It improves the portability in the way that you are more likely to properly include all the headers you need.

Not if every STL implementation provides the same transitive includes. Also, it only improves portability one way. That is code written against libc++ is likely to port to other STL's without needing additional includes. The other direction is less true. Though I agree that decreasing our transitive includes is a good goal, I don't know if it actually solves a problem our users have in practice, or if it creates a new problem they didn't have before.

I ported all of Google's C++ codebase from libstdc++ to libc++, which is to say "I ported a lot of code", and in my experience, header hygiene didn't cause an major issue. Though it likely will be as we remove transitive declarations.

Have we done tests to see how many fewer names we actually expose to the end user for a typical real TU? I'm imaging we could do something like:

  1. Produce a pre-processed header.
  2. Run clang-query over it to produce a list of STL declarations.
  3. Repeat for both HEAD and older versions.
  4. Diff the outputs.

I'm not aware of any test like this. I think the outcome would be interesting, but I doubt it will influence the direction very much.

If we're not actually decreasing the surface area of our headers, which is the primary stated goal, it should influence the direction.

Further, as the complexity of our header structure increases, I find it increasingly difficult to reason about the header structure. I'm glad we have a tool to detect circular deps, because that's beyond my capabilities now. But how are we ensuring all necessary declarations have been included *directly*, rather than transitively. AKA how do we enforce "include what you use"?

That's enforced very well by the modules build when the headers are granularized.

Perfect. Don't know why I didn't think of that.

And finally, given the goal of this is to remove unnecessary declarations leaking from headers, how are we tracking our progress doing this? We should be aware exactly what breaking changes we're shipping to users, and I don't know that we are.

We have a few patches that remove top-level headers. The removed headers are listed under https://libcxx.llvm.org/ReleaseNotes.html#id4.

Cool, but do we know the actual effect of that? It's one thing to say "Removed #include <utility> from <foo>", but that doesn't tell us what declarations are no longer present as a result. Which is important. Did we remove 15 declarations or 150?
I have no problem making breaking changes. I do have a problem with doing it in the dark.

P.S. Also, and I have been meaning to confirm this, but I expect this to be some amount slower for non-trivial real-world C++ builds. But only when there is enough code being compiled to frequently eject the STL headers from the linux kernels file cache. That said, until I actually produce a test, it's fair to say I'm only talking out my butt.

Ever tried -fmodules? Makes the libc++ tests run at least 1.6x as fast on my machine.

Yes, I have tried modules. I've also tried porting large codebases over to modules, and it's non-trivial. So long as a significant portion of our users rely on textual includes, the build time of texually including libc++ matters. But again, until I produce data demonstrating an actual problem, we can assume I'm way off the mark.

I'm responding here, but I don't want this to block this PR. Could you ask on Discord if you want further clarification/justification?