This is an archive of the discontinued LLVM Phabricator instance.

[clang] protects users from relying on libc++ detail headers
AbandonedPublic

Authored by cjdb on Jul 5 2021, 10:10 AM.

Details

Summary

libc++ has started splicing standard library headers into much more
fine-grained content for maintainability. It's very likely that outdated
and naive tooling (some of which is outside of LLVM's scope) will
suggest users include things such as <__algorithm/find.h> instead of
<algorithm>, and Hyrum's law suggests that users will eventually begin
to rely on this without the help of tooling. As such, this commit
intends to protect users from themselves, by making it a hard error for
anyone outside of the standard library to include libc++ detail headers.

Diff Detail

Event Timeline

cjdb created this revision.Jul 5 2021, 10:10 AM
cjdb requested review of this revision.Jul 5 2021, 10:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2021, 10:10 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cjdb added a comment.Jul 5 2021, 10:11 AM

A few notes to reviewers:

  • The patch assumes everything is under a top-level __libcxx directory, because it seems there are other system headers that could be prefixed with __. libc++ currently has many top-level directories, following the trend of __${STD_LIB_HEADER}/. I see two solutions going forward:
    1. libc++ moves everything into a top-level __libcxx directory, which minimises the chances of clashing with pre-existing top-level directories that we don't own. Once this patch is committed, it probably won't need to be updated (sans bug fixes).
    2. This patch adopts a set of "blessed" directories that needs to be consistently updated as we splice more and more headers (and as WG21 adds more headers to C++).
  • The patch could probably do with more testing, but I'm coming up dry :-(
Quuxplusone requested changes to this revision.Jul 5 2021, 10:15 AM
Quuxplusone added a subscriber: Quuxplusone.

Step 1 should be to find out if this is even a problem at all. For example, try using one of these tools to compile a C++ program against GNU libstdc++, or against a library like range-v3, both of which already uses many small detail headers. Is the tool's output confusing or incorrect for those libraries? Only if it's actually a problem, should Clang itself try to solve problems with those tool/library combos by adding special-case code into the Clang codebase.

This revision now requires changes to proceed.Jul 5 2021, 10:15 AM

Why not solve this in the headers themselves?
Just #error in implementation header if some macro isn't defined, and define said macro in parent header before including implementation detail headers?

I should add that if any particular library really wants to enforce "thou shalt not deep-link detail headers" (instead of just documenting it and/or relying on the user-programmer's common sense) they can do it pretty cleanly via macros:

// toplevel.h
#pragma once
#define DONT_INCLUDE_DETAIL_HEADERS_DIRECTLY(x, y)
#include <__utility/foo.h>
#include <__various/bar.h>
#undef DONT_INCLUDE_DETAIL_HEADERS_DIRECTLY
// more code here

and then

// __utility/foo.h
#pragma once
DONT_INCLUDE_DETAIL_HEADERS_DIRECTLY(instead, include <utility>)
class Foo {};
void foo();

I don't know how you'd achieve a similar effect with C++20 Modules, but I certainly hope it's possible somehow.

cjdb added a comment.Jul 5 2021, 10:37 AM

Step 1 should be to find out if this is even a problem at all. For example, try using one of these tools to compile a C++ program against GNU libstdc++, or against a library like range-v3, both of which already uses many small detail headers. Is the tool's output confusing or incorrect for those libraries? Only if it's actually a problem, should Clang itself try to solve problems with those tool/library combos by adding special-case code into the Clang codebase.

I didn't say that any tool's output is "confusing or incorrect". I said that tooling will accidentally include libc++'s detail headers, which is undesirable, as it is non-portable. You've also disregarded my statement about Hyrum's law, which points out that users will come to rely on <__algorithm/find.h> with or without tooling. Tooling only accelerates the problem of including detail headers.

Why not solve this in the headers themselves?
Just #error in implementation header if some macro isn't defined, and define said macro in parent header before including implementation detail headers?

Do you mean like this?

#define _LIBCXX_DETAIL_HEADERS_ALLOWED

#include <__algorithm/find.h>
...

#undef _LIBCXX_DETAIL_HEADERS_ALLOWED

That would cover headers too (very good), but I'm not sure how it will play with Clang's modules.

cjdb added a comment.Jul 5 2021, 3:41 PM

I should note that I didn't use private headers in libc++'s modules.modulemap because it created a very noticeable impact to our test suite's run-time.

cjdb added a comment.Jul 7 2021, 6:10 PM

I tried the following, but it doesn't work with modules, so it looks like a compiler solution is necessary.

// in <algorithm>
#define _LIBCPP_PRIVATE_HEADER_ALLOWED

// in <__algorithm/find.h>
#ifndef _LIBCPP_PRIVATE_HEADER_ALLOWED
#error This is a libc++ detail header. Please instead include <algorithm>.
#endif

Given that the goal is to get include-what-you-use to stop suggesting that users #include <__detail/fooimpl.h> when what they want is spelled <foo.h>, I think what's desired here (if anything) would be a pragma targeted at include-what-you-use (and other tools). Basically

// in __detail/fooimpl.h
#pragma clang public_header "foo.h"

This pragma can mostly be ignored by Clang; it's just up to those other tools to consume these pragmas and report the correct public header.

cjdb added a comment.Jul 12 2021, 9:33 AM

Given that the goal is to get include-what-you-use to stop suggesting that users #include <__detail/fooimpl.h> when what they want is spelled <foo.h>, I think what's desired here (if anything) would be a pragma targeted at include-what-you-use (and other tools). Basically

// in __detail/fooimpl.h
#pragma clang public_header "foo.h"

This pragma can mostly be ignored by Clang; it's just up to those other tools to consume these pragmas and report the correct public header.

Thank you for your opinion. That is not the purpose of this patch.

I'm not entirely sure I understand the purpose of this patch. So the idea is that let's say a tool suggests including <__algorithm/find.h> to get the definition of std::find as a IWYU fix-it sort of suggestion, the user would naively do that, and then the compiler (with this patch) would error out saying "woops, you can't include that libc++ detail header". Is that the idea?

If that's it, then I would much rather fix the tools that incorrectly suggest including those implementation detail headers in the first place. Users will be less confused and we won't have to special-case a special directory name, which I can imagine could cause issues. I think it's great to try and give the proper diagnostic to users, but I think the correct place to do that is in the tool that suggests it in the first place. Thoughts?

I'm not entirely sure I understand the purpose of this patch. So the idea is that let's say a tool suggests including <__algorithm/find.h> to get the definition of std::find as a IWYU fix-it sort of suggestion, the user would naively do that, and then the compiler (with this patch) would error out saying "woops, you can't include that libc++ detail header". Is that the idea?

If that's it, then I would much rather fix the tools that incorrectly suggest including those implementation detail headers in the first place. Users will be less confused and we won't have to special-case a special directory name, which I can imagine could cause issues. I think it's great to try and give the proper diagnostic to users, but I think the correct place to do that is in the tool that suggests it in the first place. Thoughts?

Regardless of whether this is about enhacing IWYU QoL or completely preventing inclusion of separate sub-headers,
it seems like that should be done in libc++ itself, it already has the means for that.

cjdb added a comment.Jul 13 2021, 11:12 AM

I'm not entirely sure I understand the purpose of this patch. So the idea is that let's say a tool suggests including <__algorithm/find.h> to get the definition of std::find as a IWYU fix-it sort of suggestion, the user would naively do that, and then the compiler (with this patch) would error out saying "woops, you can't include that libc++ detail header". Is that the idea?

If that's it, then I would much rather fix the tools that incorrectly suggest including those implementation detail headers in the first place. ... I think it's great to try and give the proper diagnostic to users, but I think the correct place to do that is in the tool that suggests it in the first place. Thoughts?

There's way more reviewer focus on tooling in this patch than there should be. Fixing non-LLVM tools is not a feasible request: I barely have enough time to try and make sure that LLVM tooling is going to respond to our libc++ changes. The point of this patch is to help users in a variety of situations.

  1. They explicitly include a detail libc++ header. I've observed this quite frequently with the bits/ headers in libstdc++, and even Boost, and while it might seem like a good idea at the time, it's going to inevitably cause pain down the road (as I have experienced when trying to upgrade an employer's Boost library). This is akin to wearing a seatbelt.
  2. Tools that auto-include the detail thing and then get checked into a library's trunk without the author noticing (this happens to me more often than I'd care to admit). This can impact downstream users. I don't really have a good car analogy for this situation, but it lowers the chance for negligence.
    1. Yes, tool QoI is important and should be fixed too, but people have a habit of trusting their tools, and this is a good way to get tools to improve their QoI.
    2. Putting the burden on tooling also doesn't take into consideration that some people are neurodiverse, and it might be difficult for them to even remember to go back and check that the "IWYU fix-it" is correct. That's not complacency or naivety or inexperience; this is how some people's brains are hardwired.
  3. Hyrum's law is boss here. It doesn't matter how much we improve external tooling: if a user can include our details, some folks inevitably will.

Users will be less confused and we won't have to special-case a special directory name, which I can imagine could cause issues.

I'm not sure why users will be more confused if we do this?


I do plan to improve QoI for LLVM tooling in other areas, but this patch is essentially a catch-all handbrake. It's best if you put attention on people, and human behaviour: this is a human problem that I'm trying to solve, not a technical one.

cjdb added a comment.Jul 13 2021, 11:14 AM

I'm not entirely sure I understand the purpose of this patch. So the idea is that let's say a tool suggests including <__algorithm/find.h> to get the definition of std::find as a IWYU fix-it sort of suggestion, the user would naively do that, and then the compiler (with this patch) would error out saying "woops, you can't include that libc++ detail header". Is that the idea?

If that's it, then I would much rather fix the tools that incorrectly suggest including those implementation detail headers in the first place. Users will be less confused and we won't have to special-case a special directory name, which I can imagine could cause issues. I think it's great to try and give the proper diagnostic to users, but I think the correct place to do that is in the tool that suggests it in the first place. Thoughts?

Regardless of whether this is about enhacing IWYU QoL or completely preventing inclusion of separate sub-headers,
it seems like that should be done in libc++ itself, it already has the means for that.

I've already pointed out that it can't be done in libc++. The proposed header alternative doesn't translate to Clang modules, and private modules tank build times.

cjdb added a comment.Jul 13 2021, 12:01 PM

I'm not entirely sure I understand the purpose of this patch. So the idea is that let's say a tool suggests including <__algorithm/find.h> to get the definition of std::find as a IWYU fix-it sort of suggestion, the user would naively do that, and then the compiler (with this patch) would error out saying "woops, you can't include that libc++ detail header". Is that the idea?

If that's it, then I would much rather fix the tools that incorrectly suggest including those implementation detail headers in the first place. Users will be less confused and we won't have to special-case a special directory name, which I can imagine could cause issues. I think it's great to try and give the proper diagnostic to users, but I think the correct place to do that is in the tool that suggests it in the first place. Thoughts?

Regardless of whether this is about enhacing IWYU QoL or completely preventing inclusion of separate sub-headers,
it seems like that should be done in libc++ itself, it already has the means for that.

I've already pointed out that it can't be done in libc++. The proposed header alternative doesn't translate to Clang modules, and private modules tank build times.

After a bit more experimenting (I was benchmarking to provide evidence), it seems as though private headers impact the first run of llvm-lit on libc++ tests, but not subsequent ones. Since this patch is user-oriented, I'm now in agreement that it can be punted back to the libc++ space. Thanks @lebedev.ri and apologies for my initial misleading assertion!

cjdb abandoned this revision.Jul 13 2021, 12:02 PM