This is an archive of the discontinued LLVM Phabricator instance.

[clang][pp] adds '#pragma include_instead'
ClosedPublic

Authored by cjdb on Jul 20 2021, 1:14 PM.

Details

Summary

#pragma clang include_instead(<header>) is a pragma that can be used
by system headers (and only system headers) to indicate to a tool that
the file containing said pragma is an implementation-detail header and
should not be directly included by user code.

The library alternative is very messy code that can be seen in the first
diff of D106124, and we'd rather avoid that with something more
universal.

This patch takes the first step by warning a user when they include a
detail header in their code, and suggests alternative headers that the
user should include instead. Future work will involve adding a fixit to
automate the process, as well as cleaning up modules diagnostics to not
suggest said detail headers. Other tools, such as clangd can also take
advantage of this pragma to add the correct user headers.

Diff Detail

Event Timeline

cjdb created this revision.Jul 20 2021, 1:14 PM
cjdb requested review of this revision.Jul 20 2021, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2021, 1:14 PM
cjdb added inline comments.Jul 20 2021, 1:35 PM
clang/lib/Lex/PPLexerChange.cpp
475

I wasn't able to work out how to escape braces inside diagnostic files. Seems neither {{ nor \{ work :(

Can't wait to start using this! (Note: this is not a full review, just some thoughts.)

Do you have a test case where

// private: header one
*exists*

// private: header two
#include<__header_one.h>

// public: header three
#include<__header_two.h>

I think right now you might see a warning that says "please include either __header_two.h or header_three.h" which is not right.

This begs the question, would it not be better to have a second argument that specifies what header should be used as a replacement? This would also allow the person writing the library to decide whether it's better to use angle brackets or quotes.

clang/include/clang/Lex/HeaderSearch.h
470

Is the logic here is just "if it's not already here add it?" If so, I think maybe you should just make Aliases a set (either a StringSet or SmallSet). Do you care about order?

clang/lib/Lex/PPLexerChange.cpp
467

Can we use llvm::interleaveComma or interleave?

cjdb added a comment.EditedJul 20 2021, 2:30 PM

Can't wait to start using this! (Note: this is not a full review, just some thoughts.)

Do you have a test case where

// private: header one
*exists*

// private: header two
#include<__header_one.h>

// public: header three
#include<__header_two.h>

I think right now you might see a warning that says "please include either __header_two.h or header_three.h" which is not right.

You seem to be right... I thought I already had a test case for this :S

This begs the question, would it not be better to have a second argument that specifies what header should be used as a replacement? This would also allow the person writing the library to decide whether it's better to use angle brackets or quotes.

This is the purpose of the first argument, so I'm not sure I follow.

clang/include/clang/Lex/HeaderSearch.h
470

Yes. I was going to use StringSet, but need operator[] for diagnostics. SetVector was my next choice, but it was giving me some serious complaints about a DeepMap or something. I guess I could copy the StringSet's contents into a SmallVector at the diagnostic site, but I really don't expect headers to have more than two aliases?

clang/lib/Lex/PPLexerChange.cpp
467

Thanks, I was looking for join!

cjdb updated this revision to Diff 360317.Jul 20 2021, 4:31 PM
cjdb marked an inline comment as done.
  • renames test files so they better describe intention
  • replaces SmallVector with SetVector
  • replaces std::accumulate with llvm::interleave

This is pretty interesting from a tooling perspective!

Some prior art here are the include-what-you-use "pragmas" (magic comments). Clangd supports // IWYU pragma: "private", include "public.h" which has similar semantics to that proposed here.

Having multiple alternatives makes this less useful to tools as it's ambiguous what the correct action/fix is. It's necessary for the standard library and the syntax doesn't really encourage it, so that seems OK. Worst case, tools can ignore the pragmas when several are present.

Eventually this seems like a reasonable thing to want for user code. What are your motivations for restricting it to system headers? (And do you think people in future will be tempted to lift them?)
FWIW in the short term in clangd it's not that useful for the standard libraries, as we can enumerate all the symbols and that will work with libstdc+, MS stdlib...

For user code it's worth thinking about how the alternatives are specified - in the standard library a literal string <utility> is correct, but in user code you may rather want a path to the correct *file* (relative to the current file) which can then be turned into a spelled #include according to the including file's header search path.

This is quite interesting, I'd love to be able to use this in libc++ (and suggest that my peers use this in Apple's libc)!

Eventually this seems like a reasonable thing to want for user code. What are your motivations for restricting it to system headers? (And do you think people in future will be tempted to lift them?)

The problem with extending this to non-system headers is that you need a way to tell which headers are allowed to include the detail headers and which ones are not. If a user-written library were able to use this pragma, it would need to somehow tell the compiler that its own headers are allowed to include the detail headers in the library. But that would require some notion of "header group", which, if you go down the rabbit hole, I think is roughly equivalent to implementing modules. By special-casing system headers, you sidestep this whole issue by saying "all system headers are special and they are allowed to include any other header", removing the need for something more clever. Chris can correct me if that's incorrect, but that's my understanding of the situation.

cjdb added a comment.Jul 21 2021, 9:15 AM

This is pretty interesting from a tooling perspective!

Some prior art here are the include-what-you-use "pragmas" (magic comments). Clangd supports // IWYU pragma: "private", include "public.h" which has similar semantics to that proposed here.

Having multiple alternatives makes this less useful to tools as it's ambiguous what the correct action/fix is. It's necessary for the standard library and the syntax doesn't really encourage it, so that seems OK. Worst case, tools can ignore the pragmas when several are present.

Hmm... I like prior art. That clangd supports it suggests that there's a section of code I can look at for inspiration if we were to replace this pragma with the IWYU comment-pragma (I wonder why they didn't just go with #pragma IWYU ...?).

Is it reasonable for a compiler to interpret comments and issue/adjust diagnostics based on those comments? I thought that was the purpose of a pp-pragma.

Eventually this seems like a reasonable thing to want for user code. What are your motivations for restricting it to system headers? (And do you think people in future will be tempted to lift them?)
FWIW in the short term in clangd it's not that useful for the standard libraries, as we can enumerate all the symbols and that will work with libstdc+, MS stdlib...

For user code it's worth thinking about how the alternatives are specified - in the standard library a literal string <utility> is correct, but in user code you may rather want a path to the correct *file* (relative to the current file) which can then be turned into a spelled #include according to the including file's header search path.

Louis' answer is wholesale correct, at least for now. If someone wants to extend the design to support "groups", then I'll happily consider the design; that's not something I've been able to provide a trivial solution to (it usually just comes back to redesigning -fmodules).

Hmm... I like prior art. That clangd supports it suggests that there's a section of code I can look at for inspiration if we were to replace this pragma with the IWYU comment-pragma (I wonder why they didn't just go with #pragma IWYU ...?). Is it reasonable for a compiler to interpret comments and issue/adjust diagnostics based on those comments? I thought that was the purpose of a pp-pragma.

I can answer that one! Pragmas tend to be for things with real in-game effects that require compiler support, such as #pragma once or #pragma thumb. If a TU contains a line of the form #pragma ... that the current compiler doesn't recognize, that's usually a bug. Compiler vendors reflect that fact by providing -Wunknown-pragmas (and turning it on pretty aggressively). https://stackoverflow.com/questions/132667/how-to-disable-pragma-warnings

For things that "don't really matter" — such as (1) source encoding, (2) switch case fallthroughs, (3) IWYU relationships — it's usually easier to just do a special kind of comment, rather than forcing all your users to worry about -Wunknown-pragmas warnings on older or third-party compilers. All compilers silently ignore "unknown comments," which is exactly what we want for these kinds of things. OTOH, special comments are hard to hide behind macros, whereas [[attributes]] and _Pragma("s") are easier.

If IWYU has already developed a de facto standard for this information, I would recommend libc++ just use it (perhaps with no changes needed in clang at all). This would have the benefit of getting libc++ to work with IWYU out-of-the-box, instead of having to go beg IWYU to implement support for a new #pragma include_instead. Is any IWYU developer aware of D106394 yet? You should definitely try to get them involved here.

cjdb updated this revision to Diff 360519.Jul 21 2021, 10:47 AM
  • renames files
  • makes it possible for #pragma GCC system_header to come after includes
cjdb added a comment.Jul 21 2021, 11:25 AM

If IWYU has already developed a de facto standard for this information, I would recommend libc++ just use it (perhaps with no changes needed in clang at all). This would have the benefit of getting libc++ to work with IWYU out-of-the-box, instead of having to go beg IWYU to implement support for a new #pragma include_instead. Is any IWYU developer aware of D106394 yet? You should definitely try to get them involved here.

This suggestion misses the fact that this change issues a warning to users who include detail system headers and the fact that Clang can use this to improve its diagnostics.

Eventually this seems like a reasonable thing to want for user code. What are your motivations for restricting it to system headers? (And do you think people in future will be tempted to lift them?)

The problem with extending this to non-system headers is that you need a way to tell which headers are allowed to include the detail headers and which ones are not.

Ah, of course. It's a shame if this is a fatal reason because it seems like a small thing.

IWYU defines a whole bunch of other "pragmas", and one is "IWYU pragma: friend" for this purpose. This works but adds complexity to the source code (I didn't see it being as heavy as defining modules, but I haven't thought about it much) and also to consuming tools (we don't support it in clangd for this reason).

Annotating the include site instead like #include "internals.h" // friend seems almost tempting but magic-comments issues aside doesn't solve all the questions tools have (like what include to insert if none exists yet).

Hmm... I like prior art. That clangd supports it suggests that there's a section of code I can look at for inspiration if we were to replace this pragma with the IWYU comment-pragma

Sure, there's nothing terribly interesting to see though. Clangd doesn't yet use it for warnings, but sometimes when we encounter an indexed symbol we consider inserting an #include for it (code completion, fixit for missing symbol). At indexing time the IWYU pragma is used to determine which insertable header we'll record for that symbol.

Indexing 1 - parse an IWYU comment. Note that it always produces a quoted string, which means a verbatim #include spelling, rather than a resolved file path, thus the FIXME.
Indexing 2 - determine the includable header for symbols from a FileID.
(The code completion and diagnostic fixit code isn't terribly interesting or readable...)

(I wonder why they didn't just go with #pragma IWYU ...?).

Historically, it may have been the principled reason @Quuxplusone mentioned, or it may simply have been that as an *out-of-tree* clang-based tool, it's easier to hook comments in PPCallbacks than it is to hook an unknown pragma. (Of course these reasons are connected)

If IWYU has already developed a de facto standard for this information, I would recommend libc++ just use it (perhaps with no changes needed in clang at all).

One hesitation I'd have about adopting this convention is that while the "IWYU pragma: friend" subset seems to coincide exactly with the design you want, the full set of IWYU pragmas are a rich/complicated set that I'm less confident in the design of.
Supporting only this subset will cause *some* confusion/pressure for more, but later choosing to extend the model will force a choice between adopting more IWYU pragmas (may be the wrong design) or doing something else with overlapping functionality (lots of confusion).

Pragmas and magic comments both have ergonomic issues, so it's really about the tradeoffs. Personally, I think it's more reasonable to support this as a pragma than as a magic-comment. Pragmas are a mystery to users, but they're less of a mystery than comments that carry semantic weight. I think it's somewhat more advantageous to use a pragma because if a system header uses this pragma in an unguarded way, they get told about their mistake (which is somewhat valuable), and code editors are far better about autocompleting actual syntax than they are with things like magic comments.

clang/include/clang/Basic/DiagnosticLexKinds.td
303

Design-wise, do we want this one to be an error instead of a warning? I don't have strong feelings one way or the other, but making it an error ensures no one finds creative ways to use it outside of a system header by accident (and we can relax the restriction later if there are reasons to do so).

304

I think these diagnostics should say '#pragma clang include_instead' (we're not super consistent about this, but the surrounding single quotes are intended to convey syntactic constructs and so it should probably be valid syntax).

306

O.o... why do we need to show this one in system headers if the diagnostic can only trigger outside of a system header?

307–310

No need for a new warning -- I think we can use existing diagnostics for this (like err_pp_expects_filename). Also, I think syntax issues should be diagnosed as an error instead of a warning -- we expect a particular syntactic form and it's reasonable for us to enforce that users get it right.

311–314

Do we need this one or can we use err_pp_file_not_found?

315

Do we want to give this pragma teeth by making this an error? Right now, an intrepid user can simply ignore the diagnostic, form the reliance on the header, and we've not really solved the problem we set out to solve. Then again, perhaps you'd like that as an escape hatch for some reason?

clang/include/clang/Lex/PreprocessorLexer.h
188–189

It's not wrong, but we don't usually do top-level const qualification of value types.

clang/lib/Lex/PPDirectives.cpp
2025
clang/lib/Lex/PPLexerChange.cpp
313

Please spell out the type instead of using auto or sink this into the for loop

315
316

Please spell out the type.

335

Can you use llvm::join() from StringExtras.h for this or does the insertion of the single quotes make that awkward?

337
clang/lib/Lex/Pragma.cpp
505

Please return llvm::None in all these cases to make it more clear what's going on.

556
565–567
571
587–589
1072
cjdb added a comment.Jul 22 2021, 1:28 PM

Thanks for the first pass @aaron.ballman! Some clarifying questions, but I think I have enough of your other comments to work on while you ponder these.

clang/include/clang/Basic/DiagnosticLexKinds.td
303

I'd prefer it to be an error, but I wasn't confident that would be accepted (same applies to all your comments re warn vs err). If you're game, I'll make the change.

306

Mostly because I'm scared that there's some toggle that would accidentally make the diagnostic observable in a system header and wanted to test the behaviour.

clang/lib/Lex/PPLexerChange.cpp
316

The type's currently a private type (that was by design). Do you want me to make it a public type or pull it out into namespace clang?

clang/lib/Lex/Pragma.cpp
558

I thought I deleted this.

571

Alternatively: remove spelling altogether.

cjdb marked 2 inline comments as done.Jul 22 2021, 1:42 PM
cjdb added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
307–310

Hmm... the syntax is #pragma clang include_instead(header). This is diagnosing the lack of ( or ), not a missing filename. Perhaps there's already a diagnostic for that?

cjdb updated this revision to Diff 361014.Jul 22 2021, 4:02 PM
cjdb marked 12 inline comments as done.
  • applies all of @aaron.ballman's suggested changes
  • turns warnings into errors and deletes warning group
  • replaces interleave with join
  • properly abandons LF transmuting

The only things that shouldn't be applied are comments with replies (and even some of those have been actioned).

aaron.ballman added inline comments.Jul 23 2021, 5:03 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
303

I would prefer errors here -- the goal of this feature is to prevent programmers from doing things we know are going to be a maintenance burden, so I think we should be restrictive here. If users find the restrictions too onerous and have a good reason for us to relax them, we can definitely consider it then.

304

Can you prefix the diagnostic identifiers with err_? We're not super consistent in this file, but it's helpful when reading the use of the diagnostic to know whether it's an error or not.

306

You can remove the ShowInSystemHeader bits now because these are all errors (at least, I'm 99% sure you can remove it).

307–310

There is: diag::err_expected (and some related ones in DiagnosticCommonKinds.td).

clang/lib/Lex/PPLexerChange.cpp
316

PreprocessorLexer already friends Preprocessor, so you can name it directly here (though please change the local variable name to be something different from the type name while you're at it).

cjdb updated this revision to Diff 361357.Jul 23 2021, 2:50 PM
cjdb marked 11 inline comments as done.

applies all of @aaron.ballman's remaining feedback

clang/include/clang/Basic/DiagnosticLexKinds.td
304

I've replaced pp with err since pragma implies "preprocessor" and don't really think it adds much more value. err on the other hand is useful?

aaron.ballman accepted this revision.Jul 26 2021, 7:00 AM

LGTM!

clang/include/clang/Basic/DiagnosticLexKinds.td
304

SGTM, thanks!

This revision is now accepted and ready to land.Jul 26 2021, 7:00 AM
This revision was landed with ongoing or failed builds.Jul 26 2021, 9:08 AM
This revision was automatically updated to reflect the committed changes.

Just a thought (nothing to hold up this patch/suggest a revert/suggest any immediate action), but:

The problem with extending this to non-system headers is that you need a way to tell which headers are allowed to include the detail headers and which ones are not.

What if the analysis was done on the entire #include path from the current primary source file - rather than from the immediate include location? Then the requirement could be "This header must be directly or indirectly included from one of the headers listed in include_instead, otherwise there will be a warning" - could that generalize to other use cases/not only system headers?

(not entirely sure how this'd interact with modules, though - but perhaps that could be handled a bit differently, by having private modular headers that can only be included from within a module, not by clients of a module)

hans added a subscriber: hans.Jul 27 2021, 8:30 AM

This is causing the compiler to crash in Chromium builds, see https://bugs.chromium.org/p/chromium/issues/detail?id=1195353

I verified that replacing the use of getLiteralData() with OriginalFileName in HandleHeaderIncludeOrImport() fixes the crash, but the the other getLiteralData() probably needs a fix too, and I don't have a test case, so reverted for now in 973de7185606a21fd5e9d5e8c014fbf898c0e72f

clang/lib/Lex/PPDirectives.cpp
2028

getLiteralData() can return null, which causes a crash here in Chromium.

I think getSpelling() should be used instead, and since it was already called above, perhaps you could just pass in OriginalFilename here.

clang/lib/Lex/Pragma.cpp
571

actually, get getSpelling() is probably necessary in case getLiteralData() returns null?

cjdb added a comment.Jul 28 2021, 9:07 AM

Just a thought (nothing to hold up this patch/suggest a revert/suggest any immediate action), but:

The problem with extending this to non-system headers is that you need a way to tell which headers are allowed to include the detail headers and which ones are not.

What if the analysis was done on the entire #include path from the current primary source file - rather than from the immediate include location? Then the requirement could be "This header must be directly or indirectly included from one of the headers listed in include_instead, otherwise there will be a warning" - could that generalize to other use cases/not only system headers?

I think this would mean that something like libc++'s <__ranges/concepts.h> couldn't directly include <__iterator/concepts.h>? Also, how does this play with #pragma GCC system_header detection?

Just a thought (nothing to hold up this patch/suggest a revert/suggest any immediate action), but:

The problem with extending this to non-system headers is that you need a way to tell which headers are allowed to include the detail headers and which ones are not.

What if the analysis was done on the entire #include path from the current primary source file - rather than from the immediate include location? Then the requirement could be "This header must be directly or indirectly included from one of the headers listed in include_instead, otherwise there will be a warning" - could that generalize to other use cases/not only system headers?

I think this would mean that something like libc++'s <__ranges/concepts.h> couldn't directly include <__iterator/concepts.h>?

Ah, I think I follow what you're getting at - I think you're suggesting that the include_insteads are not intended to be exhaustive? (they're not all the ways you might get these declarations by including public headers - they are only meant to enumerate the public/intended entry points to these declarations)

So despite the fact that ranges indirectly includes __iterator/concepts .h, you wouldn't want to tell the user that they could include ranges to get the declarations in __iterator/concepts.h ?

Because __iterator/concepts.h is only needed as an implementation detail of ranges, not as part of its public interface? Fair enough.

Pity - if the enumeration of public headers that can indirectly include an implementation header was complete, then that would be sufficient to implement a warning without needing the system header detection, I think? So one option would be to annotate the include_instead with "implementation detail" V "public interface" bit/distinction and that would work instead of the system header detection? But I guess that'd be left to another project/further work to generalize this feature.

Also, how does this play with #pragma GCC system_header detection?

I don't think my suggestion would have anything to do with system header detection, so far as I can think of - could you describe this concern in more detail?

kimgr added a subscriber: kimgr.Aug 3 2021, 3:44 AM

Hi, sorry I'm late to the game. IWYU maintainer here.

I saw this patch mentioned in the LLVM Weekly newsletter and immediately thought: "wow, great, I have to build support in IWYU for that!".

I too prefer pragmas to magic comments, and I don't think include_instead necessarily needs to cover all IWYU pragmas (https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md).

Took me a while to get my head around it, but I see now that this is only supported for system headers. I think that makes sense for the compiler, otherwise it will be hard to guess which headers are allowed to include what. IWYU usually doesn't have that problem, as we analyze source files individually, and usually not headers independently.

My only concern was spelling -- in IWYU we need some handholding to know whether to use angled or quoted includes, but I see the quoting is part of the pragma, so that should be nice and useful.

Does this support the macro-able _Pragma syntax, so that users can be portable using something like:

#ifdef __clang__
#define INCLUDE_INSTEAD(headername) _Pragma("clang include_instead " ## headername)
#else
#define INCLUDE_INSTEAD(headername)
#endif

? That might also make a nice testcase.

I'd love it if you could keep me in the loop if you want to extend this to something more general (using something like e.g. IWYU pragma: friend)

Took me a while to get my head around it, but I see now that this is only supported for system headers. I think that makes sense for the compiler, otherwise it will be hard to guess which headers are allowed to include what. IWYU usually doesn't have that problem, as we analyze source files individually, and usually not headers independently.

Not sure I follow - could you describe IWYU's approach in more detail? Certainly a clang warning can be powered by examining a source file and all the includes in totality, not only using local reasoning at the location of one #include from another file. So if there's a way to use more global (within the scope of a single translation unit) reasoning to get a more general purpose feature, that would be an option.

rsmith added a comment.EditedAug 3 2021, 10:47 AM

I'm not in love with the design of this feature. Basing the accept / warn decision on whether the inclusion is in a system header seems fragile and doesn't seem to enforce the intended semantics of the warning (that you can only reach the implementation detail header through one of the named headers), and limits the utility of the feature to system headers. As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not? That would seem to make this warning applicable to both user headers and system headers in an unsurprising way. We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic.

clang/include/clang/Lex/HeaderSearch.h
116–122

This is a very heavyweight thing to include in HeaderFileInfo -- it looks like this member will add a couple of hundred bytes to the struct. Please consider a more efficient way of storing these, such as storing a map from header to aliases in HeaderSearch and only including a single bit here to say if there are any aliases for a header (so we can avoid checking the map in the common case).

cjdb added a comment.Aug 3 2021, 11:00 AM

I'm not in love with the design of this feature. Basing the accept / warn decision on whether the inclusion is in a system header seems fragile and doesn't seem to enforce the intended semantics of the warning (that you can only reach the implementation detail header through one of the named headers), and limits the utility of the feature to system headers. As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not? That would seem to make this warning applicable to both user headers and system headers in an unsurprising way. We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic.

Wouldn't this suggestion mean that the following would be allowed?

#include <utility>
#include <__utility/move.h>

If not, then I'm probably misunderstanding your alternative design.

clang/include/clang/Lex/HeaderSearch.h
116–122

Good idea. I'll follow up with a patch some time this week.

clang/lib/Lex/Pragma.cpp
1996

@rsmith wrote:

I'm not in love with the design of this feature. Basing the accept / warn decision on whether the inclusion is in a system header seems fragile and doesn't seem to enforce the intended semantics of the warning (that you can only reach the implementation detail header through one of the named headers), and limits the utility of the feature to system headers. As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not?

@rsmith, I think you misunderstand part of the design goal here. You said "you can only reach the implementation detail header through one of the named headers" — this is true only for a maybe-surprising definition of "you." For example, if the user-programmer's "MyThing.hpp" tries to include <__utility/move.h> directly (or if IWYU wants to suggest how to include it), we expect the tool to suggest that you must include <utility>, not <__utility/move.h>. However, if the library's <algorithm> tries to include <__utility/move.h> directly, that's totally fine and expected. The entire point of these detail headers is to make sure that library headers don't have to #include <utility> when all they need is one little piece of it.

So we do need some way to distinguish user-programmer headers like "MyThing.hpp" from library headers like <algorithm>. At the moment this is done by limiting the usefulness of the feature to just system headers, and making every system header a "friend" of every other system header.

But one could imagine allowing a detail header to specify not only "here's how my non-friends should get to me" (e.g. by including <utility>) but also "here's a list of headers and/or directories which should be considered friends of mine" (and which could then include me directly, e.g. <algorithm> or <__iterator/iter_move.h>). We'd need a reliable syntax for that, though.

As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not? That would seem to make this warning applicable to both user headers and system headers in an unsurprising way. We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic.

Wouldn't this suggestion mean that the following would be allowed?

#include <utility>
#include <__utility/move.h>

If not, then I'm probably misunderstanding your alternative design.

No: when we process the second #include, <utility> is not on the #include stack. (It's not the includer of the current file, nor the includer of that file, ...).

rsmith added inline comments.Aug 3 2021, 12:00 PM
clang/lib/Lex/Pragma.cpp
1996

I think what you're suggesting is exactly what I suggested later: "We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic."

[That was an edit I made after my original comment, though; I realized I forgot to mention this and went back and added it. I guess your reply and my edit happened at around the same time?]

As for syntax, something like this would make some sense to me:

#pragma clang include_instead(<foo>, "bar"; "private.h")

(Putting all the headers in a single pragma removes the need to track and check this at the end of the file.)

... As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not? That would seem to make this warning applicable to both user headers and system headers in an unsurprising way. We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic.

Yeah, @cjdb and I discussed this offline last week - a main concern with that direction is the maintenance burden that'd add to libc++ when using one of the internal implementation headers, having to update the "internal list of headers" every time a new include is added. Not impossible, but maybe impractical - especially for more common internal headers (like whatever provides std::move, maybe needing to be included in nearly every file.

(we speculated that maybe something akin to a modulemap could be used - found with a similar search path that would be able to describe once a single set of headers (but I guess that only works if all the headers are in a partitioned subdirectory - otherwise who gets to own the top-level file that has these lists of names - I guess maybe each file could say "I'm part of library foo" and then from the file search for foo.headerlist file and use that to describe which are the public and private headers to implement the warning) - but no doubt that's overkill/not likely feasible)

... As an alternative, have you considered checking whether any of the headers named in the pragma are in the include stack, and warning if not? That would seem to make this warning applicable to both user headers and system headers in an unsurprising way. We'd presumably need two lists of headers: the external headers that we encourage people to include to get at the implementation detail header, and the internal list of headers that are also allowed to use it but that we shouldn't list in the diagnostic.

Yeah, @cjdb and I discussed this offline last week - a main concern with that direction is the maintenance burden that'd add to libc++ when using one of the internal implementation headers, having to update the "internal list of headers" every time a new include is added.

It seems to me that this is at least only a linear factor more work: when you add a #include, you need to update the list of permitted includers in that file. (This is assuming that your permitted includers list contains only (a) the list of headers to tell the user about and (b) the list of *direct* includers in libc++.)

(we speculated that maybe something akin to a modulemap could be used - found with a similar search path that would be able to describe once a single set of headers (but I guess that only works if all the headers are in a partitioned subdirectory - otherwise who gets to own the top-level file that has these lists of names - I guess maybe each file could say "I'm part of library foo" and then from the file search for foo.headerlist file and use that to describe which are the public and private headers to implement the warning) - but no doubt that's overkill/not likely feasible)

Perhaps a much simplified version of this could work: permit an inclusion only if the file is found by relative path or if it's found in the same include path as an earlier file in the include stack. (So, once you enter /usr/include/c++/v1 you can include any header within it; it's only edges from outside -> inside that have this enforcement applied to them.)

Alternatively, perhaps we could use the existing module map infrastructure for this? Treating these headers as private textual headers would seem to have the right semantics without requiring any in-source-file annotations. For that, we'd presumably want to change Clang's default (modules disabled) mode from ignoring all module maps to parsing module maps but treating all headers in them as textual, or something along those lines.

kimgr added a comment.Aug 3 2021, 12:53 PM

@dblaikie I think @sammccall mentioned IWYU pragma: friend before, that's what I was alluding to. Much of IWYU's complexity comes from maintaining these library mappings both statically (using mapping tables) and dynamically (using in-source annotations).

I figured the compiler wouldn't want to maintain that level of fidelity for mappings, but judging from the subsequent discussion, maybe it could.