This is an archive of the discontinued LLVM Phabricator instance.

[Modules] Include builtins with #include instead of #import for ObjC
ClosedPublic

Authored by bruno on Nov 2 2016, 5:54 PM.

Details

Summary

After r284797 builins are treated like textual includes. When compiling for ObjC++, the in-memory header file generated for the modules is composed only of #import's instead of includes. That could block some textual includes to happen because #import's will not re-enter headers already handled by other #import's.

Use #include when the header in question is a built-in.

Diff Detail

Event Timeline

bruno updated this revision to Diff 76807.Nov 2 2016, 5:54 PM
bruno retitled this revision from to [Modules] Include builtins with #include instead of #import for ObjC.
bruno updated this object.
bruno added a reviewer: rsmith.
bruno added subscribers: cfe-commits, manmanren.
aprantl added a subscriber: aprantl.Nov 2 2016, 7:43 PM

Could you include more context when creating the diff eg. git diff -U9999, or equivalent.

include/clang/Lex/ModuleMap.h
283

It seems this is in the private section and it is accessed by FrontendActions.cpp:224.

bruno marked an inline comment as done.Nov 4 2016, 12:31 PM

Could you include more context when creating the diff eg. git diff -U9999, or equivalent.

I did, -U99999 actually, not sure why you're not getting it...

include/clang/Lex/ModuleMap.h
283

Oops!

bruno updated this revision to Diff 76931.Nov 4 2016, 12:31 PM
bruno marked an inline comment as done.

Update patch after Vassil's comments!

rsmith added inline comments.Nov 28 2016, 2:30 PM
lib/Frontend/FrontendActions.cpp
227 ↗(On Diff #76931)

I don't understand why this would be necessary. If these headers are in fact textual headers, they won't be in the HK_Normal or HK_Private lists at all (they'll be in the HK_Private or HK_PrivateTextual lists instead), so your IsBuiltin = true; line should be unreachable. (Checking for an absolute path also seems suspicious here.)

I suspect the problem is instead that #import just doesn't work properly with modules (specifically, either with local submodule visibility or with modules that do not export *), because it doesn't care whether the previous inclusion is visible or not, and as a result it assumes "I've heard about someone #including this ever" means the same thing as "the contents of this file are visible right now". I know that #pragma once has this issue, so I'd expect #import to also suffer from it.

bruno added inline comments.Nov 30 2016, 11:08 AM
lib/Frontend/FrontendActions.cpp
227 ↗(On Diff #76931)

In fact, I can achieve the same desired result by changing libcxx modulemap to:

@@ -161,11 +161,15 @@ module std [system] {
     module cstddef {
       header "cstddef"
       export *
+      // FIXME: #import on Darwin requires explicit re-export
+      export Darwin.POSIX.sys.types
     }
     module cstdint {
       header "cstdint"
       export depr.stdint_h
       export *
+      // FIXME: #import on Darwin requires explicit re-export
+      export Darwin.C.stdint
     }

But I would like to avoid adding darwin specific stuff there.

I don't understand why this would be necessary. If these headers are in fact textual headers,
they won't be in the HK_Normal or HK_Private lists at all (they'll be in the HK_Private or
HK_PrivateTextual lists instead), so your IsBuiltin = true; line should be unreachable.

AFAIU, builtins are added twice (1) with the full path with /<path_to_clang_install>/bin/../lib/clang/4.0.0/include/<builtin>.h, as NormalHeader, which maps to HK_Normal and (2) <builtin>.h as TextualHeader, mapping to HK_Textual. This happens in the snippet (lib/Lex/ModuleMap.cpp:1882) below:

if (BuiltinFile) {
  // FIXME: Taking the name from the FileEntry is unstable and can give
  // different results depending on how we've previously named that file
  // in this build.
  Module::Header H = { BuiltinFile->getName(), BuiltinFile };
  Map.addHeader(ActiveModule, H, Role); <- (1)

  // If we have both a builtin and system version of the file, the
  // builtin version may want to inject macros into the system header, so
  // force the system header to be treated as a textual header in this
  // case.
  Role = ModuleMap::ModuleHeaderRole(Role | ModuleMap::TextualHeader);
}

// Record this header.
Module::Header H = { RelativePathName.str(), File };
Map.addHeader(ActiveModule, H, Role); <- (2)

As an experiment, I changed collectModuleHeaderIncludes to skip addHeaderInclude for builtin headers with HK_Normal, but it caused regressions while compiling Darwin SDK even for non local submodule visibility.

(Checking for an absolute path also seems suspicious here.)

Right, this was done to speedup checking for built-ins, since they are expanded to absolute paths. But this isn't necessary for the logic to work and can be removed.

I suspect the problem is instead that #import just doesn't work properly with modules
(specifically, either with local submodule visibility or with modules that do not export *),
because it doesn't care whether the previous inclusion is visible or not, and as a result it assumes
"I've heard about someone #including this ever" means the same thing as "the contents of this file
are visible right now". I know that #pragma once has this issue, so I'd expect #import to also suffer from it.

Taking a look at this, any ideas on what's the right approach with the #import's here? So instead of entering the header, is there a way to make the contents for the module matching the built-in header to become available/visible at that point?

bruno added a comment.Dec 2 2016, 10:08 AM

The right thing to do would be to track the set of headers whose #import / #include-with-#pragma-once is
visible, and only skip inclusions if there is a *visible* #import / #include-with-#pragma-once of that header.

This makes sense, but the situation is a bit more weird with the builtins: at the time clang skips entering /Users/bruno/Dev/srcs/llvm/debug/bin/../lib/clang/4.0.0/include/stddef.h because there's already a #import associated with it, clang tries to get the module, in this case it finds "libc++.stddef", but makeModuleVisible does not export the contents
of /Users/bruno/Dev/srcs/llvm/debug/bin/../lib/clang/4.0.0/include/stddef.h:

// If we don't need to enter the file, stop now.
if (!ShouldEnter) {
  // If this is a module import, make it visible if needed.
  if (auto *M = SuggestedModule.getModule()) {
    makeModuleVisible(M, HashLoc);
    ...
  }
  return;
}

(lldb) p M->dump()
module stddef   [system] {
  header "/Users/bruno/Dev/srcs/llvm/debug/bin/../lib/clang/4.0.0/include/stddef.h"
  textual header "stddef.h"
  export *
}

Is this what it's supposed to happen with a (textual) builtin header? It seems to me that since the first #import was done after collecting the headers to build the module, the content should already been present/available in the AST for the module.

bruno added a comment.Dec 8 2016, 3:07 PM

Any thoughts on this @rsmith ?

rsmith edited edge metadata.Dec 8 2016, 3:51 PM

Looks like we might only be making macros visible, and failing to emit the annot_module_import token for Sema.

bruno updated this revision to Diff 81478.EditedDec 14 2016, 2:38 PM
bruno edited edge metadata.

Here is the Decl dump in the end of ReadDeclRecord for each Decl before it fails:

ParmVarDecl 0x7ffe23053b20 </Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/math.h:1:9> col:12 in libc.math hidden 'int'
--
FunctionDecl 0x7ffe23053a48 </Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/math.h:1:1, col:12> col:5 in libc.math hidden abs 'int (int)'
`-ParmVarDecl 0x7ffe23053b20 <col:9> col:12 in libc.math hidden 'int'
--
TypedefDecl 0x7ffe23053f90 </Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h:51:1, col:26> col:26 in libc.stddef hidden ptrdiff_t 'long'
`-BuiltinType 0x7ffe2300f9f0 'long'
--
TypedefDecl 0x7ffe23054018 </Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/sys/_types/_ptrdiff_t.h:3:1, col:15> col:15 in libc.POSIX.sys.types hidden ptrdiff_t 'int *'
`-PointerType 0x7ffe23054070 'int *' imported
  `-BuiltinType 0x7ffe2300f9d0 'int'
While building module 'libc++' imported from /Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/c++/v1/stdio.h:4:
In file included from <module-includes>:1:
In file included from /Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/c++/v1/math.h:7:
In file included from /Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/c++/v1/type_traits:4:
/Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/c++/v1/cstddef:7:9: error: unknown type name 'ptrdiff_t'
typedef ptrdiff_t my_ptrdiff_t;
        ^

So, both libc.stddef and libc++.stddef, answer for the builtin /Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h. libc++.stddef is being currently built, that's why the definition isn't there yet. Looks like the right thing is to indeed re-enter /Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h, so it can be put in libc++.stddef.

I've updated the patch to do what you suggested in the first place: track modules per header. Can you comment on the general approach? One consideration though; it's expensive to have the additional SmallVector per each HeaderFileInfo, since most headers (except builtins and textuals) aren't actually expected to have more than one associated Module. Maybe a map from HFIs to associated modules, but I'm not sure where to store it, suggestions? The Decl's loaded from the AST now make more sense (output resuming from the previously failing point):

TypedefDecl 0x7f81ed0f0840 </Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h:62:1, col:23> col:23 in libc.stddef hidden size_t 'unsigned long'
`-BuiltinType 0x7f81ed0ab890 'unsigned long'
--
TypedefDecl 0x7f81ed0f0938 </Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h:76:1, col:23> col:23 in libc.stddef hidden rsize_t 'unsigned long'
`-BuiltinType 0x7f81ed0ab890 'unsigned long'
--
TypedefDecl 0x7f81ed076778 </Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h:51:1, col:26> col:26 in libc.stddef hidden ptrdiff_t 'long'
`-BuiltinType 0x7f81ed03a9f0 'long'
--
TypedefDecl 0x7f81ed076800 </Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/sys/_types/_ptrdiff_t.h:3:1, col:15> col:15 in libc.POSIX.sys.types hidden ptrdiff_t 'int *'
`-PointerType 0x7f81ed076850 'int *' imported
  `-BuiltinType 0x7f81ed03a9d0 'int'
--
TypedefDecl 0x7f81ed0768b0 prev 0x7f81ed076778 </Users/bruno/Dev/srcs/llvm/build/bin/../lib/clang/4.0.0/include/stddef.h:51:1, col:26> col:26 in libc++.stddef hidden referenced ptrdiff_t 'long'
`-BuiltinType 0x7f81ed03a9f0 'long'
--
TypedefDecl 0x7f81ed0769d0 </Users/bruno/Dev/srcs/llvm/tools/clang/test/Modules/Inputs/libc-libcxx/sysroot/usr/include/c++/v1/cstddef:7:1, col:19> col:19 in libc++.cstddef hidden my_ptrdiff_t 'ptrdiff_t':'long'
`-TypedefType 0x7f81ed076920 'ptrdiff_t' sugar imported
  |-Typedef 0x7f81ed0768b0 'ptrdiff_t'
  `-BuiltinType 0x7f81ed03a9f0 'long'

Other options:

Do we need to perfectly preserve the functionality of #pragma once / #import? That is, would it be acceptable to you if we spuriously enter files that have been imported in that way, while building a module? If we view them as pure optimization, we can do something substantially simpler here, for instance by never importing "isImported" flags from modules and throwing away all our "isImported" flags whenever module visibility decreases.

Another possibility -- albeit slightly hacky -- would be to invent a controlling macro if the header in question turns out to not have one (say, form an identifier from the path by which the file was included and use that as the macro name) and is #imported / uses #pragma once. Or we could try simply rejecting textual headers that are #import'd / #pragma once'd and have no controlling macro in a modules build.

include/clang/Lex/HeaderSearch.h
98–101

I don't think this is true in general. For a textual header, there could be hundreds or thousands of loaded modules that use it. If all header files in a project start with

#include "local_config.h"

... which is configured to be a textual header for whatever reason and contains a #pragma once, we would get into that situation for at least that one header. While a vector might be OK (as the number of entries should at least grow only linearly with the size of the entire codebase), quadratic-time algorithms over it probably won't be. Perhaps a sorted vector instead?

lib/Lex/HeaderSearch.cpp
1132–1135

I think this would be clearer as:

[...], ignore it, unless doing so will lead to us importing a module that contains the file but didn't actually include it (because we're still building the corresponding module), and we've not already made the file visible by importing some other module.

... but I don't think that's actually right. If M is null (if the file is only ever included as a textual header), we still need to check whether some visible module has actually made it visible, and we should enter it if not.

lib/Serialization/ASTReader.cpp
1691–1692 ↗(On Diff #81478)

Doing this here will do the wrong thing while building a module with local submodule visibility enabled -- we need to know whether the visible module set contains a module that entered the header, even for modules that we've never serialized.

Also, this will not populate the vector for the cases where the header was not listed in the module map, but was simply a non-moduler header that was textually entered while building a module.

bruno marked 3 inline comments as done.Dec 20 2016, 12:11 AM

Do we need to perfectly preserve the functionality of #pragma once / #import? That is, would it be acceptable to
you if we spuriously enter files that have been imported in that way, while building a module?

Unfortunately we can find ObjC headers that don't have include guards and only rely on #import's. Relaxing this condition could lead to more redefinition errors with current code base than we desire.

Another possibility -- albeit slightly hacky -- would be to invent a controlling macro if the header in question turns out to not have one (say, form an identifier from the path by which the file was included and use that as the macro name) and is #imported / uses #pragma once. Or we could try simply rejecting textual headers that are #import'd / #pragma once'd and have no controlling macro in a modules build.

Right, it's not that hacky - it actually performs the job of actually guarding as we do with includes, but with more context. However, it's not clear to me how to determine the suggested 'identifier from the path'.

I have a third suggestion: seems like we only care here about two cases (1) modular import's from the same header - this will only happens with builtins AFAIU and (2) textual headers that must be imported multiple times for different modules. We can solve (1) by only using the information on HFI and fix (2) by only re-trying to enter the #imports where we find a controlling macro for the header and that macro isn't reachable at that point. I wrote a new patch that implements this, let me know what are your thoughts. Note that I intentionally left non-modular includes out because AFAIK they should be part of the first module they are founded, so it doesn't make any sense to re-enter them anyway.

include/clang/Lex/HeaderSearch.h
98–101

Right, I wasn't considering adding the textual headers at this point, this should have been filtered out by the ASTReader HFI merging. But handling the textual headers is necessary indeed. Hopefully my next patch fix it.

lib/Lex/HeaderSearch.cpp
1132–1135

Looks like what I actually wanted was to check if modules were supported at this point with LangOpts.Modules

lib/Serialization/ASTReader.cpp
1691–1692 ↗(On Diff #81478)

Ok

bruno updated this revision to Diff 82067.Dec 20 2016, 12:14 AM
bruno marked 3 inline comments as done.

Uploaded new patch with another approach to fix the issue.

rsmith accepted this revision.Jan 9 2017, 1:59 PM
rsmith edited edge metadata.
rsmith added inline comments.
lib/Lex/HeaderSearch.cpp
1082

Maybe add a FIXME here indicating that this is a workaround for the lack of proper modules-aware support for #import / #pragma once?

This revision is now accepted and ready to land.Jan 9 2017, 1:59 PM
bruno closed this revision.Jan 10 2017, 7:00 PM

Thanks for the review Richard.

Committed 291644

lib/Lex/HeaderSearch.cpp
1082

Done.