It's not really in the spirit of clang modules to make a top level module for every single header. That's necessary for the public headers, but the private detail headers could be grouped into bigger modules as long as they remain acyclic. Make a Python script to do the grouping.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@ldionne is concerned about the maintainability of this patch. The module generation script is admittedly complicated.
- validate_exports needs to be updated if module naming conventions change.
- write_libcxx_includes needs to be updated when the subdirectories and files that need to be ignored are updated.
- build_include_tree needs to be updated when new C++ versions are added.
- build_include_tree needs to be updated when new macros control what gets included. _LIBCPP_ENABLE_EXPERIMENTAL and _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY were found while debugging module cycles that the script created.
It's probably not obvious that you might need to edit internal_module_attributes.json.in from time to time as includes change and new private detail headers are added.
While I don't personally think that doing any of these is particularly difficult or onerous, I must admit they aren't obvious. If you don't know you have to do them, the errors are going to be fairly difficult to figure out. Several of them you won't realize you've forgotten them until some time later. e.g. you might not realize that you need to add c++26 until years down the road when the modules become cyclic on that version because the script never knew it needed to take that into consideration.
Nevertheless, I'm posting this review because 1) it produces a much nicer module map than can be done by hand, and 2) it might be necessary down the road if build performance suffers for having too many modules.
Don't allocate a Header for every line from cxx -H, which is what was happening from the dict.setdefault call. Break that up into a get/test/set sequence.
[Github PR transition cleanup]
I've said it before but I think this is too complicated and we don't have a clear picture of the benefits. I would prefer abandoning this patch, but if you really want to move forward with it, please provide compelling evidence of why this complexity is worth adding.
I still think this results in a better module map; module-per-header solves our requirements but just feels very inelegant. But I think I need to split out some of the variables into the json file like which standards to run and which macros go with which standards. And probably convert that into a Python file akin to header_info.py. I'm also maybe overestimating how often the module map needs to change, and maybe it could be a regular generated file instead. I'll abandon this for now, and if we end up needing it after llvm 17 ships, or I get a bit of time to work on it and I'm still bummed about all the work I put in on this, I'll try it for review on GitHub.