This is an archive of the discontinued LLVM Phabricator instance.

[C++20] [Modules] [ClangScanDeps] Enable to print make-style dependency file within P1689 format (4/4)
ClosedPublic

Authored by ChuanqiXu on Dec 1 2022, 9:15 PM.

Details

Summary

Required in https://reviews.llvm.org/D137534.

The build systems needs the information to know that "header X changed, scanning may have changed, so please rerun scanning". Although it is possible to get the information by running clang-scan-deps for the second time with make format, it is not user friendly clearly.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Dec 1 2022, 9:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 9:15 PM
ChuanqiXu requested review of this revision.Dec 1 2022, 9:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 9:15 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ben.boeckel added inline comments.Dec 2 2022, 5:25 AM
clang/test/ClangScanDeps/P1689.cppm
164

For scanning, this cannot be the object file. The output needs to be the P1689 output (or whatever the "main output" for the scanning rule is). This is the purpose behind the -MT <target> flag: to say what goes in this slot. I think it'll be necessary here (even if clang-scan-deps learns an -o flag because it is the build system that determines the "primary output").

I don't know if the -MMD and -MD differences are important or not; I don't think I particularly care which is default (I've used -MD FWIW), but it may matter for other situations.

ChuanqiXu added inline comments.Dec 4 2022, 6:15 PM
clang/test/ClangScanDeps/P1689.cppm
164

I am confused since the output [[PREFIX]]/impl_part.o is the same with P1689 output [[PREFIX]]/impl_part.o and the one in the compilation database and the one specified in the command line option --p1689-targeted-output. What's the expected output for you in this case? (and the related command line input.)

ben.boeckel added inline comments.Dec 4 2022, 7:38 PM
clang/test/ClangScanDeps/P1689.cppm
164

P1689 is about specifying dependencies of *another* rule found by the dynamic content of some source. -MF is about *discovered* dependencies of *this* rule.

ChuanqiXu added inline comments.Dec 4 2022, 7:45 PM
clang/test/ClangScanDeps/P1689.cppm
164

hmmm sorry, I don't understand it a lot. May you explain your expectation in the form of the input and the corresponding output?

ben.boeckel added inline comments.Dec 5 2022, 4:43 AM
clang/test/ClangScanDeps/P1689.cppm
164

*This* rule outputs foo.ddi (in CMake terms). We need to tell make or ninja what files, if they change, *this* rule needs rerun for. That is what -MF is for. What I need is spelled -MT "normally".

P1689, what this rule is *doing*, is writing dependencies for the *compile* rule, so it is hooked up by *its output*. Two rules cannot have the same output, so P1689 and -MF have completely different things to put for their "output".

You can see here: https://gitlab.kitware.com/cmake/cmake/-/blob/master/.gitlab/ci/cxx_modules_rules_gcc.cmake where -MT gets the <DYNDEP_FILE> which is the -fdep-file= argument. -fdep-file= tells GCC what rule the P1689 itself is for.

ben.boeckel added inline comments.Dec 5 2022, 4:44 AM
clang/test/ClangScanDeps/P1689.cppm
164

-fdep-output= tells GCC what rule the P1689 itself is for.

Fixed.

ChuanqiXu updated this revision to Diff 480329.Dec 5 2022, 9:26 PM

Address comments.

ChuanqiXu added inline comments.Dec 5 2022, 9:27 PM
clang/test/ClangScanDeps/P1689.cppm
164

Oh, I guess I get your point. Now we can set the output name with -MT [[outputname]] -MD in the input after --. See the above changes for example. Is this what you want?

This worked for CMake in a previous version, so is good enough on the functionality front. I'll retry with the current state to verify.

ping @Bigcheese @jansvoboda11 would you take a look at this if it will affect the existing users and the development? Thanks!

Rebase onto main.

Currently we will detect -MF in the command line and we will write the make-format dependency output to the corresponding file once we find -MF.

Currently we will detect -MF in the command line and we will write the make-format dependency output to the corresponding file once we find -MF.

Which is fine, but docs need to mention that some clang-looking flags are actually for clang-scan-deps and that, as such, these flags cannot just be grabbed blindly from a compilation database. I'll note that I'm becoming more and more convinced that compilation_database.json is being abused for this as we are quite contorting it from its intended use case: listing the command lines to compile sources. Instead, we are using it as a source of *related* information that differs from the *real* command line in some important ways. Namely:

  • dependency information extraction is re-used and therefore must be different from the *real* command
  • module information must be missing from the scanning command line as it cannot be known at this point as scanning is used to discover those flags in the first place (usually a response file)

Currently we will detect -MF in the command line and we will write the make-format dependency output to the corresponding file once we find -MF.

When using clang-scan-deps with a compdb with multiple command lines, which depfile path will it use? Or must all agree on the same path and options? Because there should be a single one for all scanning that is performed (e.g., in batch mode). Is this really simpler than just adding -MF and friends to clang-scan-deps directly?

Currently we will detect -MF in the command line and we will write the make-format dependency output to the corresponding file once we find -MF.

Which is fine, but docs need to mention that some clang-looking flags are actually for clang-scan-deps and that, as such, these flags cannot just be grabbed blindly from a compilation database.

Sure, agreed. I'll add docs for the series patches once they got accepted (otherwise we might do many useless works).

When using clang-scan-deps with a compdb with multiple command lines, which depfile path will it use? Or must all agree on the same path and options? Because there should be a single one for all scanning that is performed (e.g., in batch mode).

Currently, clang-scan-deps won't check for this. If we have multiple command lines with different -MF value, the make-style dependency information will be written to these different depfiles. If all the command lines use the same depfile, then the make-style dependency information will be written to the depfile. (BTW, We have a lock for this so we don't need to worry about data racing).

Is this really simpler than just adding -MF and friends to clang-scan-deps directly?

Personally, I feel the complexity of both methods are acceptable. I rewrite the patch since it looks like @jansvoboda11 don't like to add new command line options to clang-scan-deps.

I feel like we have 2 (or 3) options

  1. (The current way) Extract -MF in the command line of clang (from compilation database or from the args after --)
  2. (The original way) Specify -MF in the command line of clang-scan-deps.
  3. (Not good) Do nothing. I feel like it is possible for build systems to get the make-style dependency information by scanning twice. One for P1689 format and one for make-format. It may be workable but it sounds a little bit silly.

Personally, I feel the first (current) style is cleaner.

I'll note that I'm becoming more and more convinced that compilation_database.json is being abused for this as we are quite contorting it from its intended use case: listing the command lines to compile sources. Instead, we are using it as a source of *related* information that differs from the *real* command line in some important ways. Namely:

  • dependency information extraction is re-used and therefore must be different from the *real* command
  • module information must be missing from the scanning command line as it cannot be known at this point as scanning is used to discover those flags in the first place (usually a response file)

Agreed. But this may beyond the scope of the current patch.

ChuanqiXu updated this revision to Diff 486472.Jan 4 2023, 10:23 PM

Add the test to generate make-style dependency file with compilation database.

@ben.boeckel I updated the test with compilation db. BTW, I am curious that according to https://reviews.llvm.org/D137534, cmake may not use clang-scan-deps with compilation db for P1689. So this is a pure review comment and it is not related with the planned practice of CMake, right? Or did I misunderstand anything?


Also I am wondering if cmake will only use clang-scan-deps without compilation db for P1689, how can it generate multiple make-style dependency information into one file.

I mean, if we execute:

clang-scan-deps -format=p1689 -- clang++ -std=c++20 -MD -MT <target_name> -MF <dep_file> -c input_file <any_other_needed_opts> -o output_file

then we execute:

clang-scan-deps -format=p1689 -- clang++ -std=c++20 -MD -MT <another_target_name> -MF <dep_file> -c another_input_file <any_other_needed_opts> -o another_output_file

the make-style dependency information of input_file in <dep_file> will be overwritten. Or would cmake like to concat the results manually?

clang/test/ClangScanDeps/P1689.cppm
291–302

We can find the result to write make-style dependency information with compilation database here.

Currently, clang-scan-deps won't check for this. If we have multiple command lines with different -MF value, the make-style dependency information will be written to these different depfiles.

IMO, this is not suitable; there must be *one* depfile for Ninja to work (otherwise build tools will need to manually collate all of this into one for ninja.

I feel like we have 2 (or 3) options

  1. (The current way) Extract -MF in the command line of clang (from compilation database or from the args after --)

See above; it works for the file-by-file, but falls over with batch scanning.

  1. (The original way) Specify -MF in the command line of clang-scan-deps.

I feel this scales and communicates what is happening much better because it actually is a flag for clang-scan-deps itself.

  1. (Not good) Do nothing. I feel like it is possible for build systems to get the make-style dependency information by scanning twice. One for P1689 format and one for make-format. It may be workable but it sounds a little bit silly.

In what mode will clang output #include information without erroring about missing .pcm files for import statements? -E -fdirectives-only perhaps? This further exacerbates the need for "fake" command lines and costs on platforms with expensive process execution.

the make-style dependency information of input_file in <dep_file> will be overwritten. Or would cmake like to concat the results manually?

Every command gets its own unique <dep_file>. Scanning, compilation, etc. Overlapping is just asking for race conditions in the build.

IMO, this is not suitable; there must be *one* depfile for Ninja to work (otherwise build tools will need to manually collate all of this into one for ninja.

Yeah, but the producer of the compilation database is able to create a compilation db with the same -MF option. Do you mean you will feel much better that the clang-scan-deps will have a strong check?

  1. (The original way) Specify -MF in the command line of clang-scan-deps.

I feel this scales and communicates what is happening much better because it actually is a flag for clang-scan-deps itself.

There is a related problem that how do we set the target of the dependency? In per file mode, we can easily specify it in the command line of clang-scan-deps. But we have to extract them ("-MT") from the command line with compilation database. (Otherwise all the targets have the same name, which is incorrect). Do you think it is acceptable?

In what mode will clang output #include information without erroring about missing .pcm files for import statements? -E -fdirectives-only perhaps? This further exacerbates the need for "fake" command lines and costs on platforms with expensive process execution.

After we land D137526, we are able to do this by: -M -MF <dep_file> -E.

In what mode will clang output #include information without erroring about missing .pcm files for import statements? -E -fdirectives-only perhaps? This further exacerbates the need for "fake" command lines and costs on platforms with expensive process execution.

After we land D137526, we are able to do this by: -M -MF <dep_file> -E.

BTW, if you are interested, it should be possible to use clang-scan-deps to get the make-style format information.

Arthapz added a comment.EditedJan 6 2023, 3:03 PM

Hi, just wanted to say that i added support of these patch to XMake (i built a LLVM with your 4 patches) and it work pretty well :) (except for header units)

> xmake b -vD                                                                                                                           
[  0%]: generating.module.deps src/main.cpp
checking for clang-scan-deps ... /usr/bin/clang-scan-deps
checking for flags (-std=c++20) ... ok
> clang "-std=c++20"
checking for flags (-fmodules) ... ok
> clang "-fmodules"
/usr/bin/clang-scan-deps --format=p1689 -- /usr/bin/clang -x c++ -c src/main.cpp -o build/.objs/dependence/linux/x86_64/release/src/main.cpp.o -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps
[  0%]: generating.module.deps src/foo.mpp
/usr/bin/clang-scan-deps --format=p1689 -- /usr/bin/clang -x c++ -c src/foo.mpp -o build/.objs/dependence/linux/x86_64/release/src/foo.mpp.o -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps
[  0%]: generating.module.deps src/zoo.mpp
/usr/bin/clang-scan-deps --format=p1689 -- /usr/bin/clang -x c++ -c src/zoo.mpp -o build/.objs/dependence/linux/x86_64/release/src/zoo.mpp.o -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps
[  0%]: generating.module.deps src/cat.mpp
/usr/bin/clang-scan-deps --format=p1689 -- /usr/bin/clang -x c++ -c src/cat.mpp -o build/.objs/dependence/linux/x86_64/release/src/cat.mpp.o -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps
[  0%]: generating.module.deps src/bar.mpp
/usr/bin/clang-scan-deps --format=p1689 -- /usr/bin/clang -x c++ -c src/bar.mpp -o build/.objs/dependence/linux/x86_64/release/src/bar.mpp.o -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps
checking for flags (clang_modules_cache_path) ... ok
> clang "-fmodules-cache-path=/dev/shm/.xmake1000/230107"
[ 10%]: compiling.module.release zoo
/usr/bin/clang -c -x c++-module --precompile -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm src/zoo.mpp
/usr/bin/clang -c -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -o build/.objs/dependence/linux/x86_64/release/src/zoo.mpp.o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm
checking for flags (-MMD -MF) ... ok
> clang "-MMD" "-MF" "/dev/null"
[ 10%]: compiling.module.release cat
/usr/bin/clang -c -x c++-module --precompile -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/cat.pcm src/cat.mpp
/usr/bin/clang -c -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -o build/.objs/dependence/linux/x86_64/release/src/cat.mpp.o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/cat.pcm
checking for flags (-fdiagnostics-color=always) ... ok
> clang "-fdiagnostics-color=always"
checking for flags (clang_module_file) ... ok
> clang "-fmodule-file=/dev/shm/.xmake1000/230107/_11EA40624C464D10876A6DA3D0E41320.pcm"
[ 30%]: compiling.module.release bar
/usr/bin/clang -c -x c++-module --precompile -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm -o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/bar.pcm src/bar.mpp
/usr/bin/clang -c -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm -o build/.objs/dependence/linux/x86_64/release/src/bar.mpp.o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/bar.pcm
[ 40%]: compiling.module.release foo
/usr/bin/clang -c -x c++-module --precompile -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/bar.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/cat.pcm -o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/foo.pcm src/foo.mpp
/usr/bin/clang -c -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodules-cache-path=build/.gens/dependence/linux/x86_64/release/rules/modules/cache -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/bar.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/cat.pcm -o build/.objs/dependence/linux/x86_64/release/src/foo.mpp.o build/.gens/dependence/linux/x86_64/release/rules/modules/cache/foo.pcm
[ 70%]: compiling.release src/main.cpp
/usr/bin/clang -c -Qunused-arguments -m64 -std=c++20 -fmodules -fno-implicit-module-maps -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/foo.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/bar.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/zoo.pcm -fmodule-file=build/.gens/dependence/linux/x86_64/release/rules/modules/cache/cat.pcm -o build/.objs/dependence/linux/x86_64/release/src/main.cpp.o src/main.cpp
checking for clang++ ... /usr/bin/clang++
checking for the linker (ld) ... clang++
checking for /usr/bin/clang++ ... ok
checking for flags (-fPIC) ... ok
> clang++ "-fPIC"
[ 80%]: linking.release dependence
/usr/bin/clang++ -o build/linux/x86_64/release/dependence build/.objs/dependence/linux/x86_64/release/src/main.cpp.o build/.objs/dependence/linux/x86_64/release/src/foo.mpp.o build/.objs/dependence/linux/x86_64/release/src/zoo.mpp.o build/.objs/dependence/linux/x86_64/release/src/cat.mpp.o build/.objs/dependence/linux/x86_64/release/src/bar.mpp.o -m64

build cache stats:
cache directory: build/.build_cache
cache hit rate: 0%
cache hit: 0
cache miss: 0
new cached files: 0
remote cache hit: 0
remote new cached files: 0
preprocess failed: 0
compile fallback count: 0

[100%]: build ok!

BTW, if you are interested, it should be possible to use clang-scan-deps to get the make-style format information.

I want a *single* depfile for *anything* scanned by the clang-scan-deps invocation, not one per TU scanned. Also note that the -MT is the output of clang-scan-deps; in this case, the .ddi file that contains P1689 content. I think it'd be *very* weird to have clang-scan-deps -p1689-output=out.ddi -- clang -MF blah.d -MT out.ddi.

Hi, just wanted to say that i added support of these patch to XMake (i built a LLVM with your 4 patches) and it work pretty well :) (except for header units)

@Arthapz Great to hear that! It is pretty important for us to know there are more build systems which are using these functionality. BTW, for header units, it is still under discussion that how should build system and compiler interact about header units. It is still unclear whether or not the header units should be transparent to build systems (and other tools).

BTW, if you are interested, it should be possible to use clang-scan-deps to get the make-style format information.

I want a *single* depfile for *anything* scanned by the clang-scan-deps invocation, not one per TU scanned. Also note that the -MT is the output of clang-scan-deps; in this case, the .ddi file that contains P1689 content. I think it'd be *very* weird to have clang-scan-deps -p1689-output=out.ddi -- clang -MF blah.d -MT out.ddi.

I mean, it is possible to get the make-style format information by clang-scan-deps -format=make --compilation-database=db.json. (Although it sounds not smart indeed)

Then how do you feel about the current strategy? clang-scan-deps --format=p1689 --compilation-database=db.json. Then if every commands in db.json has the same -MF value, it should be able to satisfy your requirement.

@Arthapz Great to hear that! It is pretty important for us to know there are more build systems which are using these functionality. BTW, for header units, it is still under discussion that how should build system and compiler interact about header units. It is still unclear whether or not the header units should be transparent to build systems (and other tools).

Header units have even more need to be involved in the build graph than named units. ODR violations and cache invalidation problems await anyone just winging it on header units (at least that's the understanding I've gotten from SG15 meetings).

I mean, it is possible to get the make-style format information by clang-scan-deps -format=make --compilation-database=db.json. (Although it sounds not smart indeed)

I don't see the benefit of having to run two commands to get this information.

Then how do you feel about the current strategy? clang-scan-deps --format=p1689 --compilation-database=db.json. Then if every commands in db.json has the same -MF value, it should be able to satisfy your requirement.

This seems…reasonable. No worse than the other lies we have to tell in a databse. FWIW, the same with should happen with -MT.

FWIW, the same with should happen with -MT.

Yeah, it should be. Since the patch reuses the previous logics.

BTW, for header units, it is still under discussion that how should build system and compiler interact about header units. It is still unclear whether or not the header units should be transparent to build systems (and other tools).

On XMake we built our support around MSVC then extended support to GCC and clang, so we handle header units separately from named modules (but in a similar way), i don't have problem to add some logic in XMake to support them for clang

Header units have even more need to be involved in the build graph than named units. ODR violations and cache invalidation problems await anyone just winging it on header units (at least that's the understanding I've gotten from SG15 meetings).

I think that latter claim applies equally to all module units. The ODR violation and cache invalidation concerns sometimes associated with header units occur in implicit module build systems in which a header unit might be built multiple times with different sets of options that result in an ODR violation. The same problem can occur with other kinds of module units if they are built multiple times with different options and then imported by distinct TUs that are then linked together. The general rule is, given a set of TUs that will be linked together, all imported module units should be built exactly one time.

jansvoboda11 added inline comments.Jan 9 2023, 10:25 AM
clang/tools/clang-scan-deps/ClangScanDeps.cpp
841

How will this work when a different process tries to write the same file? Could we write into a temporary file and then do atomic rename?

Header units have even more need to be involved in the build graph than named units. ODR violations and cache invalidation problems await anyone just winging it on header units (at least that's the understanding I've gotten from SG15 meetings).

I think that latter claim applies equally to all module units. The ODR violation and cache invalidation concerns sometimes associated with header units occur in implicit module build systems in which a header unit might be built multiple times with different sets of options that result in an ODR violation. The same problem can occur with other kinds of module units if they are built multiple times with different options and then imported by distinct TUs that are then linked together. The general rule is, given a set of TUs that will be linked together, all imported module units should be built exactly one time.

I think the option set will be naturally a unique set for a single module-unit since a module-unit is an actual translation unit. But the header units are synthesized units so it is not the same case.

I am pretty interested about the topic indeed but I guess we'd better to find another place to discuss this.

clang/tools/clang-scan-deps/ClangScanDeps.cpp
841

It may be problematic if another process tries to write the same file at the same time. But I feel it is an overkill to defend such cases otherwise many existing codes need to be refactored.

This revision is now accepted and ready to land.Jan 20 2023, 11:24 AM
This revision was landed with ongoing or failed builds.Feb 9 2023, 7:23 PM
This revision was automatically updated to reflect the committed changes.
TWeaver added a subscriber: TWeaver.EditedFeb 10 2023, 5:22 AM

@ChuanqiXu Hi there, the revert of this patch has caused the following buildbot to start failing:

https://lab.llvm.org/buildbot/#/builders/192/builds/267

It looks as though the removal of the // REQUIRES: !system-windows has caused the test P1689.cppm to start running, and therefore failing, on Windows builds.

Also, "// UNSUPPORTED: system-windows" would be a more canonical way to exclude the test from running on Windows machines.

The failure should be fixed by https://reviews.llvm.org/D143749, which disables the test in windows. (Because the inconsistent slash direction in linux and windows)