This is an archive of the discontinued LLVM Phabricator instance.

[docs] Document that the modules can improve the linking time
AbandonedPublic

Authored by ChuanqiXu on Nov 23 2022, 1:43 AM.

Details

Summary

We found that the linking time decreases significantly after using modules. It is a big surprise and I feel like it is good to document it. So that other people whose project is pretty slow at linking time now may have stronger motivation to use modules.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Nov 23 2022, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 1:43 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
ChuanqiXu requested review of this revision.Nov 23 2022, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 1:43 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I'm not sure I follow this - the comparison doesn't seem equal. If you wanted the modules-like code in non-modules, you could get it by having notDirectlyUsed declared-but-not-defined in the header file, and defined in the corresponding .cpp file, yeah?

This seems comparing different things... As mentioned, header files can be refactored to reduce the number of symbols as well.
I'd be curious to learn how linking time can be reduced significantly by deploying modules.

I'd be curious to learn how linking time can be reduced significantly by deploying modules.

Oh, today I found the conclusion "significant decrease in link time time" was not responsible. Since I didn't recognize that our projects uses thinlto and thinlto cache. So there are too many factors and I can't say it reduces the linktime directly. And I'll try to do more precise analysis today.

And it would be great if you are interested. If you want to play with it. I think you can get started to apply the patch https://reviews.llvm.org/D138666 (this patch is for fun only.) to clang. Then you can find a sample project. The project would be best to have long link time and large and repeatedly included headers. Our project is close-sourced so I can't share it.

Then you can put the headers into a module unit:

// m.cppm
module;
#include "headers"
// ...
export module m;

Then for each .cc file, you can replace the headers with the modules by:

import m;

Note that if there are uses of macro in the source files, you need to extract them into a standalone header which contains macro definitions only. And let the source files include the standalone headers. Because the named modules can't export macros. (Or we can hack it in the compiler for simplicity). Finally remember to compile m.cppm to m.o and link it. (If there are bugs you can try to file an issue)

Then I think you are able to analysis the changes.

(Never mind if you don't want to do so many things)

I'm not sure I follow this - the comparison doesn't seem equal. If you wanted the modules-like code in non-modules, you could get it by having notDirectlyUsed declared-but-not-defined in the header file, and defined in the corresponding .cpp file, yeah?

Yeah.

This seems comparing different things... As mentioned, header files can be refactored to reduce the number of symbols as well.

It is true too.

It looks like my writing is misleading. Sorry for confusing. My point here is: with modules, the not directly used symbols wouldn't show up in the importee. And usuallythere would be many not directly used symbols in the header files. We need a clean re-design and some effort to move these not directly used items into .cpp files. It is not trivial in real world projects.

And another problem here is, without LTO, the function definitions in other TU can't be inlined. But now, the function definitions in the module interface can be imported to the importee as AvaialableExternally linkage with optimization. The AvaialableExternally linkage is a special linkage which aims for the IPO. And the AvaialableExternally entities would be removed in the middle end after inlining. (I know there are arguments to remove the function definition in the module file.) So if we move not directly used things in .cpp files, the performance will be hurt. But it is not the case for modules (at least for now).

And another problem here is, without LTO, the function definitions in other TU can't be inlined. But now, the function definitions in the module interface can be imported to the importee as AvaialableExternally linkage with optimization. The AvaialableExternally linkage is a special linkage which aims for the IPO. And the AvaialableExternally entities would be removed in the middle end after inlining. (I know there are arguments to remove the function definition in the module file.) So if we move not directly used things in .cpp files, the performance will be hurt. But it is not the case for modules (at least for now).

Yeah, I think if we're planning to minimize pcms to the point where they can't do code generation I guess we'll probably remove the indirect function definitions/not use available externally, and rely on (Thin)LTO to provide whole program optimization - so the guidance that modules reduces link time wouldn't be true in that case.

I think it's probably best to abandon this patch/not document/suggest that this is a benefit people should expect from modules if it's one that'd go away in the future anyway.

ChuanqiXu abandoned this revision.Nov 27 2022, 5:45 PM

And another problem here is, without LTO, the function definitions in other TU can't be inlined. But now, the function definitions in the module interface can be imported to the importee as AvaialableExternally linkage with optimization. The AvaialableExternally linkage is a special linkage which aims for the IPO. And the AvaialableExternally entities would be removed in the middle end after inlining. (I know there are arguments to remove the function definition in the module file.) So if we move not directly used things in .cpp files, the performance will be hurt. But it is not the case for modules (at least for now).

Yeah, I think if we're planning to minimize pcms to the point where they can't do code generation I guess we'll probably remove the indirect function definitions/not use available externally, and rely on (Thin)LTO to provide whole program optimization - so the guidance that modules reduces link time wouldn't be true in that case.

I think it's probably best to abandon this patch/not document/suggest that this is a benefit people should expect from modules if it's one that'd go away in the future anyway.

Got it. You're correct : )