Page MenuHomePhabricator

[ThinLTO] upgrade IR symtab in parallel ahead of time
Needs ReviewPublic

Authored by ychen on Dec 10 2019, 4:56 AM.

Details

Summary

Before parsing file, collect all bitcode files that need upgrade, and perform the actual upgrade in parallel, assuming the time taken to upgrade the symtab is proportion to the file size. Probably some of the upgrade is not necessary if the bitcode file is archive member and no symbol is defined there. After some experiments, it seems that the cost would be low because of parallelization.

Add an lld option(--cache-ir-symtab) to enable this behavior (default off). In the long run, I think it is beneficial to turn it on by default.

For symtab upgrade path, this reduces clang binary linking (all file hit lto cache) to ~5.5 (8 core, 16 threads) seconds from ~13s. Non-upgrade path takes about 3.8s.

Diff Detail

Event Timeline

ychen created this revision.Dec 10 2019, 4:56 AM
Herald added a project: Restricted Project. · View Herald Transcript
ychen edited reviewers, added: pcc, tejohnson, steven_wu; removed: espindola.Dec 10 2019, 5:21 AM

This sounds like a good idea. Have you thought about just write out upgraded bitcode file on disk (since you named it a "SymTabCache"), maybe into thinLTO cache? This will speed up the incremental build as well. I think most of the upgrade path are coming from linking a bitcode static library, which is probably not changing during incremental build so we just need to upgrade once.

This sounds like a good idea. Have you thought about just write out upgraded bitcode file on disk (since you named it a "SymTabCache"), maybe into thinLTO cache? This will speed up the incremental build as well. I think most of the upgrade path are coming from linking a bitcode static library, which is probably not changing during incremental build so we just need to upgrade once.

Thanks for the feedback, @steven_wu. I think there would be more improvement by caching the upgraded IR symbol table (the IR module self is in memory but not parsed). I would do some experiments to get rough numbers. I think it will be a separate patch for the caching because it helps with or without this patch.

steven_wu accepted this revision.Dec 18 2019, 12:14 PM

LLVM LTO part LGTM. I will get another pair of eyes on lld changes.

This revision is now accepted and ready to land.Dec 18 2019, 12:14 PM
ormris added a subscriber: ormris.Jan 13 2020, 11:40 AM
ychen updated this revision to Diff 261847.May 4 2020, 9:31 AM
  • rebase
ychen added a comment.Thu, May 14, 4:23 PM

gentle ping. Could reviewers help take a look at the linker part?

tejohnson added a subscriber: ruiu.Fri, May 15, 9:48 AM

@MaskRay and @ruiu are the best reviewers for the LLD changes.

One high level question and one request:

Does this affect the laziness of Lazy object/archive files? It looked to me as if those perhaps were going to be upgraded non-lazily after this change.

It would be good to add equivalent support to gold-plugin.cpp as well so that those remain in sync on functionality.

For symtab upgrade path, this reduces clang binary linking (all file hit lto cache) to ~5.5 (8 core, 16 threads) seconds from ~13s. Non-upgrade path takes about 3.8s.

This looks promising and I want to try out this patch for our internal targets. The patch does not apply at HEAD. Do you mind rebasing it?

lld/ELF/Driver.cpp
72–75

Apologies that I just touched namespace lld { lines which might require you to rebase.

lld/ELF/Options.td
106

Use the same name as the option name.

Use BB for strict two dashes. GNU ld's loose option parsing cause some parsing ambiguity. We want to avoid one-dash long options for newer options.

llvm/lib/Object/IRSymtab.cpp
312–315

Do we have a better testing this other than using an environment variable?

ychen updated this revision to Diff 264367.Fri, May 15, 3:09 PM
  • rebase
ychen added a comment.Fri, May 15, 4:00 PM

@MaskRay and @ruiu are the best reviewers for the LLD changes.

One high level question and one request:

Does this affect the laziness of Lazy object/archive files? It looked to me as if those perhaps were going to be upgraded non-lazily after this change.

Yes, it indeed pre-upgrades the IR symbol table of Lazy object/archive files. It is a little bit wasteful potentially. Good thing is that parallelization makes the cost trivial.

It would be good to add equivalent support to gold-plugin.cpp as well so that those remain in sync on functionality.

Understood. I will look into that.

ychen updated this revision to Diff 264379.Fri, May 15, 4:15 PM
  • Address comments
ychen marked 2 inline comments as done.Fri, May 15, 4:17 PM
ychen added inline comments.
llvm/lib/Object/IRSymtab.cpp
312–315

Not that I'm aware of. @tejohnson any suggestions?

MaskRay added a comment.EditedSun, May 17, 10:45 AM

Just to confirm. @ychen Do you have lots of bitcode files with different versions which require upgrade in LTO?

I tested with an internal program and there seems to be 1% improvement on time, minor cost on memory usage. I need to collect a bit more applications for benchmark.
Loading LazyObjFile and ArchiveFile eagerly may potentially accelerate "Parse input files". (Our Bazel builds use very file archives (we don't use --nostart_end_lib))
There is a proposal in 2019-04 http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html making the scheme more generic.

@ruiu Is this proposal unfinished yet? I think we can at least add an option to opt in that mode (parsing LazyObjFile (and probably ArchiveFile) eagerly). This bitcode parsing may need a dependency on that mode.

lld/ELF/InputFiles.cpp
510

Why is the change?

1527

This logic requires a test.

ychen added a comment.Sun, May 17, 8:07 PM

Just to confirm. @ychen Do you have lots of bitcode files with different versions which require upgrade in LTO?

Yes. The number I collected was for cases for all files are bitcode and need upgrade.

I tested with an internal program and there seems to be 1% improvement on time, minor cost on memory usage. I need to collect a bit more applications for benchmark.
Loading LazyObjFile and ArchiveFile eagerly may potentially accelerate "Parse input files". (Our Bazel builds use very file archives (we don't use --nostart_end_lib))
There is a proposal in 2019-04 http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html making the scheme more generic.

@ruiu Is this proposal unfinished yet? I think we can at least add an option to opt in that mode (parsing LazyObjFile (and probably ArchiveFile) eagerly). This bitcode parsing may need a dependency on that mode.

It would be good to add equivalent support to gold-plugin.cpp as well so that those remain in sync on functionality.

It seems that the gold plugin API does not work well with the pre-upgrade this patch is proposing. Without pre-upgrade, the upgrade happens during InputFile::create in claim_file_hook. Pre-upgrade needs to happen before claim_file_hook where something like vector<Input_path> is available. The plugin API does not seem to have this hook. Not quite sure how to proceed with this restriction.

ychen updated this revision to Diff 265060.Tue, May 19, 3:15 PM
  • Add linker test for the option and dependent libraries
ychen updated this revision to Diff 265062.Tue, May 19, 3:16 PM
  • reupload
ychen marked 2 inline comments as done.Tue, May 19, 3:21 PM
ychen added inline comments.
lld/ELF/InputFiles.cpp
510

To factor out the common code getPath to be used later.

1527

I added a test case to the deplib test file. It should cover this if statement.

Add an lld option(--cache-ir-symtab) to enable this behavior (default off). In the long run, I think it is beneficial to turn it on by default.

I tested this patch internally. There is some minor performance loss turning on --cache-ir-symtab. That may be because we don't have mismatching versions of bitcode files.
I guess many users do the same (same version of bitcode files), so I am unsure this should be on by default.

lld/ELF/Driver.cpp
506

Drop AheadOfTime. It is not part of the option name.

If this is an LTO specific option, --lto-* may be more suitable.

MaskRay added a comment.EditedTue, May 19, 8:55 PM

Also, I still want to check whether doing further work on http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html can avoid the LTO specific option.

Request for changes (just in case). Sorry..

MaskRay requested changes to this revision.Tue, May 19, 8:55 PM
This revision now requires changes to proceed.Tue, May 19, 8:55 PM
ychen updated this revision to Diff 265133.Tue, May 19, 9:55 PM
  • replace cacheIRSymTabAheadOfTime with cacheIRSymTab
ychen added a comment.Tue, May 19, 9:56 PM

Also, I still want to check whether doing further work on http://lists.llvm.org/pipermail/llvm-dev/2019-April/131902.html can avoid the LTO specific option.

Request for changes (just in case). Sorry..

Not a problem. I'm curious to know too :-)

It would be good to add equivalent support to gold-plugin.cpp as well so that those remain in sync on functionality.

It seems that the gold plugin API does not work well with the pre-upgrade this patch is proposing. Without pre-upgrade, the upgrade happens during InputFile::create in claim_file_hook. Pre-upgrade needs to happen before claim_file_hook where something like vector<Input_path> is available. The plugin API does not seem to have this hook. Not quite sure how to proceed with this restriction.

That's true the claim_file_hook is called on files one by one from gold. But there is a second call to InputFile::create for each claimed module, done from addModule, which I assume will upgrade them a second time. In that case the caller has the whole list of input files that were claimed, so something similar could presumably be done there.

ychen marked 2 inline comments as done.Tue, May 19, 10:10 PM

Add an lld option(--cache-ir-symtab) to enable this behavior (default off). In the long run, I think it is beneficial to turn it on by default.

I tested this patch internally. There is some minor performance loss turning on --cache-ir-symtab. That may be because we don't have mismatching versions of bitcode files.
I guess many users do the same (same version of bitcode files), so I am unsure this should be on by default.

This mostly benefits ThinLTO incremental build. When there are a lot of bitcode files need to be upgraded, the performance increase is obvious; if not, I expected the performance loss is very minor. There are some heuristics could help performance loss. For example, skip pre-upgrade when the number of candidate files is smaller than some threshold. Anyhow, I think it is reasonable to make it opt-in to begin with.

It would be good to add equivalent support to gold-plugin.cpp as well so that those remain in sync on functionality.

It seems that the gold plugin API does not work well with the pre-upgrade this patch is proposing. Without pre-upgrade, the upgrade happens during InputFile::create in claim_file_hook. Pre-upgrade needs to happen before claim_file_hook where something like vector<Input_path> is available. The plugin API does not seem to have this hook. Not quite sure how to proceed with this restriction.

That's true the claim_file_hook is called on files one by one from gold. But there is a second call to InputFile::create for each claimed module, done from addModule, which I assume will upgrade them a second time. In that case the caller has the whole list of input files that were claimed, so something similar could presumably be done there.

Thanks for the pointer! I will look into that.