This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Add -flto-jobs=N to control backend parallelism
ClosedPublic

Authored by tejohnson on Sep 22 2016, 6:57 AM.

Details

Summary

Currently, a linker option must be used to control the backend
parallelism of ThinLTO or full LTO (where it only applies to
parallel code gen). The linker option varies depending on the
linker (e.g. gold vs ld64). Add a new clang option -flto-jobs=N
to control this.

I've added in the wiring to pass this to the gold plugin. I also
added in the logic to pass this down in the form I understand that
ld64 uses on MacOS, for the darwin target.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 72164.Sep 22 2016, 6:57 AM
tejohnson retitled this revision from to [LTO] Add -flto-jobs=N to control backend parallelism.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: cfe-commits.
mehdi_amini edited edge metadata.Sep 22 2016, 9:29 AM
mehdi_amini added a subscriber: dexonsmith.

The Gold path looks fine.
On OSX, we would have the clang driver relying on a LLVM cl::opt, for which I don't think there is any precedent. CC Duncan for advice.

Also I don't think the same option should be used for the parallel LTO codegen: it actually does not generate the same binary, which should deserve a dedicated opt-in (What if I mix ThinLTO and LTO, and I don't want // codegen?)

The Gold path looks fine.
On OSX, we would have the clang driver relying on a LLVM cl::opt, for which I don't think there is any precedent. CC Duncan for advice.

I do see other uses of -mllvm in lib/Driver/Tools.cpp, but are you talking about something else?

Also I don't think the same option should be used for the parallel LTO codegen: it actually does not generate the same binary, which should deserve a dedicated opt-in (What if I mix ThinLTO and LTO, and I don't want // codegen?)

Ok good point. I can change this to -fthinlto_jobs. However, while the two parallel settings are separate in the LTO API, currently the gold-plugin jobs option controls both, so I will need to do a preparatory gold-plugin patch to split this into thinlto_jobs and lto_jobs. On the libLTO/ld64 path, looks like the current -mllvm -threads only affects ThinLTO so there is no work to do there.

tejohnson updated this revision to Diff 72301.Sep 23 2016, 9:39 AM
tejohnson edited edge metadata.

Update option description as per decision to split from parallel code gen.

I do see other uses of -mllvm in lib/Driver/Tools.cpp, but are you talking about something else?

I think this is okay, since clang is talking to the same version of libLTO.dylib. I feel like there might be another case where
clang talks to libLTO.dylib through ld64 using -mllvm... perhaps, -O0?

Let's ask around though to be sure.

Ok, let me know what you find out.

Ok good point. I can change this to -fthinlto_jobs. However, while the two parallel settings are separate in the LTO API, currently the gold-plugin jobs option controls both, so I will need to do a preparatory gold-plugin patch to split this into thinlto_jobs and lto_jobs. On the libLTO/ld64 path, looks like the current -mllvm -threads only affects ThinLTO so there is no work to do there.

I actually like -flto-jobs=N better for this. I expect "jobs" not to affect output at all.

I think the current parallel FullLTO CodeGen (where it *does* affect output) should have a special name that calls this out, perhaps -flto-partitions=N? -flto-slices=N? -flto-random-partitions=N? Is it urgent to add that flag now though?

Note that I imagine someone will parallelizing FullLTO the hard way in the future, which won't affect output. That implementation should use -flto-jobs=N.

Ok, sure that seems reasonable. I changed the option documentation to note that this is currently just for ThinLTO. See also D24873 where I split the gold-plugin options.

tejohnson added a subscriber: pcc.Sep 23 2016, 9:42 AM
mehdi_amini added inline comments.Sep 23 2016, 10:31 AM
include/clang/Driver/Options.td
818 ↗(On Diff #72301)

std::thread::hardware_concurrency is a bit implementation specific, can we find a better formulation?
Something like (default of 0 means the number of thread will be derived from the number of CPUs detected)?

tejohnson updated this revision to Diff 72317.Sep 23 2016, 11:53 AM

Update option help message per Mehdi's suggestion

mehdi_amini accepted this revision.Sep 23 2016, 12:50 PM
mehdi_amini edited edge metadata.

LGTM, Thanks!

This revision is now accepted and ready to land.Sep 23 2016, 12:50 PM
This revision was automatically updated to reflect the committed changes.

This is breaking our android lldb build, because it uses std::to_string. Looks like there is llvm::to_string, which should be preferred. Would someone mind changing it? I don't have commit access or I would myself. :)

Thanks in advance