This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Split the options for ThinLTO jobs and Regular LTO partitions
ClosedPublic

Authored by davide on Oct 10 2016, 1:51 PM.

Diff Detail

Event Timeline

davide updated this revision to Diff 74172.Oct 10 2016, 1:51 PM
davide retitled this revision from to [LTO] Split the options for ThinLTO jobs and Regular LTO partitions.
davide updated this object.
davide added reviewers: ruiu, pcc.
davide added a subscriber: llvm-commits.
ruiu added inline comments.Oct 10 2016, 1:56 PM
ELF/Driver.cpp
484–489

It is better to include the option name here.

"--lto-partitions: number of thread must be >0"
davide updated this revision to Diff 74177.Oct 10 2016, 2:53 PM
pcc added inline comments.Oct 10 2016, 3:01 PM
ELF/Driver.cpp
487

I think this would result in a default parallelism level of 1. Can we avoid passing a ThinBackend if the user did not specify a parallelism level?

davide updated this revision to Diff 74182.Oct 10 2016, 3:39 PM
pcc accepted this revision.Oct 10 2016, 4:13 PM
pcc edited edge metadata.

LGTM

ELF/Driver.cpp
487

Maybe -1u here and below?

This revision is now accepted and ready to land.Oct 10 2016, 4:13 PM
mehdi_amini added inline comments.Oct 10 2016, 4:27 PM
ELF/Driver.cpp
487

Doesn't it mean that if the user does not specify --thinlto-jobs on the command line, ThinLTO won't work? I think it would even lead to the LTO library crashing if the bitcode has been generated with ThinLTO? Is there a test that shows what happen in this case? (I don't see any, but I didn't look closely)
The default should be something like std::hardware_concurrency.

pcc added inline comments.Oct 10 2016, 4:39 PM
ELF/Driver.cpp
487

No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229

I think we will eventually want to replace the thread::hardware_concurrency() default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that hardware_concurrency leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients.

You're right, we should have a test that does not pass a --thinlto-jobs flag. @davide can you please add one?

test/ELF/lto/thinlto.ll
2

I think you need to use opt -module-summary here.

mehdi_amini added inline comments.Oct 10 2016, 5:00 PM
ELF/Driver.cpp
487

No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229

OK! Great :)

I think we will eventually want to replace the thread::hardware_concurrency() default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that hardware_concurrency leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients.

Yes, Apple's clang is using the number of physical cores because of that, but my rational was to limit the memory consumption at a time where debug info where even more costly than now for ThinLTO.
So indeed we plan to add support for this in LLVM.

On the other hand, David and Teresa benchmarks building Chromium on a 16-cores Xeon machine (32 hyper threaded cores) and they reported 2m18s for the ThinLTO plugin with 32 threads vs 3m38s with 32 threads. So your link is puzzling...

This revision was automatically updated to reflect the committed changes.