Page MenuHomePhabricator

Allow clang -Os and -Oz to work with -flto and lld
Needs ReviewPublic

Authored by bero on Jun 29 2019, 12:35 PM.

Details

Summary

Fix clang -Os/-Oz with LTO

$ clang -Os -fuse-ld=lld -flto test.c
ld.lld: error: -plugin-opt=Os: number expected, but got 's'
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
$ clang -Oz -fuse-ld=lld -flto test.c
ld.lld: error: -plugin-opt=Oz: number expected, but got 'z'
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

https://bugs.llvm.org/show_bug.cgi?id=42445

Diff Detail

Repository
rC Clang

Event Timeline

bero created this revision.Jun 29 2019, 12:35 PM

Good idea, I found this issue a few days ago too.

Please upload the patch with a full context.
Plesse try to add a testcase.

tejohnson added inline comments.Jun 29 2019, 3:16 PM
llvm-9.0.0-20190629/clang/lib/Driver/ToolChains/CommonArgs.cpp
395

Os/Oz are closer to O2 than O3 (which allows much more aggressive code size increasing optimizations).

Better though to add a size level to the LTO::Config, teach lld to pass it through properly, then using the LTO Config to set the SizeLevel in the old PM and the PassBuilder::OptimizationLevel in the new PM when setting up the LTO backend pipelines, similar to how CodeGenLevel.OptimizeSize is handled in clang (BackendUtil.cpp).

My concern is that silently mapping Os/Oz to do something different than in the non-LTO pipeline is going to end up more confusing than the current error (which isn't good either, but at least makes it clear that it isn't supported).

ormris added a subscriber: ormris.Jul 1 2019, 9:46 AM
ruiu added a comment.Jul 2 2019, 3:01 AM

I agree with Teresa. I don't think automatically setting O3 for Os and Oz is a good idea because these three options are different (that's why we have three different options in the first place). Adding an Os and Oz to lld's LTO option seems like a good idea, but they should be mapped to their corresponding features.

I agree with Teresa. I don't think automatically setting O3 for Os and Oz is a good idea because these three options are different (that's why we have three different options in the first place). Adding an Os and Oz to lld's LTO option seems like a good idea, but they should be mapped to their corresponding features.

It shouldn't Matt

llvm-9.0.0-20190629/clang/lib/Driver/ToolChains/CommonArgs.cpp
395

Using O2 makes sense to me.
Beyond this does it matter much? Isn't the important part of Os/Oz carried through a function attribute?

dmgreen added a subscriber: dmgreen.Jul 6 2019, 1:26 PM
tycho added a subscriber: tycho.Aug 6 2019, 7:26 AM

Two things:

  • I'm curious why we would want to force -O2/-O3 instead of just allowing -Os/-Oz to be used with LTO. Is optimizing for size combined with LTO really that unusual? I've built several projects using GCC with -Os -flto and the size reduction over plain -Os or -O2 -flto has been substantial enough to warrant that combination on release builds as well. I assume I might be missing something here, though, since someone mentioned this above (I don't understand the response to it though).
  • This change is missing a fix for the option parsing in tools/gold/gold-plugin.cpp (option::process_plugin_option), which also complains about the -Os/-Oz arguments.

I assume I might be missing something here, though, since someone mentioned this above (I don't understand the response to it though).

There are two invocations in LTO: the first phase where we compile from source to bitcode, and the second phase which is invoked by the linker.

Phase 1: compile to bitcode

clang++ -Os -flto foo.c -o foo.o
clang++ -O2 -flto bar.c -o bar.o

Phase 2: LTO and CodeGen

clang++ bar.o foo.o

When compiling foo.c, we use Os which add a function attribute to make each function in foo.o as such. Functions in bar.o won't have the same attributes.
During LTO we merge everything but functions keep their attributes, the optimization passes can adjust their heuristics to honor the fact that functions from foo.c will be "optimized for size" and the function from bar.c will be optimized "normally".

Specifying an optimization level for the LTO phase is a also bit awkward since the LTO optimization pipeline is already very different from the non-LTO one: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L1012

What remains are CodeGen heuristics, and there might still have some global flags in use: https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Support/CodeGen.h#L51 (note nothing specific to Os/Oz though), still in use for example for generating FMAs on AArch64 apparently: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp#L55

This is why specifying Oz/Os during LTO can be very confusing for the user: it would change very few things in the process without actually making function from bar.c having the right function attributes (nothing would override the lack of attribute as far as I know): clang++ -flto -Oz bar.o foo.o would *not* add the Oz annotation on functions defined in bar.o.

I'm curious why we would want to force -O2/-O3 instead of just allowing -Os/-Oz to be used with LTO.

So I hope it is more clear that I don't think we're forcing O2/O3 on LTO users, but it isn't obvious to expose a consistent UI for these flags with respect to LTO without being surprising to the user in some way.
(do you know that LTO use to be -O4?)

tycho added a comment.Aug 6 2019, 11:17 AM

OK, that makes sense. So this change would not enforce -O2/-O3 for the bitcode emission, but would enforce one of the two for the LTO phase.

This may be a stupid question, but aren't there some optimization passes that can emit functions during the LTO phase that weren't in the initial bitcode compile? If so, wouldn't those miss out on getting the -Os/-Oz function attributes applied?

What would happen after D63976 is applied with a funny command line like clang -Oz -flto -o foo a.o b.c, where one (or all) of the input arguments is a source file? Would it invoke the bitcode compile of the source files with the requested -Oz and then invoke the LTO linker plugin with -O2/-O3?

with a funny command line like clang -Oz -flto -o foo a.o b.c, where one (or all) of the input arguments is a source file? Would it invoke the bitcode compile of the source files with the requested -Oz and then invoke the LTO linker plugin with -O2/-O3?

What you're invoking is the clang driver, it will drive the process and spawn multiple subprocess for each phase (this contributes to the UI challenge). Your "funny" command line will detect that it needs to spawn a clang instance for the compile phase of b.c to get an object file before invoking the linker on the temporary b.o and the a.o you provide on the input. To add some complexity, the exact behavior depends on the platform (Linux vs MacOS) and possible which linker is in use (-fuse-ld=lld).

But yes this patch changes the driver so that only the invocation of the linker is changed. Your previous "funny" command line would still yield Oz when compiling b.c to b.o before invoking the linker on b.o and a.o with O3.

(I don't know if this change is the best solution, I just provide my understanding of the situation :))

khchen added a subscriber: khchen.Sep 3 2019, 7:10 AM
troyj added a subscriber: troyj.Mon, Nov 18, 9:21 PM

I ran into this same problem and would like to see this patch or a similar one land. Note that there is also a -Og option to consider, which currently has the same problem.