This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Set CGOptLevel in LTO config.
ClosedPublic

Authored by sbc100 on Jan 29 2019, 3:39 PM.

Details

Summary

Previously we were never setting this which means it was always being
set to Default (-O2/-Os).

Diff Detail

Repository
rL LLVM

Event Timeline

sbc100 created this revision.Jan 29 2019, 3:39 PM

Perhaps this was intentional? But heejin noticed that --lto-O0 builds were still runing at O2 GC. This code is that same as used in gold/gold-plugin.cpp.

ruiu added a comment.Jan 29 2019, 3:45 PM

Is there any reason you don't want to set it to Aggressive by default? (e.g. hurts debuggability, makes code generation slow, etc.)

From looking at other code it looks like Aggressive corresponds to -O3, and that --lto-O3 here.

pcc added a comment.Jan 29 2019, 5:02 PM

I'd prefer that we not do this. The code gen level really ought be controlled by the frontend via function attributes. We shouldn't be setting it in gold-plugin.cpp either but the last time I tried to remove it it ended up getting reverted for causing a perf regression.

I see. So there should be enough information in each of the bitcode files to set the CGOptLevel accordingly for each function?

The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?

Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.

pcc added a comment.Jan 29 2019, 5:17 PM

I see. So there should be enough information in each of the bitcode files to set the CGOptLevel accordingly for each function?

Not yet. It would need to be added and the cg passes taught to respect it.

The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?

The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.

Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.

Yes, that's the eventual plan.

BTW, it looks like -O0 gets added to the bitcode as optnone, but O2 and O3 produce identical bitcode.

In D57422#1376492, @pcc wrote:

I see. So there should be enough information in each of the bitcode files to set the CGOptLevel accordingly for each function?

Not yet. It would need to be added and the cg passes taught to respect it.

In that case would you envisage even clang eventually not setting CGOptLevel globally and let each function set it own level?

The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?

The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.

Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.

Yes, that's the eventual plan.

In that case would you mind if I landed this change for now with a TODO to remove this code once the opt level for each function is plumbed through from the frontend? (Today it looks like only O0 is plumbed through), It seems that being consistent with the other LTO tools would be good for the time being, and there would be an immediate benefit for users who are passing --lto-O3.

pcc added a comment.Jan 29 2019, 5:40 PM

BTW, it looks like -O0 gets added to the bitcode as optnone, but O2 and O3 produce identical bitcode.

In D57422#1376492, @pcc wrote:

I see. So there should be enough information in each of the bitcode files to set the CGOptLevel accordingly for each function?

Not yet. It would need to be added and the cg passes taught to respect it.

In that case would you envisage even clang eventually not setting CGOptLevel globally and let each function set it own level?

Yes.

The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?

The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.

Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.

Yes, that's the eventual plan.

In that case would you mind if I landed this change for now with a TODO to remove this code once the opt level for each function is plumbed through from the frontend? (Today it looks like only O0 is plumbed through), It seems that being consistent with the other LTO tools would be good for the time being, and there would be an immediate benefit for users who are passing --lto-O3.

This should be quite easy to do:

use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3

I'd be happy if we changed gold and all the lld ports to do that.

sbc100 updated this revision to Diff 184214.Jan 29 2019, 5:50 PM
  • Use Default CGOptLevel for --lto-[0..2]

OK I updated this PR to use Default for 0-2 and Aggressive for 3.

I will followup with matching change to the other LTO using tools in llvm.

Can I ask why you don't want --lto-O0 to set CGOptLevel to None?

pcc added a comment.Jan 29 2019, 6:06 PM

Can I ask why you don't want --lto-O0 to set CGOptLevel to None?

The brief answer is: --lto-O0 means no cross-module optimizations, and just because you don't want cross-module optimizations doesn't mean that you also want poor code quality. For example, control flow integrity requires LTO, and users of CFI who don't want cross-module optimizations (whether for code size, link performance or other reasons) should not expect the code quality to get worse just because they enabled LTO by necessity.

Common/Args.cpp
20 ↗(On Diff #184214)

nit: CGOptLevel

29 ↗(On Diff #184214)

Why not:

if (OptLevelLTO == 3)
  return CodeGenOpt::Aggressive;
assert(OptLevelLTO < 3);
return CodeGenOpt::Default;

?

sbc100 updated this revision to Diff 184232.Jan 29 2019, 6:40 PM
  • feedback
pcc accepted this revision.Jan 30 2019, 11:05 AM

LGTM

This revision is now accepted and ready to land.Jan 30 2019, 11:05 AM
This revision was automatically updated to reflect the committed changes.
In D57422#1376600, @pcc wrote:

Can I ask why you don't want --lto-O0 to set CGOptLevel to None?

The brief answer is: --lto-O0 means no cross-module optimizations,

This is a poor convention IMO, because it conflict with the clang user exposed O0 which is for no optimizations.

and just because you don't want cross-module optimizations doesn't mean that you also want poor code quality.

In such case you shouldn't request something that is "O0"?

For example, control flow integrity requires LTO, and users of CFI who don't want cross-module optimizations (whether for code size, link performance or other reasons) should not expect the code quality to get worse just because they enabled LTO by necessity.

Why not exposing another option than "O0"? Coupling CFI to optimization even when O0 is requested does not seem in line with the usual practice: CFI is about behavior first.