Previously we were never setting this which means it was always being
set to Default (-O2/-Os).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Perhaps this was intentional? But heejin noticed that --lto-O0 builds were still runing at O2 GC. This code is that same as used in gold/gold-plugin.cpp.
Is there any reason you don't want to set it to Aggressive by default? (e.g. hurts debuggability, makes code generation slow, etc.)
From looking at other code it looks like Aggressive corresponds to -O3, and that --lto-O3 here.
This behaviour seems to match what clang does: https://github.com/llvm-mirror/clang/blob/f7d44292db27680486c0f03fd8eba125276144c8/lib/CodeGen/BackendUtil.cpp#L369
I'd prefer that we not do this. The code gen level really ought be controlled by the frontend via function attributes. We shouldn't be setting it in gold-plugin.cpp either but the last time I tried to remove it it ended up getting reverted for causing a perf regression.
I see. So there should be enough information in each of the bitcode files to set the CGOptLevel accordingly for each function?
The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?
Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.
Not yet. It would need to be added and the cg passes taught to respect it.
The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?
The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.
Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.
Yes, that's the eventual plan.
BTW, it looks like -O0 gets added to the bitcode as optnone, but O2 and O3 produce identical bitcode.
In that case would you envisage even clang eventually not setting CGOptLevel globally and let each function set it own level?
The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?
The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.
Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.
Yes, that's the eventual plan.
In that case would you mind if I landed this change for now with a TODO to remove this code once the opt level for each function is plumbed through from the frontend? (Today it looks like only O0 is plumbed through), It seems that being consistent with the other LTO tools would be good for the time being, and there would be an immediate benefit for users who are passing --lto-O3.
Yes.
The fact that gold saw a performance regression is suspicious no? Does that mean there is performance gain we can get by adding it here?
The user who saw the perf regression was using --lto-O3 IIRC. One of the proposed interim solutions was to use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3, but I never got around to writing a patch for it.
Perhaps we should remove the CGOptLevel from the LTO completely once we remove it from gold? It looks like tools/llvm-lto2/llvm-lto2.cpp also does that same thing BTW.
Yes, that's the eventual plan.
In that case would you mind if I landed this change for now with a TODO to remove this code once the opt level for each function is plumbed through from the frontend? (Today it looks like only O0 is plumbed through), It seems that being consistent with the other LTO tools would be good for the time being, and there would be an immediate benefit for users who are passing --lto-O3.
This should be quite easy to do:
use CGOptLevel = 2 for --lto-O[0-2] and CGOptLevel = 3 for --lto-O3
I'd be happy if we changed gold and all the lld ports to do that.
OK I updated this PR to use Default for 0-2 and Aggressive for 3.
I will followup with matching change to the other LTO using tools in llvm.
Can I ask why you don't want --lto-O0 to set CGOptLevel to None?
The brief answer is: --lto-O0 means no cross-module optimizations, and just because you don't want cross-module optimizations doesn't mean that you also want poor code quality. For example, control flow integrity requires LTO, and users of CFI who don't want cross-module optimizations (whether for code size, link performance or other reasons) should not expect the code quality to get worse just because they enabled LTO by necessity.
Common/Args.cpp | ||
---|---|---|
20 ↗ | (On Diff #184214) | nit: CGOptLevel |
29 ↗ | (On Diff #184214) | Why not: if (OptLevelLTO == 3) return CodeGenOpt::Aggressive; assert(OptLevelLTO < 3); return CodeGenOpt::Default; ? |
This is a poor convention IMO, because it conflict with the clang user exposed O0 which is for no optimizations.
and just because you don't want cross-module optimizations doesn't mean that you also want poor code quality.
In such case you shouldn't request something that is "O0"?
For example, control flow integrity requires LTO, and users of CFI who don't want cross-module optimizations (whether for code size, link performance or other reasons) should not expect the code quality to get worse just because they enabled LTO by necessity.
Why not exposing another option than "O0"? Coupling CFI to optimization even when O0 is requested does not seem in line with the usual practice: CFI is about behavior first.