Address PR32155: Skip passing -Os and -Oz to the Gold plugin using
-plugin-opt.
Details
Diff Detail
- Build Status
Buildable 4746 Build 4746: arc lint + arc unit
Event Timeline
Interested in pcc's thoughts, as https://bugs.llvm.org/show_bug.cgi?id=32155 mentioned you already discussed with him. Note that some of the passes that check PassManagerBuilder::sizeLevel are added during the ThinLTO back end (e.g. populateModulePassManager which checks sizeLevel is invoked by populateThinLTOPassManager). Until everything is converted to using size attributes, it seems like a correct fix for the bug is to accept these options in the gold-plugin and pass through the LTO API to the PassManagerBuilder.
lib/Driver/ToolChains/CommonArgs.cpp | ||
---|---|---|
369 | Remove added "{" | |
377 | Ditto about unnecessary "{" |
Agree with @pcc. Unless anyone has a need to have "perfect" support for Os, this is the right direction.
lib/Driver/ToolChains/CommonArgs.cpp | ||
---|---|---|
369 | Done. I thought having braces consistent across if and else branches would be a consistent coding style but doesn't seem so: clang-format doesn't do this. |
Sure, but there is certainly a perception that optimization flags affecting the non-LTO pipeline should similarly affect the LTO pipeline. LTO should be transparent to the user, so if -Os behaves one way without LTO, it seems problematic to me if it behaves a different way with LTO.
That being said, agree that the best way to enforce that is to pass the relevant flags through the IR. (On the flip side, if the user passes -O1 to the link step, it does get passed through to the plugin and affects the LTO optimization pipeline...)
Pirama - can you file a bug (and cc me on it) for the need to convert existing users of the sizeLevel flag to instead use the relevant function attributes?
lib/Driver/ToolChains/CommonArgs.cpp | ||
---|---|---|
376 | Note that -Os and -Oz are supposed to map to -O2 minus some size increasing options. Not setting OOpt in that case works because the OptLevel in the gold-plugin defaults to -O2. I'd rather make it explicit here, i.e. add an else that sets OOpt to "2" under these options. You can add a check for that in your tests. |
I agree that I don't like the discrepancy: the driver should *not* drop -Os silently if it passes down -O1/-O2/-O3, a warning is the minimum.
I don't like the discrepancy either, and I agree that we should be passing these other flags through the IR as well (even though, in the face of inlining, there is some ambiguity as to what the flags would mean). That having been said, I don't see the value in the warning. Forcing users to endure a warning solely because they use LTO and use -Os or -Oz for all of their compilation steps, is not friendly. The information has been captured already so there's nothing to warn about. You might worry about the opposite situation (the user uses only -Os or -Oz on the link step, but not for the compile steps), and that will have no effect. That, however, should be the expected behavior (optimization is associated with compiling, not linking, except perhaps for specifically called-out exceptions). The fact that our other optimization level don't work that way is a bug, not a feature, that we should fix instead of further exposing to our users.
The warning here is only about the *link* step.
The information has been captured already so there's nothing to warn about. You might worry about the opposite situation (the user uses only -Os or -Oz on the link step, but not for the compile steps), and that will have no effect. That, however, should be the expected behavior (optimization is associated with compiling, not linking, except perhaps for specifically called-out exceptions). The fact that our other optimization level don't work that way is a bug, not a feature, that we should fix instead of further exposing to our users.
Yes, the issue is only about how the driver accepts Os for the link even though it has no effect (O0/O1/O2/O3 *will* have an effect though).
The driver (accepts, but) ignores Os and other optimization flags for non-lto link-only actions. That it has an effect for LTO is seems to be an implementation detail. Since optimization flags are compiler-only options, and Clang already silently (without a warning) ignores these flags during link-only invocations, silently transforming them when passing to the plugin seems reasonable.
It would be reasonable to *not* forward any flag to the linker (plugin), but not to rewrite them with a different meaning.
(By the way, this is what is done on Darwin: the -O flag is *ignored* for the link step invocation, because we couldn't find a "right way" of correctly honoring it that would be logical to the user in all case).
Do we agree that the desired endpoint here is that all optimization flags are encoded in the IR somehow? If so, in this desired end state, will it will be true that -O[n] will have some affect on an LTO link step?
I don't think we have any plan for the optimization *level* (1, 2, and 3), at least not that I know of. I don't even see how it would be possible to achieve it in a principled way without limiting what we're allowing ourself to express at the PassManager level with these levels.
We can handle O0 (optnone), Os (optsize), and Oz (minsize) separately though, because we can make these independent of the passmanager setup.
Please elaborate.
We can handle O0 (optnone), Os (optsize), and Oz (minsize) separately though, because we can make these independent of the passmanager setup.
I'm trying to figure out if these are really fundamentally different from 1,2,3 or if we just happen to handle them differently right now.
The fundamental difference, is that Os/Oz especially are treated as optimizations directive that are independent of the pass pipeline: instructing that "loop unroll should not increase size" is independent of *where* is loop unroll inserted in the pipeline.
The issue with O1 vs O2 is that the *ordering* of the passes changes, not only the threshold to apply. For some of the differences we could get away with adding the level as a function attribute for instance, and have the PassManager itself skipping passes depending on the level. For instance the API for the pass manager could be
PM.add(createLoopUnroll(), /* "shouldRun()" callback */ [] (Function &F) { return F.getAttribute("opt_level") > 1; });
And the PM would call back the lambda and actually skip the LoopUnroll when the level is <2, in way that the client can customize with the position in the pipeline. But this can be very hard to setup a pipeline this way when we'd do more than just skipping but having different pass ordering entirely.
Also this wouldn't work with an SCC pass, as different functions in the SCC can have different level, it gets quite tricky. It also becomes problematic for any module level pass: Dead Global Elimination would need to leave out global variable that comes from a module compiled with O1, same with Merge Duplicate Global Constants (I think the issue with GlobalVariable affects also Os/Oz by the way).
The issue with O1 vs O2 is that the *ordering* of the passes changes, not only the threshold to apply.
Maybe we should stop here and ask: Is this really a *fundamental* difference? Or is this just a difference in how to handle Os/Oz today? Is there a fundamental reason why we might not want a different order for Oz vs. O2 if we want one for 02 vs 03?
My view is that, no, there is no fundamental difference. Why? Because each optimization level has a meaning, and that meaning can almost always be applied per function.
- Oz - Make the binary as small as possible
- Os - Make the binary small while making only minor performance tradeoffs
- O0 - Be fast, and maybe, maximize the ability to debug
- O1 - Make the code fast while making only minor debugability tradeoffs
- O2 - Make the code fast - perform only transformations that will speed up the code with near certainty
- O3 - Make the code fast - perform transformations that will speed up the code with high probability
Believing that we can implement this model primarily through pass scheduling has proved false in the past (except for -O0 without LTO) and won't be any more true in the future. We essentially have one optimization pipeline, and I see no reason to assume this will change. It seems significantly more effective to have the passes become optimization-level aware than to implement optimization though changes in the pipeline structure itself. Especially in the context of the new pass manager, where the cost of scheduling a pass that will exit early should be small, I see no reason to alter the pipeline at all. For one thing, many optimizations are capable of performing several (or many) different transformations, and these often don't all fall into the same categories. In that sense, CGSCC- and function-scope passes are not really different than function/loop-scope passes.
As such, I think that we should tag functions/modules with optimization levels so that users can control the optimization level effectively even in the case of LTO. We could even add pragmas allowing users to control this at finer granularity, and that would be a positive thing (I dislike that we currently force users to put functions in different files to get different optimization levels - my users find this annoying too). We need to give users a simple model to follow: optimization is associated with compiling (even if we do, in fact, delay it until link time).
Also this wouldn't work with an SCC pass, as different functions in the SCC can have different level, it gets quite tricky. It also becomes problematic for any module level pass: Dead Global Elimination would need to leave out global variable that comes from a module compiled with O1, same with Merge Duplicate Global Constants (I think the issue with GlobalVariable affects also Os/Oz by the way).
Yes, we should do this. I don't understand why this is tricky. Actually, I think having these kinds of decisions explicit in the code of the transformations would be a positive development.
I think a high reason why I consider this tricky, it the part where I mentioned that a single pass can have different behavior/level depending on where it is in the pipeline.
But recently @joerg split SimplifyCFG into two wrapper passes, which may be a good way of achieving what you're describing.
From the discussion, it seems it is theoretically feasible to make optimization for speed a function-level attribute as well. After looking at the PassMangerBuilder for this bug, I think that'll make the optimization passes cleaner by keeping the passes and their behavior at various levels in one place.
Circling back to the issue at hand, what is the best way to handle Os and Oz at the lto stage? I think we left off with @mehdi_amini's comment that the driver should emit a warning when dropping the Os/Oz.
Remove added "{"