Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline add the appropriate attribute as well.
Details
Diff Detail
- Build Status
Buildable 2688 Build 2688: arc lint + arc unit
Event Timeline
Maybe instead, pass a flag to enable setting optnone on everything when the driver sees -O0 -flto? The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.
I'm not fond of this: limiting discrepancy between LTO and non-LTO reduces the LTO specific bugs and reduces the maintenance of LTO.
The patch as-is obviously has a massive testing cost, and it's easy to imagine people being tripped up by this in the future.
Can you clarify what massive testing cost you're referring to?
Well, you just had to modify around 50 tests, and I'd expect some future tests to have to deal with it too. Maybe "massive" is overstating it but it seemed like an unusually large number.
I don't know that just slapping the option on all these tests is really the most appropriate fix, either, in some cases. I'll look at it more.
There are two things:
- tests are modified: when adding a new option, it does not seems unusual to me
- what impact on future testing. I still don't see any of this future "testing cost" you're referring to right now.
I don't know that just slapping the option on all these tests is really the most appropriate fix, either, in some cases. I'll look at it more.
For instance the ARM test are relying on piping the output of clang to mem2reg to clean the IR and have simpler check patterns (I assume). This is not compatible with optnone obviously.
On the other hand I don't want to update the check lines for > 20000 lines in testsclang/test/CodeGen/aarch64-neon-intrinsics.c just to save passing an option.
It's likely that some of these test could have their check line adapted, but I didn't see much interest in doing this.
50 seems rather more than usual, but whatever. Granted it's not hundreds.
- what impact on future testing. I still don't see any of this future "testing cost" you're referring to right now.
Maybe I worry too much.
I am getting a slightly different set of test failures than you did though. I get these failures:
CodeGen/aarch64-neon-extract.c
CodeGen/aarch64-poly128.c
CodeGen/arm-neon-shifts.c
CodeGen/arm64-crc32.c
And I don't get these failures:
CodeGenCXX/apple-kext-indirect-virtual-dtor-call.cpp
CodeGenCXX/apple-kext-no-staticinit-section.cpp
CodeGenCXX/debug-info-global-ctor-dtor.cpp
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
900 | I'd set ShouldAddOptNone = false here, as it's already explicit. | |
clang/test/CodeGen/aarch64-neon-2velem.c | ||
1 | Option specified twice. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
760–762 | At point where we are in numerous places doing 3 coupled calls, we should add some routine to do this... Maybe we should have when I added the noinline bit. I don't have a good idea of where best to do this -- as part of or as an alternative to SetInternalFunctionAttributes? Something else? I'm imagining something like SetAlwaysInlinedRuntimeFunctionAttributes or something. Need a clang IRGen person to help push the organization in the right direction. | |
clang/lib/CodeGen/CodeGenModule.cpp | ||
892 | attributes prevents -> attributes prevent ACtually, what do you mean by attributes here? Or should this comment instead go below, where we start to branch on the actual 'hasAttr' calls? After reading below, I understand better. Maybe: // Track whether we need to add the optnone LLVM attribute, // starting with the default for this optimization level. | |
899–900 | Unrelated (and unnecessary) formatting change? | |
910–912 | Is this still at all correct? Why? it seems pretty confusing especially in conjunction with the code below. I think this may force you to either: I don't have any strong opinion about a vs. b. | |
962 | why is optnone incompatible with *cold*.... |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
962 | Because cold implies OptimizeForSize (just above this). I take no position on whether that is reasonable. |
Address comments: reorganize the way ShouldAddOptNone is handled, hopefully make it more easy to track.
Also after talking with Chandler on IRC, the source attribute "cold" does
not add the LLVM IR attribute "optsize" at O0, we add "optnone" instead.
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
760–762 | Yes some refactoring of all this custom handling would be welcome. I'll take any pointer to how to do it in clang (I'm not familiar enough with clang). | |
clang/lib/CodeGen/CodeGenModule.cpp | ||
892 | Actually I instead moved it all together. | |
910–912 | I believe it is still correct: during Os/Oz we reach this point and figure that there is __attribute__((optnone)) in the *source* (not -O0), we remove the attributes, nothing changes. Did I miss something? | |
962 | The source attribute "Cold" adds llvm::Attribute::OptimizeForSize even at O0 right now, I changed this and now we emit optnone at O0 in this case. |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
904 | Oh right! Will remove, thanks! |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
910–912 | Hmmm the Os/Oz attributes are added in CGCall.cpp, and are guarded with a check on the presence of the Optnone source attribute, so if the Optnone source attribute is present we should never see these. And Os/Oz set OptimizationLevel to 2, which is not zero, so we won't come through here for ShouldAddOptNone reasons either. |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
910–912 | The verifier is already complaining if we get this wrong, and indeed it complains if I'm removing these. int foo1(int); int foo2(int a) { return foo1(a + 2); } __attribute__((optnone)) int foo1(int a) { return a + 1; } Here we have the attributed optnone on the definition but not the declaration, and the check you're mentioning in CGCalls is only applying to the declaration. |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
910–912 | This is all still incredibly confusing code. I think what would make me happy with this is to have a separate section for each mutually exclusive group of LLVM attributes added to the function. so: // Add the relevant optimization level to the LLVM function. if (...) { B.addAttribute(llvm::Attribute::OptNone); F.removeFnAttr(llvm::ATtribute::OptForSize); ... } else if (...) { B.addAttribute(llvm::Attribute::OptForSize); } else if (...) } ... } // Add the inlining control attributes. if (...) { <whatever to set NoInline> } else if (...) { <whatever to set AlwaysInline> } else if (...) { <whatever to set inlinehint> } // Add specific semantic attributes such as 'naked' and 'cold'. if (D->hasAttr<NakedAttr>()) { B.addAttribute(...::Naked); } if (D->hasAttr<Cold>()) { ... } Even though this means testing the Clang-level attributes multiple times, I think it'll be much less confusing to read and update. We're actually already really close. just need to hoist the non-inlining bits of optnone out, sink the naked attribute down, and hoist the cold sizeopt up. |
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
910–912 | Since you answer below the example I gave above, I just want to be sure you understand that the attributes for the *declarations* are not even handled in the same file right? |
Over the weekend I had a thought: Why is -O0 so special here? That is, after going to all this trouble to propagate -O0 to LTO, how does this generalize to propagating -O1 or any other specific -O option? (Maybe this question would be better dealt with on the dev list...)
O0 is "special" like Os and Oz because we have an attribute for it and passes "know" how to handle this attribute.
I guess no-one cares enough about O1/O2/O3 to find a solution for these (in the context of LTO, I don't really care about O1/O2).
It is likely that Og would need a special treatment at some point, maybe with a new attribute as well, to inhibit optimization that can't preserve debug info properly.
"I don't care" doesn't seem like much of a principle.
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended. I don't think -c -O0 should get this not-entirely-O0-like behavior.
Long version is: "There is no use-case, no users, so I don't have much motivation to push it forward for the only sake of completeness". Does it sound enough of a principle like that?
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.
Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.
I don't think -c -O0 should get this not-entirely-O0-like behavior.
What is "not-entirely"? And why do you think that?
No. You still need to have adequate justification for your use case, which I think you do not.
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.
Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.
IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.
In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.) But if your build system makes that easy, you can just as easily remove -flto as add -O0 and thus get the result you want without trying to pass conflicting options to the compiler. Or spending time implementing this patch.
I don't think -c -O0 should get this not-entirely-O0-like behavior.
What is "not-entirely"? And why do you think that?
"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.
Note that the source modification consists of adding #pragma clang optimize off to the top of the file. It is not a complicated thing.
I don't follow your logic.
IIUC, you asked about "why not supporting O1/O2/O3" ; how is *not supporting* these because their not useful / don't have use-case related to "supporting O0 is useful"?
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.
Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.
IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.
You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.
Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.
In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)
Static library, separated projects, etc.
We have tons of users...
I don't think -c -O0 should get this not-entirely-O0-like behavior.
What is "not-entirely"? And why do you think that?
"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.
That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.
Also: LTO is required for some features likes CFI. There are users who wants CFI+O0 during development (possibly for debugging a subcomponent of the app).
Upfront, it seemed peculiar to handle only one optimization level. After more thought, the whole idea of mixing -O0 and LTO seems wrong. Sorry, should have signaled that I had changed my mind about it.
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.
Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.
IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.
You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.
Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.
No, I'm saying they are conflicting options on the same Clang command line.
As long as your linker can handle foo.o and bar.bc on the same command line, not a problem. (If your linker can't handle that, fix the linker first.)
In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)
Static library, separated projects, etc.
We have tons of users...
Still waiting. Your up-front use case was about de-optimizing a module to assist debugging it within an LTO-built application, not building entire projects one way versus another. If that is not actually your use case, you need to start over with the correct description.
I don't think -c -O0 should get this not-entirely-O0-like behavior.
What is "not-entirely"? And why do you think that?
"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.
That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.
It's not orthogonal; that's exactly how 'optnone' behaves today. If you have proposed a redesign of how to mix optnone and non-optnone functions in the same compilation unit, in some way other than what's done today, I am not aware of it; can you point to your proposal?
Sorry, you lost me. CFI is part of DWARF and we do DWARF perfectly well without LTO (and at O0).
You just haven't articulated 1) why it is wrong and 2) what should we do about it.
Optnone does not equal -O0. It is a debugging aid for the programmer, because debugging optimized code sucks. If you have an LTO-built application and want to de-optimize parts of it to aid with debugging, then you can use the pragma, as originally intended.
Having to modifying the source isn't friendly. Not being able to honor -O0 during LTO is not user-friendly.
IMO, '-O0' and '-flto' are conflicting options and therefore not deserving of special support.
You're advocating for *rejecting* O0 built module at link-time? We'd still need to detect this though. Status-quo isn't acceptable.
Also, that's not practicable: what if I have an LTO static library for which I don't have the source, now if I build my own file with -O0 -flto I can't link anymore.No, I'm saying they are conflicting options on the same Clang command line.
As long as your linker can handle foo.o and bar.bc on the same command line, not a problem. (If your linker can't handle that, fix the linker first.)
You just wrote above that " mixing -O0 and LTO " is wrong, *if* I were to agree with you at some point, then I'd make it a hard error.
In my experience, modifying source is by far simpler than hacking a build system to make a special case for compiler options for one module in an application. (If you have a way to build Clang with everything done LTO except one module built with -O0, on Linux with ninja, I would be very curious to hear how you do that.)
Static library, separated projects, etc.
We have tons of users...Still waiting.
Waiting for what?
We have use-cases, I gave you a few (vendor static libraries are one). Again, if you think it is wrong to support O0 and LTO, then please elaborate.
I don't think -c -O0 should get this not-entirely-O0-like behavior.
What is "not-entirely"? And why do you think that?
"Not entirely" means that running the -O0 pipeline, and running an optimization pipeline but asking some subset of passes to turn themselves off, does not get you the same result. And I think that because I'm the one who put 'optnone' upstream in the first place. The case that particularly sticks in my memory is the register allocator, but I believe there are passes at every stage that do not turn themselves off for optnone.
That's orthogonal: you're saying we are not handling it correctly yet, I'm just moving toward *fixing* all these.
It's not orthogonal; that's exactly how 'optnone' behaves today. If you have proposed a redesign of how to mix optnone and non-optnone functions in the same compilation unit, in some way other than what's done today, I am not aware of it; can you point to your proposal?
I don't follow: IMO if I generate a module with optnone and pipe it to opt -O3 I expect no function IR to be touched. If it is not the case it is a bug.
Your opinion and expectation are not supported by the IR spec. Optnone skips "most" optimization passes. It is not practical (or was not, at the time) to make the -O3 pipeline behave exactly the same as the -O0 pipeline, and also not actually necessary to support the purpose for which 'optnone' was invented.
If you have a goal of making 'optnone' functions use the actual -O0 pipeline, while non-optnone functions use the optimizing pipeline, more power to you and you will need to take up that particular design challenge with Chandler first.
Yes, I was not clear that I meant that -O0 -flto on the same clang command line just seems nonsensical. "Optimize my program without optimizing it" forsooth.
"Optimize without optimizing" really? Does not sound confused to you? Persuade me why it makes sense.
If it doesn't make sense, then yes making the -O0 -flto combination an error would be the right path.
Unless you are taking the position that -flto doesn't mean "use LTO" and instead means something else, like "emit bitcode" in which case you should be advocating to change the name of the option to say what it means.
Your original use-case described debugging a module in an application. You claimed it was simpler to change the build options for a module than change the source, which I am still waiting to hear how/why that is simpler.
Your subsequent use cases are about entire sub-projects, which is entirely different and orthogonal to where you started. Please elaborate on the original use case.
Basically, I don't see why having clang always emit a real .o at -O0 would be a problem.
I haven't gotten through the other-CFI documentation yet though.
Oh, maybe you are thinking of eliminating the -O0 pipeline? Because if -O0 implies optnone then it's kinda-sorta the same thing as the optimization pipeline operating on nothing but optnone functions? I'd think that would make -O0 compilations slow down, which would not be a feature.
Actually, as mentioned before, I could be fine with making O0 incompatible with LTO, however security features like CFI (or other sort of whole-program analyses/instrumentations) requires LTO.
Well, "requires LTO" is overstating the case, AFAICT from the link you gave me. Doesn't depend on optimization at all. It depends on some interprocedural analyses given some particular scope/visibility boundary, which it is convenient to define as a set of linked bitcode modules, that by some happy chance is the same set of linked bitcode modules that LTO will operate on.
If it's important to support combining a bitcode version of my-application with your-bitcode-library for this CFI or whatever, and you also want to let me have my-application be unoptimized while your-bitcode-library gets optimized, NOW we have a use-case. (Maybe that's what you had in mind earlier, but for some reason I wasn't able to extract that out of any prior comments. No matter.)
I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.
Does that seem reasonable? Fit your understanding of the needs?
I'm now thinking along the lines of a -foptimize-off flag (bikesheds welcome) which would set the default for the pragma to 'off'. How is that different than what you wanted for -O0? It is defined in terms of an existing pragma, which is WAY easier to explain and WAY easier to implement. And, it still lets us say that -c -O0 -flto is a mistake, if that seems like a useful thing to say.
Well -O0 being actually "disable optimization", I found "way easier" to handle everything the same way (pragma, command line, etc.). I kind of find it confusing for the user to differentiate -O0 from -foptimize=off. What is supposed to change between the two?
There is a pedantic difference, rooted in the still-true factoid that O0 != optnone.
If we redefine LTO as "Link Time Operation" (rather than Optimization; see my reply to Duncan) then -O0 -flto is no longer an oxymoron, but using the attribute to imply the optimization level is still not good fidelity to what the user asked for.
I have to say, I don't understand the confusion or problem here...
For me, the arguments you're raising against -O0 and -flto don't hold up on closer inspection:
- O0 != optnone: correct. But this is only visible in LTO. And in LTO, Os != optsize, and Oz != minsize. But we use optsize and minsize to communicate between the compilation and the LTO step to the best of our ability the intent of the programmer. It appears we can use optnone exactly the same way here.
- optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*.
- There is no use case for -O0 + -flto: I really don't understand this. CFI and other whole program analysis or semantic transformations (*not* optimizations) require LTO but not any particular pipeline. And I *really* want the ability to bisect files going into an LTO build to chase miscompiles. There are large systems built to manipulate flags that are much more efficient and accessible than modifying source code. It seems an entirely reasonable (and quite low cost) feature. The fact that the LTO acronym stands for Link Time Optimization seems like a relatively unimportant thing. It is just an acronym and a name. We shouldn't let it preclude interesting use cases.
But all of this seems like an attempt to argue "you are wrong to have your use case". I personally find that an unproductive line of discussion. I would suggest instead we look at this differently:
For example, you might ask: could we find some other way to solve the problem you are trying to solve here? Suggesting an alternative approach would seem constructive. So far, all we've got is modify source code, but I think that there is a clear explanation of why that doesn't address the particular use case.
You might also ask: is supporting this feature a reasonable maintenance burden for Clang to address the use case? That seems like a productive discussion. For example, I *am* concerned about the increasing attribute noise at -O0. I don't think it is something to be dismissed. However, given the options we have today, it seems like the most effective way to address this use case and I don't have any better ideas to solve the problems Mehdi is solving here.
But I'm also not one of the most active maintainers writing patches, fixing bugs, and improving the IRGen layer. So ultimately, I defer on the maintenance issue to those maintainers.
If the design decision is that relevant optimization controls are propagated into bitcode as function attributes, I grumble but concede it will do something similar to what was requested.
It does bother me that we keep finding things that LTO needs to know but which it does not know because it runs in a separate phase of the workflow. I hope it is not a serious problem to ask "is there a more sensible way to fix this?" Maybe I'm not so good at expressing that so it comes out as a question rather than an objection, but that's basically what it is.
This design decision leaves -O1/-Og needing yet another attribute, when we get around to that, but I suppose Og would not have the interaction-with-other-attributes problems that optnone has.
- optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*.
We have picked all the low-hanging fruit there, and probably some medium-hanging fruit. Mehdi did have the misunderstanding that optnone == -O0 and that I think was worth correcting.
- There is no use case for -O0 + -flto:
The email thread has an exchange between Duncan and me, where I accept the use case.
But all of this seems like an attempt to argue "you are wrong to have your use case". I personally find that an unproductive line of discussion.
Not saying it was *wrong* just the description did not convey adequate justification. Listing a few project types does not constitute a use case. We did get to one, eventually, and it even involved differences in optimization levels.
For example, you might ask: could we find some other way to solve the problem you are trying to solve here?
There is another way to make use of the attribute, which I think will be more robust:
Have Sema pretend the pragma is in effect at all times, at -O0. Then all the existing conflict detection/resolution logic Just Works, and there's no need to spend 4 lines of code hoping to replicate the correct conditions in CodeGenModule.
Because Sema does not have a handle on CodeGenOptions and therefore does not a-priori know the optimization level, probably the right thing to do is move the flag to LangOpts and set it under the correct conditions in CompilerInvocation. It wouldn't be the first codegen-like option in LangOpts.
As I stand right now, there hasn't been any correction.
I still consider the fact that optnone wouldn't produce the "same" result (modulo corner cases around merging global variables for instance) as O0 a bug that need to be fixed.
(Disabling passes for compile time at O0 stays I compile time improvement, I never suggested to stop doing this...)
Why? That's not the purpose of optnone. You've already admitted there are some differences. Why are other differences important?
Why not? What's the alternative?
If we want to support -O0 -flto and optnone it the way to convey this to the optimizer, I don't see the alternative.
optsize != -Os (according to Chandler)
minsize != -Oz (according to Chandler)
optnone != -O0 (according to both me and Chandler)
optnone is not "the way to convey (-O0) to the optimizer." Please get that misunderstanding out of your head. Clang handles -O0 by creating a short, minimalist pipeline, and running everything through it. Clang handles -O2 by creating a fuller optimization pipeline (with not just more, but *different* passes), and functions with 'optnone' skip many of the passes in the pipeline.
These are architecturally different processes, you are not going to be able to make 'optnone' behave exactly like -O0 without major redesign of how the pipelines work.
Of course, but that's just an implementation limitation, mostly for historical reason I believe, not by design. That does not have to be set in stone and I'm giving you the direction with respect to LTO in particular here: these attributes should be able to behave the same way as the corresponding '-O' command line.
optnone is not "the way to convey (-O0) to the optimizer." Please get that misunderstanding out of your head. Clang handles -O0 by creating a short, minimalist pipeline, and running everything through it. Clang handles -O2 by creating a fuller optimization pipeline, and functions with 'optnone' skip many of the passes in the pipeline.
Don't get me wrong: I believe I have a very good understanding how the optimizer pipeline is setup and how the passes operates with respect to the attributes.
And it is because I understand the deficiencies (and how it is an issue with LTO) that I'm aligning all of this toward a consistent/coherent expected result for the users.
These are architecturally different processes, you are not going to be able to make 'optnone' behave exactly like -O0 without major redesign of how the pipelines work.
I'd disagree about your estimation of "major". It's not gonna be tomorrow, sure, but it does not have to be. The most difficult part will be the inter procedural ones, but there are not that many.
There is certainly a lot of history here influencing this, but I think there is also some a fundamental difference. The flag is a request from the user to behave in a particular way. The LLVM attribute is a tool we use in some cases to try to satisfy that request.
When we're not doing LTO, it is easier to satisfy the requests of '-O' flags. The fact that we happen to not use attributes to do it today is just an implementation detail.
When we are doing LTO, satisfying different requests is hard. We should do our best, but I think it is reasonable to explain to the user that "with LTO, we can't fail to optimize with the Wombat optimization because of <reasons> when one file requests -O0 and another requests -O2". Same thing for the other levels.
This seems precisely analogous to the fact that even when the user requests -O0, we will do some inlining. Why? Because we *have to* for semantic reasons.
So I think what Mehdi is driving at is that if '-O0 -flto' has a mismatch from '-O0' in terms of what users expect, we should probably try to fix that. I'd suggest that there may be fundamental things we can't fix and that is OK. I don't think this is unprincipled either, we're doing the best we can to honor the user's request.
The other thing that might help is to point out that there *are* principles behind why these flags. Unlike the differences between -O[123], all of -O0, -Os, and -Oz have non-threshold semantic implications. So with this change, I think we will have *all* the -O flags covered, because I view '-O[123]' as a single semantic space with a threshold modifier that we *don't* need to communicate to LTO. We model that state as the absence of any attribute. And -O0, -Os, and-Oz have dedicated attributes.
If we ever want to really push on -Og, that might indeed require an attribute to distinguish it.
optnone is not "the way to convey (-O0) to the optimizer."
So, I view '-O0' as a request from the programmer to turn off the optimizer to the extent possible and give them as naive, minimally transformed representation of th ecode as possible.
And based on that, I view optnone as a tool to implement precisely these semantics at a fine granularity and with survivability across bitcode roundtrip.
It just isn't the *only* tool, and sometimes we can use an easier (and cheaper to Mehdi's compile time point) tool.
I think the text for spec'ing optnone in the LLVM langref needs to be updated to reflect this though. Currently it says:
This function attribute indicates that most optimization passes will skip this function, with the exception of interprocedural optimization passes.
This is demonstrably false:
% ag OptimizeNone lib/Transforms/IPO lib/Transforms/IPO/ForceFunctionAttrs.cpp 47: .Case("optnone", Attribute::OptimizeNone) lib/Transforms/IPO/Inliner.cpp 813: if (F.hasFnAttribute(Attribute::OptimizeNone)) lib/Transforms/IPO/InferFunctionAttrs.cpp 30: if (F.isDeclaration() && !F.hasFnAttribute((Attribute::OptimizeNone))) lib/Transforms/IPO/FunctionAttrs.cpp 1056: if (F.hasFnAttribute(Attribute::OptimizeNone)) { 1137: if (!F || F->hasFnAttribute(Attribute::OptimizeNone)) {
I'll send a patch.
I guess I'm getting irritated because people are trying to tell me what optnone means. I know what it means; I spent probably a whole year pushing to get it adopted.
Optnone means: When you are running optimizations, try not to optimize this part, if you can.
That's it. That's *all*. It has never meant anything else. Telling me different means you misunderstand, and trying to persuade me that *I* misunderstand is going to be a waste of time and effort.
I fully understand that this is not the definition of optnone that you *want*. Please feel free to propose a redefinition. But don't go telling me that the thing you *want* is what the thing already *is* and that any difference is a bug.
This is implementing a debugging option, not skipping a pass.
lib/Transforms/IPO/Inliner.cpp 813: if (F.hasFnAttribute(Attribute::OptimizeNone))
This is declining to operate on a callee, not skipping a pass. Given that optnone is supposed to be paired with noinline, wouldn't this be redundant?
lib/Transforms/IPO/InferFunctionAttrs.cpp 30: if (F.isDeclaration() && !F.hasFnAttribute((Attribute::OptimizeNone))) lib/Transforms/IPO/FunctionAttrs.cpp 1056: if (F.hasFnAttribute(Attribute::OptimizeNone)) { 1137: if (!F || F->hasFnAttribute(Attribute::OptimizeNone)) {
This is SCC stuff, which I don't really understand what it is trying to do. I wonder whether 'noinline' ought to be the relevant attribute, though.
@rsmith could you say whether it seems reasonable to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." I think it would make the other optnone-related logic simpler. It would not be the only sort-of-codegen-related flag in LangOpts (e.g. the PIC/PIE stuff).
To clarify my understanding of this thread, it seems like there are three ways forward here:
- To have -O0 add optnone to the generated functions (enabling some degree of lack of optimization of those functions even when used with -flto)
- To have -O0 -flto essentially turn off LTO (so that we get unoptimized objects directly for things we're debugging)
- Add a separate flag to make optnone the default
(1) is this patch. The disadvantage of (2) is that it also precludes CFI (and other whole-program transformations). This seems highly unfortunate at best and a non-starter in the general case. The disadvantage of (3) is that it might seems confusing to users (i.e. how to explain the difference between -O0 and -foptimize-off?) and is an unnecessary exposure to users of implementation details. On this point I agree.
It is true that -O0 != optnone, in a technical sense, but in the end, both are best effort. Moreover, there is a tradeoff between disabling optimization of the functions you don't want to optimize and keeping the remainder of the code as similar as possible to how it would be if everything were being optimized. What optnone provides seems like a reasonable point in that tradeoff space. I think that we should move forward with this approach.
Also note that @chandlerc in r290398 made clang adding "noinline" on every function at O0 by default, which seems very similar to what I'm doing here.
We're still waiting for @rsmith to comment whether it'd be better to have a LangOpts flag that basically means "pragma clang optimize off is always in effect." and Have Sema pretend the pragma is in effect at all times, at -O0.
Just to be explicit, I agree with Hal's summary. This seems like the right engineering tradeoff and I don't find anything particularly unsatisfying about it.
FWIW, I have no real opinion about this side of it, I see it more as a detail of how Clang wants to implement this kind of thing.
That was my suggestion as it seemed like this patch is essentially replicating the attribute-conflict detection logic that's in place for attributes specified in the source. And we do like to say DRY.
But I won't insist; the patch can proceed as far as I'm concerned.
FWIW, I think this makes sense.
Moving O0 and optnone get closer seems sensible. Even though -O3 with an optnone function indeed gives you different results today.
We are basically maintaining two things for the same "do not optimize" goal.
This obviously won't make O0 and optnone being the same in todays pass managers, but it is a step in the right direction.
Actually, looking through the comments, it appears that everyone (eventually) agreed with the approach in the patch. I agree too. LGTM.
Mehdi, are you able to rebase and commit, or should someone take over?
At point where we are in numerous places doing 3 coupled calls, we should add some routine to do this... Maybe we should have when I added the noinline bit.
I don't have a good idea of where best to do this -- as part of or as an alternative to SetInternalFunctionAttributes? Something else?
I'm imagining something like SetAlwaysInlinedRuntimeFunctionAttributes or something. Need a clang IRGen person to help push the organization in the right direction.