This is an archive of the discontinued LLVM Phabricator instance.

Cleanup the handling of noinline function attributes, -fno-inline, -fno-inline-functions, -O0, and optnone.
ClosedPublic

Authored by chandlerc on Dec 22 2016, 4:56 AM.

Details

Summary

These were really, really tangled together:

  • We used the noinline LLVM attribute for -fno-inline
    • But not for -fno-inline-functions (breaking LTO)
    • But we did use it for -finline-hint-functions (yay, LTO is happy!)
    • But we didn't for -O0 (LTO is sad yet again...)
  • We had weird structuring of CodeGenOpts with both an inlining enumeration and a boolean. They interacted in weird ways and needlessly.
  • A *lot* of set smashing went on with setting these, and then got worse when we considered optnone and other inlining-effecting attributes.
  • A bunch of inline affecting attributes were managed in a completely different place from -fno-inline.
  • Even with -fno-inline we failed to put the LLVM noinline attribute onto many generated function definitions because they didn't show up as AST-level functions.
  • If you passed -O0 but -finline-functions we would run the normal inliner pass in LLVM despite it being in the O0 pipeline, which really doesn't make much sense.

Sadly, this causes a bunch of churn in tests because we don't run the
optimizer in the tests and check the contents of attribute sets. It
would be awesome if attribute sets were a bit more FileCheck friendly,
but oh well.

I think this is a significant improvement and should remove the semantic
need to change what inliner pass we run in order to comply with the
requested inlining semantics by relying completely on attributes. It
also cleans up tho optnone and related handling a bit.

One unfortunate aspect of this is that for generating alwaysinline
routines like those in OpenMP we end up removing noinline and then
adding alwaysinline. I tried a bunch of other approaches, but because we
recompute function attributes from scratch and don't have a declaration
here I couldn't find anything substantially cleaner than this.

Depends on D28047

Event Timeline

chandlerc updated this revision to Diff 82329.Dec 22 2016, 4:56 AM
chandlerc retitled this revision from to Cleanup the handling of noinline function attributes, -fno-inline, -fno-inline-functions, -O0, and optnone..
chandlerc updated this object.
chandlerc added reviewers: rsmith, rnk, mehdi_amini.
chandlerc added a subscriber: cfe-commits.
rnk edited edge metadata.Dec 22 2016, 9:09 AM

The big change here is that clang -O0 now applies the noinline attribute everywhere. I can see why someone might expect things to work that way, but it seems surprising to me at first glance.

Before this change I can imagine someone distributing a static archive of bitcode which could be used to produce debug binaries and release binaries. There are some issues with that (lack of lifetime markers), but it should mostly work.

Before this change, this command line would do inlining: clang -S -emit-llvm foo.c -o - | opt -O2 -S Now it won't, even though I didn't explicitly pass -O0. Are we happy with that?

mehdi_amini edited edge metadata.Dec 22 2016, 9:54 AM

It would be awesome if attribute sets were a bit more FileCheck friendly, but oh well.

I've been wondering about that, what's the point of attribute sets in the textual IR?
I understand the idea for saving space in the Bitcode, but the IR does not have to use the same representation.

In D28053#629768, @rnk wrote:

The big change here is that clang -O0 now applies the noinline attribute everywhere. I can see why someone might expect things to work that way, but it seems surprising to me at first glance.

Before this change I can imagine someone distributing a static archive of bitcode which could be used to produce debug binaries and release binaries. There are some issues with that (lack of lifetime markers), but it should mostly work.

Note that LTO runs a bunch of optimizations during the compile phase that you wouldn't get with that.

Steven had issues in the past when mixing O0 and O2 in LTO, I don't remember the details unfortunately (and Steven is OOO).
Right now we don't add the optnone attribute when building with O0, maybe we should as well if we go this route.

mehdi_amini added inline comments.Dec 22 2016, 10:15 AM
lib/Frontend/CompilerInvocation.cpp
453

I'd switch the two if

mehdi_amini added inline comments.Dec 22 2016, 10:22 AM
lib/Frontend/CompilerInvocation.cpp
453

The test > 1 and the comment about -O0 are confusing to me as well. What about the following (IIUC):

if (Opts.OptimizationLevel <= 1) {
  // "always-inline" required for correctness at O0 and O1.
  Opts.setInlining(CodeGenOptions::OnlyAlwaysInlining);
else if (Opts.OptimizationLevel > 1) {
  // Normal inlining at O2 and above
  Opts.setInlining((CodeGenOptions::NormalInlining);
  // -fno-inline-functions overrides
  if (Arg *InlineArg = Args.getLastArg(
          options::OPT_finline_functions, options::OPT_finline_hint_functions,
          options::OPT_fno_inline_functions, options::OPT_fno_inline)) {
     if (InlineOpt.matches(options::OPT_finline_functions))
        Opts.setInlining(CodeGenOptions::NormalInlining);
      else if (InlineOpt.matches(options::OPT_finline_hint_functions))
        Opts.setInlining(CodeGenOptions::OnlyHintInlining);
      else
        Opts.setInlining(CodeGenOptions::OnlyAlwaysInlining);
  }
}

It would be awesome if attribute sets were a bit more FileCheck friendly, but oh well.

I've been wondering about that, what's the point of attribute sets in the textual IR?
I understand the idea for saving space in the Bitcode, but the IR does not have to use the same representation.

Will return to the actual patch review, but as an aside, I would *strongly* support going back to printing attributes in the canonical location in the function declaration / definition. It has huge advantages:

  • Can FileCheck individual function's attributes rather than a set at a time
  • Can place those checks after a CHECK-LABEL for the function name
  • Can have a CHECK-LABEL for the function name, CHECKs for attributes *and* CHECKs for function body stuff, which today is extremely hard to combine into a single test.

Happy to help with this if you want to pursue it.

chandlerc updated this revision to Diff 82356.Dec 22 2016, 11:46 AM
chandlerc edited edge metadata.

Fix an thinko found in review, clean up comments, and clean up pass pipeline
selection as that actually needs to happen in this patch as well.

Also update one test using -O1 and observing this, including making it use
FileCheck more effectively.

In D28053#629768, @rnk wrote:

The big change here is that clang -O0 now applies the noinline attribute everywhere. I can see why someone might expect things to work that way, but it seems surprising to me at first glance.

Before this change I can imagine someone distributing a static archive of bitcode which could be used to produce debug binaries and release binaries. There are some issues with that (lack of lifetime markers), but it should mostly work.

Note that LTO runs a bunch of optimizations during the compile phase that you wouldn't get with that.

There are other issues with this workflow as well. For example, Clang at -O0 will intentionally reuse local allocas to save stack space without the backend having to track lifetime. This destroys aliasing information left and right though, so you really wouldn't want this for a mixed-use bitcode archive.

I think a much more effective technique (if folks actually want to do this) would be (with whatever flag spelling you want)

% clang -O -emit-llvm -mllvm -disable-llvm-passes

This will generate a frontend-optimized but backend pristine bitcode file that can be processed more or less depending on the desire of the user...

But I also don't think we really need to do a ton of work to support this hypothetical. If someone really wants to make this work, they should specifically request some mode and we should surface a clear flag documented to provide it. Joerg's idea about '-emit-raw-llvm' would be a nice step in this direction IMO.

Steven had issues in the past when mixing O0 and O2 in LTO, I don't remember the details unfortunately (and Steven is OOO).
Right now we don't add the optnone attribute when building with O0, maybe we should as well if we go this route.

Fundamentally, yes. This patch is trying to follow the path we've been going down where we try to propagate optimization restrictions to LTO. We do this for -Os and -Oz, but not -O0. This propagates one aspect (inlining), but I would be interested in using optnone to more effectively propagate O0.

That said, I agree with Reid that it is a bit surprising. After thinking about it a lot, the reason I personally find it surprising is actually that I find -O0 being the default surprising. But I'm OK with not changing that here, and maybe never changing that at the CC1 layer.

Thoughts? Generally happy with this direction?

There is another semantic change here that I intended but Mehdi highlighted I failed to fully implement: I'm also *not* using 'noinline' at -O1, and instead just building a simpler pass pipeline. I think this better fits with the intent: -O1 vs -O2 vs -O3 don't really change the *target* of optimization but instead *how much* optimization to do. As such, they feel like they don't need to be in attributes where -O0, -Os, and -Oz do.

lib/Frontend/CompilerInvocation.cpp
453

Yea, this was confusing, and I hadn't actually implemented my intent here either.

My goal was to make the Inlining setting correspond to whether we *forcibly* disable some aspects of inlining or not, rather than anything to do with what inliner pass we run.

This means we should only be setting OnlyAlwaysInlining in O0, not at O1, and the O1 behavior should be up to the backend's construction of a pass pipeline appropriate to that optimization level. (I actually think using the always inliner is a mistake here, but that's a separate discussion.)

To make all of this work requires me to merge in the next patch I was working on which I've done. It doesn't really scope creep much.

I've updated the formulation of the conditions and the comments to hopefully be more clear.

I'm still doing the weird order of testing because i want to always run getLastArg to mark these flags as used.

mehdi_amini added inline comments.Dec 22 2016, 11:56 AM
lib/Frontend/CompilerInvocation.cpp
453

always run getLastArg to mark these flags as used.

Oh I missed this side effect in the test, that's nasty. Is it covered by a test? (i.e. if someone refactor this code in the future will we warn for unused and break a test?).

That said, I agree with Reid that it is a bit surprising. After thinking about it a lot, the reason I personally find it surprising is actually that I find -O0 being the default surprising. But I'm OK with not changing that here, and maybe never changing that at the CC1 layer.

We could reduce the "surprising effect" for LTO behavior by requiring an explicit -O when -flto is passed to the driver.

Thoughts? Generally happy with this direction?

LGTM.

rsmith accepted this revision.Dec 22 2016, 5:01 PM
rsmith edited edge metadata.
This revision is now accepted and ready to land.Dec 22 2016, 5:01 PM
This revision was automatically updated to reflect the committed changes.