This is an archive of the discontinued LLVM Phabricator instance.

Size LTO (1/3): Standardizing the use of OptimizationLevel
Needs ReviewPublic

Authored by rcorcs on Jun 4 2020, 10:53 PM.

Details

Summary

This patch is the first in the sequence of three patches for supporting size optimization with LTO. The planned patches are:
1: Standardizing the use of OptimizationLevel across pass builders, which includes both SpeedupLevel and SizeLevel.
2: Enable the support for -Os and -Oz for LTO in lld.
3: Tune the LTO pipeline for size optimization.

Since we already have a class that describes both speed and size levels of optimization, I believe it is a good idea to use across the code base when defining optimization levels.
In the next patch, instead of adding a SizeLevel variable for the LTO configuration, I'll be able to simply use this OptimizationLevel variable.

Diff Detail

Event Timeline

rcorcs created this revision.Jun 4 2020, 10:53 PM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

This patch is the first in the sequence of three patches for supporting size optimization with LTO.

Can you start by describing the problem you're trying to solve and the overall approach, including the end-to-end user-interface?

rcorcs added a comment.Jun 5 2020, 7:57 AM

LLVM already has the class PassBuilder::OptimizationLevel that encapsulates the logic of both speed and size optimization levels. This class already checks which values for SpeedLevel and SizeLevel are valid.

However, other parts of the code define two separate variables to describe speed and size optimization levels with their semantic specified either in comments or code. Note that when SizeLevel!=0, OptLevel (or SpeedLevel) is usually expected to be 2, that is, their values are interdependent.

From my understanding, ideally, LLVM would use the same OptimizationLevel encapsulation everywhere. If the same encapsulation is used everywhere we can avoid conversions and guarantee
that they always have the same semantics.

For example, the class PassManagerBuilder defines these two separate variables with their semantics in comments:

/// The Optimization Level - Specify the basic optimization level.
///    0 = -O0, 1 = -O1, 2 = -O2, 3 = -O3 
unsigned OptLevel;

/// SizeLevel - How much we're optimizing for size.
///    0 = none, 1 = -Os, 2 = -Oz
unsigned SizeLevel;

On the other hand, the class ThinLTOCodeGenerator defines the semantics of OptLevel in code:

/// IR optimization level: from 0 to 3.
void setOptLevel(unsigned NewOptLevel) {
  OptLevel = (NewOptLevel > 3) ? 3 : NewOptLevel;
}

This patch standardizes the use of OptimizationLevel across PassBuilder, PassManagerBuilder, LTO configuration, and LTO code generators. Even with this patch, further work is still needed to standardize the use of OptimizationLevel across LLVM.

rcorcs added a comment.EditedJun 5 2020, 8:02 AM

If reviewers think that this patch is touching in too many files, I could try to focus it only on the LTO related files, converting OptimizationLevel back to two separate values when necessary.

mehdi_amini added a comment.EditedJun 5 2020, 8:29 PM

This patch standardizes the use of OptimizationLevel across PassBuilder, PassManagerBuilder, LTO configuration, and LTO code generators. Even with this patch, further work is still needed to standardize the use of OptimizationLevel across LLVM.

The distinction of size level for LTO isn't obvious to me, this is why I'm asking some clarification here.
In general with LTO the Os/Oz frontend flags will trigger the addition of function attributes, however the Os/Oz flags aren't impacting the optimizer pipeline during LTO (so they basically have no effect during LTO and get mapped to O2 directly).

The way I see it, with size level for LTO, we could have a different LTO optimization pipeline for size or runtime performance. For example, we could have a different tuning for inlining, vectorization, etc. We could also use the size level to automatically enable optimizations such as HotColdSplitting, MergeFunctions, etc., instead of relying on specific enabling flags. We could also have other size-specific optimizations in the future, such as MergeSimilarFunctions (https://reviews.llvm.org/D52896).

I believe that function attributes for size are useful for optimizing cold functions that have been outlined by HotColdSplitting, for example. However, an ideal size level LTO would involve a different optimization pipeline and also a different tuning of those optimizations.

For example, when optimizing for size, we could disable loop vectorization and have SLP optimizing based on the code-size cost model. We could also have MergeFunctions enabled with Os and both MergeFunctions and MergeSimilarFunctions enabled with Oz. A similar logic could be applied to other optimizations, such as Inlining, HotColdSplitting, etc.

Sorry I haven't had a chance to review this yet. Added @mtrofin who recently added the OptimizationLevel class that this is leveraging, and @yamauchi who has also been looking at size optimizations in llvm.

At a first glance it isn't clear to me how much of this is NFC related refactoring/cleanup vs behavior change. If it has both, it would be helpful to split into an NFC patch first and then a follow on with actual behavior changes. Can you clarify?

mtrofin added inline comments.Jun 11 2020, 9:04 AM
llvm/include/llvm/IR/PassManager.h
413

I think this change - moving OptimizationLevel out - should be in its own patch, to avoid noise.

llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
22

It's unfortunate we now need to pull pass management into places that didn't have that dependency. IIUC, the goal of this overall effort includes piping though the full user-requested optimization parameters (i.e. both speed and size). Given the likely diversity of the consumers, it may make sense to move OptimizationLevel in its own header?

llvm/tools/opt/CMakeLists.txt
20

Nit: make this change separately, and since it's just a style change, it can probably be just submitted with no review.

The way I see it, with size level for LTO, we could have a different LTO optimization pipeline for size or runtime performance.

So this is the important point to settle before going on with any patch: this isn't how LTO is setup today.

For example, we could have a different tuning for inlining, vectorization, etc.

All these are covered by the function attributes already.

We could also use the size level to automatically enable optimizations such as HotColdSplitting, MergeFunctions, etc., instead of relying on specific enabling flags. We could also have other size-specific optimizations in the future, such as MergeSimilarFunctions (https://reviews.llvm.org/D52896).

All these could be in the LTO pipeline and driven by the attribute as well.

I believe that function attributes for size are useful for optimizing cold functions that have been outlined by HotColdSplitting, for example.

The attribute is added by the frontend and can change per translation-unit / per function though.