This is an archive of the discontinued LLVM Phabricator instance.

Differential D19827

Do not disable completely loop unroll when optimizing for size.
ClosedPublic

Authored by mamai on May 2 2016, 1:15 PM.

Download Raw Diff

Details

Reviewers

chandlerc
rnk

Commits

rG21ac3bfc691a: Do not disable completely loop unroll when optimizing for size.
rC268509: Do not disable completely loop unroll when optimizing for size.
rL268509: Do not disable completely loop unroll when optimizing for size.

Summary

By disabling completely the loop unroll when optimizing for size (/Os), the #pragma unroll have no effect at that optimization level.

This contradicts the paragraph in an llvm blog post about the loop pragmas (http://blog.llvm.org/2014/11/loop-vectorization-diagnostics-and.html) saying the following:

For example, when compiling for size (-Os) it's a good idea to vectorize the hot loops of the application to improve performance. Vectorization, interleaving, and unrolling can be explicitly specified using the #pragma clang loop directive prior to any for, while, do-while, or c++11 range-based for loop.

Also, as explained in a previous commit, the loop unroll pass already have the logic to handle optimizing for size (http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20130805/085399.html).

Diff Detail

Repository: rL LLVM

Event Timeline

mamai updated this revision to Diff 55866.May 2 2016, 1:15 PM

mamai retitled this revision from to Do not disable completely loop unroll according to optimization level..

mamai updated this object.

mamai added a reviewer: chandlerc.

mamai set the repository for this revision to rL LLVM.

mamai added a subscriber: cfe-commits.

I believe the LLVM blog post is in error. Loop vectorization commonly generates two versions of the loop: vectorized and scalar. The scalar loop is necessary to handle the case where the trip count isn't evenly divisible by the vectorization factor. Therefore, it's reasonable to disable these type of optimizations (e.g., vectorization, unrolling) when we're optimizing for size. It's also reasonable to disable these optimizations at -O0.

However, I also understand the argument being made by Chandler. Can you please create an LLVM patch the shows the loop unroll pass respects the equivalent -Os/-Oz/-O0 in LLVM IR?

I believe the first two are handled by checking Function::optForSize() and for the latter you can check the function for the optnone attribute.

I think the blog comment is right. The pragma should make the loop unroll even in /Os. I think it is essential to allow the user to optimize some specific loops even if he generally wants to optimize for size the rest of the code. I will add tests that show the behavior of the loop unroll pass when optnone or optsize are specified.

In D19827#419870, @mamai wrote:

I think the blog comment is right. The pragma should make the loop unroll even in /Os. I think it is essential to allow the user to optimize some specific loops even if he generally wants to optimize for size the rest of the code. I will add tests that show the behavior of the loop unroll pass when optnone or optsize are specified.

You're correct. No more reviewing patches after my bedtime. :)

mcrosier mentioned this in D19870: Adding test cases showing the behavior of LoopUnrollPass according to optnone and optsize attributes.May 3 2016, 8:18 AM

Modified the patch not to affect /O1 optimization level.

lgtm

To be clear, loop unrolling just lowers its size threshold when -Os is on:

// Apply size attributes
if (L->getHeader()->getParent()->optForSize()) {
  UP.Threshold = UP.OptSizeThreshold;
  UP.PartialThreshold = UP.PartialOptSizeThreshold;
}

So this does more than enabling loop unrolling when pragmas are present. However, it that behavior is wrong then we should fix it in LLVM.

This revision is now accepted and ready to land.May 4 2016, 8:11 AM

Closed by commit rL268509: Do not disable completely loop unroll when optimizing for size. (authored by mamai). · Explain WhyMay 4 2016, 8:32 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Frontend/

CompilerInvocation.cpp

2 lines

Diff 56158

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 507 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.NoImplicitFloat = Args.hasArg(OPT_no_implicit_float);		Opts.NoImplicitFloat = Args.hasArg(OPT_no_implicit_float);
Opts.OptimizeSize = getOptimizationLevelSize(Args);		Opts.OptimizeSize = getOptimizationLevelSize(Args);
Opts.SimplifyLibCalls = !(Args.hasArg(OPT_fno_builtin) \|\|		Opts.SimplifyLibCalls = !(Args.hasArg(OPT_fno_builtin) \|\|
Args.hasArg(OPT_ffreestanding));		Args.hasArg(OPT_ffreestanding));
if (Opts.SimplifyLibCalls)		if (Opts.SimplifyLibCalls)
getAllNoBuiltinFuncValues(Args, Opts.NoBuiltinFuncs);		getAllNoBuiltinFuncValues(Args, Opts.NoBuiltinFuncs);
Opts.UnrollLoops =		Opts.UnrollLoops =
Args.hasFlag(OPT_funroll_loops, OPT_fno_unroll_loops,		Args.hasFlag(OPT_funroll_loops, OPT_fno_unroll_loops,
(Opts.OptimizationLevel > 1 && !Opts.OptimizeSize));		(Opts.OptimizationLevel > 1));
Opts.RerollLoops = Args.hasArg(OPT_freroll_loops);		Opts.RerollLoops = Args.hasArg(OPT_freroll_loops);

Opts.DisableIntegratedAS = Args.hasArg(OPT_fno_integrated_as);		Opts.DisableIntegratedAS = Args.hasArg(OPT_fno_integrated_as);
Opts.Autolink = !Args.hasArg(OPT_fno_autolink);		Opts.Autolink = !Args.hasArg(OPT_fno_autolink);
Opts.SampleProfileFile = Args.getLastArgValue(OPT_fprofile_sample_use_EQ);		Opts.SampleProfileFile = Args.getLastArgValue(OPT_fprofile_sample_use_EQ);

setPGOInstrumentor(Opts, Args, Diags);		setPGOInstrumentor(Opts, Args, Diags);
Opts.InstrProfileOutput =		Opts.InstrProfileOutput =
▲ Show 20 Lines • Show All 1,880 Lines • Show Last 20 Lines