Currently, rotation-max-header-size is at 16, and it was always that way since the beginning (rL35714).
It is not very obvious (to me?) what reasoning is behind this threshold,
is it there to cap the maximal amount of work to be done to rotate the loop,
to cap the amount of instruction number growth, other?
But i think, threshold of 16 could use adjustment.
In particular, i've identified these loops here as not being rotated,
because they have more instructions than allowed (16 < x < 64, ~50).
That, in turn, prevents vectorizer from dealing with them,
which is not optimal since they should be vectorizable.
On my benchmarks, i'm not seeing any perf impact from this change.
And yes, those loops in question still aren't being vectorized, there are more issues.
llvm-compile-time-tracker is also not very unhappy with this:
http://llvm-compile-time-tracker.com/compare.php?from=07c4c7e7959b7fd09830bbdf4dcd533e98aa45ab&to=b40c8ea61a4bea615f2b3e1bbd0eaa67b7a13b44&stat=instructions
It clearly affects those tests, because
- there's pretty small +0.01 .. +0.05% compile-time regression
- this causes, on average, +0.03% size-text increase
So i'm wondering, what is the procedure here, would this be a reasonable change?