This patch is to enable partial unrolling and runtime unrolling for AArch64 target. Applying this patch with the runtime unrolling prologue changes I just sent together, the SPEC2000 got improved by 0.6%, and for SPEC2006, the improved number is 0.8%. For code size, the images of two benchmarks got same 20% inflation. This experiment is done on A57.
Thanks,
Kevin
I recommend not enabling it this way. Instead, you should set LoopMicroOpBufferSize in the relevant scheduling model. You'll see this is done in lib/Target/X86/X86SchedHaswell.td, for example. This was you can take advantage of the default logic in BasicTargetTransformInfo.cpp (which does things like exclude loops with function calls).
Also, it will force you to pick a threshold, which is really a core-specific property (generally speaking). In my experience, this helps on OOO cores only for small loops. For in-order cores, especially if you use AA during instruction scheduling, it can help for larger loops too (but obviously not *too* large).