If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes.
For some special cases, the performance can improve more than 30% after adding the patch for ppc.
This patch have a dependency on the patch D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.
We only apply to innermost loops, can we use something like DisableInnerMostLoopAlign32 / disable-ppc-innermost-loop-align32