This patch removes the limit on how many padding bytes are allowed to
be inserted in order to align loop blocks that have no fallthrough
edges into them and are either a loop header or are preceded in the
layout by a block in a different loop.
This change gives some small performance improvements on AArch64 and
also makes benchmark results less susceptible for variations due to
block placement.
The !LayoutPred->isSuccessor(ChainBB) check already ensures the padding will never be executed. Given that, I guess the remaining checks here are to try to maximize icache hits. In that context, why are loop headers special? Do we care if LayoutPred is part of a subloop of L?