This patch adds an option to disable speculation of a triangle when its
tail is the only latch block of this loop. At this time, the option
-aarch64-ccmp-disable-triangle-latch is disabled by default. I'm hoping for feedback
from others on the profitability on other targets.
When the tail of triangle is the only latch block of this loop, we end up inserting ccmp
inside the critical path of the loop. If the speculated code is cold we execute
the cold code for all the loop iterations. If the speculated code were hot the branch
predictor would anyway take that direction.
This impacts the chances of forming a ld/st pair because now the loads could possibly
end up in different blocks. However, when tested on Kryo the performance was slightly
better on spec2006 CINT/CFP benchmarks and no regressions above noise range.