Before reaching the header block of the vectorized loop, the Loop Vectorizer generates two conditions to make sure the vectorized loop will iterate at-least once: first emitVectorLoopEnteredCheck() compares the Scalar Trip Count to VF*UF, and if the former is not less than the latter, emitMinimumIterationCountCheck() compares the Vector Trip Count to zero.
In general, VTC = STC mod VF*UF, unless this does not comply with an optional constraint to execute at-least a single scalar iteration in the epilog, aka requiresScalarEpilogue(). In this case a single (the last) vector iteration is peeled and replaced with VF*UF scalar iterations (instead of none), reducing VTC by 1.
This patch replaces the above two comparisons for VTC == 0 with a single comparison:
- STC < VF*UF, when requiresScalarEpilogue() does not hold; or
- STC <= VF*UF, when requiresScalarEpilogue() does hold,
effectively removing the basic-block originally named "min.iters.checked".
The original documentation of emitMinimumIterationCountCheck() claiming it checks for overflow seems obsolete (right?).
Original observation and initial patch by Evgeny Stupachenko.