For low trip counts the vectoriser will attempt to create a single
predicated loop that folds the scalar tail into the vector body. For
some combinations of the trip count and the VF it is possible to
determine at compile time if there will only be a single vector
iteration. If so, we can avoid creating the comparison at the end of
the loop and just always branch to the loop exit. This improves the
code quality for smaller loops with low trip counts because the
compare + branch add a relatively high cost to the loop.
This optimisation may also apply for unpredicated vector loops with
low trip counts too, hence the change in test X86/pr42674.ll.
Initially I wanted to suggest creating a new (unconditional) branch instruction here, but at this point you don't know which VF will be chosen for the given VPlan, so we have to defer this decision until codegen.