When trip count is not known at compile time, there are additional overhead to make sure it's safe to perform next VF iterations. Thus, if vector loop is skipped at runtime then such vectorization is unprofitable. When trip count is known to be small enough there is high chance to get into such situation. Currently, LV is not able to properly cost model in this case since it doesn't account for cost of the epilog loop. Instead "short trip count" heuristic is employed.
While "short trip count" heuristic makes sense in general (at least for current state) it can be slightly lifted up when trip count is compile time known constant. In this case it's known at compile time how many vector iterations will be executed and there is no implied overhead by trip count checks as well. Cost modeling is simple as well, if one vector iteration costs less than one scalar iteration multiple VF then vectorization is profitable.
Note: One may say, that "short trip count" heuristic is the needed to reduce code size in assumption that short trip count loops can't be performance critical. That statement turns out to be false in many cases (for example, nested loops) and should not be driving factor.
I don't quite understand this change. The whole point of getMinTripCountTailFoldingThreshold() was to give targets control over this behaviour based on their understanding how the cost model has been implemented.
Admittedly this was in part due to the immaturity of the cost modelling but this change essentially removes that flexibility to the point where there's no value in keeping getMinTripCountTailFoldingThreshold()?
If your previous patches improve the cost model in this regard then I'd rather getMinTripCountTailFoldingThreshold() be removed. That said, @dtemirbulatov can you help here in ascertaining if this option is still required based on this new patch series?