When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, if TC is a known constant and if TC is a multiple of the VF. Targets with efficient vector masking can thereby overcome the last three TC-related conditions: see “Direction #1” in [[ http://lists.llvm.org/pipermail/llvm-dev/2018-August/125042.html | [llvm-dev] Vectorizing remainder loop ]] - this patch applies that transformation of setting the trip-count of the vector loop to be TC rounded-up to a multiple of VF while masking the vector body under a newly introduced "if (i < TC)" condition; or rather "if (i <= TC-1)" to overcome the aforementioned overflow hazard.
The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os.
Handling loops with reductions and live-outs are marked as TODOs for subsequent extensions.