Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.
Added some tests for different cases where this optimization triggers.
CorrectIdx here is boolean - 1 or 0. Further you do ++.