The current cost calculation for the conversion instructions has many problems
- The provided numbers are inaccurate
- Huge numbers are given for vector split
I changed the approach for vector cost calculation:
- If the original types are simple - check them first
- if the original vector should be split - take the legal types and multiply by split factor
- split factor is the max factor between source and destination
- do not put v8i32 -> v4i64 cost. The calculated cost should be 2 * (v4i32 -> v4i64).
I also checked all SSE numbers and put the real number of instructions instead.
I understand, that instruction latency is different in SSE2 and AVX, but
the matter of this cost model to let the vectorizer to choose the right VF (compare VFx to VFy for the same target).