In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate the int and float type register pressure separately(especially for scalar type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
For POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(gpr), float is 64(vsr). But for int and float vector register both are 64(vsr).
I test it on POWER target, it makes big(+~30%) performance improvement in one specific bmk of spec2017 and no other obvious degressions. Could anyone help to adjust the register num and verify in other targets?