In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate the int and float type register pressure separately(especially for scalardifferent register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
For POWER target,So we need classify the register num is special when VSX is enabled. When VSX is enabledclasses in IR level, and importantly these are **abstract** register classes, the number of int scalarand are not the target register is 32(gpr), float is 64(vsr)class of
backend provided in td file. But for int and float vector register both are 64(vsr)It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled.
I test it on POWER target, it makes big(+~30%) performance improvement in one specific bmk of spec2017 and no other obvious degressions. Could anyone help to adjust the register num and verify in other targets?