In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of
backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled.
I test it on POWER target, it makes big(+~30%) performance improvement in one specific bmk of spec2017 and no other obvious degressions. Could anyone help to adjust the register num and verify in other targets?
I'm not sure that these defaults make sense. Many targets won't even have these as distinct classes (e.g., PowerPC with VSX). I think that we should have the default implementation just return one register class, 0, with its current default (which I suppose is 8 registers), and the default implementation will put everything in that one class. Then, I don't think that we need this enum at all.
My impression is that you decided to do it this way so that you could write in the other targets:
but I think it's better to just give all of the other targets which did something with Vector two register classes, and return the second one for all types which are vector types. That should match the current behavior and then the targets can customize as they see fit. But I'd leave this all within each target (there's no need to expose generic classes because there's no need for a generic meaning).