Previously, getRegUsageForType was implemented using
getTypeLegalizationCost. getRegUsageForType is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type. However, getTypeLegalizationCost currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.
Instead, use getNumRegisters, which understands when scalarization
can occur (via computeRegisterProperties).
This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.
I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT. I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.
I lack some historical knowledge here but I agree it does look like the current implementation is answering the wrong question here.
Assuming others agree with the intent of the change I'm thinking the function definition should also be changed. Returning InstructionCost seems wrong and likely just the result of the original call to getTypeLegalizationCost(). I think unsigned is more representative of the function's intent.