As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.
This is a blocker for generic vector CTPOP expansion (P32655) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering......