Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.
Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure.
Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.
You don't need "else".
if VT is v4i64 you need AVX2 to emit the PCMPGT.