The problem Alexander reported on D127982 was caused by an optimization
for AVX512-FP16 instruction. We must limit it to the feature enabled only.
During the investigation, I found we didn't expand for fp_round/fp_extend
without F16C. This may result runtime crash, so change them too.
It seems the condition can be refined to hasAVX(), or we can add hasAVX() check in lowerShuffleAsBroadcast.
https://www.felixcloutier.com/x86/vbroadcast.html