This will coax the truncate lowering to emit an extract_subvector and a packuswb instead of an extract_subvector, 2 pshufbs, and a punpcklqdq. But don't do this if we have an AVX512 truncate instruction available.
Diff Detail
Diff Detail
Event Timeline
Comment Actions
It looks OK, but wouldn't we be better off trying to improve the truncation lowering?
This an extra truncate on the last step. Maybe need some SimplifyDemandedBits/SimplifyDemandedVectorElts enhancement here?