AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
The only use of removed patterns i identified was in Insert1BitVector() , update it implementation to use natively supported shift .
Fix 2 additional bugs in Insert1BitVector implementation
IdxVal == 0 case -- SubVec should be zero extend , IdxVal + SubVecNumElems == NumElems - Vec was used instead SubVec
I failed to create test case for IdxVal == 0 , may be you can suggest one.
Most probably in many cases insert kmask sub-vector into zero-vector don't require any instructions ( shiftl + shiftr ) to zero upper part - it already zero.
I think only exstruct_subvec (low part) in current implementation doesn't zero upper part, but i am not sure. In any case i believe this performance improvement ( if it is possible ) should be implemented as separate patch.
Thanks.
" " after if