[AVX512] Fix insertelement i1 lowering.
1 . Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
- Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
Insertion of one bit into first or last position can be done with two SHIFTs + OR.