[AVX512] Fix insertelement i1 lowering.
1 . Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
- Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
You can do this if IdxVal !=0 and Vec is defined.
Otherwise it can be done cheaper with shifts and "or"s