Lower _mm_mask{|z}_move_{ss|sd} to generic llvm IR instead of x86 intrinsics.
Add support for missing intrinsics:
- _mm_mask{|z}_load_{ss|sd}
- _mm_mask_store_{ss|sd}
- _mm512_int2mask
- _mm512_mask2int
Clarification:
In the load and store intrinsics the vectors are widened to 512 vectors to stay in legal types for AVX512F (Otherwise illegal types are generated - e.g. v2i2).
- A llvm patch with the patterns for these intrinsics is attached as a son (D26022).