In this patch, I am converting the above intrinsic into generic IR. This allows us to do further optimization on the above intrinsic. Besides of the above, we added also a new instcombine that combines ExtractElementInst with bitcast into shuffle instruction. This folding fixes an issue of inserting an avx2's intrinsics as input into avx512's intrinsics.
The patch is a part of two patches: clang side and this LLVM side.
clang side: https://reviews.llvm.org/D40465