AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
example
shufflevector <8 x double> %x, <8 x double> %x1, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 0, i32 1, i32 4, i32 5> Before optimization the follow instructions was generated: vmovdqa64 LCPI0_0(%rip), %zmm1 vpermpd %zmm0, %zmm1, %zmm0
After optimization:
vshuff64x2 $136, %zmm0, %zmm0, %zmm0
case_VSHUF(64x2)
case_VSHUF(32x4)