Implemented DAG lowering for shuff62x2/shufi62x2 instuctions ( Shuffle Packed Values at 128-bit Granularity )
Tests added , vector-shuffle-512-v8.ll test re-generated.
example
shufflevector <8 x double> %x, <8 x double> %x1, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 0, i32 1, i32 4, i32 5>
Before optimization the follow instruction was generated:
vmovdqa64 LCPI0_0(%rip), %zmm1 vpermpd %zmm0, %zmm1, %zmm0
After optimization:
vshuff64x2 $136, %zmm0, %zmm0, %zmm0