Currently, for this node:
vector int test(int a, int b, int c, int d) { return (vector int) { a, b, c, d }; }
we get this on Power9:
mtvsrdd 34, 5, 3 mtvsrdd 35, 6, 4 vmrgow 2, 3, 2
and this on Power8:
mtvsrwz 0, 3 mtvsrwz 1, 5 mtvsrwz 2, 4 mtvsrwz 3, 6 xxmrghd 34, 1, 0 xxmrghd 35, 3, 2 vmrgow 2, 3, 2
This can be improved to this on LE Power9:
rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrdd 34, 5, 3
and this on LE Power8
rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrd 34, 3 mtvsrd 35, 5 xxpermdi 34, 35, 34, 0
This patch updates the TD pattern to generate the optimized sequence for both Power8 and Power9 on LE and BE.