This is my first hack at getting us out of the custom x86 shuffle intrinsic business.
This was suggested in: http://reviews.llvm.org/D7866
Let's start with the vinsertf128 troika. I'll post a clang sibling patch shortly.
Please let me know if I've missed anything. I'm basing these changes on:
http://llvm.org/viewvc/llvm-project?view=revision&revision=230860
If anyone can explain why vinsertf128 with a 0 immediate exists, I'm curious...
What happens if Imm is for example 2 or 4?
According to the Intel documentation: "The high 7 bits of the immediate are ignored". So, only the first bit of 'Imm' has a meaning in this context.
However, your code doesn't clear the upper bits of 'Imm'. So (unless I misread the code) the for loops between lines 665 and 674 would propagate the wrong indices if Imm is an even number bigger than zero.
Can you add a test for the case where Imm is a non-zero even number?