This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086
We want to remove as much custom x86 shuffling nonsense as possible.
For the avxintrin.h header, it's not clear to me how we decide between macro implementations and inline function implementations. I thought macros were generally frowned upon, so I went with inlines.
The use of macros leaves these open to Wshadow warnings - what is the effect of casting them directly in the __builtin_shufflevector args?