This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086
We want to remove as much custom x86 shuffling nonsense as possible.
For the avxintrin.h header, it's not clear to me how we decide between macro implementations and inline function implementations. I thought macros were generally frowned upon, so I went with inlines.