These instructions have weird register allocation constraints
that requires 4 consecutive registers. LLVM has support for
this using REG_SEQUENCE.
These instructions are also weird in that they do a 128 bit load
conditional on whether any bit of the mask is non-zero. This
differs from normal masked loads. They also have no register form.
We need to use target specific memory ISD opcodes to make sure
memory operands are propagated correctly to the machine instruction.
I've had part of this patch lying around for a couple years. I
tried to clean it up some this weekend. Still need to add tests
and finish the patterns for the scalar instructions. Not completely
sure I'm happy with the intrinsic format for those yet.
Some of the test changes are just from introducing the quad register
classes. This seems to break something in the critical anti dep
breaker if I remember from when I investigated this a year or two