This adds Sdot2d op, which is similar to the usual Neon

intrinsic except that it takes 2d vector operands, reflecting the

structure of the arithmetic that it's performing: 4 separate

4-dimensional dot products, whence the vector<4x4xi8> shape.

This also adds a new pass, arm-neon-2d-to-intr, lowering

this new 2d op to the 1d intrinsic.