This adds a StructuredSdot op, which is similar to the usual Neon
intrinsic except that it takes 2d vector operands, reflecting the
structure of the arithmetic that it's performing: 4 separate
4-dimensional dot products, whence the vector<4x4xi8> shape.
This also adds a new pass, arm-neon-structured-to-sdot, lowering
this new 2d structured op to the 1d intrinsic.