These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)
I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.
When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in PerformIntrinsicCombine, but
first I had to enable PerformIntrinsicCombine for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of ARMTargetLowering so that it can get at
SimplifyDemandedBits.
clang-format: please reformat the code