It's completely impossible to check that I've actually found all the issues, due to the use of macros in arm_neon.h, but hopefully this time it'll take more than a few hours for someone to find another issue.
I have no idea why, but apparently there's a rule that some, but not all, builtins which should take an fp16 vector actually take an int8 vector as an argument. Fix this, and add test coverage.