[ARM,AArch64] Fix rev16l and rev16ll intrinsics
These two intrinsics are defined in arm_acle.h.
__rev16l needs to rotate by 16 bits, bit it was actually rotating by 2 bits.
For AArch64, where long is 64 bits, this would still be wrong.
rev16ll was incorrect, it reversed the bytes in each 32-bit word, rather than
each 16-bit halfword. The correct implementation is to apply rev16 to the top
and bottom words of the 64-bit value.
For AArch32 targets, these get compiled down to the hardware rev16 instruction
at -O1 and above. For AArch64 targets, the 64-bit ones get compiled to two
32-bit rev16 instructions, because there is not currently a pattern for the
64-bit rev16 instruction.
Differential Revision: http://reviews.llvm.org/D14609