[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.
Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.
Differential revision: https://reviews.llvm.org/D66519