Improves the codegen for VECREDUCE_{AND,OR,XOR} operations on AArch64. Currently, these are fully scalarized, except if the vector is a <N x i1>. This patch improves the codegen down to O(log(N)) where N is the length of the vector for vectors whose elements are not i1, by repeatedly applying the bitwise operations to the two halves of the vector until only one element is left, which contains the final result. <N x i1> bitwise reductions are handled using VECREDUCE_{UMAX,UMIN,ADD} instead.
I had to update quite a few codegen tests with these changes, with a general downward trend in instruction count. Since the vector reductions already have tests, I haven't added any new tests myself.
This is my first patch submitted to LLVM, so please tell me if I did anything wrong or if I should change anything.
special -> Special