If llvm.aarch64.neon.uaddlv intrinsic has v8i8 type input, the it returns 16-bits value.
clang generates llvm.aarch64.neon.uaddlv.i32.v8i8 and trunc to i16 for vaddlv_u8 neon intrinsic. It causes additional and 0xffff instruction from attached example as below.
foo: // @foo uaddlv h0, v0.8b fmov w8, s0 and w0, w8, #0xffff ret
If we mark know zero for high 16-bits of uaddlv intrinsic output with v8i8, we can avoid the additional and 0xfff.
foo: // @foo uaddlv h0, v0.8b fmov w0, s0 ret