If llvm.aarch64.neon.uaddlv intrinsic has v8i8 type input, the it returns 16-bits value.
clang generates llvm.aarch64.neon.uaddlv.i32.v8i8 and trunc to i16 for vaddlv_u8 neon intrinsic. It causes additional and 0xffff instruction from attached example as below.
foo:                                    // @foo
        uaddlv  h0, v0.8b
        fmov    w8, s0
        and     w0, w8, #0xffff
        retIf we mark know zero for high 16-bits of uaddlv intrinsic output with v8i8, we can avoid the additional and 0xfff.
foo:                                    // @foo
        uaddlv  h0, v0.8b
        fmov    w0, s0
        ret