If llvm.aarch64.neon.uaddlv intrinsic has v8i8 type input, the it returns 16-bits value.
clang generates llvm.aarch64.neon.uaddlv.i32.v8i8 and trunc to i16 for vaddlv_u8 neon intrinsic. It causes additional and 0xffff instruction from attached example as below.
foo: // @foo
uaddlv h0, v0.8b
fmov w8, s0
and w0, w8, #0xffff
retIf we mark know zero for high 16-bits of uaddlv intrinsic output with v8i8, we can avoid the additional and 0xfff.
foo: // @foo
uaddlv h0, v0.8b
fmov w0, s0
ret