These intrinsics return the number of elements in a streaming
vector, for example aarch64.sme.cntsw returns the number of
32-bit elements. When in streaming mode these are equivalent
to aarch64.sve.cntb/h/w/d with an input value of 1.
I have implemented these intrinsics using the rdsvl instruction
and added tests here:
CodeGen/AArch64/SME/sme-intrinsics-rdsvl.ll