Compilation and autovectorisation of a fp16 reduction kernel like this:
_Float16 sum = .0F16; for (unsigned i = 0; i < N; i++) sum += A[i]; return sum;
fails with an instruction selection 'cannot match' error. A BUILD_VECTOR node is created to hold the 'sum' vector, which gets initialised with VMOVIMM. The problem was that BUILD_VECTOR nodes for v4f16 and v8f16 were assigned the wrong type so that it didn't know how to lower the VMOVIMM.
There are different ways to initialise vectors with constants, e.g. constant pool loads or vmov with immediates. But this BUILD_VECTOR node is another case, that gets created for constant initialised phi nodes, which again, we were not handling.
In a follow up commit, I will add support for 'extractelt' from v4f16 and v8f16 vectors, which is the last step to get this fully working.
Why does this matter?