Page MenuHomePhabricator

[ARM] FP16: constant initialised v4f16 and v8f16 vectors
Needs ReviewPublic

Authored by SjoerdMeijer on Nov 29 2018, 9:10 AM.

Details

Summary

Compilation and autovectorisation of a fp16 reduction kernel like this:

    
_Float16 sum = .0F16;
for (unsigned i = 0; i < N; i++)
  sum += A[i];
return sum;

fails with an instruction selection 'cannot match' error. A BUILD_VECTOR node is created to hold the 'sum' vector, which gets initialised with VMOVIMM. The problem was that BUILD_VECTOR nodes for v4f16 and v8f16 were assigned the wrong type so that it didn't know how to lower the VMOVIMM.

There are different ways to initialise vectors with constants, e.g. constant pool loads or vmov with immediates. But this BUILD_VECTOR node is another case, that gets created for constant initialised phi nodes, which again, we were not handling.

In a follow up commit, I will add support for 'extractelt' from v4f16 and v8f16 vectors, which is the last step to get this fully working.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Nov 29 2018, 9:10 AM

The lib/Target/ARM/ARMInstrNEON.td look like they belong in a separate patch. And it looks like there isn't any explicit test coverage of bitcasts.

Please clean up the testcase so it only contains the relevant operations; the whole loop isn't necessary.

lib/Target/ARM/ARMISelLowering.cpp
5744

Why does this matter?

olista01 added inline comments.Nov 30 2018, 1:04 AM
lib/Target/ARM/ARMInstrNEON.td
7145

There are still a lot of combinations of types missing here, I think it would be better to add all of them at once. It might also be better to re-write all of these patterns as a pair of tablegen foreach loops, so that we don't miss any combinations.

7228

Why does this need a VREV? All of the other cases where the lane widths are the same are no-ops for BE and LE.

Thanks for taking a look!

I am actually going to split this up in 3 patches:

  • the changes in ARMISelDAGToDAG.cpp, the are related to (post-increment) VLD1, and should also be tested explicitly
  • the changes in ARMInstrNEON.td, the bitcasts,
  • and then this one to support BUILD_VECTOR.

Please clean up the testcase so it only contains the relevant operations; the whole loop isn't necessary.

I think for testing this BUILD_VECTOR change, a loop is necessary, as this code is generated from:

%vec.phi = phi <8 x half> [ zeroinitializer, %vector.ph ], [ %2, %vector.body ]

but I will have a look again.