Page MenuHomePhabricator

[ARM] Add IR intrinsics for a sample of MVE instructions.
Needs ReviewPublic

Authored by simon_tatham on Wed, Sep 4, 5:47 AM.

Details

Summary

This is a representative sample of the kind of IR intrinsics that
we'll need in order to implement the user-facing ACLE intrinsics for
MVE. We'll need a lot more like this to cover the whole MVE
instruction set, but this commit introduces an intrinsic or two from
lots of different categories:

  • a vector gather load, with and without writeback
  • an interleaving load and store
  • a vector format conversion
  • a vector-to-vector operation
  • a vector-to-scalar reduction
  • the unusual VADC instruction that also uses the FPSCR carry flag
  • predicated versions of all of the above
  • a scalar shift

Where possible, instruction selection for these intrinsics is done
using DAG ISel patterns. But if the intrinsic has to return multiple
values (writeback load, VADC producing the output carry) then the
selection has to be done in C++.

For the predicated intrinsics, we want to specify the predicate value
as a vector of the appropriate number of i1. So I've invented a sort
of 'subroutine' class in the Tablegen (you can write mkpred<T>.p to
get the predicate vector type that goes with a given value vector),
and also provided a couple of intrinsics to convert between those and
the integer form of the predicate as a C programmer will provide it.

Event Timeline

simon_tatham created this revision.Wed, Sep 4, 5:47 AM
dmgreen added inline comments.Mon, Sep 9, 2:49 AM
llvm/include/llvm/IR/IntrinsicsARM.td
806

A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea).

816

Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction?

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
212

Maybe call this something like "AddMVEPredicateToOps"?

llvm/lib/Target/ARM/ARMInstrMVE.td
283

The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc)

I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read.

695

Are the instructions correct here? Should they be s8/s16/s32?

1498

I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place.

5246

There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
2

-verify-machineinstrs is probably worth putting on most of these tests.

22

The #1 can be removed?

llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
10

Can you add test for 0 too

llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll
4

Test other VT's too, please.

simon_tatham marked 8 inline comments as done.Wed, Sep 11, 4:53 AM
simon_tatham added inline comments.
llvm/include/llvm/IR/IntrinsicsARM.td
816

Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for vrmlaldavhax), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it.

Addressed many review comments.