Page MenuHomePhabricator

[ARM] Begin adding IR intrinsics for MVE instructions.
ClosedPublic

Authored by simon_tatham on Sep 4 2019, 5:47 AM.

Details

Summary

This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new MVEVectorVTInfo records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Diff Detail

Event Timeline

simon_tatham created this revision.Sep 4 2019, 5:47 AM
dmgreen added inline comments.Sep 9 2019, 2:49 AM
llvm/include/llvm/IR/IntrinsicsARM.td
810

A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea).

820

Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction?

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
212 ↗(On Diff #218662)

Maybe call this something like "AddMVEPredicateToOps"?

llvm/lib/Target/ARM/ARMInstrMVE.td
284

The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc)

I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read.

746

Are the instructions correct here? Should they be s8/s16/s32?

1591

I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place.

5534

There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
2 ↗(On Diff #218662)

-verify-machineinstrs is probably worth putting on most of these tests.

22 ↗(On Diff #218662)

The #1 can be removed?

llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
11

Can you add test for 0 too

llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll
5

Test other VT's too, please.

simon_tatham marked 8 inline comments as done.Sep 11 2019, 4:53 AM
simon_tatham added inline comments.
llvm/include/llvm/IR/IntrinsicsARM.td
820

Not any particularly good reason. I had the vague idea of naming things after their functionality as long as it didn't turn the name into a giant essay (there's probably no hope for vrmlaldavhax), on the vague principle that that might make them easier to find if anyone later goes looking for platform-specific intrinsics that could be pulled out into standard IR representations. But it's not something I'm strongly committed to, just a harebrained idea I thought I'd throw out there to see if anyone had opinions about it.

Addressed many review comments.

dmgreen added inline comments.Sep 23 2019, 12:12 AM
llvm/include/llvm/IR/IntrinsicsARM.td
820

I would say that in this case, if it is going from something called vcvt to something called vcvt then similar name would be less confusing (presuming that you can search for arm_mve_vcvt to distinguish the intrinsic from the instruction from the builtin).

Rebased, and renamed the fltnarrow intrinsic as suggested.

This is a bit large to review in a single patch, and I don't think all the parts are necessarily interrelated. Mind pulling a few logically separable parts out into separate patches, to make what's left simpler?

Split this patch into three as requested. This one now contains only
the subset of the previous IR intrinsics that can be implemented by
Tablegen patterns. The ones using C++ are moved out to two followup
patches.

simon_tatham retitled this revision from [ARM] Add IR intrinsics for a sample of MVE instructions. to [ARM] Begin adding IR intrinsics for MVE instructions..Oct 9 2019, 6:00 AM
simon_tatham edited the summary of this revision. (Show Details)

Thanks for splitting this up.

llvm/lib/Target/ARM/ARMInstrMVE.td
729

I feel like we should come up with a style and try and stick with it.

The adds/subs below add in VT to the existing instructions and use it in the new Patterns, mixed in with the old patterns.

These ones add the intrinsics to the multiclass so the top level can include the pattern (but also has patterns outside (below) too.

I think there's value in making this somewhat structured, if we can.

2867

Little bit of indenting, please.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
5

We probably don't _need_ tests for simple instructions like this, they should be covered elsewhere (fine to leave them if you wish).

25

For the rest of the tests, at least for codegen we have tried to fill in all the combinations for type and operations (at least the legal types). It can be useful for making sure nothing is missed (here or in the future when some refactoring happens).

Whether you want to do the same thing here is up to you, or whether you think that having interesting combinations is enough (adds with v16i8, subs with v4f32 for example).

53

Do we care about what happens when there is a fp intrinsic but we don't have mve.fp? I presume this will be a fail to select or some sort of legalisation error, which is probably fine considering what is happening.

simon_tatham edited the summary of this revision. (Show Details)

Rewritten this patch to do all its jobs in the same reasonably
consistent way.

Also, I've replaced my mkpred function-oid with a system of
MVEVectorVTInfo, similar to the x86 system that @dmgreen pointed
out. I think it works out more nicely, for two reasons. Firstly, I can
make separate classes for vectors of signed and unsigned integers,
which MVE distinguishes even though LLVM IR doesn't. Secondly, I can
also have those info records contain bits and pieces that can be used
in the main instruction definition: the type suffix on the mnemonic
(.s32 or .f16 or whatever), and at least the _usual_ way that
vector types are represented in MVE instruction encodings. So now
every time I have to add an MVEVectorVTInfo as an extra template
parameter, I'll be able to remove a few existing ones that it makes
redundant, so the declarations shouldn't get too complicated.

simon_tatham edited the summary of this revision. (Show Details)Oct 15 2019, 7:50 AM

Oh, nearly forgot: I renamed the vcvt intrinsic again, to put "narrow" back into the name (it's now vcvt_narrow).

Rationale: the MVE VCVT instructions can't all be treated exactly the same way by their IR intrinsics, because conversions to a narrower element type need an extra input parameter giving the previous value of their output register, which they only overwrite half of. Non-narrowing VCVTs will have a simpler type signature without that parameter.

dmgreen added inline comments.Oct 23 2019, 2:10 AM
llvm/lib/Target/ARM/ARMInstrMVE.td
312

Maybe call these MVEv16i8? Or v16i8_t or v16i8info or something?

Otherwise it looks very useful.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25

This could probably do with a reply, one way or the other.

I was previously in the "don't mind either way" camp, now I feel more in the "why not just add them" camp, unless there is some reason not to.

simon_tatham marked 2 inline comments as done.Oct 23 2019, 6:46 AM
simon_tatham added inline comments.
llvm/lib/Target/ARM/ARMInstrMVE.td
312

I copied the `v16i8_info` naming system from the similar x86 case you pointed out to me. But I can call them something with "MVE" in the name if you prefer, sure.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25

I had intended to stop at "interesting subset of combinations", along the lines of one of each operation, and one of each type, but not the full cross product unless absolutely necessary.

(There are about 2000 of these to come in future work, so at some point adding a test for every single one won't deserve the word "just" any more!)

dmgreen added inline comments.Oct 23 2019, 8:31 AM
llvm/lib/Target/ARM/ARMInstrMVE.td
312

Ah righteo. The underscore threw me off. Whatever you think is best. MVE might not be the worst idea, just in case someone decides to do something similar in NEON.

llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
25

The add int and add float are two different instructions, same for sub, so it probably best to make sure we have at least one of each of those combinations.

I was looking for prior art of only adding subsets of the combinations for codegen, I don't immediately see anywhere where we have done that. I'm not too worried about this set of tests not catching problems now. But in the future as things are refactored (in ways that might be difficult to predict), it might be expected that the test coverage is more complete and lead to things getting missed.

I'd suggest we at least fill in the combinations for add and sub. Later on when we get to vrmlsldavhax it probably wont be as important.

simon_tatham edited the summary of this revision. (Show Details)

Renamed the info classes from vXXyY_info to MVE_vXXyY, and added
some more tests of addition. Also rebased to current master.

dmgreen accepted this revision.Oct 24 2019, 7:11 AM
dmgreen added a subscriber: samparker.

Thanks! LGTM, with just one comment that might be better left for Sam.

llvm/lib/Target/ARM/ARMInstrMVE.td
2841

Can this move into MVE_VADDSUBFMA_fp? I'm pretty sure VFMA should be fine there too. Feel free to leave that for @samparker though.

This revision is now accepted and ready to land.Oct 24 2019, 7:11 AM
This revision was automatically updated to reflect the committed changes.