This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Set up infrastructure for MVE vector instructions.
ClosedPublic

Authored by simon_tatham on May 30 2019, 8:20 AM.

Details

Summary

This commit prepares the way to start adding the main collection of
MVE instructions, which operate on the 128-bit vector registers.

The most obvious thing that's needed, and the simplest, is to add the
MQPR register class, which is like the existing QPR except that it has
fewer registers in it.

The more complicated part: MVE defines a system of vector predication,
in which instructions operating on 128-bit vector registers can be
constrained to operate on only a subset of the lanes, using a system
of prefix instructions similar to the existing Thumb IT, in that you
have one prefix instruction which designates up to 4 following
instructions as subject to predication, and within that sequence, the
predicate can be inverted by means of T/E suffixes ('Then' / 'Else').

To support instructions of this type, we've added two new Tablegen
classes vpred_n and vpred_r for standard clusters of MC operands
to add to a predicated instruction. Both include a flag indicating how
the instruction is predicated at all (options are T, E and 'not
predicated'), and an input register field for the register controlling
the set of active lanes. They differ from each other in that vpred_r
also includes an input operand for the previous value of the output
register, for instructions that leave inactive lanes unchanged.
vpred_n lacks that extra operand; it will be used for instructions
that don't preserve inactive lanes in their output register (either
because inactive lanes are zeroed, as the MVE load instructions do, or
because the output register isn't a vector at all).

This commit also adds the family of prefix instructions themselves
(VPT / VPST), and all the machinery needed to work with them in
assembly and disassembly (e.g. generating the 't' and 'e' mnemonic
suffixes on disassembled instructions within a predicated block)

I've added a couple of demo instructions that derive from the new
Tablegen base classes and use those two operand clusters. The bulk of
the vector instructions will come in followup commits small enough to
be manageable. (One exception is that I've added the full version of
isMnemonicVPTPredicable in the AsmParser, because it seemed
pointless to carefully split it up.)

Diff Detail

Repository
rL LLVM

Event Timeline

simon_tatham created this revision.May 30 2019, 8:20 AM
samparker added inline comments.Jun 3 2019, 4:56 AM
llvm/lib/Target/ARM/ARMRegisterInfo.td
461 ↗(On Diff #202204)

Is this really necessary..? Maybe it would be clearer to just have the MQPR class for MVE and not touch QPR?

ostannard added inline comments.
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
504 ↗(On Diff #202204)

mve.fp implies mve.int, so one of these functions is redundant.

2114 ↗(On Diff #202204)

getCondCode already returns an ARMCC::CondCodes, so these casts aren't needed.

6317 ↗(On Diff #202204)

MQPR is a subset of QPR, so there's no point in checking both.

6457 ↗(On Diff #202204)

Why is this inverted compared to IT? Is the format of these masks written down anywhere?

6907 ↗(On Diff #202204)

Could the bit-twiddling on the VPS state mask be split into a helper function, like currentITCond?

6925 ↗(On Diff #202204)

Why do we need to check hasMVE here?

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
534 ↗(On Diff #202204)

I think this needs a comment explaining what is different about MVE.

llvm/test/MC/ARM/mve-vpt.s
9 ↗(On Diff #202204)

Are these instructions testing anything related to VPT blocks? If not, they should be moved to a separate file (and we should check the actual error which is emitted in the non-FP case).

66 ↗(On Diff #202204)

We should also check the errors which are reported for invalid instructions.

dmgreen added inline comments.Jun 4 2019, 12:44 PM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
2183 ↗(On Diff #202204)

I think this should be using MII.get(Inst.getOpcode()), if it can.

llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
535 ↗(On Diff #202204)

Does HasMVEFloatOps imply HasMVEIntegerOps?

Remastered patch to apply cleanly against current trunk.

miyuki added a subscriber: miyuki.Jun 11 2019, 5:47 AM
miyuki added inline comments.Jun 11 2019, 6:05 AM
llvm/lib/Target/ARM/MCTargetDesc/ARMMCCodeEmitter.cpp
410 ↗(On Diff #204023)

GCC complains that this function is unused. It can probably be moved to D62678.

simon_tatham marked an inline comment as done.Jun 11 2019, 7:57 AM
simon_tatham added inline comments.
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
2183 ↗(On Diff #202204)

Unfortunately, that's rather fiddly, because the context here is that we're in a method of ARMOperand, which doesn't have an MCInstrInfo conveniently accessible that I can find – the ARMAsmParser as a whole is given one at construction time, but it's not passed through to ARMOperand.

I wondered about changing the prototype of this addVPTPredROperands method so that it received the MCInstrDesc from its caller, and then that could look it up in the ambient MCInstrInfo. But the caller is the tablegenerated AsmMatcher, and that has hard-coded function call code for all the RenderMethods which expects them to conform to the same API, so that would be a huge piece of plumbing to change.

I think leaving it as it is is safe from a modules perspective. We're referring directly to the llvm::ARMInsts array, which is defined in the LLVMARMDesc library. But that library also defines the llvm::ARMMCRegisterClasses array, which we refer to constantly in this same source file. So anyone linking against this module must surely also have LLVMARMDesc available, or else they'd already have had a link error.

dmgreen added inline comments.Jun 11 2019, 8:08 AM
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
2183 ↗(On Diff #202204)

Ah, that makes sense. Thanks for checking.

simon_tatham marked an inline comment as done.Jun 12 2019, 2:26 AM
simon_tatham added inline comments.
llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
6457 ↗(On Diff #202204)

Recording the result of our investigation just now:

It's IT, not VPT, that is doing something strange. The ARMOperand coming out of the operand parsing stage has the IT mask in one format; it is translated directly into an MCOperand storing the mask in the same format; but then processInstruction translates it into an MCOperand in a different format. For example, if you feed the instruction itete eq into llvm-mc -debug -triple=thumbv8a -show-inst, you see

Parsed as: <MCInst #2921 t2IT <MCOperand Imm:0> <MCOperand Imm:5>>
        itete   eq                      @ <MCInst #2921 t2IT
                                        @  <MCOperand Imm:0>
                                        @  <MCOperand Imm:11>>

in which you can see that the mask is originally parsed into an MCOperand with value 5, and processInstruction later converts that into 11, without changing the type of structure it's stored in. So that MCOperand has different semantics depending on which part of the assembly pipeline you're in!

Apparently the reason for having two formats in the case of IT is because the transient initial format is what has to be passed to startExplicitITBlock in order to have the assembler statically track the IT block state and validate it against the suffixes on the next few instructions. So processInstruction receives the IT mask in the right format to pass to that function, and then transforms it into the second format.

The VPT code here is avoiding this complication, by using the same format for the mask throughout, in both the ARMOperand and MCOperand structures. So vpstete, for example, would have the mask value 11 from beginning to end of the parsing process. (The architectural encoding is different, but that's taken care of by getVPTMaskOpValue at the point of translation from MCOperand to an instruction encoding, which seems to me the sensible place to do it.)

I think it's very confusing for the immediate value in an MCOperand to have different semantics at different points in the code, so I propose to make IT work more like VPT rather than vice versa: I think all masks in operand structures should be stored in the same format, and processInstruction will have to do a local translation into the format that startExplicitITBlock expects.

simon_tatham marked an inline comment as done.

I think this revision addresses all the previous review comments. In
particular, IT and VPT mask handling is now identical, because
IT handling was simplified by D63219 (which also introduced that
helper function @ostannard asked for to do the mask bit extraction).

Also I've moved the tests that aren't specifically about VPT into test
files that further MVE assembly patches will add more things to. (Each
of the handful of example instructions in this patch is pulled forward
from one of those later patches, and now when those later patches
land, each one will end up being tested in the same test input files
as its siblings.)

ostannard accepted this revision.Jun 13 2019, 5:24 AM

LGTM, just one minor nit.

llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
126 ↗(On Diff #204479)

This could be a SmallVector<unsigned char, 4> to avoid some heap allocations.

This revision is now accepted and ready to land.Jun 13 2019, 5:24 AM
This revision was automatically updated to reflect the committed changes.