Page MenuHomePhabricator

[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
ClosedPublic

Authored by simon_tatham on Jan 17 2020, 9:17 AM.

Details

Summary

Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
NEONvorrImm to a general ARMvorrImm available in MVE as well, and
similarly for NEONvbicImm.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the imm_simd system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for vmovlb.u8, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Diff Detail

Event Timeline

simon_tatham created this revision.Jan 17 2020, 9:17 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 17 2020, 9:17 AM

What is the reason that this can't be lowered in tablegen, in the same way as the VMOVimm's are?

For vbic vs vmovlb, the vmovlb does include a free register move, so may under some circumstances be slightly better. Like you say, it's mostly benign, but may be worth updating the MVE_VMOVL patterns.

Do you have any tests for what would be invalid bic values under MVE?

llvm/lib/Target/ARM/ARMISelLowering.cpp
12181

This is OK because we are passing OtherModImm to isVMOVModifiedImm, and MVE supports the same patterns as NEON?

simon_tatham marked 2 inline comments as done.Jan 20 2020, 5:43 AM

What is the reason that this can't be lowered in tablegen, in the same way as the VMOVimm's are?

In NEON, immediate VBIC is represented as a single MC instruction, which takes its immediate operand already encoded into the NEON format (8 data bits, op and cmode). That's the same format that ARMISD::VBICIMM has encoded the operand in after lowering. So you only need one tablegen pattern, which passes the immediate through unchanged between the input and output SDNode types.

In MVE, immediate VBIC is represented as four separate MC instructions, for an 8-bit immediate shifted left by 0, 8, 16 or 24 bits. Each one takes the immediate operand in the 'natural' form, i.e. the numerical value that would be combined into the vector lane and shown in assembly. For example, MVE_VBICIZ16v4i32 takes an operand such as 0xab0000 which NEON VBIC would represent as 0xab | (control bits << 8). So the C++ isel code I've written has to undo the NEON encoding and turn it back into the 'natural' immediate value plus a choice of which MVE opcode to use.

I suppose an alternative would be to rework the MC representation of MVE VBIC/VORR so that they look more like the NEON versions. I don't exactly know why MVE was done differently in the first place (the commit here has my name on it, but it was a team effort). One possibility is that the pseudo-instruction reversed forms vand and vorn might be hard to represent that way, but I don't know.

Do you have any tests for what would be invalid bic values under MVE?

True, I suppose I could provide some immediates that are valid for other VMOVModImmTypes, like 0xabff, and make sure nothing goes wrong.

llvm/lib/Target/ARM/ARMISelLowering.cpp
12181

Yes: OtherModImm only matches values of the form '8-bit number shifted left by a multiple of 8 bits', which is just what MVE VBIC and VORR take as well.

What is the reason that this can't be lowered in tablegen, in the same way as the VMOVimm's are?

In NEON, immediate VBIC is represented as a single MC instruction, which takes its immediate operand already encoded into the NEON format (8 data bits, op and cmode). That's the same format that ARMISD::VBICIMM has encoded the operand in after lowering. So you only need one tablegen pattern, which passes the immediate through unchanged between the input and output SDNode types.

In MVE, immediate VBIC is represented as four separate MC instructions, for an 8-bit immediate shifted left by 0, 8, 16 or 24 bits. Each one takes the immediate operand in the 'natural' form, i.e. the numerical value that would be combined into the vector lane and shown in assembly. For example, MVE_VBICIZ16v4i32 takes an operand such as 0xab0000 which NEON VBIC would represent as 0xab | (control bits << 8). So the C++ isel code I've written has to undo the NEON encoding and turn it back into the 'natural' immediate value plus a choice of which MVE opcode to use.

I suppose an alternative would be to rework the MC representation of MVE VBIC/VORR so that they look more like the NEON versions. I don't exactly know why MVE was done differently in the first place (the commit here has my name on it, but it was a team effort). One possibility is that the pseudo-instruction reversed forms vand and vorn might be hard to represent that way, but I don't know.

I believe that the downstream VMOVimm's were rewritten like this when the other BUILDVECTOR handling was added by DavidS. If it is possible to structure this way for BIC's too, it sounds like it might be a little cleaner.

simon_tatham marked an inline comment as done.
simon_tatham edited the summary of this revision. (Show Details)

I've revised the MC representations of VBIC and VORR as suggested, but that was a big enough patch that I've done it separately as D73205. This patch now sits on top of that one.

Changing VBIC and VORR meant I could do the isel for the unpredicated forms in pure Tablegen. But the predicated ones would still have needed C++, because the IR intrinsics would have wanted the immediate in its natural form, but by the time you generate an instruction, it has to be re-encoded as NEON. The simplest way was to stop adding new IR intrinsics, and instead encode the predicated instructions as a select. Then I still get to use isel lowering's conversion into VBICIMM/VORRIMM which does the immediate translation for me.

Adjusting the VMOVL pattern to expect the result of my modified lowering has made all those unrelated MVE codegen tests go back to the way they were before, so the new version of this patch doesn't have to change anything there.

Also added a negative llc test with an immediate that doesn't fit into VBICIMM, to prove that it gets sensibly selected as a different instruction sequence and nothing crashes.

dmgreen accepted this revision.Jan 23 2020, 1:13 AM

Looks good, from what I can tell.

I especially like the selects. We know that we have to do more work there, but adding this for more instructions would go a long way towards creating more predicated instructions (before the ability to do this in IR comes along).

This revision is now accepted and ready to land.Jan 23 2020, 1:13 AM
This revision was automatically updated to reflect the committed changes.