Page MenuHomePhabricator

[SVE] Add ISD nodes for predicated integer extend inreg operations

Authored by paulwalker-arm on Aug 7 2020, 12:20 PM.



These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Diff Detail

Event Timeline

paulwalker-arm created this revision.Aug 7 2020, 12:20 PM
paulwalker-arm requested review of this revision.Aug 7 2020, 12:20 PM
efriedma added inline comments.Aug 7 2020, 12:31 PM

The signed versions are essentially predicated versions of ISD::SIGN_EXTEND_INREG? Can we use names that reflect that?

paulwalker-arm added inline comments.Aug 7 2020, 4:44 PM

It is more than a rename but I can do that, it might even allow an easier path to lowering for scalable vector types. I do want the signed and unsigned variants to be consistent though. So, any objections to me creating a matching predicated zero_extend_inreg?

efriedma added inline comments.Aug 7 2020, 4:52 PM

That's fine. We actually have a function on SelectionDAG called getZeroExtendInReg(), so that isn't completely new terminology.

Changed new nodes to SIGN/ZERO_EXTEND_INREG_MERGE_PATHRU that closer resemble SIGN_EXTEND_INREG but using a predicate and passthru positioned as required by _MERGE_PATHRU style nodes.

paulwalker-arm retitled this revision from [SVE] Add ISD nodes for integer extend inreg operations. to [SVE] Add ISD nodes for predicated integer extend inreg operations.Aug 7 2020, 6:16 PM
This revision is now accepted and ready to land.Aug 10 2020, 10:31 AM
cameron.mcinally reopened this revision.Aug 21 2020, 2:47 PM
cameron.mcinally added a subscriber: cameron.mcinally.

These patterns might need attention. ISD::SIGN_EXTEND_INREG expects both the input and output registers to have the same type, extending the small values in place. I.e. the input is unpacked.

But the AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU patterns are expecting to explicitly change the element size. I.e. the input is packed.

I have another patch that's needed to show the problem, but legalize can produce something like this:

    t78: v16i8 = BUILD_VECTOR t217, t216, t215, t214, t213, t212, t211, t210, t209, t208, t207, t206, t205, t204, t203, t202
  t80: v16i16 = any_extend t78
t82: v16i16 = sign_extend_inreg t80, ValueType:ch:v16i1
This revision is now accepted and ready to land.Aug 21 2020, 2:47 PM

I'm not sure I understand what you think the issue is here. ISD::SIGN_EXTEND_INREG, AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU, and the SVE sxtb instruction are all the same, as far as I know. The operand and result types are the same, and the operation in each lane is independent.

By contrast, ISD::ANY_EXTEND does change the element size.

Ah, you're right. I misread the register classes.

def : SVE_InReg_Extend<nxv8i16, op, nxv8i1, nxv8i8, !cast<Instruction>(NAME # _H)>;

I was reading nxv16i8 instead of nxv8i8 for the input operand. Since it's nxv8i8, they are unpacked.

I'll have to find a way to generate the nxv8i8 type, to cast the operand to it. Will take that to another Diff though...

Eh, thinking some more, it's still a little weird:

class SVE_InReg_Extend<ValueType vt, SDPatternOperator op, ValueType pt,
                       ValueType inreg_vt, Instruction inst>
: Pat<(vt (op pt:$Pg, vt:$Src, inreg_vt, vt:$PassThru)),
      (inst $PassThru, $Pg, $Src)>;

ISD::SIGN_EXTEND_INREG has the same type for the input and output reg, and an additional VT operand to encode the input type.

AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU encodes the input type in the register, and tosses the VT operand.

Maybe it would be better to lower based on the VT operand? That way we don't need to get into the half vector type business (e.g. nxv8i8).

Perhaps I've misunderstood but based on Cameron's original message I suspect he's hit a bug because I lowered all SIGN_EXTEND_INREG operations regardless of the inreg type. This is wrong because there's no patterns for non-power-of-2 non-byte-based inreg types and thus I guess Cameron has hit a selection failure?

If so then the fix is easy enough, we should expand those cases, which looking at the common code does the correct thing of replacing with shifts. I've create D86394, even if I have misunderstood it fixes a bug anyway.

D86394 addresses a different issue, I think. Posted some new code in D85364 to expose the AArch64ISD::ZERO_EXTEND_INREG_MERGE_PASSTHRU issue.