This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate
ClosedPublic

Authored by luke on Aug 17 2023, 9:05 AM.

Download Raw Diff

Details

Reviewers

craig.topper
reames

Commits

rG976244bb845c: [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate

Summary

A rotate of 8 bits of an e16 vector in either direction is equivalent to a
byteswap, i.e. vrev8. There is a generic combine on ISD::ROT{L,R} to
canonicalize these rotations to byteswaps, but on fixed vectors they are
legalized before they have the chance to be combined. This patch teaches the
rotate vector_shuffle lowering to emit these rotations as byteswaps to match
the scalable vector behaviour.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	130 ms	x64 debian > LLVM.CodeGen/X86::bitreverse.ll
	110 ms	x64 debian > LLVM.CodeGen/X86::bswap-vector.ll
	60 ms	x64 debian > LLVM.CodeGen/X86::named-vector-shuffle-reverse.ll
	1,320 ms	x64 debian > LLVM.CodeGen/X86::oddshuffles.ll
	460 ms	x64 debian > LLVM.CodeGen/X86::pr57340.ll
		View Full Test Results (21 Failed)

Event Timeline

luke created this revision.Aug 17 2023, 9:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 9:05 AM

Herald added subscribers: jobnoorman, asb, sunshaoce and 29 others. · View Herald Transcript

luke requested review of this revision.Aug 17 2023, 9:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 9:05 AM

Herald added subscribers: llvm-commits, wangpc, eopXD, MaskRay. · View Herald Transcript

luke added a parent revision: D157417: [RISCV][SelectionDAG] Lower shuffles as bitrotates with vror.vi when possible.Aug 17 2023, 9:05 AM

Do we have reason to believe vrev8 is better than vror?

In D158195#4595901, @craig.topper wrote:

Do we have reason to believe vrev8 is better than vror?

@reames alluded to it here: https://reviews.llvm.org/D157417#inline-1529308
Tangentially, DAGCombiner canonicalises rotr/rotl to bswap anyway so this brings fixed-length vector behaviour more in line with scalable

If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll
278	How does this patch create new rotates?

Harbormaster completed remote builds in B253242: Diff 551167.Aug 17 2023, 10:48 AM

In D158195#4596052, @craig.topper wrote:

If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?

Yeah we already get vrev8 for these, DAGCombiner canonicalises them before they would be legalised to vl nodes:

define <4 x i16> @rot_via_fshr(<4 x i16> %a) {
  %res = call <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %a, <4 x i16> <i16 8, i16 8, i16 8, i16 8>)
  ret <4 x i16> %res
}

declare <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c)

define <4 x i16> @rot_via_shift(<4 x i16> %a, <4 x i16> %amt) {
  %1 = shl <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %2 = lshr <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %3 = or <4 x i16> %1, %2
  ret <4 x i16> %3
}

=== rot_via_fshr
Initial selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
        t6: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
      t7: v4i16 = rotr t4, t6
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t7, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1


Optimized lowered selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t13: v4i16 = bswap t4
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t13, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1

=== rot_via_shift
Initial selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
    t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
  t4: v4i16 = extract_subvector t2, Constant:i64<0>
    t6: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %1
  t7: v4i16 = extract_subvector t6, Constant:i64<0>
  t9: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
        t10: v4i16 = shl t4, t9
        t11: v4i16 = srl t4, t9
      t12: v4i16 = or t10, t11
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t12, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1


Optimized lowered selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t19: v4i16 = bswap t4
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t19, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll
278	Not sure how I didn't notice these. Looks like it always emitted rotates on zvbb, there's just an issue with the filecheck prefixes.

luke added inline comments.Aug 18 2023, 6:31 AM

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll
278	*It emits rotates on zvbb after D157417

Fix extraneous test diffs caused by dodgy rebase (filecheck prefixes are actually fine:
sorry for the noise)

Harbormaster completed remote builds in B253479: Diff 551495.Aug 18 2023, 8:18 AM

In D158195#4598747, @luke wrote:

In D158195#4596052, @craig.topper wrote:

If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?

Yeah we already get vrev8 for these, DAGCombiner canonicalises them before they would be legalised to vl nodes:

define <4 x i16> @rot_via_fshr(<4 x i16> %a) {
  %res = call <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %a, <4 x i16> <i16 8, i16 8, i16 8, i16 8>)
  ret <4 x i16> %res
}

declare <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c)

define <4 x i16> @rot_via_shift(<4 x i16> %a, <4 x i16> %amt) {
  %1 = shl <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %2 = lshr <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %3 = or <4 x i16> %1, %2
  ret <4 x i16> %3
}

=== rot_via_fshr
Initial selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
        t6: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
      t7: v4i16 = rotr t4, t6
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t7, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1


Optimized lowered selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t13: v4i16 = bswap t4
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t13, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1

=== rot_via_shift
Initial selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
    t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
  t4: v4i16 = extract_subvector t2, Constant:i64<0>
    t6: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %1
  t7: v4i16 = extract_subvector t6, Constant:i64<0>
  t9: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
        t10: v4i16 = shl t4, t9
        t11: v4i16 = srl t4, t9
      t12: v4i16 = or t10, t11
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t12, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1


Optimized lowered selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t19: v4i16 = bswap t4
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t19, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1

How ugly would it be to do it as a special case during the shuffle lowering instead?

In D158195#4595909, @luke wrote:

In D158195#4595901, @craig.topper wrote:

Do we have reason to believe vrev8 is better than vror?

@reames alluded to it here: https://reviews.llvm.org/D157417#inline-1529308
Tangentially, DAGCombiner canonicalises rotr/rotl to bswap anyway so this brings fixed-length vector behaviour more in line with scalable

For context, I do *not* think there's a performance difference. This was mostly a canonicalization thing.

In D158195#4599594, @craig.topper wrote:

In D158195#4598747, @luke wrote:

In D158195#4596052, @craig.topper wrote:

If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?

Yeah we already get vrev8 for these, DAGCombiner canonicalises them before they would be legalised to vl nodes:

define <4 x i16> @rot_via_fshr(<4 x i16> %a) {
  %res = call <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %a, <4 x i16> <i16 8, i16 8, i16 8, i16 8>)
  ret <4 x i16> %res
}

declare <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c)

define <4 x i16> @rot_via_shift(<4 x i16> %a, <4 x i16> %amt) {
  %1 = shl <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %2 = lshr <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
  %3 = or <4 x i16> %1, %2
  ret <4 x i16> %3
}

=== rot_via_fshr
Initial selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
        t6: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
      t7: v4i16 = rotr t4, t6
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t7, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1


Optimized lowered selection DAG: %bb.0 'rot_via_fshr:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t13: v4i16 = bswap t4
    t9: nxv2i16 = insert_subvector undef:nxv2i16, t13, Constant:i64<0>
  t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
  t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1

=== rot_via_shift
Initial selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 18 nodes:
  t0: ch,glue = EntryToken
    t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
  t4: v4i16 = extract_subvector t2, Constant:i64<0>
    t6: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %1
  t7: v4i16 = extract_subvector t6, Constant:i64<0>
  t9: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
        t10: v4i16 = shl t4, t9
        t11: v4i16 = srl t4, t9
      t12: v4i16 = or t10, t11
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t12, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1


Optimized lowered selection DAG: %bb.0 'rot_via_shift:'
SelectionDAG has 11 nodes:
  t0: ch,glue = EntryToken
          t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
        t4: v4i16 = extract_subvector t2, Constant:i64<0>
      t19: v4i16 = bswap t4
    t14: nxv2i16 = insert_subvector undef:nxv2i16, t19, Constant:i64<0>
  t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
  t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1

How ugly would it be to do it as a special case during the shuffle lowering instead?

Another possibility would be a RISCV shuffle to bswap combine before lowering, but having this be a special case in the lowering doesn't seem bad to me.

Since these rotates are only emitted in one place during lowering, remove the combine and just
handle it there.

luke retitled this revision from [RISCV] Combine (vrot{l,r} vxi16, 8) -> vrev8 to [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate.Aug 21 2023, 4:22 AM

luke edited the summary of this revision. (Show Details)

Herald added a subscriber: alextsao1999. · View Herald TranscriptAug 21 2023, 4:22 AM

Harbormaster completed remote builds in B253806: Diff 551964.Aug 21 2023, 4:29 AM

LGTM

This revision is now accepted and ready to land.Aug 21 2023, 10:31 AM

craig.topper added inline comments.Aug 21 2023, 10:33 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4213	Do we handle FP shuffles here? If so should this be f16 too?

luke added inline comments.Aug 22 2023, 3:10 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4213	I don't think we can ever get an f16 element type in RotateVT, since it's always going to be larger than the original element type. E.g. i16 has to come from a shuffle of i8s

Closed by commit rG976244bb845c: [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate (authored by luke). · Explain WhyAug 30 2023, 3:02 AM

This revision was automatically updated to reflect the committed changes.

luke added a commit: rG976244bb845c: [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

18 lines

test/

CodeGen/

RISCV/

rvv/

fixed-vectors-shuffle-reverse.ll

100 lines

fixed-vectors-shuffle-rotate.ll

4 lines

Diff 551167

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,204 Lines • ▼ Show 20 Lines	SDValue RotateAmtSplat = DAG.getNode(
RISCVISD::VMV_V_X_VL, DL, ContainerVT, DAG.getUNDEF(ContainerVT),		RISCVISD::VMV_V_X_VL, DL, ContainerVT, DAG.getUNDEF(ContainerVT),
DAG.getConstant(RotateAmt, DL, Subtarget.getXLenVT()), VL);		DAG.getConstant(RotateAmt, DL, Subtarget.getXLenVT()), VL);
RotateAmtSplat =		RotateAmtSplat =
convertFromScalableVector(RotateVT, RotateAmtSplat, DAG, Subtarget);		convertFromScalableVector(RotateVT, RotateAmtSplat, DAG, Subtarget);

SDValue Rotate =		SDValue Rotate =
DAG.getNode(ISD::ROTL, DL, RotateVT,		DAG.getNode(ISD::ROTL, DL, RotateVT,
DAG.getBitcast(RotateVT, SVN->getOperand(0)), RotateAmtSplat);		DAG.getBitcast(RotateVT, SVN->getOperand(0)), RotateAmtSplat);
return DAG.getBitcast(VT, Rotate);		return DAG.getBitcast(VT, Rotate);
		craig.topperUnsubmitted Not Done Reply Inline Actions Do we handle FP shuffles here? If so should this be f16 too? craig.topper: Do we handle FP shuffles here? If so should this be f16 too?
		lukeAuthorUnsubmitted Done Reply Inline Actions I don't think we can ever get an f16 element type in RotateVT, since it's always going to be larger than the original element type. E.g. i16 has to come from a shuffle of i8s luke: I don't think we can ever get an f16 element type in RotateVT, since it's always going to be…
}		}

static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,		static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
SDValue V1 = Op.getOperand(0);		SDValue V1 = Op.getOperand(0);
SDValue V2 = Op.getOperand(1);		SDValue V2 = Op.getOperand(1);
SDLoc DL(Op);		SDLoc DL(Op);
MVT XLenVT = Subtarget.getXLenVT();		MVT XLenVT = Subtarget.getXLenVT();
▲ Show 20 Lines • Show All 9,437 Lines • ▼ Show 20 Lines	if (auto Gather = matchSplatAsGather(N->getOperand(0), VT.getSimpleVT(), N,
DAG, Subtarget))		DAG, Subtarget))
return Gather;		return Gather;
break;		break;
}		}
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
if (SDValue V = performCONCAT_VECTORSCombine(N, DAG, Subtarget, *this))		if (SDValue V = performCONCAT_VECTORSCombine(N, DAG, Subtarget, *this))
return V;		return V;
break;		break;
		case RISCVISD::ROTR_VL:
		case RISCVISD::ROTL_VL: {
		// An i16 bitrotate of 8 in either direction is equivalent to a swapping the
		// bytes (bswap). This is normally caught by a generic ISD::ROT{L,R}
		// combine, but on fixed vectors they are legalized before they can be
		// combined, so handle it later here too.
		EVT VT = N->getValueType(0);
		if (VT.getScalarType() == MVT::i16 &&
		// The splat of 8 will have been legalized to a vmv_v_x_vl.
		N->getOperand(1).getOpcode() == RISCVISD::VMV_V_X_VL &&
		N->getOperand(1).getOperand(0).isUndef() &&
		isa<ConstantSDNode>(N->getOperand(1).getOperand(1)) &&
		N->getOperand(1).getConstantOperandVal(1) == 8) {
		return DAG.getNode(RISCVISD::BSWAP_VL, SDLoc(N), VT, N->getOperand(0),
		N->getOperand(2), N->getOperand(3), N->getOperand(4));
		}
		break;
		}
case RISCVISD::VMV_V_X_VL: {		case RISCVISD::VMV_V_X_VL: {
// Tail agnostic VMV.V.X only demands the vector element bitwidth from the		// Tail agnostic VMV.V.X only demands the vector element bitwidth from the
// scalar input.		// scalar input.
unsigned ScalarSize = N->getOperand(1).getValueSizeInBits();		unsigned ScalarSize = N->getOperand(1).getValueSizeInBits();
unsigned EltWidth = N->getValueType(0).getScalarSizeInBits();		unsigned EltWidth = N->getValueType(0).getScalarSizeInBits();
if (ScalarSize > EltWidth && N->getOperand(0).isUndef())		if (ScalarSize > EltWidth && N->getOperand(0).isUndef())
if (SimplifyDemandedLowBitsHelper(1, EltWidth))		if (SimplifyDemandedLowBitsHelper(1, EltWidth))
return SDValue(N, 0);		return SDValue(N, 0);
▲ Show 20 Lines • Show All 4,318 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: reverse_v1i8:			; CHECK-LABEL: reverse_v1i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <1 x i8> @llvm.experimental.vector.reverse.v1i8(<1 x i8> %a)			%res = call <1 x i8> @llvm.experimental.vector.reverse.v1i8(<1 x i8> %a)
	ret <1 x i8> %res			ret <1 x i8> %res
	}			}

	define <2 x i8> @reverse_v2i8(<2 x i8> %a) {			define <2 x i8> @reverse_v2i8(<2 x i8> %a) {
	; CHECK-LABEL: reverse_v2i8:			; NO-ZVBB-LABEL: reverse_v2i8:
	; CHECK: # %bb.0:			; NO-ZVBB: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 2, e8, mf8, ta, ma			; NO-ZVBB-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
	; CHECK-NEXT: vslidedown.vi v9, v8, 1			; NO-ZVBB-NEXT: vslidedown.vi v9, v8, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; NO-ZVBB-NEXT: vslideup.vi v9, v8, 1
	; CHECK-NEXT: vmv1r.v v8, v9			; NO-ZVBB-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; NO-ZVBB-NEXT: ret
				;
				; ZVBB-LABEL: reverse_v2i8:
				; ZVBB: # %bb.0:
				; ZVBB-NEXT: vsetivli zero, 1, e16, mf4, ta, ma
				; ZVBB-NEXT: vrev8.v v8, v8
				; ZVBB-NEXT: ret
	%res = call <2 x i8> @llvm.experimental.vector.reverse.v2i8(<2 x i8> %a)			%res = call <2 x i8> @llvm.experimental.vector.reverse.v2i8(<2 x i8> %a)
	ret <2 x i8> %res			ret <2 x i8> %res
	}			}

	define <4 x i8> @reverse_v4i8(<4 x i8> %a) {			define <4 x i8> @reverse_v4i8(<4 x i8> %a) {
	; CHECK-LABEL: reverse_v4i8:			; CHECK-LABEL: reverse_v4i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma			; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: reverse_v1i16:			; CHECK-LABEL: reverse_v1i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <1 x i16> @llvm.experimental.vector.reverse.v1i16(<1 x i16> %a)			%res = call <1 x i16> @llvm.experimental.vector.reverse.v1i16(<1 x i16> %a)
	ret <1 x i16> %res			ret <1 x i16> %res
	}			}

	define <2 x i16> @reverse_v2i16(<2 x i16> %a) {			define <2 x i16> @reverse_v2i16(<2 x i16> %a) {
	; CHECK-LABEL: reverse_v2i16:			; NO-ZVBB-LABEL: reverse_v2i16:
	; CHECK: # %bb.0:			; NO-ZVBB: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma			; NO-ZVBB-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
	; CHECK-NEXT: vslidedown.vi v9, v8, 1			; NO-ZVBB-NEXT: vslidedown.vi v9, v8, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; NO-ZVBB-NEXT: vslideup.vi v9, v8, 1
	; CHECK-NEXT: vmv1r.v v8, v9			; NO-ZVBB-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; NO-ZVBB-NEXT: ret
				;
				; ZVBB-LABEL: reverse_v2i16:
				; ZVBB: # %bb.0:
				; ZVBB-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
				; ZVBB-NEXT: vror.vi v8, v8, 16
				craig.topperUnsubmitted Done Reply Inline Actions How does this patch create new rotates? craig.topper: How does this patch create new rotates?
				lukeAuthorUnsubmitted Done Reply Inline Actions Not sure how I didn't notice these. Looks like it always emitted rotates on zvbb, there's just an issue with the filecheck prefixes. luke: Not sure how I didn't notice these. Looks like it always emitted rotates on zvbb, there's just…
				lukeAuthorUnsubmitted Done Reply Inline Actions It emits rotates on zvbb after D157417 luke:* *It emits rotates on zvbb after D157417
				; ZVBB-NEXT: ret
	%res = call <2 x i16> @llvm.experimental.vector.reverse.v2i16(<2 x i16> %a)			%res = call <2 x i16> @llvm.experimental.vector.reverse.v2i16(<2 x i16> %a)
	ret <2 x i16> %res			ret <2 x i16> %res
	}			}

	define <4 x i16> @reverse_v4i16(<4 x i16> %a) {			define <4 x i16> @reverse_v4i16(<4 x i16> %a) {
	; CHECK-LABEL: reverse_v4i16:			; CHECK-LABEL: reverse_v4i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma			; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: reverse_v1i32:			; CHECK-LABEL: reverse_v1i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <1 x i32> @llvm.experimental.vector.reverse.v1i32(<1 x i32> %a)			%res = call <1 x i32> @llvm.experimental.vector.reverse.v1i32(<1 x i32> %a)
	ret <1 x i32> %res			ret <1 x i32> %res
	}			}

	define <2 x i32> @reverse_v2i32(<2 x i32> %a) {			define <2 x i32> @reverse_v2i32(<2 x i32> %a) {
	; CHECK-LABEL: reverse_v2i32:			; NO-ZVBB-LABEL: reverse_v2i32:
	; CHECK: # %bb.0:			; NO-ZVBB: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma			; NO-ZVBB-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
	; CHECK-NEXT: vslidedown.vi v9, v8, 1			; NO-ZVBB-NEXT: vslidedown.vi v9, v8, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; NO-ZVBB-NEXT: vslideup.vi v9, v8, 1
	; CHECK-NEXT: vmv1r.v v8, v9			; NO-ZVBB-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; NO-ZVBB-NEXT: ret
				;
				; ZVBB-LABEL: reverse_v2i32:
				; ZVBB: # %bb.0:
				; ZVBB-NEXT: vsetivli zero, 1, e64, m1, ta, ma
				; ZVBB-NEXT: vror.vi v8, v8, 32
				; ZVBB-NEXT: ret
	%res = call <2 x i32> @llvm.experimental.vector.reverse.v2i32(<2 x i32> %a)			%res = call <2 x i32> @llvm.experimental.vector.reverse.v2i32(<2 x i32> %a)
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

	define <4 x i32> @reverse_v4i32(<4 x i32> %a) {			define <4 x i32> @reverse_v4i32(<4 x i32> %a) {
	; CHECK-LABEL: reverse_v4i32:			; CHECK-LABEL: reverse_v4i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma			; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
	▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: reverse_v1f16:			; CHECK-LABEL: reverse_v1f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <1 x half> @llvm.experimental.vector.reverse.v1f16(<1 x half> %a)			%res = call <1 x half> @llvm.experimental.vector.reverse.v1f16(<1 x half> %a)
	ret <1 x half> %res			ret <1 x half> %res
	}			}

	define <2 x half> @reverse_v2f16(<2 x half> %a) {			define <2 x half> @reverse_v2f16(<2 x half> %a) {
	; CHECK-LABEL: reverse_v2f16:			; NO-ZVBB-LABEL: reverse_v2f16:
	; CHECK: # %bb.0:			; NO-ZVBB: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma			; NO-ZVBB-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
	; CHECK-NEXT: vslidedown.vi v9, v8, 1			; NO-ZVBB-NEXT: vslidedown.vi v9, v8, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; NO-ZVBB-NEXT: vslideup.vi v9, v8, 1
	; CHECK-NEXT: vmv1r.v v8, v9			; NO-ZVBB-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; NO-ZVBB-NEXT: ret
				;
				; ZVBB-LABEL: reverse_v2f16:
				; ZVBB: # %bb.0:
				; ZVBB-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
				; ZVBB-NEXT: vror.vi v8, v8, 16
				; ZVBB-NEXT: ret
	%res = call <2 x half> @llvm.experimental.vector.reverse.v2f16(<2 x half> %a)			%res = call <2 x half> @llvm.experimental.vector.reverse.v2f16(<2 x half> %a)
	ret <2 x half> %res			ret <2 x half> %res
	}			}

	define <4 x half> @reverse_v4f16(<4 x half> %a) {			define <4 x half> @reverse_v4f16(<4 x half> %a) {
	; CHECK-LABEL: reverse_v4f16:			; CHECK-LABEL: reverse_v4f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma			; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: reverse_v1f32:			; CHECK-LABEL: reverse_v1f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <1 x float> @llvm.experimental.vector.reverse.v1f32(<1 x float> %a)			%res = call <1 x float> @llvm.experimental.vector.reverse.v1f32(<1 x float> %a)
	ret <1 x float> %res			ret <1 x float> %res
	}			}

	define <2 x float> @reverse_v2f32(<2 x float> %a) {			define <2 x float> @reverse_v2f32(<2 x float> %a) {
	; CHECK-LABEL: reverse_v2f32:			; NO-ZVBB-LABEL: reverse_v2f32:
	; CHECK: # %bb.0:			; NO-ZVBB: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma			; NO-ZVBB-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
	; CHECK-NEXT: vslidedown.vi v9, v8, 1			; NO-ZVBB-NEXT: vslidedown.vi v9, v8, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; NO-ZVBB-NEXT: vslideup.vi v9, v8, 1
	; CHECK-NEXT: vmv1r.v v8, v9			; NO-ZVBB-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; NO-ZVBB-NEXT: ret
				;
				; ZVBB-LABEL: reverse_v2f32:
				; ZVBB: # %bb.0:
				; ZVBB-NEXT: vsetivli zero, 1, e64, m1, ta, ma
				; ZVBB-NEXT: vror.vi v8, v8, 32
				; ZVBB-NEXT: ret
	%res = call <2 x float> @llvm.experimental.vector.reverse.v2f32(<2 x float> %a)			%res = call <2 x float> @llvm.experimental.vector.reverse.v2f32(<2 x float> %a)
	ret <2 x float> %res			ret <2 x float> %res
	}			}

	define <4 x float> @reverse_v4f32(<4 x float> %a) {			define <4 x float> @reverse_v4f32(<4 x float> %a) {
	; CHECK-LABEL: reverse_v4f32:			; CHECK-LABEL: reverse_v4f32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma			; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
	▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-rotate.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vle8.v v10, (a0)			; CHECK-NEXT: vle8.v v10, (a0)
	; CHECK-NEXT: vrgather.vv v9, v8, v10			; CHECK-NEXT: vrgather.vv v9, v8, v10
	; CHECK-NEXT: vmv1r.v v8, v9			; CHECK-NEXT: vmv1r.v v8, v9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; ZVBB_V-LABEL: shuffle_v8i8_as_i16:			; ZVBB_V-LABEL: shuffle_v8i8_as_i16:
	; ZVBB_V: # %bb.0:			; ZVBB_V: # %bb.0:
	; ZVBB_V-NEXT: vsetivli zero, 4, e16, mf2, ta, ma			; ZVBB_V-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
	; ZVBB_V-NEXT: vror.vi v8, v8, 8			; ZVBB_V-NEXT: vrev8.v v8, v8
	; ZVBB_V-NEXT: ret			; ZVBB_V-NEXT: ret
	;			;
	; ZVBB_ZVE32X-LABEL: shuffle_v8i8_as_i16:			; ZVBB_ZVE32X-LABEL: shuffle_v8i8_as_i16:
	; ZVBB_ZVE32X: # %bb.0:			; ZVBB_ZVE32X: # %bb.0:
	; ZVBB_ZVE32X-NEXT: vsetivli zero, 4, e16, m2, ta, ma			; ZVBB_ZVE32X-NEXT: vsetivli zero, 4, e16, m2, ta, ma
	; ZVBB_ZVE32X-NEXT: vror.vi v8, v8, 8			; ZVBB_ZVE32X-NEXT: vrev8.v v8, v8
	; ZVBB_ZVE32X-NEXT: ret			; ZVBB_ZVE32X-NEXT: ret
	%shuffle = shufflevector <8 x i8> %v, <8 x i8> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>			%shuffle = shufflevector <8 x i8> %v, <8 x i8> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>
	ret <8 x i8> %shuffle			ret <8 x i8> %shuffle
	}			}

	define <8 x i8> @shuffle_v8i8_as_i32_8(<8 x i8> %v) {			define <8 x i8> @shuffle_v8i8_as_i32_8(<8 x i8> %v) {
	; CHECK-LABEL: shuffle_v8i8_as_i32_8:			; CHECK-LABEL: shuffle_v8i8_as_i32_8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	▲ Show 20 Lines • Show All 548 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotateClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 551167

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-rotate.ll

[RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate
ClosedPublic