This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Combine funnel shifts to AArch64 rotate opcodes.
AbandonedPublic

Authored by aemerson on Mar 24 2021, 10:19 AM.

Download Raw Diff

Details

Reviewers

paquette
arsenm
foad

Summary

Adds AArch64::G_ROR opcode to allow us to import patterns for selection.

This fixes the 0.5% size regression on ClamAV introduced when G_FSHL/G_FSHR started being lowered instead of falling back.

The 0.7% regression on consumer-typeset is still remaining.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aemerson created this revision.Mar 24 2021, 10:19 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls, rovka. · View Herald TranscriptMar 24 2021, 10:19 AM

aemerson requested review of this revision.Mar 24 2021, 10:19 AM

Herald added a subscriber: wdng. · View Herald TranscriptMar 24 2021, 10:19 AM

@arsenm @foad Is it worth making this rotate op generic? I'm happy to keep it in AArch64, but wasn't sure what the overall plan was for optimizing funnels to rotates.

In D99281#2648193, @aemerson wrote:

@arsenm @foad Is it worth making this rotate op generic? I'm happy to keep it in AArch64, but wasn't sure what the overall plan was for optimizing funnels to rotates.

The DAG has rotates, so probably

There are a bunch of rotate combines in the DAGCombiner, so it would make sense to have a generic G_ROR and G_ROL. Looks like some low-hanging fruit.

  // fold (or (shl x, C1), (srl x, C2)) -> (rotl x, C1)

  // fold (or (shl x, C1), (srl x, C2)) -> (rotr x, C2)

  // fold (xor (shl 1, x), -1) -> (rotl ~1, x)

  // (or (and (shl (A, 8)), 0xff00ff00), (and (srl (A, 8)), 0x00ff00ff)) -> (rotr (bswap A), 16)

  // fold (or (shl x, (*ext y)),
  //          (srl x, (*ext (sub 32, y)))) ->
  //   (rotl x, y) or (rotr x, (sub 32, y))

  // fold (or (shl x, (*ext (sub 32, y))),
  //          (srl x, (*ext y))) ->
  //   (rotr x, y) or (rotl x, (sub 32, y))
  
... and so on ...

Harbormaster completed remote builds in B95525: Diff 333035.Mar 24 2021, 6:56 PM

aemerson mentioned this in D99383: [GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates.Mar 25 2021, 2:54 PM

Implementing as generic operations in other patches.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

GlobalISel/

CombinerHelper.h

3 lines

lib/

Target/

AArch64/

AArch64Combine.td

11 lines

AArch64InstrGISel.td

8 lines

GISel/

AArch64PreLegalizerCombiner.cpp

44 lines

test/

CodeGen/

AArch64/

GlobalISel/

prelegalizercombiner-funnel-shifts-to-rotates.mir

151 lines

select-rotates.mir

73 lines

Diff 333035

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h

Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines	public:

bool matchExtractAllEltsFromBuildVector(		bool matchExtractAllEltsFromBuildVector(
MachineInstr &MI,		MachineInstr &MI,
SmallVectorImpl<std::pair<Register, MachineInstr *>> &MatchInfo);		SmallVectorImpl<std::pair<Register, MachineInstr *>> &MatchInfo);
void applyExtractAllEltsFromBuildVector(		void applyExtractAllEltsFromBuildVector(
MachineInstr &MI,		MachineInstr &MI,
SmallVectorImpl<std::pair<Register, MachineInstr *>> &MatchInfo);		SmallVectorImpl<std::pair<Register, MachineInstr *>> &MatchInfo);

		bool matchFunnelShiftToRotate(MachineInstr &MI);
		void applyFunnelShiftToRotate(MachineInstr &MI);

/// Try to transform \p MI by using all of the above		/// Try to transform \p MI by using all of the above
/// combine functions. Returns true if changed.		/// combine functions. Returns true if changed.
bool tryCombine(MachineInstr &MI);		bool tryCombine(MachineInstr &MI);

private:		private:
// Memcpy family optimization helpers.		// Memcpy family optimization helpers.
bool optimizeMemcpy(MachineInstr &MI, Register Dst, Register Src,		bool optimizeMemcpy(MachineInstr &MI, Register Dst, Register Src,
unsigned KnownLen, Align DstAlign, Align SrcAlign,		unsigned KnownLen, Align DstAlign, Align SrcAlign,
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Combine.td

	Show All 27 Lines
	def fold_global_offset_matchdata : GIDefMatchData<"std::pair<uint64_t, uint64_t>">;			def fold_global_offset_matchdata : GIDefMatchData<"std::pair<uint64_t, uint64_t>">;
	def fold_global_offset : GICombineRule<			def fold_global_offset : GICombineRule<
	(defs root:$root, fold_global_offset_matchdata:$matchinfo),			(defs root:$root, fold_global_offset_matchdata:$matchinfo),
	(match (wip_match_opcode G_GLOBAL_VALUE):$root,			(match (wip_match_opcode G_GLOBAL_VALUE):$root,
	[{ return matchFoldGlobalOffset(*${root}, MRI, ${matchinfo}); }]),			[{ return matchFoldGlobalOffset(*${root}, MRI, ${matchinfo}); }]),
	(apply [{ return applyFoldGlobalOffset(*${root}, MRI, B, Observer, ${matchinfo});}])			(apply [{ return applyFoldGlobalOffset(*${root}, MRI, B, Observer, ${matchinfo});}])
	>;			>;

				def funnel_shift_to_rotate : GICombineRule<
				(defs root:$root),
				(match (wip_match_opcode G_FSHL, G_FSHR):$root,
				[{ return matchFunnelShiftToRotate(*${root}, MRI); }]),
				(apply [{ applyFunnelShiftToRotate(*${root}, MRI, B, Observer); }])
				>;


	def AArch64PreLegalizerCombinerHelper: GICombinerHelper<			def AArch64PreLegalizerCombinerHelper: GICombinerHelper<
	"AArch64GenPreLegalizerCombinerHelper", [all_combines,			"AArch64GenPreLegalizerCombinerHelper", [all_combines,
	fconstant_to_constant,			fconstant_to_constant,
	icmp_redundant_trunc,			icmp_redundant_trunc,
	fold_global_offset]> {			fold_global_offset,
				funnel_shift_to_rotate]> {
	let DisableRuleOption = "aarch64prelegalizercombiner-disable-rule";			let DisableRuleOption = "aarch64prelegalizercombiner-disable-rule";
	let StateClass = "AArch64PreLegalizerCombinerHelperState";			let StateClass = "AArch64PreLegalizerCombinerHelperState";
	let AdditionalArguments = [];			let AdditionalArguments = [];
	}			}

	// Matchdata for combines which replace a G_SHUFFLE_VECTOR with a			// Matchdata for combines which replace a G_SHUFFLE_VECTOR with a
	// target-specific opcode.			// target-specific opcode.
	def shuffle_matchdata : GIDefMatchData<"ShuffleVectorPseudo">;			def shuffle_matchdata : GIDefMatchData<"ShuffleVectorPseudo">;
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrGISel.td

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	def G_SITOF : AArch64GenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$src);		let InOperandList = (ins type0:$src);
}		}
def G_UITOF : AArch64GenericInstruction {		def G_UITOF : AArch64GenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$src);		let InOperandList = (ins type0:$src);
}		}

		// Represents rotate right.
		def G_ROR : AArch64GenericInstruction {
		let OutOperandList = (outs type0:$dst);
		let InOperandList = (ins type0:$src1, type0:$src2);
		let hasSideEffects = 0;
		}

def : GINodeEquiv<G_REV16, AArch64rev16>;		def : GINodeEquiv<G_REV16, AArch64rev16>;
def : GINodeEquiv<G_REV32, AArch64rev32>;		def : GINodeEquiv<G_REV32, AArch64rev32>;
def : GINodeEquiv<G_REV64, AArch64rev64>;		def : GINodeEquiv<G_REV64, AArch64rev64>;
def : GINodeEquiv<G_UZP1, AArch64uzp1>;		def : GINodeEquiv<G_UZP1, AArch64uzp1>;
def : GINodeEquiv<G_UZP2, AArch64uzp2>;		def : GINodeEquiv<G_UZP2, AArch64uzp2>;
def : GINodeEquiv<G_ZIP1, AArch64zip1>;		def : GINodeEquiv<G_ZIP1, AArch64zip1>;
def : GINodeEquiv<G_ZIP2, AArch64zip2>;		def : GINodeEquiv<G_ZIP2, AArch64zip2>;
def : GINodeEquiv<G_DUP, AArch64dup>;		def : GINodeEquiv<G_DUP, AArch64dup>;
def : GINodeEquiv<G_DUPLANE8, AArch64duplane8>;		def : GINodeEquiv<G_DUPLANE8, AArch64duplane8>;
def : GINodeEquiv<G_DUPLANE16, AArch64duplane16>;		def : GINodeEquiv<G_DUPLANE16, AArch64duplane16>;
def : GINodeEquiv<G_DUPLANE32, AArch64duplane32>;		def : GINodeEquiv<G_DUPLANE32, AArch64duplane32>;
def : GINodeEquiv<G_DUPLANE64, AArch64duplane64>;		def : GINodeEquiv<G_DUPLANE64, AArch64duplane64>;
def : GINodeEquiv<G_TRN1, AArch64trn1>;		def : GINodeEquiv<G_TRN1, AArch64trn1>;
def : GINodeEquiv<G_TRN2, AArch64trn2>;		def : GINodeEquiv<G_TRN2, AArch64trn2>;
def : GINodeEquiv<G_EXT, AArch64ext>;		def : GINodeEquiv<G_EXT, AArch64ext>;
def : GINodeEquiv<G_VASHR, AArch64vashr>;		def : GINodeEquiv<G_VASHR, AArch64vashr>;
def : GINodeEquiv<G_VLSHR, AArch64vlshr>;		def : GINodeEquiv<G_VLSHR, AArch64vlshr>;
def : GINodeEquiv<G_SITOF, AArch64sitof>;		def : GINodeEquiv<G_SITOF, AArch64sitof>;
def : GINodeEquiv<G_UITOF, AArch64uitof>;		def : GINodeEquiv<G_UITOF, AArch64uitof>;
		def : GINodeEquiv<G_ROR, rotr>;

def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;		def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;

// These are patterns that we only use for GlobalISel via the importer.		// These are patterns that we only use for GlobalISel via the importer.
def : Pat<(f32 (fadd (vector_extract (v2f32 FPR64:$Rn), (i64 0)),		def : Pat<(f32 (fadd (vector_extract (v2f32 FPR64:$Rn), (i64 0)),
(vector_extract (v2f32 FPR64:$Rn), (i64 1)))),		(vector_extract (v2f32 FPR64:$Rn), (i64 1)))),
(f32 (FADDPv2i32p (v2f32 FPR64:$Rn)))>;		(f32 (FADDPv2i32p (v2f32 FPR64:$Rn)))>;

llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	static bool applyFoldGlobalOffset(MachineInstr &MI, MachineRegisterInfo &MRI,
MI.getOperand(0).setReg(NewGVDst);		MI.getOperand(0).setReg(NewGVDst);
Observer.changedInstr(MI);		Observer.changedInstr(MI);
B.buildPtrAdd(		B.buildPtrAdd(
Dst, NewGVDst,		Dst, NewGVDst,
B.buildConstant(LLT::scalar(64), -static_cast<int64_t>(MinOffset)));		B.buildConstant(LLT::scalar(64), -static_cast<int64_t>(MinOffset)));
return true;		return true;
}		}

		/// Match a scalar G_FSHR that can be turned into an AArch64::ROR.
		static bool matchFunnelShiftToRotate(MachineInstr &MI,
		MachineRegisterInfo &MRI) {
		unsigned Opc = MI.getOpcode();
		assert(Opc == TargetOpcode::G_FSHL \|\| Opc == TargetOpcode::G_FSHR);
		Register X = MI.getOperand(1).getReg();
		Register Y = MI.getOperand(2).getReg();
		LLT Ty = MRI.getType(X);
		if (!Ty.isScalar())
		return false;
		unsigned Size = Ty.getSizeInBits();
		if (Size == 32 \|\| Size == 64)
		return X == Y;
		return false; // Illegal size for ROR.
		}

		static void applyFunnelShiftToRotate(MachineInstr &MI,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static void applyFunnelShiftToRotate(MachineInstr &MI, - MachineRegisterInfo &MRI, - MachineIRBuilder &B, - GISelChangeObserver &Observer) { +static void applyFunnelShiftToRotate(MachineInstr &MI, MachineRegisterInfo &MRI, + MachineIRBuilder &B, + GISelChangeObserver &Observer) { Lint: Pre-merge checks: clang-format: please reformat the code ``` -static void applyFunnelShiftToRotate(MachineInstr…
		MachineRegisterInfo &MRI,
		MachineIRBuilder &B,
		GISelChangeObserver &Observer) {
		unsigned Opc = MI.getOpcode();
		assert(Opc == TargetOpcode::G_FSHL \|\| Opc == TargetOpcode::G_FSHR);
		bool IsFSHL = Opc == TargetOpcode::G_FSHL;
		Register ShiftReg = MI.getOperand(3).getReg();
		LLT ShiftTy = MRI.getType(ShiftReg);
		B.setInstrAndDebugLoc(MI);
		// For rotate-left, we can negate the shift and use ROR.
		if (IsFSHL) {
		auto Neg = B.buildSub(ShiftTy, B.buildConstant(ShiftTy, 0), ShiftReg);
		ShiftReg = Neg.getReg(0);
		}
		Observer.changingInstr(MI);
		MI.setDesc(B.getTII().get(AArch64::G_ROR));
		MI.RemoveOperand(1);
		// If we have a 32 bit shift, then extend the amount to 64b for selection.
		if (ShiftTy.getSizeInBits() == 32) {
		auto Ext = B.buildSExt(LLT::scalar(64), ShiftReg);
		MI.getOperand(2).setReg(Ext.getReg(0));
		} else {
		MI.getOperand(2).setReg(ShiftReg);
		}
		Observer.changedInstr(MI);
		}

class AArch64PreLegalizerCombinerHelperState {		class AArch64PreLegalizerCombinerHelperState {
protected:		protected:
CombinerHelper &Helper;		CombinerHelper &Helper;

public:		public:
AArch64PreLegalizerCombinerHelperState(CombinerHelper &Helper)		AArch64PreLegalizerCombinerHelperState(CombinerHelper &Helper)
: Helper(Helper) {}		: Helper(Helper) {}
};		};
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-funnel-shifts-to-rotates.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple aarch64-apple-ios -run-pass=aarch64-prelegalizer-combiner %s -o - -verify-machineinstrs \| FileCheck %s

				# Tests that we combine funnel shifts to AArch64-specific rotate opcodes.
				---
				name: test_ror
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$w0' }
				- { reg: '$w1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $w0, $w1

				; CHECK-LABEL: name: test_ror
				; CHECK: liveins: $w0, $w1
				; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
				; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
				; CHECK: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[COPY1]](s32)
				; CHECK: [[ROR:%[0-9]+]]:_(s32) = G_ROR [[COPY]], [[SEXT]]
				; CHECK: $w0 = COPY [[ROR]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%0:_(s32) = COPY $w0
				%1:_(s32) = COPY $w1
				%2:_(s32) = G_FSHR %0, %0, %1(s32)
				$w0 = COPY %2(s32)
				RET_ReallyLR implicit $w0

				...
				---
				name: test_ror64
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $x0, $x1

				; CHECK-LABEL: name: test_ror64
				; CHECK: liveins: $x0, $x1
				; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
				; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
				; CHECK: [[ROR:%[0-9]+]]:_(s64) = G_ROR [[COPY]], [[COPY1]]
				; CHECK: $x0 = COPY [[ROR]](s64)
				; CHECK: RET_ReallyLR implicit $x0
				%0:_(s64) = COPY $x0
				%1:_(s64) = COPY $x1
				%2:_(s64) = G_FSHR %0, %0, %1(s64)
				$x0 = COPY %2(s64)
				RET_ReallyLR implicit $x0

				...
				---
				name: test_rotl
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$w0' }
				- { reg: '$w1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $w0, $w1

				; CHECK-LABEL: name: test_rotl
				; CHECK: liveins: $w0, $w1
				; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
				; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
				; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; CHECK: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[C]], [[COPY1]]
				; CHECK: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[SUB]](s32)
				; CHECK: [[ROR:%[0-9]+]]:_(s32) = G_ROR [[COPY]], [[SEXT]]
				; CHECK: $w0 = COPY [[ROR]](s32)
				; CHECK: RET_ReallyLR implicit $w0
				%0:_(s32) = COPY $w0
				%1:_(s32) = COPY $w1
				%2:_(s32) = G_FSHL %0, %0, %1(s32)
				$w0 = COPY %2(s32)
				RET_ReallyLR implicit $w0

				...
				---
				name: test_rotl64
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $x0, $x1

				; CHECK-LABEL: name: test_rotl64
				; CHECK: liveins: $x0, $x1
				; CHECK: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
				; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
				; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; CHECK: [[SUB:%[0-9]+]]:_(s64) = G_SUB [[C]], [[COPY1]]
				; CHECK: [[ROR:%[0-9]+]]:_(s64) = G_ROR [[COPY]], [[SUB]]
				; CHECK: $x0 = COPY [[ROR]](s64)
				; CHECK: RET_ReallyLR implicit $x0
				%0:_(s64) = COPY $x0
				%1:_(s64) = COPY $x1
				%2:_(s64) = G_FSHL %0, %0, %1(s64)
				$x0 = COPY %2(s64)
				RET_ReallyLR implicit $x0

				...
				# Just do this for scalars for now.
				---
				name: test_no_vector
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '$w0' }
				- { reg: '$w1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $q0, $q1

				; CHECK-LABEL: name: test_no_vector
				; CHECK: liveins: $q0, $q1
				; CHECK: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
				; CHECK: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
				; CHECK: [[FSHR:%[0-9]+]]:_(<4 x s32>) = G_FSHR [[COPY]], [[COPY]], [[COPY1]](<4 x s32>)
				; CHECK: $q0 = COPY [[FSHR]](<4 x s32>)
				; CHECK: RET_ReallyLR implicit $q0
				%0:_(<4 x s32>) = COPY $q0
				%1:_(<4 x s32>) = COPY $q1
				%2:_(<4 x s32>) = G_FSHR %0, %0, %1(<4 x s32>)
				$q0 = COPY %2(<4 x s32>)
				RET_ReallyLR implicit $q0

				...

llvm/test/CodeGen/AArch64/GlobalISel/select-rotates.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -O0 -mtriple=arm64-unknown-unknown -global-isel -run-pass=instruction-select -global-isel-abort=1 %s -o - \| FileCheck %s

				---
				name: test_ror
				alignment: 4
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr }
				- { id: 1, class: gpr }
				- { id: 2, class: gpr }
				- { id: 3, class: gpr }
				liveins:
				- { reg: '$w0' }
				- { reg: '$w1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $w0, $w1

				; CHECK-LABEL: name: test_ror
				; CHECK: liveins: $w0, $w1
				; CHECK: [[COPY:%[0-9]+]]:gpr32 = COPY $w0
				; CHECK: [[COPY1:%[0-9]+]]:gpr32 = COPY $w1
				; CHECK: [[RORVWr:%[0-9]+]]:gpr32 = RORVWr [[COPY]], [[COPY1]]
				; CHECK: $w0 = COPY [[RORVWr]]
				; CHECK: RET_ReallyLR implicit $w0
				%0:gpr(s32) = COPY $w0
				%1:gpr(s32) = COPY $w1
				%3:gpr(s64) = G_SEXT %1(s32)
				%2:gpr(s32) = G_ROR %0, %3
				$w0 = COPY %2(s32)
				RET_ReallyLR implicit $w0

				...
				---
				name: test_ror64
				alignment: 4
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr }
				- { id: 1, class: gpr }
				- { id: 2, class: gpr }
				liveins:
				- { reg: '$x0' }
				- { reg: '$x1' }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.1.entry:
				liveins: $x0, $x1

				; CHECK-LABEL: name: test_ror64
				; CHECK: liveins: $x0, $x1
				; CHECK: [[COPY:%[0-9]+]]:gpr64 = COPY $x0
				; CHECK: [[COPY1:%[0-9]+]]:gpr64 = COPY $x1
				; CHECK: [[RORVXr:%[0-9]+]]:gpr64 = RORVXr [[COPY]], [[COPY1]]
				; CHECK: $x0 = COPY [[RORVXr]]
				; CHECK: RET_ReallyLR implicit $x0
				%0:gpr(s64) = COPY $x0
				%1:gpr(s64) = COPY $x1
				%2:gpr(s64) = G_ROR %0, %1
				$x0 = COPY %2(s64)
				RET_ReallyLR implicit $x0

				...