This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Back end improvements to vec_splat
ClosedPublic

Authored by nemanjai on Mar 30 2016, 4:27 AM.

Download Raw Diff

Details

Reviewers

wschmidt
kbarton
amehsan
seurer
hfinkel

Summary

We currently have no way to emit the xxspltw for a word splat and a call to vec_splat for vectors with 8-byte elements get translated to vperm with a mask vector that comes from memory.
This patch provides intrinsics to get at xxspltw and xxspltd (extended mnemonic). In a subsequent patch, altivec.h will be modified to use these intrinsics for the respective vec_splat definitions.

This provides a significant improvement in one benchmark that uses vec_splat.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 52035.Mar 30 2016, 4:27 AM

nemanjai retitled this revision from to [PowerPC] Back end improvements to vec_splat.

nemanjai updated this object.

nemanjai added reviewers: hfinkel, kbarton, wschmidt, amehsan, seurer.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

amehsan added inline comments.Mar 30 2016, 4:37 AM

include/llvm/IR/IntrinsicsPowerPC.td
752–755 ↗	(On Diff #52035)	Are you going to define a builtin and emit this intrinsics in the clang, when that builtin is seen? Can't we use shufflevector instruction, instead of target specific intrinsic?

nemanjai mentioned this in D18593: [PowerPC] Front end improvements for vec_splat.Mar 30 2016, 5:24 AM

nemanjai added inline comments.Mar 30 2016, 8:07 AM

include/llvm/IR/IntrinsicsPowerPC.td
752–755 ↗	(On Diff #52035)	Yes, D18593 has the FE portions of this patch. I don't think having this intrinsic prevents us from using these for the shufflevector instruction. These can be subsequently added as options to the framework for emitting vector shuffles. Namely, shufflevector is probably too general of an instruction to do this all in the .td files. Notice that all the Altivec instructions that can be used for vector shuffles have code fragments that check whether the node has a mask that is eligible for that instruction. The canonical type for all the vector shuffles is v16i8 which isn't a type that can be in vsrc. Of course, this can be changed... I think that the simplest thing to do would be similar to the way the QPX QVESPLATI instruction is handled (namely, check for VSX, check that the node is a splat of the correct type, emit xxspltw/xxpermdi). Of course, this is just my opinion and if people feel that I should implement this as a vector shuffle immediately, I can do so.

amehsan added inline comments.Mar 30 2016, 11:48 AM

include/llvm/IR/IntrinsicsPowerPC.td
752–755 ↗	(On Diff #52035)	You definitely know more about our codegen than I do and I may be missing something here. Also it will be good to hear from other reviewers about this (@hfinkel, @kbarton, @wschmidt, @seurer). The way I see it (and I maybe wrong) is this. We have two options: 1- Current implementation which is simpler. 2- Using vector shuffle, which requires extra C++ code, to get it right. It is more complicated. I think (1) results in introducing a new target specific intrinsic which is not understood by optimizations. Also this intrinsic maybe used for other purposes in the future. Potentially causing optimization problems. So even though (2) is more complicated, to me it seems that (2) is the right way to go. One reason that I see we may want to choose (1) is that the complexity of (2) is too much. There might be other reasons that I have not realized.

hfinkel added inline comments.Mar 30 2016, 12:16 PM

include/llvm/IR/IntrinsicsPowerPC.td
752–755 ↗	(On Diff #52035)	I don't really see a difference between what the two of you are proposing, except whether the frontend uses an intrinsic or not. Is this right? Our general preference, across all targets, is that the frontend headers emit generic IR whenever possible (including using things like __builtin_shufflevector, or equivalent syntax. Then we match it in the backend.

Yes, please use generic IR if it is possible to represent. Thanks.

nemanjai added inline comments.Mar 30 2016, 2:38 PM

include/llvm/IR/IntrinsicsPowerPC.td
752–755 ↗	(On Diff #52035)	OK, I think it's settled. I'll update the patch so that we retain the information that a vec_splat is a vector_shuffle rather than using a target-specific builtin. Thank you all for your comments. New patch coming up tomorrow :).

I've changed this around a bit to match an SDAG pattern that will be emitted when lowering vector_shuffle nodes rather than matching an intrinsic. This way, we'll get the FE to emit vector shuffles for splats (in the update to D18593).

Also, since XXSPLTW is a splat, I've added the handling for it to the swap removal pass.

A couple of comments, otherwise LGTM.

lib/Target/PowerPC/PPCVSXSwapRemoval.cpp
410	Looks like the "This is not yet implemented. When it is, we need to uncomment the following:" part of this comment is out-dated. If so, please remove it.
test/CodeGen/PowerPC/swaps-le-2.ll
90	Please add a few more test cases for other index values. The space of bugs that might happen to continue to cause this index to be 0, but incorrectly compute others is pretty large.

This revision is now accepted and ready to land.Apr 26 2016, 6:11 PM

Committed revision 268516.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

4 lines

11 lines

5 lines

4 lines

PPCVSXSwapRemoval.cpp

17 lines

test/

CodeGen/

PowerPC/

swaps-le-2.ll

2 lines

Diff 52216

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking		// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking
// three v4f32 operands and producing a v4f32 result.		// three v4f32 operands and producing a v4f32 result.
VMADDFP, VNMSUBFP,		VMADDFP, VNMSUBFP,

/// VPERM - The PPC VPERM Instruction.		/// VPERM - The PPC VPERM Instruction.
///		///
VPERM,		VPERM,

		/// XXSPLT - The PPC VSX splat instructions
		///
		XXSPLT,

/// The CMPB instruction (takes two operands of i32 or i64).		/// The CMPB instruction (takes two operands of i32 or i64).
CMPB,		CMPB,

/// Hi/Lo - These represent the high and low 16-bit parts of a global		/// Hi/Lo - These represent the high and low 16-bit parts of a global
/// address respectively. These nodes have two operands, the first of		/// address respectively. These nodes have two operands, the first of
/// which must be a TargetGlobalAddress, and the second of which must be a		/// which must be a TargetGlobalAddress, and the second of which must be a
/// Constant. Selected naively, these turn into 'lis G+C' and 'li G+C',		/// Constant. Selected naively, these turn into 'lis G+C' and 'li G+C',
/// though these are usually folded into other nodes.		/// though these are usually folded into other nodes.
▲ Show 20 Lines • Show All 835 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,004 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";		case PPCISD::FCTIDUZ: return "PPCISD::FCTIDUZ";
case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";		case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";
case PPCISD::FRE: return "PPCISD::FRE";		case PPCISD::FRE: return "PPCISD::FRE";
case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";		case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";
case PPCISD::STFIWX: return "PPCISD::STFIWX";		case PPCISD::STFIWX: return "PPCISD::STFIWX";
case PPCISD::VMADDFP: return "PPCISD::VMADDFP";		case PPCISD::VMADDFP: return "PPCISD::VMADDFP";
case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";		case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";
case PPCISD::VPERM: return "PPCISD::VPERM";		case PPCISD::VPERM: return "PPCISD::VPERM";
		case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
case PPCISD::CMPB: return "PPCISD::CMPB";		case PPCISD::CMPB: return "PPCISD::CMPB";
case PPCISD::Hi: return "PPCISD::Hi";		case PPCISD::Hi: return "PPCISD::Hi";
case PPCISD::Lo: return "PPCISD::Lo";		case PPCISD::Lo: return "PPCISD::Lo";
case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";		case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";
case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";		case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";
case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";		case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";
case PPCISD::GlobalBaseReg: return "PPCISD::GlobalBaseReg";		case PPCISD::GlobalBaseReg: return "PPCISD::GlobalBaseReg";
case PPCISD::SRL: return "PPCISD::SRL";		case PPCISD::SRL: return "PPCISD::SRL";
▲ Show 20 Lines • Show All 6,144 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
SDValue V1 = Op.getOperand(0);		SDValue V1 = Op.getOperand(0);
SDValue V2 = Op.getOperand(1);		SDValue V2 = Op.getOperand(1);
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
bool isLittleEndian = Subtarget.isLittleEndian();		bool isLittleEndian = Subtarget.isLittleEndian();

		if (Subtarget.hasVSX()) {
		if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
		int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
		SDValue Splat = DAG.getNode(PPCISD::XXSPLT, dl, MVT::v4i32, Conv,
		DAG.getConstant(SplatIdx, dl, MVT::i32));
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Splat);
		}
		}

if (Subtarget.hasQPX()) {		if (Subtarget.hasQPX()) {
if (VT.getVectorNumElements() != 4)		if (VT.getVectorNumElements() != 4)
return SDValue();		return SDValue();

if (V2.isUndef()) V2 = V1;		if (V2.isUndef()) V2 = V1;

int AlignIdx = PPC::isQVALIGNIShuffleMask(SVOp);		int AlignIdx = PPC::isQVALIGNIShuffleMask(SVOp);
if (AlignIdx != -1) {		if (AlignIdx != -1) {
▲ Show 20 Lines • Show All 4,406 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

	Show All 25 Lines

	def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32> ]>;			def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32> ]>;
	def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,			def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,
	SDTCisVT<1, i32> ]>;			SDTCisVT<1, i32> ]>;
	def SDT_PPCvperm : SDTypeProfile<1, 3, [			def SDT_PPCvperm : SDTypeProfile<1, 3, [
	SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>			SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>
	]>;			]>;

				def SDT_PPCVecSplat : SDTypeProfile<1, 2, [ SDTCisVec<0>,
				SDTCisVec<1>, SDTCisInt<2>
				]>;

	def SDT_PPCvcmp : SDTypeProfile<1, 3, [			def SDT_PPCvcmp : SDTypeProfile<1, 3, [
	SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>			SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>
	]>;			]>;

	def SDT_PPCcondbr : SDTypeProfile<0, 3, [			def SDT_PPCcondbr : SDTypeProfile<0, 3, [
	SDTCisVT<0, i32>, SDTCisVT<2, OtherVT>			SDTCisVT<0, i32>, SDTCisVT<2, OtherVT>
	]>;			]>;

	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	def PPCaddiTlsldLAddr : SDNode<"PPCISD::ADDI_TLSLD_L_ADDR",			def PPCaddiTlsldLAddr : SDNode<"PPCISD::ADDI_TLSLD_L_ADDR",
	SDTypeProfile<1, 3, [			SDTypeProfile<1, 3, [
	SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,			SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,
	SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;			SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;
	def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;			def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;
	def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;			def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;

	def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;			def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;
				def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;

	def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;			def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;
	def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;			def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;
	def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;			def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;
	def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;			def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;

	def PPCqbflt : SDNode<"PPCISD::QBFLT", SDT_PPCqbflt, []>;			def PPCqbflt : SDNode<"PPCISD::QBFLT", SDT_PPCqbflt, []>;

	▲ Show 20 Lines • Show All 4,020 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 769 Lines • ▼ Show 20 Lines	def XXSEL : XX4Form<60, 3,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, vsrc:$XC),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, vsrc:$XC),
"xxsel $XT, $XA, $XB, $XC", IIC_VecPerm, []>;		"xxsel $XT, $XA, $XB, $XC", IIC_VecPerm, []>;

def XXSLDWI : XX3Form_2<60, 2,		def XXSLDWI : XX3Form_2<60, 2,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, u2imm:$SHW),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, u2imm:$SHW),
"xxsldwi $XT, $XA, $XB, $SHW", IIC_VecPerm, []>;		"xxsldwi $XT, $XA, $XB, $SHW", IIC_VecPerm, []>;
def XXSPLTW : XX2Form_2<60, 164,		def XXSPLTW : XX2Form_2<60, 164,
(outs vsrc:$XT), (ins vsrc:$XB, u2imm:$UIM),		(outs vsrc:$XT), (ins vsrc:$XB, u2imm:$UIM),
"xxspltw $XT, $XB, $UIM", IIC_VecPerm, []>;		"xxspltw $XT, $XB, $UIM", IIC_VecPerm,
		[(set v4i32:$XT,
		(PPCxxsplt v4i32:$XB, imm32SExt16:$UIM))]>;
} // hasSideEffects		} // hasSideEffects

// SELECT_CC_* - Used to implement the SELECT_CC DAG operation. Expanded after		// SELECT_CC_* - Used to implement the SELECT_CC DAG operation. Expanded after
// instruction selection into a branch sequence.		// instruction selection into a branch sequence.
let usesCustomInserter = 1, // Expanded after instruction selection.		let usesCustomInserter = 1, // Expanded after instruction selection.
PPC970_Single = 1 in {		PPC970_Single = 1 in {

def SELECT_CC_VSRC: Pseudo<(outs vsrc:$dst),		def SELECT_CC_VSRC: Pseudo<(outs vsrc:$dst),
▲ Show 20 Lines • Show All 1,355 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCVSXSwapRemoval.cpp

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
SwapVector[VecIdx].IsSwappable = 1;		SwapVector[VecIdx].IsSwappable = 1;
SwapVector[VecIdx].SpecialHandling = SHValues::SH_COPYWIDEN;		SwapVector[VecIdx].SpecialHandling = SHValues::SH_COPYWIDEN;
}		}
break;		break;
}		}
case PPC::VSPLTB:		case PPC::VSPLTB:
case PPC::VSPLTH:		case PPC::VSPLTH:
case PPC::VSPLTW:		case PPC::VSPLTW:
		case PPC::XXSPLTW:
// Splats are lane-sensitive, but we can use special handling		// Splats are lane-sensitive, but we can use special handling
// to adjust the source lane for the splat. This is not yet		// to adjust the source lane for the splat. This is not yet
// implemented. When it is, we need to uncomment the following:		// implemented. When it is, we need to uncomment the following:
		hfinkelUnsubmitted Not Done Reply Inline Actions Looks like the "This is not yet implemented. When it is, we need to uncomment the following:" part of this comment is out-dated. If so, please remove it. hfinkel: Looks like the "This is not yet implemented. When it is, we need to uncomment the following:"…
SwapVector[VecIdx].IsSwappable = 1;		SwapVector[VecIdx].IsSwappable = 1;
SwapVector[VecIdx].SpecialHandling = SHValues::SH_SPLAT;		SwapVector[VecIdx].SpecialHandling = SHValues::SH_SPLAT;
break;		break;
// The presence of the following lane-sensitive operations in a		// The presence of the following lane-sensitive operations in a
// web will kill the optimization, at least for now. For these		// web will kill the optimization, at least for now. For these
// we do nothing, causing the optimization to fail.		// we do nothing, causing the optimization to fail.
// FIXME: Some of these could be permitted with special handling,		// FIXME: Some of these could be permitted with special handling,
// and will be phased in as time permits.		// and will be phased in as time permits.
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
case PPC::VUPKLSH:		case PPC::VUPKLSH:
case PPC::VUPKLSW:		case PPC::VUPKLSW:
case PPC::XXMRGHW:		case PPC::XXMRGHW:
case PPC::XXMRGLW:		case PPC::XXMRGLW:
// XXSLDWI could be replaced by a general permute with one of three		// XXSLDWI could be replaced by a general permute with one of three
// permute control vectors (for shift values 1, 2, 3). However,		// permute control vectors (for shift values 1, 2, 3). However,
// VPERM has a more restrictive register class.		// VPERM has a more restrictive register class.
case PPC::XXSLDWI:		case PPC::XXSLDWI:
case PPC::XXSPLTW:
break;		break;
}		}
}		}
}		}

if (RelevantFunction) {		if (RelevantFunction) {
DEBUG(dbgs() << "Swap vector when first built\n\n");		DEBUG(dbgs() << "Swap vector when first built\n\n");
dumpSwapVector();		dumpSwapVector();
▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	case SHValues::SH_SPLAT: {
DEBUG(dbgs() << "Changing splat: ");		DEBUG(dbgs() << "Changing splat: ");
DEBUG(MI->dump());		DEBUG(MI->dump());

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
default:		default:
llvm_unreachable("Unexpected splat opcode");		llvm_unreachable("Unexpected splat opcode");
case PPC::VSPLTB: NElts = 16; break;		case PPC::VSPLTB: NElts = 16; break;
case PPC::VSPLTH: NElts = 8; break;		case PPC::VSPLTH: NElts = 8; break;
case PPC::VSPLTW: NElts = 4; break;		case PPC::VSPLTW:
		case PPC::XXSPLTW: NElts = 4; break;
}		}

unsigned EltNo = MI->getOperand(1).getImm();		unsigned EltNo;
		if (MI->getOpcode() == PPC::XXSPLTW)
		EltNo = MI->getOperand(2).getImm();
		else
		EltNo = MI->getOperand(1).getImm();

EltNo = (EltNo + NElts / 2) % NElts;		EltNo = (EltNo + NElts / 2) % NElts;
		if (MI->getOpcode() == PPC::XXSPLTW)
		MI->getOperand(2).setImm(EltNo);
		else
MI->getOperand(1).setImm(EltNo);		MI->getOperand(1).setImm(EltNo);

DEBUG(dbgs() << " Into: ");		DEBUG(dbgs() << " Into: ");
DEBUG(MI->dump());		DEBUG(MI->dump());
break;		break;
}		}

// For an XXPERMDI that isn't handled otherwise, we need to		// For an XXPERMDI that isn't handled otherwise, we need to
// reverse the order of the operands. If the selector operand		// reverse the order of the operands. If the selector operand
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/swaps-le-2.ll

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: @sfoo			; CHECK-LABEL: @sfoo
	; CHECK: lxvd2x			; CHECK: lxvd2x
	; CHECK: vsplth {{[0-9]+}}, {{[0-9]+}}, 5			; CHECK: vsplth {{[0-9]+}}, {{[0-9]+}}, 5
	; CHECK: stxvd2x			; CHECK: stxvd2x

	; CHECK-LABEL: @ifoo			; CHECK-LABEL: @ifoo
	; CHECK: lxvd2x			; CHECK: lxvd2x
	; CHECK: vspltw {{[0-9]+}}, {{[0-9]+}}, 0			; CHECK: xxspltw {{[0-9]+}}, {{[0-9]+}}, 0
				hfinkelUnsubmitted Not Done Reply Inline Actions Please add a few more test cases for other index values. The space of bugs that might happen to continue to cause this index to be 0, but incorrectly compute others is pretty large. hfinkel: Please add a few more test cases for other index values. The space of bugs that might happen to…
	; CHECK: stxvd2x			; CHECK: stxvd2x