Download Raw Diff

Details

Reviewers

craig.topper
asb
luismarques
frasercrmck
HsiangKai
khchen
benshi001
reames

Summary

Now the backend promotes mask vector to an i8 vector and insert element to that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to set the certain bit, and insert back to the widen element vector.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,120 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,090 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,020 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

jacquesguan created this revision.Feb 7 2022, 1:39 AM

Herald added subscribers: VincentWu, luke957, achieveartificialintelligence and 24 others. · View Herald TranscriptFeb 7 2022, 1:39 AM

jacquesguan requested review of this revision.Feb 7 2022, 1:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2022, 1:39 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Improve code.

Harbormaster completed remote builds in B147906: Diff 406362.Feb 7 2022, 3:00 AM

craig.topper added inline comments.Feb 7 2022, 5:42 PM

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll
109–117	The upper xlen-1 bits of a0 have an unknown value. The value of %elt is only in the lower bit. You need to mask off the other bits. A mask would only not be needed if the i1 was passed with zeroext attribute.

Address comment.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll
109–117	Done, thanks.

Harbormaster completed remote builds in B148147: Diff 406670.Feb 7 2022, 9:21 PM

I may have an alternate suggestion for the bit manipulation

%old_bit = srl %wideelt, %bitindex shift old_bit to bit 0
%b = xor %old_bit, new_bit xor the old and new values together
%c = andi %b, 1 clear all bits except bit 0. This will clear all the extra bits from the shift and remove any extra bits that came with new_bit.
%shl = shl %c, %bitindex shift the xored value back to the correction position
%xor = xor %wideelt, shl // at bitindex this will be old_bit^(old_bit^new_bit) which cancels to new_bit. Every other index will xor with 0.

This might be a shorter sequence, but I'm not sure.

refactor the bit manipulation

In D119115#3309774, @craig.topper wrote:

I may have an alternate suggestion for the bit manipulation

%old_bit = srl %wideelt, %bitindex shift old_bit to bit 0
%b = xor %old_bit, new_bit xor the old and new values together
%c = andi %b, 1 clear all bits except bit 0. This will clear all the extra bits from the shift and remove any extra bits that came with new_bit.
%shl = shl %c, %bitindex shift the xored value back to the correction position
%xor = xor %wideelt, shl // at bitindex this will be old_bit^(old_bit^new_bit) which cancels to new_bit. Every other index will xor with 0.

This might be a shorter sequence, but I'm not sure.

Great idea, thanks.

Harbormaster completed remote builds in B148893: Diff 407749.Feb 10 2022, 8:25 PM

ping

ping.

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 12:39 AM

ping.

Herald added subscribers: • s, StephenFan, arichardson. · View Herald TranscriptMar 24 2022, 12:26 AM

ping.

Herald added a subscriber: sunshaoce. · View Herald TranscriptApr 1 2022, 4:32 AM

ping.

I'm sorry it took me so long to get round to this! I think it LGTM now other than questions/suggestions I've left. @craig.topper are you happy with it?

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4411	It's a bit of a shame we have to indent twice. We could do `if (VecVT.isFixedLengthVector() && VecVT.getVectorNumElements() >= 8)` which might be clearer for readers, even if we later have to take `NumElts` later. Just a style suggestion though.
4413	I think maybe `isPowerOf2_32(NumElts)` would be most correct here? I realise there's an assertion in one case below but I'm worried that something experimenting with differently-sized vectors may not hit this and get incorrect code.

Something worth thinking about is that this may not be a good sequence for a core with decoupled scalar and vector units. This introduces a copy from vector to scalar which prevents the scalar core from running ahead of the vector unit in such a design.

jacquesguan mentioned this in D133007: [RISCV] Add cost model for vector insert/extract element..Sep 1 2022, 8:18 PM

I don't think we should land this patch. It involves moving a value from vector to scalar and then back again. The vector to scalar domain crossing is likely to be expensive on at least some real hardware.

We can probably do better by forming the bitmask on the vector side entirely. Given an index in a scalar register, we can produce a single bit mask by comparing a vid vector against that index. With that, we can construct a vmerge with a scalar operand to set or clear the desired bit in the mask.

This revision now requires changes to proceed.Sep 14 2022, 11:29 AM

Herald added a subscriber: shiva0217. · View Herald TranscriptSep 14 2022, 11:29 AM

In D119115#3790207, @reames wrote:

I don't think we should land this patch. It involves moving a value from vector to scalar and then back again. The vector to scalar domain crossing is likely to be expensive on at least some real hardware.

We can probably do better by forming the bitmask on the vector side entirely. Given an index in a scalar register, we can produce a single bit mask by comparing a vid vector against that index. With that, we can construct a vmerge with a scalar operand to set or clear the desired bit in the mask.

I don't understand your last sentence well, I think that vmerge could not be used for mask type, so how to only merge the desired bit?

In D119115#3805149, @jacquesguan wrote:

In D119115#3790207, @reames wrote:

I don't think we should land this patch. It involves moving a value from vector to scalar and then back again. The vector to scalar domain crossing is likely to be expensive on at least some real hardware.

We can probably do better by forming the bitmask on the vector side entirely. Given an index in a scalar register, we can produce a single bit mask by comparing a vid vector against that index. With that, we can construct a vmerge with a scalar operand to set or clear the desired bit in the mask.

I don't understand your last sentence well, I think that vmerge could not be used for mask type, so how to only merge the desired bit?

You're right on the vmerge point. I made this same mistake in another review, caught it, and then didn't realize I'd already posted this comment. Sorry for the confusion.

However, you can replace the vmerge in the above with vmandn and a vmor.

So, code sequence looks something like:

v2 = incoming masking
x1 = incoming index
x2 = incoming bool (elt)
v1 = vid
v0 = vseteq v1, <idx>
v2 = vmandn v2, v0 // original mask with lane cleared
v3 = vmv.v.x x2
v3 = vseteq v3, 1
v3 = vmand v3, v0
v2 = vmor v2, v3

x1 = incoming index
x2 = incoming bool (elt)
v1 = vid
v0 = vseteq v1, <idx>
v2 = vmandn v2, v0 // original mask with lane cleared
bneqz x2, skip
v0 = vxor v3, v3
skip: 
v2 = vmor v2, v0

I'm not claiming this is optimal codegen; there may be better. This is just what occurs to me with a bit of thought.

Diff 407749

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,396 Lines • ▼ Show 20 Lines
	// By limiting the active vector length to index+1 and merging with the			// By limiting the active vector length to index+1 and merging with the
	// original vector (with an undisturbed tail policy for elements >= VL), we			// original vector (with an undisturbed tail policy for elements >= VL), we
	// achieve the desired result of leaving all elements untouched except the one			// achieve the desired result of leaving all elements untouched except the one
	// at VL-1, which is replaced with the desired value.			// at VL-1, which is replaced with the desired value.
	SDValue RISCVTargetLowering::lowerINSERT_VECTOR_ELT(SDValue Op,			SDValue RISCVTargetLowering::lowerINSERT_VECTOR_ELT(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	SDLoc DL(Op);			SDLoc DL(Op);
	MVT VecVT = Op.getSimpleValueType();			MVT VecVT = Op.getSimpleValueType();
				MVT XLenVT = Subtarget.getXLenVT();
	SDValue Vec = Op.getOperand(0);			SDValue Vec = Op.getOperand(0);
	SDValue Val = Op.getOperand(1);			SDValue Val = Op.getOperand(1);
	SDValue Idx = Op.getOperand(2);			SDValue Idx = Op.getOperand(2);

	if (VecVT.getVectorElementType() == MVT::i1) {			if (VecVT.getVectorElementType() == MVT::i1) {
	// FIXME: For now we just promote to an i8 vector and insert into that,			if (VecVT.isFixedLengthVector()) {
				frasercrmckUnsubmitted Not Done Reply Inline Actions It's a bit of a shame we have to indent twice. We could do `if (VecVT.isFixedLengthVector() && VecVT.getVectorNumElements() >= 8)` which might be clearer for readers, even if we later have to take `NumElts` later. Just a style suggestion though. frasercrmck: It's a bit of a shame we have to indent twice. We could do `if (VecVT.isFixedLengthVector() &&…
	// but this is probably not optimal.			unsigned NumElts = VecVT.getVectorNumElements();
				if (NumElts >= 8) {
				frasercrmckUnsubmitted Not Done Reply Inline Actions I think maybe `isPowerOf2_32(NumElts)` would be most correct here? I realise there's an assertion in one case below but I'm worried that something experimenting with differently-sized vectors may not hit this and get incorrect code. frasercrmck: I think maybe `isPowerOf2_32(NumElts)` would be most correct here? I realise there's an…
				MVT WideEltVT;
				unsigned WidenVecLen;
				SDValue ExtractElementIdx;
				SDValue ExtractBitIdx;
				unsigned MaxEEW = Subtarget.getMaxELENForFixedLengthVectors();
				MVT LargestEltVT = MVT::getIntegerVT(
				std::min(MaxEEW, unsigned(XLenVT.getSizeInBits())));
				if (NumElts <= LargestEltVT.getSizeInBits()) {
				assert(isPowerOf2_32(NumElts) &&
				"the number of elements should be power of 2");
				WideEltVT = MVT::getIntegerVT(NumElts);
				WidenVecLen = 1;
				ExtractElementIdx = DAG.getConstant(0, DL, XLenVT);
				ExtractBitIdx = Idx;
				} else {
				WideEltVT = LargestEltVT;
				WidenVecLen = NumElts / WideEltVT.getSizeInBits();
				// extract element index = index / element width
				ExtractElementIdx = DAG.getNode(
				ISD::SRL, DL, XLenVT, Idx,
				DAG.getConstant(Log2_64(WideEltVT.getSizeInBits()), DL, XLenVT));
				// mask bit index = index % element width
				ExtractBitIdx = DAG.getNode(
				ISD::AND, DL, XLenVT, Idx,
				DAG.getConstant(WideEltVT.getSizeInBits() - 1, DL, XLenVT));
				}
				MVT WideVT = MVT::getVectorVT(WideEltVT, WidenVecLen);
				Vec = DAG.getNode(ISD::BITCAST, DL, WideVT, Vec);
				SDValue ExtractElt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, XLenVT,
				Vec, ExtractElementIdx);
				// Set the bit and insert back to the widen vector.
				SDValue ShiftRight =
				DAG.getNode(ISD::SRL, DL, XLenVT, ExtractElt, ExtractBitIdx);
				SDValue ExtVal = DAG.getNode(ISD::ZERO_EXTEND, DL, XLenVT, Val);
				SDValue Xor = DAG.getNode(ISD::XOR, DL, XLenVT, ShiftRight, ExtVal);
				SDValue And = DAG.getNode(ISD::AND, DL, XLenVT, Xor,
				DAG.getConstant(1, DL, XLenVT));
				SDValue ShiftLeft =
				DAG.getNode(ISD::SHL, DL, XLenVT, And, ExtractBitIdx);
				SDValue NewElt =
				DAG.getNode(ISD::XOR, DL, XLenVT, ExtractElt, ShiftLeft);
				SDValue NewWidenVec = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, WideVT,
				Vec, NewElt, ExtractElementIdx);
				return DAG.getNode(ISD::BITCAST, DL, VecVT, NewWidenVec);
				}
				}
				// Otherwise, promote to an i8 vector and insert to it.
	MVT WideVT = MVT::getVectorVT(MVT::i8, VecVT.getVectorElementCount());			MVT WideVT = MVT::getVectorVT(MVT::i8, VecVT.getVectorElementCount());
	Vec = DAG.getNode(ISD::ZERO_EXTEND, DL, WideVT, Vec);			Vec = DAG.getNode(ISD::ZERO_EXTEND, DL, WideVT, Vec);
	Vec = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, WideVT, Vec, Val, Idx);			Vec = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, WideVT, Vec, Val, Idx);
	return DAG.getNode(ISD::TRUNCATE, DL, VecVT, Vec);			return DAG.getNode(ISD::TRUNCATE, DL, VecVT, Vec);
	}			}

	MVT ContainerVT = VecVT;			MVT ContainerVT = VecVT;
	// If the operand is a fixed-length vector, convert to a scalable one.			// If the operand is a fixed-length vector, convert to a scalable one.
	if (VecVT.isFixedLengthVector()) {			if (VecVT.isFixedLengthVector()) {
	ContainerVT = getContainerForFixedLengthVector(VecVT);			ContainerVT = getContainerForFixedLengthVector(VecVT);
	Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);			Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
	}			}

	MVT XLenVT = Subtarget.getXLenVT();

	SDValue Zero = DAG.getConstant(0, DL, XLenVT);			SDValue Zero = DAG.getConstant(0, DL, XLenVT);
	bool IsLegalInsert = Subtarget.is64Bit() \|\| Val.getValueType() != MVT::i64;			bool IsLegalInsert = Subtarget.is64Bit() \|\| Val.getValueType() != MVT::i64;
	// Even i64-element vectors on RV32 can be lowered without scalar			// Even i64-element vectors on RV32 can be lowered without scalar
	// legalization if the most-significant 32 bits of the value are not affected			// legalization if the most-significant 32 bits of the value are not affected
	// by the sign-extension of the lower 32 bits.			// by the sign-extension of the lower 32 bits.
	// TODO: We could also catch sign extensions of a 32-bit value.			// TODO: We could also catch sign extensions of a 32-bit value.
	if (!IsLegalInsert && isa<ConstantSDNode>(Val)) {			if (!IsLegalInsert && isa<ConstantSDNode>(Val)) {
	const auto *CVal = cast<ConstantSDNode>(Val);			const auto *CVal = cast<ConstantSDNode>(Val);
	▲ Show 20 Lines • Show All 6,948 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%y = insertelement <2 x i1> %x, i1 %elt, i32 %idx			%y = insertelement <2 x i1> %x, i1 %elt, i32 %idx
	ret <2 x i1> %y			ret <2 x i1> %y
	}			}

	define <8 x i1> @insertelt_v8i1(<8 x i1> %x, i1 %elt) nounwind {			define <8 x i1> @insertelt_v8i1(<8 x i1> %x, i1 %elt) nounwind {
	; CHECK-LABEL: insertelt_v8i1:			; CHECK-LABEL: insertelt_v8i1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; CHECK-NEXT: vsetivli zero, 0, e8, mf8, ta, mu
	; CHECK-NEXT: vmv.s.x v8, a0			; CHECK-NEXT: vmv.x.s a1, v0
	; CHECK-NEXT: vmv.v.i v9, 0			; CHECK-NEXT: srli a2, a1, 1
	; CHECK-NEXT: vmerge.vim v9, v9, 1, v0			; CHECK-NEXT: xor a0, a2, a0
	; CHECK-NEXT: vsetivli zero, 2, e8, mf2, tu, mu			; CHECK-NEXT: andi a0, a0, 1
	; CHECK-NEXT: vslideup.vi v9, v8, 1			; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; CHECK-NEXT: xor a0, a1, a0
	; CHECK-NEXT: vand.vi v8, v9, 1			; CHECK-NEXT: vsetivli zero, 1, e8, mf8, tu, mu
	; CHECK-NEXT: vmsne.vi v0, v8, 0			; CHECK-NEXT: vmv.s.x v0, a0
				craig.topperUnsubmitted Not Done Reply Inline Actions The upper xlen-1 bits of a0 have an unknown value. The value of %elt is only in the lower bit. You need to mask off the other bits. A mask would only not be needed if the i1 was passed with zeroext attribute. craig.topper: The upper xlen-1 bits of a0 have an unknown value. The value of %elt is only in the lower bit.
				jacquesguanAuthorUnsubmitted Done Reply Inline Actions Done, thanks. jacquesguan: Done, thanks.
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = insertelement <8 x i1> %x, i1 %elt, i64 1			%y = insertelement <8 x i1> %x, i1 %elt, i64 1
	ret <8 x i1> %y			ret <8 x i1> %y
	}			}

	define <8 x i1> @insertelt_idx_v8i1(<8 x i1> %x, i1 %elt, i32 zeroext %idx) nounwind {			define <8 x i1> @insertelt_idx_v8i1(<8 x i1> %x, i1 %elt, i32 zeroext %idx) nounwind {
	; RV32-LABEL: insertelt_idx_v8i1:			; RV32-LABEL: insertelt_idx_v8i1:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; RV32-NEXT: vsetivli zero, 0, e8, mf8, ta, mu
	; RV32-NEXT: vmv.s.x v8, a0			; RV32-NEXT: vmv.x.s a2, v0
	; RV32-NEXT: vmv.v.i v9, 0			; RV32-NEXT: srl a3, a2, a1
	; RV32-NEXT: vmerge.vim v9, v9, 1, v0			; RV32-NEXT: xor a0, a3, a0
	; RV32-NEXT: addi a0, a1, 1			; RV32-NEXT: andi a0, a0, 1
	; RV32-NEXT: vsetvli zero, a0, e8, mf2, tu, mu			; RV32-NEXT: sll a0, a0, a1
	; RV32-NEXT: vslideup.vx v9, v8, a1			; RV32-NEXT: xor a0, a2, a0
	; RV32-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; RV32-NEXT: vsetivli zero, 1, e8, mf8, tu, mu
	; RV32-NEXT: vand.vi v8, v9, 1			; RV32-NEXT: vmv.s.x v0, a0
	; RV32-NEXT: vmsne.vi v0, v8, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: insertelt_idx_v8i1:			; RV64-LABEL: insertelt_idx_v8i1:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; RV64-NEXT: sext.w a1, a1
	; RV64-NEXT: vmv.s.x v8, a0			; RV64-NEXT: vsetivli zero, 0, e8, mf8, ta, mu
	; RV64-NEXT: vmv.v.i v9, 0			; RV64-NEXT: vmv.x.s a2, v0
	; RV64-NEXT: vmerge.vim v9, v9, 1, v0			; RV64-NEXT: srl a3, a2, a1
	; RV64-NEXT: sext.w a0, a1			; RV64-NEXT: xor a0, a3, a0
	; RV64-NEXT: addi a1, a0, 1			; RV64-NEXT: andi a0, a0, 1
	; RV64-NEXT: vsetvli zero, a1, e8, mf2, tu, mu			; RV64-NEXT: sll a0, a0, a1
	; RV64-NEXT: vslideup.vx v9, v8, a0			; RV64-NEXT: xor a0, a2, a0
	; RV64-NEXT: vsetivli zero, 8, e8, mf2, ta, mu			; RV64-NEXT: vsetivli zero, 1, e8, mf8, tu, mu
	; RV64-NEXT: vand.vi v8, v9, 1			; RV64-NEXT: vmv.s.x v0, a0
	; RV64-NEXT: vmsne.vi v0, v8, 0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%y = insertelement <8 x i1> %x, i1 %elt, i32 %idx			%y = insertelement <8 x i1> %x, i1 %elt, i32 %idx
	ret <8 x i1> %y			ret <8 x i1> %y
	}			}

	define <64 x i1> @insertelt_v64i1(<64 x i1> %x, i1 %elt) nounwind {			define <64 x i1> @insertelt_v64i1(<64 x i1> %x, i1 %elt) nounwind {
	; CHECK-LABEL: insertelt_v64i1:			; RV32-LABEL: insertelt_v64i1:
	; CHECK: # %bb.0:			; RV32: # %bb.0:
	; CHECK-NEXT: li a1, 64			; RV32-NEXT: vsetivli zero, 0, e32, mf2, ta, mu
	; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, mu			; RV32-NEXT: vmv.x.s a1, v0
	; CHECK-NEXT: vmv.s.x v8, a0			; RV32-NEXT: srli a2, a1, 1
	; CHECK-NEXT: vmv.v.i v12, 0			; RV32-NEXT: xor a0, a2, a0
	; CHECK-NEXT: vmerge.vim v12, v12, 1, v0			; RV32-NEXT: andi a0, a0, 1
	; CHECK-NEXT: vsetivli zero, 2, e8, m4, tu, mu			; RV32-NEXT: slli a0, a0, 1
	; CHECK-NEXT: vslideup.vi v12, v8, 1			; RV32-NEXT: xor a0, a1, a0
	; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, mu			; RV32-NEXT: vsetivli zero, 2, e32, mf2, tu, mu
	; CHECK-NEXT: vand.vi v8, v12, 1			; RV32-NEXT: vmv.s.x v0, a0
	; CHECK-NEXT: vmsne.vi v0, v8, 0			; RV32-NEXT: ret
	; CHECK-NEXT: ret			;
				; RV64-LABEL: insertelt_v64i1:
				; RV64: # %bb.0:
				; RV64-NEXT: vsetivli zero, 0, e64, m1, ta, mu
				; RV64-NEXT: vmv.x.s a1, v0
				; RV64-NEXT: srli a2, a1, 1
				; RV64-NEXT: xor a0, a2, a0
				; RV64-NEXT: andi a0, a0, 1
				; RV64-NEXT: slli a0, a0, 1
				; RV64-NEXT: xor a0, a1, a0
				; RV64-NEXT: vsetivli zero, 1, e64, m1, tu, mu
				; RV64-NEXT: vmv.s.x v0, a0
				; RV64-NEXT: ret
	%y = insertelement <64 x i1> %x, i1 %elt, i64 1			%y = insertelement <64 x i1> %x, i1 %elt, i64 1
	ret <64 x i1> %y			ret <64 x i1> %y
	}			}

	define <64 x i1> @insertelt_idx_v64i1(<64 x i1> %x, i1 %elt, i32 zeroext %idx) nounwind {			define <64 x i1> @insertelt_idx_v64i1(<64 x i1> %x, i1 %elt, i32 zeroext %idx) nounwind {
	; RV32-LABEL: insertelt_idx_v64i1:			; RV32-LABEL: insertelt_idx_v64i1:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: li a2, 64			; RV32-NEXT: srli a2, a1, 5
	; RV32-NEXT: vsetvli zero, a2, e8, m4, ta, mu			; RV32-NEXT: vsetivli zero, 1, e32, mf2, ta, mu
				; RV32-NEXT: vslidedown.vx v8, v0, a2
				; RV32-NEXT: vmv.x.s a3, v8
				; RV32-NEXT: srl a4, a3, a1
				; RV32-NEXT: xor a0, a4, a0
				; RV32-NEXT: andi a0, a0, 1
				; RV32-NEXT: sll a0, a0, a1
				; RV32-NEXT: xor a0, a3, a0
	; RV32-NEXT: vmv.s.x v8, a0			; RV32-NEXT: vmv.s.x v8, a0
	; RV32-NEXT: vmv.v.i v12, 0			; RV32-NEXT: addi a0, a2, 1
	; RV32-NEXT: vmerge.vim v12, v12, 1, v0			; RV32-NEXT: vsetvli zero, a0, e32, mf2, tu, mu
	; RV32-NEXT: addi a0, a1, 1			; RV32-NEXT: vslideup.vx v0, v8, a2
	; RV32-NEXT: vsetvli zero, a0, e8, m4, tu, mu
	; RV32-NEXT: vslideup.vx v12, v8, a1
	; RV32-NEXT: vsetvli zero, a2, e8, m4, ta, mu
	; RV32-NEXT: vand.vi v8, v12, 1
	; RV32-NEXT: vmsne.vi v0, v8, 0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: insertelt_idx_v64i1:			; RV64-LABEL: insertelt_idx_v64i1:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: li a2, 64			; RV64-NEXT: sext.w a1, a1
	; RV64-NEXT: vsetvli zero, a2, e8, m4, ta, mu			; RV64-NEXT: vsetivli zero, 0, e64, m1, ta, mu
	; RV64-NEXT: vmv.s.x v8, a0			; RV64-NEXT: vmv.x.s a2, v0
	; RV64-NEXT: vmv.v.i v12, 0			; RV64-NEXT: srl a3, a2, a1
	; RV64-NEXT: vmerge.vim v12, v12, 1, v0			; RV64-NEXT: xor a0, a3, a0
	; RV64-NEXT: sext.w a0, a1			; RV64-NEXT: andi a0, a0, 1
	; RV64-NEXT: addi a1, a0, 1			; RV64-NEXT: sll a0, a0, a1
	; RV64-NEXT: vsetvli zero, a1, e8, m4, tu, mu			; RV64-NEXT: xor a0, a2, a0
	; RV64-NEXT: vslideup.vx v12, v8, a0			; RV64-NEXT: vsetivli zero, 1, e64, m1, tu, mu
	; RV64-NEXT: vsetvli zero, a2, e8, m4, ta, mu			; RV64-NEXT: vmv.s.x v0, a0
	; RV64-NEXT: vand.vi v8, v12, 1
	; RV64-NEXT: vmsne.vi v0, v8, 0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%y = insertelement <64 x i1> %x, i1 %elt, i32 %idx			%y = insertelement <64 x i1> %x, i1 %elt, i32 %idx
	ret <64 x i1> %y			ret <64 x i1> %y
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Improve insert_vector_elt for fixed mask registers.
Needs RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 407749

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Improve insert_vector_elt for fixed mask registers.Needs RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 407749

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll

[RISCV] Improve insert_vector_elt for fixed mask registers.
Needs RevisionPublic