This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
2
RISCVInsertVSETVLI.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
vsetvli-modify-vl.ll

Differential D123581

[RISCV] Teach vsetvli insertion to handle PseudoReadVL.
AbandonedPublic

Authored by fakepaper56 on Apr 12 2022, 2:55 AM.

Download Raw Diff

Details

Reviewers

craig.topper
jacquesguan
frasercrmck
reames

Summary

VSETVLIInfos right after VLEFF/VLSEGFF are unkown since they modify VL. Unknown
VSETVLIInfos make next vector operations needed to be inserted VSET(I)VLI.

Take the below C code as an example,

vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);

Vsetvli insertion adds a redundant vsetvli for that,

vsetvli a2,a2,e8,m4,ta,mu
vle8ff.v v28,(a0)
csrr a3,vl ; redundant
vsetvli zero,a3,e8,m4,ta,mu ; redundant
vmseq.vi v25,v28,0

The patch uses PseudoReadVL to get the VSETVLIInfos at the location. It may
prevent vsetvli insertion from inserting redundant VSET(I)VLI.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,130 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,130 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	740 ms	x64 debian > SanitizerCommon-tsan-x86_64-Linux.Linux::decorate_proc_maps.cpp

Event Timeline

fakepaper56 created this revision.Apr 12 2022, 2:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:55 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 29 others. · View Herald Transcript

fakepaper56 requested review of this revision.Apr 12 2022, 2:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:55 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B159181: Diff 422156.Apr 12 2022, 3:32 AM

jacquesguan added inline comments.Apr 12 2022, 6:00 AM

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
59	Maybe no need to save a pointer in the `VSETVLIInfo`? I think that we only care about if the previous `VSETVLIInfo` is from a VL modified MI and the current AVL is read from the result of the modified VL. I think maybe we could use a local pointer in `emitVSETVLIs` or somewhere you would like to use it.

kito-cheng added a reviewer: frasercrmck.Apr 12 2022, 7:35 AM

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

fakepaper56 added inline comments.Apr 12 2022, 6:50 PM

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
59	The patch also supports `emitVSETVLIPHIs` which detects the `VSETVLIInfo` of predecessors. I think it is hard to introduce "previous" `VSETVLIInfo` into multiple predecessors.

In D123581#3445896, @craig.topper wrote:

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

I think we could remove function collectVLCopy() and the table RegToVLModifer if we change the output of VLEFF.

By the way, I think we could just expanding PseudoReadVL in the pass RISCVInsertVSETVLI instead after-RA pass, since scheduling may decrease the live range of the output of PseudoReadVL.

In D123581#3445896, @craig.topper wrote:

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

Do you think should we change the output of VLEFF and VLSEGFF? I had tried to do that in my local, but I am confused that changing the output seems only benefit in the pass and we need to add more code in tablegen files and RISCVISelDAGToDAG.* than decreased code in the patch.

In D123581#3482481, @fakepaper56 wrote:

In D123581#3445896, @craig.topper wrote:

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

Do you think should we change the output of VLEFF and VLSEGFF? I had tried to do that in my local, but I am confused that changing the output seems only benefit in the pass and we need to add more code in tablegen files and RISCVISelDAGToDAG.* than decreased code in the patch.

The InsertVSETVLI pass is complex and buggy. We've had 3 bugs in it in 2022. I want to reduce complexity here if possible. Would adding SEW and LMUL from the VLEFF to the PseudoReadVL that gets emitted for VLEFF help without requiring us to give the VLEFF the GPR output?

In D123581#3482841, @craig.topper wrote:

Would adding SEW and LMUL from the VLEFF to the PseudoReadVL that gets emitted for VLEFF help without requiring us to give the VLEFF the GPR output?

I think it is a better solution, we could just check vtype of PseudoReadVL for PseudoReadVL's user and we even don't need the new status VLModified.

Should I create a new revision for changing PseudoReadVL? Or I just do the change in the revision and also update the commit name and summary?

In D123581#3484656, @fakepaper56 wrote:

Should I create a new revision for changing PseudoReadVL? Or I just do the change in the revision and also update the commit name and summary?

Yes, create a new revision. Then we can rebase this patch on top of that new revision.

khchen added a subscriber: khchen.May 3 2022, 12:12 AM

fakepaper56 mentioned this in D125199: [RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF..May 8 2022, 7:54 PM

fakepaper56 mentioned this in rG4537aae0d57e: [RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF..May 10 2022, 11:08 PM

Directly use PseudoReadVL to get VSETVLIInfo also change the commit name and summary.
Additionally, the update removes useless PseudoReadVL in the pass.

Herald added a subscriber: shiva0217. · View Herald TranscriptMay 12 2022, 9:05 AM

craig.topper added a reviewer: reames.May 12 2022, 9:08 AM

Harbormaster completed remote builds in B164113: Diff 428962.May 12 2022, 10:29 AM

fakepaper56 retitled this revision from [RISCV] Teach vsetvli insertion to handle VSETVLIInfo of vl-modified instruction. to [RISCV] Teach vsetvli insertion to handle PseudoReadVL..May 14 2022, 1:06 AM

fakepaper56 edited the summary of this revision. (Show Details)

reames mentioned this in rGbc2fe4a0d675: [RISCV] Add basic fault-first load coverage for VSETVLI insertion.May 23 2022, 10:29 AM

Coming into this a bit late, but been spending a decent amount of time looking at related code recently.

I am generally not a fan of this patch. The entire PsuedoReadVL mechanism feels like a hack, and the unchecked assumption that the SEW and policy bits on the ReadVL match the prior VLE is worrying. Beyond that, this adds a decent amount of complexity.

I find myself agreeing with @craig.topper's comment above. It really feels like we need a pseudo instruction for VLEnFF itself here. Such a psuedo would produce two outputs, the second being the GPR version of VL. After this pass, we could lower the pseudo into two instructions (e.g. the actual VLEnFF and the CSR read). This would make the state update on this patch very minimal. This really feels to me like the "right" approach here.

An alternative you could pursue is to approximate the simplicity of matching of the pseudo with a local peephole on top of the current split structure. I mocked up a POC of this idea (https://reviews.llvm.org/D126227). It needs some cleanup, but its less invasive/risky than this is. Note that the test diff shows a slightly different impact of this patch. I'm not sure which is correct.

In D123581#3532019, @reames wrote:

I am generally not a fan of this patch. The entire PsuedoReadVL mechanism feels like a hack, and the unchecked assumption that the SEW and policy bits on the ReadVL match the prior VLE is worrying. Beyond that, this adds a decent amount of complexity.

I find myself agreeing with @craig.topper's comment above. It really feels like we need a pseudo instruction for VLEnFF itself here. Such a psuedo would produce two outputs, the second being the GPR version of VL. After this pass, we could lower the pseudo into two instructions (e.g. the actual VLEnFF and the CSR read). This would make the state update on this patch very minimal. This really feels to me like the "right" approach here.

I agree VTYPE of VLEnFF calculated in two different places is worrying. I am sorry that I didn't think deeply about the two approaches and just thought that modifying PseudoReadVL need less code.

The function is done by D127576.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInsertVSETVLI.cpp

43 lines

test/

CodeGen/

RISCV/

rvv/

vsetvli-modify-vl.ll

83 lines

Diff 428962

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	class VSETVLIInfo {

enum : uint8_t {		enum : uint8_t {
Uninitialized,		Uninitialized,
AVLIsReg,		AVLIsReg,
AVLIsImm,		AVLIsImm,
Unknown,		Unknown,
} State = Uninitialized;		} State = Uninitialized;

// Fields from VTYPE.		// Fields from VTYPE.
		jacquesguanUnsubmitted Not Done Reply Inline Actions Maybe no need to save a pointer in the `VSETVLIInfo`? I think that we only care about if the previous `VSETVLIInfo` is from a VL modified MI and the current AVL is read from the result of the modified VL. I think maybe we could use a local pointer in `emitVSETVLIs` or somewhere you would like to use it. jacquesguan: Maybe no need to save a pointer in the `VSETVLIInfo`? I think that we only care about if the…
		fakepaper56AuthorUnsubmitted Not Done Reply Inline Actions The patch also supports `emitVSETVLIPHIs` which detects the `VSETVLIInfo` of predecessors. I think it is hard to introduce "previous" `VSETVLIInfo` into multiple predecessors. fakepaper56: The patch also supports `emitVSETVLIPHIs` which detects the `VSETVLIInfo` of predecessors. I…
RISCVII::VLMUL VLMul = RISCVII::LMUL_1;		RISCVII::VLMUL VLMul = RISCVII::LMUL_1;
uint8_t SEW = 0;		uint8_t SEW = 0;
uint8_t TailAgnostic : 1;		uint8_t TailAgnostic : 1;
uint8_t MaskAgnostic : 1;		uint8_t MaskAgnostic : 1;
uint8_t MaskRegOp : 1;		uint8_t MaskRegOp : 1;
uint8_t StoreOp : 1;		uint8_t StoreOp : 1;
uint8_t ScalarMovOp : 1;		uint8_t ScalarMovOp : 1;
uint8_t SEWLMULRatioOnly : 1;		uint8_t SEWLMULRatioOnly : 1;
▲ Show 20 Lines • Show All 660 Lines • ▼ Show 20 Lines	assert((AVLReg != RISCV::X0 \|\| MI.getOperand(0).getReg() != RISCV::X0) &&
"Can't handle X0, X0 vsetvli yet");		"Can't handle X0, X0 vsetvli yet");
NewInfo.setAVLReg(AVLReg);		NewInfo.setAVLReg(AVLReg);
}		}
NewInfo.setVTYPE(MI.getOperand(2).getImm());		NewInfo.setVTYPE(MI.getOperand(2).getImm());

return NewInfo;		return NewInfo;
}		}

		// Return a VSETVLIInfo representing the VSETVLIInfo of PseudoReadVL.
		// Although PseudoReadVL does not change VL/VTYPE, its operands could provide
		// the value of VL/VTYPE at the location of it.
		static VSETVLIInfo getInfoForReadVL(const MachineInstr &ReadVL) {
		assert(ReadVL.getOpcode() == RISCV::PseudoReadVL);
		VSETVLIInfo NewInfo;
		NewInfo.setAVLReg(ReadVL.getOperand(0).getReg());
		NewInfo.setVTYPE(ReadVL.getOperand(1).getImm());
		return NewInfo;
		}

bool RISCVInsertVSETVLI::needVSETVLI(const VSETVLIInfo &Require,		bool RISCVInsertVSETVLI::needVSETVLI(const VSETVLIInfo &Require,
const VSETVLIInfo &CurInfo) {		const VSETVLIInfo &CurInfo) {
if (CurInfo.isCompatible(Require, /Strict/ false))		if (CurInfo.isCompatible(Require, /Strict/ false))
return false;		return false;

// We didn't find a compatible value. If our AVL is a virtual register,		// We didn't find a compatible value. If our AVL is a virtual register,
// it might be defined by a VSET(I)VLI. If it has the same VTYPE we need		// it might be defined by a VSET(I)VLI. If it has the same VTYPE we need
// and the last VL/VTYPE we observed is the same, we don't need a		// and the last VL/VTYPE we observed is the same, we don't need a
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	bool RISCVInsertVSETVLI::computeVLVTYPEChanges(const MachineBasicBlock &MBB) {
for (const MachineInstr &MI : MBB) {		for (const MachineInstr &MI : MBB) {
// If this is an explicit VSETVLI or VSETIVLI, update our state.		// If this is an explicit VSETVLI or VSETIVLI, update our state.
if (isVectorConfigInstr(MI)) {		if (isVectorConfigInstr(MI)) {
HadVectorOp = true;		HadVectorOp = true;
BBInfo.Change = getInfoForVSETVLI(MI);		BBInfo.Change = getInfoForVSETVLI(MI);
continue;		continue;
}		}

		if (MI.getOpcode() == RISCV::PseudoReadVL) {
		BBInfo.Change = getInfoForReadVL(MI);
		continue;
		}

uint64_t TSFlags = MI.getDesc().TSFlags;		uint64_t TSFlags = MI.getDesc().TSFlags;
if (RISCVII::hasSEWOp(TSFlags)) {		if (RISCVII::hasSEWOp(TSFlags)) {
HadVectorOp = true;		HadVectorOp = true;

VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);		VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);

if (!BBInfo.Change.isValid()) {		if (!BBInfo.Change.isValid()) {
BBInfo.Change = NewInfo;		BBInfo.Change = NewInfo;
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	for (unsigned PHIOp = 1, NumOps = PHI->getNumOperands(); PHIOp != NumOps;
MachineBasicBlock *PBB = PHI->getOperand(PHIOp + 1).getMBB();		MachineBasicBlock *PBB = PHI->getOperand(PHIOp + 1).getMBB();
const BlockData &PBBInfo = BlockInfo[PBB->getNumber()];		const BlockData &PBBInfo = BlockInfo[PBB->getNumber()];
// If the exit from the predecessor has the VTYPE we are looking for		// If the exit from the predecessor has the VTYPE we are looking for
// we might be able to avoid a VSETVLI.		// we might be able to avoid a VSETVLI.
if (PBBInfo.Exit.isUnknown() \|\|		if (PBBInfo.Exit.isUnknown() \|\|
!PBBInfo.Exit.hasCompatibleVTYPE(Require, /Strict/ false))		!PBBInfo.Exit.hasCompatibleVTYPE(Require, /Strict/ false))
return true;		return true;

// We need the PHI input to the be the output of a VSET(I)VLI.		// We need the PHI input to the be the output of a VSET(I)VLI or
		// PseudoReadVL.
MachineInstr *DefMI = MRI->getVRegDef(InReg);		MachineInstr *DefMI = MRI->getVRegDef(InReg);
if (!DefMI \|\| !isVectorConfigInstr(*DefMI))		if (!DefMI \|\| (!isVectorConfigInstr(*DefMI) &&
		DefMI->getOpcode() != RISCV::PseudoReadVL))
return true;		return true;

// We found a VSET(I)VLI make sure it matches the output of the		// We found a VSET(I)VLI or PseudoReadVL make sure it matches the output of
// predecessor block.		// the predecessor block.
VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);		VSETVLIInfo DefInfo = (DefMI->getOpcode() == RISCV::PseudoReadVL)
		? getInfoForReadVL(*DefMI)
		: getInfoForVSETVLI(*DefMI);
if (!DefInfo.hasSameAVL(PBBInfo.Exit) \|\|		if (!DefInfo.hasSameAVL(PBBInfo.Exit) \|\|
!DefInfo.hasSameVTYPE(PBBInfo.Exit))		!DefInfo.hasSameVTYPE(PBBInfo.Exit))
return true;		return true;
}		}

// If all the incoming values to the PHI checked out, we don't need		// If all the incoming values to the PHI checked out, we don't need
// to insert a VSETVLI.		// to insert a VSETVLI.
return false;		return false;
Show All 13 Lines	if (isVectorConfigInstr(MI)) {
"Unexpected operands where VL and VTYPE should be");		"Unexpected operands where VL and VTYPE should be");
MI.getOperand(3).setIsDead(false);		MI.getOperand(3).setIsDead(false);
MI.getOperand(4).setIsDead(false);		MI.getOperand(4).setIsDead(false);
CurInfo = getInfoForVSETVLI(MI);		CurInfo = getInfoForVSETVLI(MI);
PrevVSETVLIMI = &MI;		PrevVSETVLIMI = &MI;
continue;		continue;
}		}

		if (MI.getOpcode() == RISCV::PseudoReadVL) {
		CurInfo = getInfoForReadVL(MI);
		continue;
		}

uint64_t TSFlags = MI.getDesc().TSFlags;		uint64_t TSFlags = MI.getDesc().TSFlags;
if (RISCVII::hasSEWOp(TSFlags)) {		if (RISCVII::hasSEWOp(TSFlags)) {
VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);		VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
if (RISCVII::hasVLOp(TSFlags)) {		if (RISCVII::hasVLOp(TSFlags)) {
MachineOperand &VLOp = MI.getOperand(getVLOpNum(MI));		MachineOperand &VLOp = MI.getOperand(getVLOpNum(MI));
if (VLOp.isReg()) {		if (VLOp.isReg()) {
// Erase the AVL operand from the instruction.		// Erase the AVL operand from the instruction.
VLOp.setReg(RISCV::NoRegister);		VLOp.setReg(RISCV::NoRegister);
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	for (MachineBasicBlock &MBB : MF)
emitVSETVLIs(MBB);		emitVSETVLIs(MBB);

// Once we're fully done rewriting all the instructions, do a final pass		// Once we're fully done rewriting all the instructions, do a final pass
// through to check for VSETVLIs which write to an unused destination.		// through to check for VSETVLIs which write to an unused destination.
// For the non X0, X0 variant, we can replace the destination register		// For the non X0, X0 variant, we can replace the destination register
// with X0 to reduce register pressure. This is really a generic		// with X0 to reduce register pressure. This is really a generic
// optimization which can be applied to any dead def (TODO: generalize).		// optimization which can be applied to any dead def (TODO: generalize).
for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {		for (auto I = MBB.begin(), E = MBB.end(); I != E;) {
		MachineInstr &MI = *I++;
if (MI.getOpcode() == RISCV::PseudoVSETVLI \|\|		if (MI.getOpcode() == RISCV::PseudoVSETVLI \|\|
MI.getOpcode() == RISCV::PseudoVSETIVLI) {		MI.getOpcode() == RISCV::PseudoVSETIVLI) {
Register VRegDef = MI.getOperand(0).getReg();		Register VRegDef = MI.getOperand(0).getReg();
if (VRegDef != RISCV::X0 && MRI->use_nodbg_empty(VRegDef))		if (VRegDef != RISCV::X0 && MRI->use_nodbg_empty(VRegDef))
MI.getOperand(0).setReg(RISCV::X0);		MI.getOperand(0).setReg(RISCV::X0);
}		}

		// PseudoReadVL MIs may be dead after emitVSETVLIs.
		if (MI.getOpcode() == RISCV::PseudoReadVL &&
		MRI->use_nodbg_empty(MI.getOperand(0).getReg()))
		MI.eraseFromParent();
}		}
}		}

BlockInfo.clear();		BlockInfo.clear();
return HaveVectorOp;		return HaveVectorOp;
}		}

/// Returns an instance of the Insert VSETVLI pass.		/// Returns an instance of the Insert VSETVLI pass.
FunctionPass *llvm::createRISCVInsertVSETVLIPass() {		FunctionPass *llvm::createRISCVInsertVSETVLIPass() {
return new RISCVInsertVSETVLI();		return new RISCVInsertVSETVLI();
}		}

llvm/test/CodeGen/RISCV/rvv/vsetvli-modify-vl.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv64 -mattr=+v \
				; RUN: -target-abi=lp64d -verify-machineinstrs -< %s \| FileCheck %s

				declare i64 @llvm.riscv.vsetvli.i64(i64, i64 immarg, i64 immarg)
				declare { <vscale x 32 x i8>, i64 } @llvm.riscv.vleff.nxv32i8.i64(<vscale x 32 x i8>, <vscale x 32 x i8>* nocapture, i64)
				declare <vscale x 32 x i1> @llvm.riscv.vmseq.nxv32i8.i8.i64(<vscale x 32 x i8>, i8, i64)
				declare <vscale x 32 x i8> @llvm.riscv.vadd.nxv32i8.i8.i64(<vscale x 32 x i8>, <vscale x 32 x i8>, i8, i64)
				declare <vscale x 16 x i16> @llvm.riscv.vadd.nxv16i16.i16.i64(<vscale x 16 x i16>, <vscale x 16 x i16>, i16, i64)

				define <vscale x 32 x i1> @seq1(i1 zeroext %cond, i8* %str, i64 %n, i8 %x) {
				; CHECK-LABEL: seq1:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vsetvli zero, a2, e8, m4, ta, mu
				; CHECK-NEXT: vle8ff.v v8, (a1)
				; CHECK-NEXT: vadd.vx v8, v8, a3
				; CHECK-NEXT: vmseq.vi v0, v8, 0
				; CHECK-NEXT: ret
				entry:
				%0 = tail call i64 @llvm.riscv.vsetvli.i64(i64 %n, i64 0, i64 2)
				%1 = bitcast i8* %str to <vscale x 32 x i8>*
				%2 = tail call { <vscale x 32 x i8>, i64 } @llvm.riscv.vleff.nxv32i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8>* %1, i64 %0)
				%3 = extractvalue { <vscale x 32 x i8>, i64 } %2, 0
				%4 = extractvalue { <vscale x 32 x i8>, i64 } %2, 1
				%5 = tail call <vscale x 32 x i8> @llvm.riscv.vadd.nxv32i8.i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8> %3, i8 %x, i64 %4)
				%6 = tail call <vscale x 32 x i1> @llvm.riscv.vmseq.nxv32i8.i8.i64(<vscale x 32 x i8> %5, i8 0, i64 %4)
				ret <vscale x 32 x i1> %6
				}

				define <vscale x 32 x i8> @cross_bb(i1 zeroext %cond, i8 zeroext %x, <vscale x 32 x i8> %vv, i8* %str, i64 %n) {
				; CHECK-LABEL: cross_bb:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vsetvli zero, a3, e8, m4, ta, mu
				; CHECK-NEXT: beqz a0, .LBB1_2
				; CHECK-NEXT: # %bb.1: # %if.then
				; CHECK-NEXT: vle8ff.v v12, (a2)
				; CHECK-NEXT: j .LBB1_3
				; CHECK-NEXT: .LBB1_2: # %if.else
				; CHECK-NEXT: vsetvli a0, a3, e8, m4, ta, mu
				; CHECK-NEXT: .LBB1_3: # %if.end
				; CHECK-NEXT: vadd.vx v8, v8, a1
				; CHECK-NEXT: vadd.vx v8, v8, a1
				; CHECK-NEXT: ret
				entry:
				%0 = tail call i64 @llvm.riscv.vsetvli.i64(i64 %n, i64 0, i64 2)
				br i1 %cond, label %if.then, label %if.else

				if.then: ; preds = %entry
				%1 = bitcast i8* %str to <vscale x 32 x i8>*
				%2 = tail call { <vscale x 32 x i8>, i64 } @llvm.riscv.vleff.nxv32i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8>* %1, i64 %0)
				%3 = extractvalue { <vscale x 32 x i8>, i64 } %2, 1
				br label %if.end

				if.else: ; preds = %entry
				%4 = tail call i64 @llvm.riscv.vsetvli.i64(i64 %n, i64 0, i64 2)
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%new_vl.0 = phi i64 [ %3, %if.then ], [ %4, %if.else ]
				%5 = tail call <vscale x 32 x i8> @llvm.riscv.vadd.nxv32i8.i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8> %vv, i8 %x, i64 %new_vl.0)
				%6 = tail call <vscale x 32 x i8> @llvm.riscv.vadd.nxv32i8.i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8> %5, i8 %x, i64 %new_vl.0)
				ret <vscale x 32 x i8> %6
				}

				; Test not eleminating useful vsetvli.
				define <vscale x 16 x i16> @no_work(i1 zeroext %cond, i8* %str, i64 %n, <vscale x 16 x i16> %v, i16 %x) {
				; CHECK-LABEL: no_work:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vsetvli zero, a2, e8, m4, ta, mu
				; CHECK-NEXT: vle8ff.v v12, (a1)
				; CHECK-NEXT: csrr a0, vl
				; CHECK-NEXT: vsetvli zero, a0, e16, m4, ta, mu
				; CHECK-NEXT: vadd.vx v8, v8, a3
				; CHECK-NEXT: ret
				entry:
				%0 = tail call i64 @llvm.riscv.vsetvli.i64(i64 %n, i64 0, i64 2)
				%1 = bitcast i8* %str to <vscale x 32 x i8>*
				%2 = tail call { <vscale x 32 x i8>, i64 } @llvm.riscv.vleff.nxv32i8.i64(<vscale x 32 x i8> undef, <vscale x 32 x i8>* %1, i64 %0)
				%3 = extractvalue { <vscale x 32 x i8>, i64 } %2, 1
				%4 = tail call <vscale x 16 x i16> @llvm.riscv.vadd.nxv16i16.i16.i64(<vscale x 16 x i16> undef, <vscale x 16 x i16> %v, i16 %x, i64 %3)
				ret <vscale x 16 x i16> %4
				}