This is an archive of the discontinued LLVM Phabricator instance.

[PoC][RISCV] Use scalar register for fixed-length vectors
AbandonedPublic

Authored by wangpc on Jul 18 2023, 12:14 AM.

Download Raw Diff

Details

Reviewers

asb
reames
craig.topper
kito-cheng
zixuan-wu

Summary

So that we can vectorize some loops with small element size.

For small vectors like v4i8, v8i8, v4i16, etc., they can be fit in
a whole scalar register.

We can vectorize load/store now, but there is no vector operation
on scalar registers (RVP extension is limited too).

I don't know if this is the right way to go and no other target
has done something like this. The changes seem to be intrusive, and
we have a lot of works to do if we want to go further.

For the example, it should be optimized to memcpy call in fact.

Related discussion:

https://discourse.llvm.org/t/freertos-queue-much-slower-on-riscv-when-compiled-with-llvm-rather-than-gcc

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	3,870 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Linux::auto_memory_profile_test.cpp
	4,210 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases/Linux::auto_memory_profile_test.cpp
	180 ms	x64 debian > LLVM.Analysis/CostModel/RISCV::logicalop.ll
	1,000 ms	x64 debian > LLVM.CodeGen/RISCV::nontemporal.ll
	890 ms	x64 debian > LLVM.CodeGen/RISCV::vector-abi.ll
		View Full Test Results (10 Failed)

Event Timeline

wangpc created this revision.Jul 18 2023, 12:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 12:14 AM

Herald added subscribers: jobnoorman, luke, VincentWu and 26 others. · View Herald Transcript

wangpc requested review of this revision.Jul 18 2023, 12:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 12:14 AM

Herald added subscribers: llvm-commits, eopXD, MaskRay. · View Herald Transcript

wangpc edited the summary of this revision. (Show Details)Jul 18 2023, 12:16 AM

wangpc edited the summary of this revision. (Show Details)Jul 18 2023, 12:24 AM

Harbormaster completed remote builds in B246091: Diff 541345.Jul 18 2023, 3:13 AM

Does this optimization only happen in load and store sequence to do memcpy or memset, or something like?

If so, enabling the vectorization in middle end and vector type legal in backend may too aggressive and complicated, because no operations are legal in such types. Introducing vectorization and backend support for such condition can work, but it influences too much such as evaluation of cost model and other opt as such types are recognized as legal type. Instead, recognizing this condition as memcpy would make sense more.

In D155551#4517444, @zixuan-wu wrote:

Does this optimization only happen in load and store sequence to do memcpy or memset, or something like?

If so, enabling the vectorization in middle end and vector type legal in backend may too aggressive and complicated, because no operations are legal in such types. Introducing vectorization and backend support for such condition can work, but it influences too much such as evaluation of cost model and other opt as such types are recognized as legal type. Instead, recognizing this condition as memcpy would make sense more.

Yes, it's limited. And we know that RVP is also not suitable for vectorization. I just post this PoC here to gather some feedbacks. :-)
But I think this kind of vectorization is feasible for XCVsimd extension (D153721).

wangpc abandoned this revision.Jul 20 2023, 12:59 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

3 lines

RISCVInstrInfo.td

6 lines

RISCVRegisterInfo.td

2 lines

RISCVSubtarget.h

2 lines

RISCVTargetTransformInfo.h

2 lines

RISCVTargetTransformInfo.cpp

9 lines

test/

CodeGen/

RISCV/

vectorization-scalar-register/

memcpy.ll

165 lines

Diff 541345

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
case RISCVABI::ABI_LP64D:		case RISCVABI::ABI_LP64D:
break;		break;
}		}

MVT XLenVT = Subtarget.getXLenVT();		MVT XLenVT = Subtarget.getXLenVT();

// Set up the register classes.		// Set up the register classes.
addRegisterClass(XLenVT, &RISCV::GPRRegClass);		addRegisterClass(XLenVT, &RISCV::GPRRegClass);
		if (Subtarget.is64Bit())
		addRegisterClass(MVT::v8i8, &RISCV::GPRRegClass);
		addRegisterClass(MVT::v4i8, &RISCV::GPRRegClass);

if (Subtarget.hasStdExtZfhOrZfhmin())		if (Subtarget.hasStdExtZfhOrZfhmin())
addRegisterClass(MVT::f16, &RISCV::FPR16RegClass);		addRegisterClass(MVT::f16, &RISCV::FPR16RegClass);
if (Subtarget.hasStdExtF())		if (Subtarget.hasStdExtF())
addRegisterClass(MVT::f32, &RISCV::FPR32RegClass);		addRegisterClass(MVT::f32, &RISCV::FPR32RegClass);
if (Subtarget.hasStdExtD())		if (Subtarget.hasStdExtD())
addRegisterClass(MVT::f64, &RISCV::FPR64RegClass);		addRegisterClass(MVT::f64, &RISCV::FPR64RegClass);
if (Subtarget.hasStdExtZhinxOrZhinxmin())		if (Subtarget.hasStdExtZhinxOrZhinxmin())
▲ Show 20 Lines • Show All 16,958 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.td

Show First 20 Lines • Show All 1,676 Lines • ▼ Show 20 Lines	class LdPat<PatFrag LoadOp, RVInst Inst, ValueType vt = XLenVT>
: Pat<(vt (LoadOp (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),		: Pat<(vt (LoadOp (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
(Inst GPR:$rs1, simm12:$imm12)>;		(Inst GPR:$rs1, simm12:$imm12)>;

def : LdPat<sextloadi8, LB>;		def : LdPat<sextloadi8, LB>;
def : LdPat<extloadi8, LBU>; // Prefer unsigned due to no c.lb in Zcb.		def : LdPat<extloadi8, LBU>; // Prefer unsigned due to no c.lb in Zcb.
def : LdPat<sextloadi16, LH>;		def : LdPat<sextloadi16, LH>;
def : LdPat<extloadi16, LH>;		def : LdPat<extloadi16, LH>;
def : LdPat<load, LW, i32>, Requires<[IsRV32]>;		def : LdPat<load, LW, i32>, Requires<[IsRV32]>;
		def : LdPat<load, LW, v4i8>, Requires<[IsRV32]>;
def : LdPat<zextloadi8, LBU>;		def : LdPat<zextloadi8, LBU>;
def : LdPat<zextloadi16, LHU>;		def : LdPat<zextloadi16, LHU>;

/// Stores		/// Stores

class StPat<PatFrag StoreOp, RVInst Inst, RegisterClass StTy,		class StPat<PatFrag StoreOp, RVInst Inst, RegisterClass StTy,
ValueType vt>		ValueType vt>
: Pat<(StoreOp (vt StTy:$rs2), (AddrRegImm (XLenVT GPR:$rs1),		: Pat<(StoreOp (vt StTy:$rs2), (AddrRegImm (XLenVT GPR:$rs1),
simm12:$imm12)),		simm12:$imm12)),
(Inst StTy:$rs2, GPR:$rs1, simm12:$imm12)>;		(Inst StTy:$rs2, GPR:$rs1, simm12:$imm12)>;

def : StPat<truncstorei8, SB, GPR, XLenVT>;		def : StPat<truncstorei8, SB, GPR, XLenVT>;
def : StPat<truncstorei16, SH, GPR, XLenVT>;		def : StPat<truncstorei16, SH, GPR, XLenVT>;
def : StPat<store, SW, GPR, i32>, Requires<[IsRV32]>;		def : StPat<store, SW, GPR, i32>, Requires<[IsRV32]>;
		def : StPat<store, SW, GPR, v4i8>, Requires<[IsRV32]>;

/// Fences		/// Fences

// Refer to Table A.6 in the version 2.3 draft of the RISC-V Instruction Set		// Refer to Table A.6 in the version 2.3 draft of the RISC-V Instruction Set
// Manual: Volume I.		// Manual: Volume I.

// fence acquire -> fence r, rw		// fence acquire -> fence r, rw
def : Pat<(atomic_fence (XLenVT 4), (timm)), (FENCE 0b10, 0b11)>;		def : Pat<(atomic_fence (XLenVT 4), (timm)), (FENCE 0b10, 0b11)>;
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
def : Pat<(binop_allwusers<xor> GPR:$rs1, u32simm12:$imm),		def : Pat<(binop_allwusers<xor> GPR:$rs1, u32simm12:$imm),
(XORI GPR:$rs1, u32simm12:$imm)>;		(XORI GPR:$rs1, u32simm12:$imm)>;
/// Loads		/// Loads

def : LdPat<sextloadi32, LW, i64>;		def : LdPat<sextloadi32, LW, i64>;
def : LdPat<extloadi32, LW, i64>;		def : LdPat<extloadi32, LW, i64>;
def : LdPat<zextloadi32, LWU, i64>;		def : LdPat<zextloadi32, LWU, i64>;
def : LdPat<load, LD, i64>;		def : LdPat<load, LD, i64>;
		def : LdPat<load, LD, v8i8>;
		def : LdPat<load, LW, v4i8>;

/// Stores		/// Stores

def : StPat<truncstorei32, SW, GPR, i64>;		def : StPat<truncstorei32, SW, GPR, i64>;
def : StPat<store, SD, GPR, i64>;		def : StPat<store, SD, GPR, i64>;
		def : StPat<store, SD, GPR, v8i8>;
		def : StPat<store, SW, GPR, v4i8>;
} // Predicates = [IsRV64]		} // Predicates = [IsRV64]

/// readcyclecounter		/// readcyclecounter
// On RV64, we can directly read the 64-bit "cycle" CSR.		// On RV64, we can directly read the 64-bit "cycle" CSR.
let Predicates = [IsRV64] in		let Predicates = [IsRV64] in
def : Pat<(i64 (readcyclecounter)), (CSRRS CYCLE.Encoding, (XLenVT X0))>;		def : Pat<(i64 (readcyclecounter)), (CSRRS CYCLE.Encoding, (XLenVT X0))>;
// On RV32, ReadCycleWide will be expanded to the suggested loop reading both		// On RV32, ReadCycleWide will be expanded to the suggested loop reading both
// halves of the 64-bit "cycle" CSR.		// halves of the 64-bit "cycle" CSR.
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	// Allow f64 in GPR for ZDINX on RV64.			// Allow f64 in GPR for ZDINX on RV64.
	def XLenFVT : ValueTypeByHwMode<[RV64],			def XLenFVT : ValueTypeByHwMode<[RV64],
	[f64]>;			[f64]>;
	def XLenRI : RegInfoByHwMode<			def XLenRI : RegInfoByHwMode<
	[RV32, RV64],			[RV32, RV64],
	[RegInfo<32,32,32>, RegInfo<64,64,64>]>;			[RegInfo<32,32,32>, RegInfo<64,64,64>]>;

	class GPRRegisterClass<dag regList>			class GPRRegisterClass<dag regList>
	: RegisterClass<"RISCV", [XLenVT, XLenFVT, i32], 32, regList> {			: RegisterClass<"RISCV", [XLenVT, XLenFVT, i32, v4i8, v8i8], 32, regList> {
	let RegInfos = XLenRI;			let RegInfos = XLenRI;
	}			}

	// The order of registers represents the preferred allocation sequence.			// The order of registers represents the preferred allocation sequence.
	// Registers are listed in the order caller-save, callee-save, specials.			// Registers are listed in the order caller-save, callee-save, specials.
	def GPR : GPRRegisterClass<(add (sequence "X%u", 10, 17),			def GPR : GPRRegisterClass<(add (sequence "X%u", 10, 17),
	(sequence "X%u", 5, 7),			(sequence "X%u", 5, 7),
	(sequence "X%u", 28, 31),			(sequence "X%u", 28, 31),
	▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVSubtarget.h

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	#include "RISCVGenSubtargetInfo.inc"
// FIXME: Consider Zfinx in the future		// FIXME: Consider Zfinx in the future
bool hasVInstructionsF32() const { return HasStdExtZve32f && HasStdExtF; }		bool hasVInstructionsF32() const { return HasStdExtZve32f && HasStdExtF; }
// FIXME: Consider Zdinx in the future		// FIXME: Consider Zdinx in the future
bool hasVInstructionsF64() const { return HasStdExtZve64d && HasStdExtD; }		bool hasVInstructionsF64() const { return HasStdExtZve64d && HasStdExtD; }
// F16 and F64 both require F32.		// F16 and F64 both require F32.
bool hasVInstructionsAnyF() const { return hasVInstructionsF32(); }		bool hasVInstructionsAnyF() const { return hasVInstructionsF32(); }
bool hasVInstructionsFullMultiply() const { return HasStdExtV; }		bool hasVInstructionsFullMultiply() const { return HasStdExtV; }
unsigned getMaxInterleaveFactor() const {		unsigned getMaxInterleaveFactor() const {
return hasVInstructions() ? MaxInterleaveFactor : 1;		return hasVInstructions() ? MaxInterleaveFactor : MaxInterleaveFactor;
}		}

// Returns VLEN divided by DLEN. Where DLEN is the datapath width of the		// Returns VLEN divided by DLEN. Where DLEN is the datapath width of the
// vector hardware implementation which may be less than VLEN.		// vector hardware implementation which may be less than VLEN.
unsigned getDLenFactor() const {		unsigned getDLenFactor() const {
if (DLenFactor2)		if (DLenFactor2)
return 2;		return 2;
return 1;		return 1;
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	case RISCVRegisterClass::VRRC:
// vector registers?		// vector registers?
return ST->hasVInstructions() ? 32 : 0;		return ST->hasVInstructions() ? 32 : 0;
}		}
llvm_unreachable("unknown register class");		llvm_unreachable("unknown register class");
}		}

unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {		unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr) const {
if (Vector)		if (Vector)
return RISCVRegisterClass::VRRC;		return ST->hasVInstructions() ? RISCVRegisterClass::VRRC : GPRRC;
if (!Ty)		if (!Ty)
return RISCVRegisterClass::GPRRC;		return RISCVRegisterClass::GPRRC;

Type *ScalarTy = Ty->getScalarType();		Type *ScalarTy = Ty->getScalarType();
if ((ScalarTy->isHalfTy() && ST->hasStdExtZfhOrZfhmin()) \|\|		if ((ScalarTy->isHalfTy() && ST->hasStdExtZfhOrZfhmin()) \|\|
(ScalarTy->isFloatTy() && ST->hasStdExtF()) \|\|		(ScalarTy->isFloatTy() && ST->hasStdExtF()) \|\|
(ScalarTy->isDoubleTy() && ST->hasStdExtD())) {		(ScalarTy->isDoubleTy() && ST->hasStdExtD())) {
return RISCVRegisterClass::FPRRC;		return RISCVRegisterClass::FPRRC;
Show All 24 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
TypeSize		TypeSize
RISCVTTIImpl::getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {		RISCVTTIImpl::getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {
unsigned LMUL =		unsigned LMUL =
llvm::bit_floor(std::clamp<unsigned>(RVVRegisterWidthLMUL, 1, 8));		llvm::bit_floor(std::clamp<unsigned>(RVVRegisterWidthLMUL, 1, 8));
switch (K) {		switch (K) {
case TargetTransformInfo::RGK_Scalar:		case TargetTransformInfo::RGK_Scalar:
return TypeSize::getFixed(ST->getXLen());		return TypeSize::getFixed(ST->getXLen());
case TargetTransformInfo::RGK_FixedWidthVector:		case TargetTransformInfo::RGK_FixedWidthVector:
return TypeSize::getFixed(		return TypeSize::getFixed(ST->useRVVForFixedLengthVectors()
ST->useRVVForFixedLengthVectors() ? LMUL * ST->getRealMinVLen() : 0);		? LMUL * ST->getRealMinVLen()
		: ST->getXLen());
case TargetTransformInfo::RGK_ScalableVector:		case TargetTransformInfo::RGK_ScalableVector:
return TypeSize::getScalable(		return TypeSize::getScalable(
(ST->hasVInstructions() &&		(ST->hasVInstructions() &&
ST->getRealMinVLen() >= RISCV::RVVBitsPerBlock)		ST->getRealMinVLen() >= RISCV::RVVBitsPerBlock)
? LMUL * RISCV::RVVBitsPerBlock		? LMUL * RISCV::RVVBitsPerBlock
: 0);		: 0);
}		}

▲ Show 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, OpInfo, I);		CostKind, OpInfo, I);

InstructionCost Cost = 0;		InstructionCost Cost = 0;
if (Opcode == Instruction::Store && OpInfo.isConstant())		if (Opcode == Instruction::Store && OpInfo.isConstant())
Cost += getStoreImmCost(Src, OpInfo, CostKind);		Cost += getStoreImmCost(Src, OpInfo, CostKind);
InstructionCost BaseCost =		InstructionCost BaseCost =
BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, OpInfo, I);		CostKind, OpInfo, I);

		if (!ST->hasVInstructions())
		return BaseCost;

// Assume memory ops cost scale with the number of vector registers		// Assume memory ops cost scale with the number of vector registers
// possible accessed by the instruction. Note that BasicTTI already		// possible accessed by the instruction. Note that BasicTTI already
// handles the LT.first term for us.		// handles the LT.first term for us.
if (std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);		if (std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);
LT.second.isVector())		LT.second.isVector())
BaseCost *= getLMULCost(LT.second);		BaseCost *= getLMULCost(LT.second);
return Cost + BaseCost;		return Cost + BaseCost;

▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/vectorization-scalar-register/memcpy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O2 -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefixes=RV32-NONVEC %s
				; RUN: llc -O2 -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefixes=RV64-NONVEC %s
				; RUN: opt -O2 -mtriple=riscv32 %s \| llc -O2 -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefixes=RV32-VEC %s
				; RUN: opt -O2 -mtriple=riscv64 %s \| llc -O2 -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefixes=RV64-VEC %s

				; C code:
				; void foo(char restrict a, char restrict b, int n) {
				; for (int i = 0; i < n; i++)
				; a[i] = b[i];
				; }

				define void @foo(ptr noalias nocapture noundef writeonly %a, ptr noalias nocapture noundef readonly %b, i32 noundef signext %n) #0 {
				; RV32-NONVEC-LABEL: foo:
				; RV32-NONVEC: # %bb.0: # %entry
				; RV32-NONVEC-NEXT: blez a2, .LBB0_3
				; RV32-NONVEC-NEXT: # %bb.1: # %for.body.preheader
				; RV32-NONVEC-NEXT: li a3, 0
				; RV32-NONVEC-NEXT: li a4, 0
				; RV32-NONVEC-NEXT: .LBB0_2: # %for.body
				; RV32-NONVEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32-NONVEC-NEXT: add a5, a1, a3
				; RV32-NONVEC-NEXT: lbu a5, 0(a5)
				; RV32-NONVEC-NEXT: add a6, a0, a3
				; RV32-NONVEC-NEXT: addi a3, a3, 1
				; RV32-NONVEC-NEXT: seqz a7, a3
				; RV32-NONVEC-NEXT: add a4, a4, a7
				; RV32-NONVEC-NEXT: xor a7, a3, a2
				; RV32-NONVEC-NEXT: or a7, a7, a4
				; RV32-NONVEC-NEXT: sb a5, 0(a6)
				; RV32-NONVEC-NEXT: bnez a7, .LBB0_2
				; RV32-NONVEC-NEXT: .LBB0_3: # %for.cond.cleanup
				; RV32-NONVEC-NEXT: ret
				;
				; RV64-NONVEC-LABEL: foo:
				; RV64-NONVEC: # %bb.0: # %entry
				; RV64-NONVEC-NEXT: blez a2, .LBB0_2
				; RV64-NONVEC-NEXT: .LBB0_1: # %for.body
				; RV64-NONVEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64-NONVEC-NEXT: lbu a3, 0(a1)
				; RV64-NONVEC-NEXT: sb a3, 0(a0)
				; RV64-NONVEC-NEXT: addi a2, a2, -1
				; RV64-NONVEC-NEXT: addi a0, a0, 1
				; RV64-NONVEC-NEXT: addi a1, a1, 1
				; RV64-NONVEC-NEXT: bnez a2, .LBB0_1
				; RV64-NONVEC-NEXT: .LBB0_2: # %for.cond.cleanup
				; RV64-NONVEC-NEXT: ret
				;
				; RV32-VEC-LABEL: foo:
				; RV32-VEC: # %bb.0: # %entry
				; RV32-VEC-NEXT: blez a2, .LBB0_7
				; RV32-VEC-NEXT: # %bb.1: # %for.body.preheader
				; RV32-VEC-NEXT: li a3, 8
				; RV32-VEC-NEXT: bgeu a2, a3, .LBB0_3
				; RV32-VEC-NEXT: # %bb.2:
				; RV32-VEC-NEXT: li a3, 0
				; RV32-VEC-NEXT: li a4, 0
				; RV32-VEC-NEXT: j .LBB0_6
				; RV32-VEC-NEXT: .LBB0_3: # %vector.ph
				; RV32-VEC-NEXT: li a4, 0
				; RV32-VEC-NEXT: andi a3, a2, -8
				; RV32-VEC-NEXT: li a6, 0
				; RV32-VEC-NEXT: li a5, 0
				; RV32-VEC-NEXT: .LBB0_4: # %vector.body
				; RV32-VEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32-VEC-NEXT: add a7, a1, a6
				; RV32-VEC-NEXT: lw t0, 0(a7)
				; RV32-VEC-NEXT: lw a7, 4(a7)
				; RV32-VEC-NEXT: add t1, a0, a6
				; RV32-VEC-NEXT: sw t0, 0(t1)
				; RV32-VEC-NEXT: addi t0, a6, 8
				; RV32-VEC-NEXT: sltu a6, t0, a6
				; RV32-VEC-NEXT: add a5, a5, a6
				; RV32-VEC-NEXT: xor a6, t0, a3
				; RV32-VEC-NEXT: or t2, a6, a5
				; RV32-VEC-NEXT: sw a7, 4(t1)
				; RV32-VEC-NEXT: mv a6, t0
				; RV32-VEC-NEXT: bnez t2, .LBB0_4
				; RV32-VEC-NEXT: # %bb.5: # %middle.block
				; RV32-VEC-NEXT: beq a3, a2, .LBB0_7
				; RV32-VEC-NEXT: .LBB0_6: # %for.body
				; RV32-VEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32-VEC-NEXT: add a5, a1, a3
				; RV32-VEC-NEXT: lbu a5, 0(a5)
				; RV32-VEC-NEXT: add a6, a0, a3
				; RV32-VEC-NEXT: addi a3, a3, 1
				; RV32-VEC-NEXT: seqz a7, a3
				; RV32-VEC-NEXT: add a4, a4, a7
				; RV32-VEC-NEXT: xor a7, a3, a2
				; RV32-VEC-NEXT: or a7, a7, a4
				; RV32-VEC-NEXT: sb a5, 0(a6)
				; RV32-VEC-NEXT: bnez a7, .LBB0_6
				; RV32-VEC-NEXT: .LBB0_7: # %for.cond.cleanup
				; RV32-VEC-NEXT: ret
				;
				; RV64-VEC-LABEL: foo:
				; RV64-VEC: # %bb.0: # %entry
				; RV64-VEC-NEXT: blez a2, .LBB0_8
				; RV64-VEC-NEXT: # %bb.1: # %for.body.preheader
				; RV64-VEC-NEXT: li a3, 16
				; RV64-VEC-NEXT: bgeu a2, a3, .LBB0_3
				; RV64-VEC-NEXT: # %bb.2:
				; RV64-VEC-NEXT: li a3, 0
				; RV64-VEC-NEXT: j .LBB0_6
				; RV64-VEC-NEXT: .LBB0_3: # %vector.ph
				; RV64-VEC-NEXT: andi a3, a2, -16
				; RV64-VEC-NEXT: addi a4, a1, 8
				; RV64-VEC-NEXT: addi a5, a0, 8
				; RV64-VEC-NEXT: mv a6, a3
				; RV64-VEC-NEXT: .LBB0_4: # %vector.body
				; RV64-VEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64-VEC-NEXT: ld a7, -8(a4)
				; RV64-VEC-NEXT: ld t0, 0(a4)
				; RV64-VEC-NEXT: sd a7, -8(a5)
				; RV64-VEC-NEXT: sd t0, 0(a5)
				; RV64-VEC-NEXT: addi a4, a4, 16
				; RV64-VEC-NEXT: addi a6, a6, -16
				; RV64-VEC-NEXT: addi a5, a5, 16
				; RV64-VEC-NEXT: bnez a6, .LBB0_4
				; RV64-VEC-NEXT: # %bb.5: # %middle.block
				; RV64-VEC-NEXT: beq a3, a2, .LBB0_8
				; RV64-VEC-NEXT: .LBB0_6: # %for.body.preheader2
				; RV64-VEC-NEXT: sub a2, a2, a3
				; RV64-VEC-NEXT: add a0, a0, a3
				; RV64-VEC-NEXT: add a1, a1, a3
				; RV64-VEC-NEXT: .LBB0_7: # %for.body
				; RV64-VEC-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64-VEC-NEXT: lbu a3, 0(a1)
				; RV64-VEC-NEXT: sb a3, 0(a0)
				; RV64-VEC-NEXT: addi a2, a2, -1
				; RV64-VEC-NEXT: addi a0, a0, 1
				; RV64-VEC-NEXT: addi a1, a1, 1
				; RV64-VEC-NEXT: bnez a2, .LBB0_7
				; RV64-VEC-NEXT: .LBB0_8: # %for.cond.cleanup
				; RV64-VEC-NEXT: ret
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %n to i64
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i8, ptr %b, i64 %indvars.iv
				%0 = load i8, ptr %arrayidx, align 1
				%arrayidx2 = getelementptr inbounds i8, ptr %a, i64 %indvars.iv
				store i8 %0, ptr %arrayidx2, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
				}

				attributes #0 = { "no-builtins" }

This is an archive of the discontinued LLVM Phabricator instance.

[PoC][RISCV] Use scalar register for fixed-length vectorsAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541345

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfo.td

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

llvm/lib/Target/RISCV/RISCVSubtarget.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/test/CodeGen/RISCV/vectorization-scalar-register/memcpy.ll

[PoC][RISCV] Use scalar register for fixed-length vectors
AbandonedPublic