Download Raw Diff

Details

Reviewers

craig.topper
evandro
rogfer01
frasercrmck
khchen
sdesmalen

Commits

rGa5b07a221a57: [RISCV] Initial support of LoopVectorizer for RISC-V Vector.

Summary

Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.

It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

HsiangKai created this revision.Jan 28 2021, 7:08 PM

Herald added subscribers: NickHung, luismarques, apazos and 23 others. · View Herald TranscriptJan 28 2021, 7:08 PM

HsiangKai requested review of this revision.Jan 28 2021, 7:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2021, 7:08 PM

Herald added a subscriber: MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B87110: Diff 320027.Jan 28 2021, 7:09 PM

Rebase.

Harbormaster completed remote builds in B87114: Diff 320034.Jan 28 2021, 8:44 PM

craig.topper added inline comments.Jan 28 2021, 11:17 PM

llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
330 ↗	(On Diff #320034)	I'm not sure this should be in this file. This file belongs to the MC layer, but this isn't an MC layer property or a property of the V extension. It's a property of our CodeGen implementation. I'm not sure where a better place to put it is.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
5075	You can probably just the element VTs and then check that Align == ElemVT.getStoreSize() rather than spelling out all of the alignments. What is considered misaligned for scalable vectors? Should we be checking the alignement is >= the element size?
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
55	Is this used with scalable vectors? AArch64 seems to base their return here for SVE on a command line, but I expected them to require specific scalable vector types in IR for the backend to work.
llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll
2	Is this directory new? If so it needs a lit.local.cfg to mark that all tests in it require the RISCV target to be compiled

Hi Kai, are we OK with having a test that goes from IR to assembly in the Transforms component?

I'd expect here a vectorized IR test. Then we can add tests those inputs Codegen so they generate sensible RVV instructions.

In D95659#2530358, @rogfer01 wrote:

Hi Kai, are we OK with having a test that goes from IR to assembly in the Transforms component?

I'd expect here a vectorized IR test. Then we can add tests those inputs Codegen so they generate sensible RVV instructions.

I'd prefer this too.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
5075	Yeah, I agree. For RVV we just need to check that the vectors are at least aligned to the element size, don't we? I reckon `Align >= EltVT.getStoreSize()` is sufficient.

vkmr added a subscriber: vkmr.Jan 29 2021, 4:28 AM

vkmr added inline comments.Jan 29 2021, 5:15 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
20	Minor nit: Reword the description for more clarity - may be something like "Maximum vector register size in bits"?
100	Nit: Use call to `getRegisterBitWidth()` here instead of `RISCVVType::RVVBitsPerBlock`. (Or implement and use `getMinVectorRegisterBitWidth()`)
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	If I understand correctly, the assumption behind this code is that a single vector register is of size `vscale x RVVBitsPerBlock` and ignore the idea (for now?) of having register groups, i.e LMUL>1. Unless we are ignoring register grouping for now, from Loop Vectorizer's perspective it would make sense to view the register group size as the real register size, specially for computing a feasible VF based on register usage. Since the documentation of `getRegisterBitWidth()` defines it to be "The width of the largest scalar or vector register type", it might be more accurate to use `getMinVectorRegisterBitWidth()` to return `RISCVVType::RVVBitsPerBlock` and `getRegisterBitWidth()` to return `getMinVectorRegisterBitWidth() * MAX_LMUL`. (I am not considering fractional LMUL here.)

craig.topper added inline comments.Jan 29 2021, 10:14 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
100	getRegisterWidth() is likely going to be updated to be similar to AArch64 and be controlled by a command line option for minimum width. So it won't be the right thing.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	Returning a non-zero value seems to at least partially enable the vectorizer to generate fixed vectors which isn't supported by the backend yet. It looks like something else stopped it in my testing, but it at least queried the cost model. Not sure what stopped it. I do plan to support fixed vectors in the RVV backend, but it will probably be a couple weeks away. The register width here will probably need to be a command line controlled value like AArch64. And it should be at least 128 bits per the 0.10 spec. So I don't think its connected to RVVBitsPerBlock.

craig.topper added a reviewer: sdesmalen.Jan 29 2021, 10:14 AM

vkmr added inline comments.Jan 29 2021, 10:45 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	Perhaps I misunderstood something, my concern here is more about how to encapsulate the idea of register grouping for scalable vectors in the TTI methods to query register widths. Having a command line option to control register width would still only reflect the width of a single register, right? Perhaps, we can add another command line option to specify a max group multiplier (essentially the Maximum LMUL value). IIRC, the TTI method `getMinVectorRegisterBitWidth()` in addition to `getRegisterBitWidth()` was introduced to handle similar concerns with NEON. With scalable vectors, things are a little more complicated.

craig.topper added inline comments.Jan 29 2021, 10:58 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	I don't think I understand how this interface works for scalable vector vectorization. AArch64 has it connected to a command line which means it can be larger than 128 bits. But I thought the backend needed specific types like <vscale x 4 x i32>. Does this interface effect the fixed portion of the scalable type for scalable vector vectorization?

HsiangKai updated this revision to Diff 320261.Jan 29 2021, 7:29 PM

HsiangKai added inline comments.

llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
330 ↗	(On Diff #320034)	I will move it to RISCVISelLowering.h.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	I think I didn't dig into how the callback is used. I remove it in this patch. We could add it back after we have clear idea how to do it.

Harbormaster completed remote builds in B87239: Diff 320261.Jan 29 2021, 8:23 PM

HsiangKai added inline comments.Jan 30 2021, 2:47 PM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is reasonable to remove it in this patch.

vkmr added inline comments.Feb 1 2021, 5:08 AM

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
52–55	Does this interface effect the fixed portion of the scalable type for scalable vector vectorization? Yes, the auto vectorizer uses this interface to compute the VF (for scalable vectors this is the fixed part of the VF) with the most optimal cost. Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is reasonable to remove it in this patch. Agreed.

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 1 2021, 5:08 AM

LGTM

This revision is now accepted and ready to land.Feb 4 2021, 5:04 PM

frasercrmck added inline comments.Feb 5 2021, 1:52 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
5063	/// .... If true, it also returns /// whether the unaligned memory access is "fast" in the last argument by /// reference. I think this suggests that setting `Fast` is part of the contract. It could theoretically read an uninitialized variable if we don't set it but return true.

Address @frasercrmck's comments.

HsiangKai marked an inline comment as done.Feb 5 2021, 6:39 PM

Harbormaster completed remote builds in B88162: Diff 321911.Feb 5 2021, 7:18 PM

LGTM

This revision was landed with ongoing or failed builds.Feb 8 2021, 2:51 PM

Closed by commit rGa5b07a221a57: [RISCV] Initial support of LoopVectorizer for RISC-V Vector. (authored by HsiangKai). · Explain Why

This revision was automatically updated to reflect the committed changes.

HsiangKai added a commit: rGa5b07a221a57: [RISCV] Initial support of LoopVectorizer for RISC-V Vector..

Diff 322222

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	public:
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CI) const override;		shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CI) const override;
Value *emitMaskedAtomicCmpXchgIntrinsic(IRBuilder<> &Builder,		Value *emitMaskedAtomicCmpXchgIntrinsic(IRBuilder<> &Builder,
AtomicCmpXchgInst *CI,		AtomicCmpXchgInst *CI,
Value AlignedAddr, Value CmpVal,		Value AlignedAddr, Value CmpVal,
Value NewVal, Value Mask,		Value NewVal, Value Mask,
AtomicOrdering Ord) const override;		AtomicOrdering Ord) const override;

		/// Returns true if the target allows unaligned memory accesses of the
		/// specified type.
		bool allowsMisalignedMemoryAccesses(
		EVT VT, unsigned AddrSpace = 0, Align Alignment = Align(1),
		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
		bool *Fast = nullptr) const override;

private:		private:
void analyzeInputArgs(MachineFunction &MF, CCState &CCInfo,		void analyzeInputArgs(MachineFunction &MF, CCState &CCInfo,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
bool IsRet) const;		bool IsRet) const;
void analyzeOutputArgs(MachineFunction &MF, CCState &CCInfo,		void analyzeOutputArgs(MachineFunction &MF, CCState &CCInfo,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
bool IsRet, CallLoweringInfo *CLI) const;		bool IsRet, CallLoweringInfo *CLI) const;

Show All 38 Lines	private:
/// reserved.		/// reserved.
void validateCCReservedRegs(		void validateCCReservedRegs(
const SmallVectorImpl<std::pair<llvm::Register, llvm::SDValue>> &Regs,		const SmallVectorImpl<std::pair<llvm::Register, llvm::SDValue>> &Regs,
MachineFunction &MF) const;		MachineFunction &MF) const;

bool useRVVForFixedLengthVectorVT(MVT VT) const;		bool useRVVForFixedLengthVectorVT(MVT VT) const;
};		};

		namespace RISCV {
		// We use 64 bits as the known part in the scalable vector types.
		static constexpr unsigned RVVBitsPerBlock = 64;
		}; // namespace RISCV

namespace RISCVVIntrinsicsTable {		namespace RISCVVIntrinsicsTable {

struct RISCVVIntrinsicInfo {		struct RISCVVIntrinsicInfo {
unsigned IntrinsicID;		unsigned IntrinsicID;
uint8_t ExtendedOperand;		uint8_t ExtendedOperand;
};		};

using namespace RISCV;		using namespace RISCV;
Show All 25 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Show First 20 Lines • Show All 5,054 Lines • ▼ Show 20 Lines	if (VT.isScalarInteger()) {
}		}
}		}

return false;		return false;
}		}

bool RISCVTargetLowering::useRVVForFixedLengthVectorVT(MVT VT) const {		bool RISCVTargetLowering::useRVVForFixedLengthVectorVT(MVT VT) const {
if (!Subtarget.useRVVForFixedLengthVectors())		if (!Subtarget.useRVVForFixedLengthVectors())
return false;		return false;
		frasercrmckUnsubmitted Done Reply Inline Actions /// .... If true, it also returns /// whether the unaligned memory access is "fast" in the last argument by /// reference. I think this suggests that setting `Fast` is part of the contract. It could theoretically read an uninitialized variable if we don't set it but return true. frasercrmck: ``` /// .... If true, it also returns /// whether the unaligned memory access is "fast" in…

if (!VT.isFixedLengthVector())		if (!VT.isFixedLengthVector())
return false;		return false;

// Don't use RVV for vectors we cannot scalarize if required.		// Don't use RVV for vectors we cannot scalarize if required.
switch (VT.getVectorElementType().SimpleTy) {		switch (VT.getVectorElementType().SimpleTy) {
default:		default:
return false;		return false;
case MVT::i1:		case MVT::i1:
case MVT::i8:		case MVT::i8:
case MVT::i16:		case MVT::i16:
case MVT::i32:		case MVT::i32:
		craig.topperUnsubmitted Not Done Reply Inline Actions You can probably just the element VTs and then check that Align == ElemVT.getStoreSize() rather than spelling out all of the alignments. What is considered misaligned for scalable vectors? Should we be checking the alignement is >= the element size? craig.topper: You can probably just the element VTs and then check that Align == ElemVT.getStoreSize() rather…
		frasercrmckUnsubmitted Not Done Reply Inline Actions Yeah, I agree. For RVV we just need to check that the vectors are at least aligned to the element size, don't we? I reckon `Align >= EltVT.getStoreSize()` is sufficient. frasercrmck: Yeah, I agree. For RVV we just need to check that the vectors are at least aligned to the…
case MVT::i64:		case MVT::i64:
break;		break;
case MVT::f16:		case MVT::f16:
if (!Subtarget.hasStdExtZfh())		if (!Subtarget.hasStdExtZfh())
return false;		return false;
break;		break;
case MVT::f32:		case MVT::f32:
if (!Subtarget.hasStdExtF())		if (!Subtarget.hasStdExtF())
Show All 13 Lines	bool RISCVTargetLowering::useRVVForFixedLengthVectorVT(MVT VT) const {
// TODO: Perhaps an artificial restriction, but worth having whilst getting		// TODO: Perhaps an artificial restriction, but worth having whilst getting
// the base fixed length RVV support in place.		// the base fixed length RVV support in place.
if (!VT.isPow2VectorType())		if (!VT.isPow2VectorType())
return false;		return false;

return true;		return true;
}		}

		bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
		EVT VT, unsigned AddrSpace, Align Alignment, MachineMemOperand::Flags Flags,
		bool *Fast) const {
		if (!VT.isScalableVector())
		return false;

		EVT ElemVT = VT.getVectorElementType();
		if (Alignment >= ElemVT.getStoreSize()) {
		if (Fast)
		*Fast = true;
		return true;
		}

		return false;
		}

#define GET_REGISTER_MATCHER		#define GET_REGISTER_MATCHER
#include "RISCVGenAsmMatcher.inc"		#include "RISCVGenAsmMatcher.inc"

Register		Register
RISCVTargetLowering::getRegisterByName(const char *RegName, LLT VT,		RISCVTargetLowering::getRegisterByName(const char *RegName, LLT VT,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
Register Reg = MatchRegisterAltName(RegName);		Register Reg = MatchRegisterAltName(RegName);
if (Reg == RISCV::NoRegister)		if (Reg == RISCV::NoRegister)
Show All 26 Lines

llvm/lib/Target/RISCV/RISCVSubtarget.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	public:
bool enableSaveRestore() const { return EnableSaveRestore; }		bool enableSaveRestore() const { return EnableSaveRestore; }
MVT getXLenVT() const { return XLenVT; }		MVT getXLenVT() const { return XLenVT; }
unsigned getXLen() const { return XLen; }		unsigned getXLen() const { return XLen; }
RISCVABI::ABI getTargetABI() const { return TargetABI; }		RISCVABI::ABI getTargetABI() const { return TargetABI; }
bool isRegisterReservedByUser(Register i) const {		bool isRegisterReservedByUser(Register i) const {
assert(i < RISCV::NUM_TARGET_REGS && "Register out of range");		assert(i < RISCV::NUM_TARGET_REGS && "Register out of range");
return UserReservedRegister[i];		return UserReservedRegister[i];
}		}
		unsigned getMaxVectorSizeInBits() const;

protected:		protected:
// GlobalISel related APIs.		// GlobalISel related APIs.
std::unique_ptr<CallLowering> CallLoweringInfo;		std::unique_ptr<CallLowering> CallLoweringInfo;
std::unique_ptr<InstructionSelector> InstSelector;		std::unique_ptr<InstructionSelector> InstSelector;
std::unique_ptr<LegalizerInfo> Legalizer;		std::unique_ptr<LegalizerInfo> Legalizer;
std::unique_ptr<RegisterBankInfo> RegBankInfo;		std::unique_ptr<RegisterBankInfo> RegBankInfo;

Show All 17 Lines

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

Show All 33 Lines	static cl::opt<unsigned> RVVVectorBitsMin(
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

static cl::opt<unsigned> RVVVectorLMULMax(		static cl::opt<unsigned> RVVVectorLMULMax(
"riscv-v-fixed-length-vector-lmul-max",		"riscv-v-fixed-length-vector-lmul-max",
cl::desc("The maximum LMUL value to use for fixed length vectors. "		cl::desc("The maximum LMUL value to use for fixed length vectors. "
"Fractional LMUL values are not supported."),		"Fractional LMUL values are not supported."),
cl::init(8), cl::Hidden);		cl::init(8), cl::Hidden);

		static cl::opt<unsigned> VectorBitsMax(
		"riscv-vector-bits-max",
		cl::desc("Assume RISC-V vector registers are at most this big"),
		cl::init(0), cl::Hidden);

void RISCVSubtarget::anchor() {}		void RISCVSubtarget::anchor() {}

RISCVSubtarget &RISCVSubtarget::initializeSubtargetDependencies(		RISCVSubtarget &RISCVSubtarget::initializeSubtargetDependencies(
const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, StringRef ABIName) {		const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS, StringRef ABIName) {
// Determine default and user-specified characteristics		// Determine default and user-specified characteristics
bool Is64Bit = TT.isArch64Bit();		bool Is64Bit = TT.isArch64Bit();
std::string CPUName = std::string(CPU);		std::string CPUName = std::string(CPU);
std::string TuneCPUName = std::string(TuneCPU);		std::string TuneCPUName = std::string(TuneCPU);
if (CPUName.empty())		if (CPUName.empty())
CPUName = Is64Bit ? "generic-rv64" : "generic-rv32";		CPUName = Is64Bit ? "generic-rv64" : "generic-rv32";
if (TuneCPUName.empty())		if (TuneCPUName.empty())
TuneCPUName = CPUName;		TuneCPUName = CPUName;
ParseSubtargetFeatures(CPUName, TuneCPUName, FS);		ParseSubtargetFeatures(CPUName, TuneCPUName, FS);
if (Is64Bit) {		if (Is64Bit) {
XLenVT = MVT::i64;		XLenVT = MVT::i64;
XLen = 64;		XLen = 64;
}		}

TargetABI = RISCVABI::computeTargetABI(TT, getFeatureBits(), ABIName);		TargetABI = RISCVABI::computeTargetABI(TT, getFeatureBits(), ABIName);
RISCVFeatures::validate(TT, getFeatureBits());		RISCVFeatures::validate(TT, getFeatureBits());
return *this;		return *this;
}		}

		unsigned RISCVSubtarget::getMaxVectorSizeInBits() const {
		assert(HasStdExtV && "Tried to get vector length without V support!");
		return VectorBitsMax;
		}

RISCVSubtarget::RISCVSubtarget(const Triple &TT, StringRef CPU,		RISCVSubtarget::RISCVSubtarget(const Triple &TT, StringRef CPU,
StringRef TuneCPU, StringRef FS,		StringRef TuneCPU, StringRef FS,
StringRef ABIName, const TargetMachine &TM)		StringRef ABIName, const TargetMachine &TM)
: RISCVGenSubtargetInfo(TT, CPU, TuneCPU, FS),		: RISCVGenSubtargetInfo(TT, CPU, TuneCPU, FS),
UserReservedRegister(RISCV::NUM_TARGET_REGS),		UserReservedRegister(RISCV::NUM_TARGET_REGS),
FrameLowering(initializeSubtargetDependencies(TT, CPU, TuneCPU, FS, ABIName)),		FrameLowering(initializeSubtargetDependencies(TT, CPU, TuneCPU, FS, ABIName)),
InstrInfo(this), RegInfo(getHwMode()), TLInfo(TM, this) {		InstrInfo(this), RegInfo(getHwMode()), TLInfo(TM, this) {
CallLoweringInfo.reset(new RISCVCallLowering(*getTargetLowering()));		CallLoweringInfo.reset(new RISCVCallLowering(*getTargetLowering()));
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	public:
int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);		int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);
int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,		int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty, TTI::TargetCostKind CostKind,		Type *Ty, TTI::TargetCostKind CostKind,
Instruction *Inst = nullptr);		Instruction *Inst = nullptr);
int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty, TTI::TargetCostKind CostKind);		Type *Ty, TTI::TargetCostKind CostKind);

bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
		bool supportsScalableVectors() const { return ST->hasStdExtV(); }
		Optional<unsigned> getMaxVScale() const;
};		};

		craig.topperUnsubmitted Not Done Reply Inline Actions Is this used with scalable vectors? AArch64 seems to base their return here for SVE on a command line, but I expected them to require specific scalable vector types in IR for the backend to work. craig.topper: Is this used with scalable vectors? AArch64 seems to base their return here for SVE on a…
		vkmrUnsubmitted Not Done Reply Inline Actions If I understand correctly, the assumption behind this code is that a single vector register is of size `vscale x RVVBitsPerBlock` and ignore the idea (for now?) of having register groups, i.e LMUL>1. Unless we are ignoring register grouping for now, from Loop Vectorizer's perspective it would make sense to view the register group size as the real register size, specially for computing a feasible VF based on register usage. Since the documentation of `getRegisterBitWidth()` defines it to be "The width of the largest scalar or vector register type", it might be more accurate to use `getMinVectorRegisterBitWidth()` to return `RISCVVType::RVVBitsPerBlock` and `getRegisterBitWidth()` to return `getMinVectorRegisterBitWidth() * MAX_LMUL`. (I am not considering fractional LMUL here.) vkmr: If I understand correctly, the assumption behind this code is that a single vector register…
		craig.topperUnsubmitted Not Done Reply Inline Actions Returning a non-zero value seems to at least partially enable the vectorizer to generate fixed vectors which isn't supported by the backend yet. It looks like something else stopped it in my testing, but it at least queried the cost model. Not sure what stopped it. I do plan to support fixed vectors in the RVV backend, but it will probably be a couple weeks away. The register width here will probably need to be a command line controlled value like AArch64. And it should be at least 128 bits per the 0.10 spec. So I don't think its connected to RVVBitsPerBlock. craig.topper: Returning a non-zero value seems to at least partially enable the vectorizer to generate fixed…
		vkmrUnsubmitted Not Done Reply Inline Actions Perhaps I misunderstood something, my concern here is more about how to encapsulate the idea of register grouping for scalable vectors in the TTI methods to query register widths. Having a command line option to control register width would still only reflect the width of a single register, right? Perhaps, we can add another command line option to specify a max group multiplier (essentially the Maximum LMUL value). IIRC, the TTI method `getMinVectorRegisterBitWidth()` in addition to `getRegisterBitWidth()` was introduced to handle similar concerns with NEON. With scalable vectors, things are a little more complicated. vkmr: Perhaps I misunderstood something, my concern here is more about how to encapsulate the idea of…
		craig.topperUnsubmitted Not Done Reply Inline Actions I don't think I understand how this interface works for scalable vector vectorization. AArch64 has it connected to a command line which means it can be larger than 128 bits. But I thought the backend needed specific types like <vscale x 4 x i32>. Does this interface effect the fixed portion of the scalable type for scalable vector vectorization? craig.topper: I don't think I understand how this interface works for scalable vector vectorization. AArch64…
		HsiangKaiAuthorUnsubmitted Done Reply Inline Actions I think I didn't dig into how the callback is used. I remove it in this patch. We could add it back after we have clear idea how to do it. HsiangKai: I think I didn't dig into how the callback is used. I remove it in this patch. We could add it…
		HsiangKaiAuthorUnsubmitted Done Reply Inline Actions Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is reasonable to remove it in this patch. HsiangKai: Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is…
		vkmrUnsubmitted Not Done Reply Inline Actions Does this interface effect the fixed portion of the scalable type for scalable vector vectorization? Yes, the auto vectorizer uses this interface to compute the VF (for scalable vectors this is the fixed part of the VF) with the most optimal cost. Craig is right. getRegisterBitWidth() is not related to scalable vector vectorization. It is reasonable to remove it in this patch. Agreed. vkmr: > Does this interface effect the fixed portion of the scalable type for scalable vector…
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show All 11 Lines
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "riscvtti"		#define DEBUG_TYPE "riscvtti"

int RISCVTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,		int RISCVTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
assert(Ty->isIntegerTy() &&		assert(Ty->isIntegerTy() &&
		vkmrUnsubmitted Not Done Reply Inline Actions Minor nit: Reword the description for more clarity - may be something like "Maximum vector register size in bits"? vkmr: Minor nit: Reword the description for more clarity - may be something like "Maximum vector…
"getIntImmCost can only estimate cost of materialising integers");		"getIntImmCost can only estimate cost of materialising integers");

// We have a Zero register, so 0 is always free.		// We have a Zero register, so 0 is always free.
if (Imm == 0)		if (Imm == 0)
return TTI::TCC_Free;		return TTI::TCC_Free;

// Otherwise, we check how many instructions it will take to materialise.		// Otherwise, we check how many instructions it will take to materialise.
const DataLayout &DL = getDataLayout();		const DataLayout &DL = getDataLayout();
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	int RISCVTTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
// Prevent hoisting in unknown cases.		// Prevent hoisting in unknown cases.
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}

bool RISCVTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {		bool RISCVTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {
// Currently, the ExpandReductions pass can't expand scalable-vector		// Currently, the ExpandReductions pass can't expand scalable-vector
// reductions, but we still request expansion as RVV doesn't support certain		// reductions, but we still request expansion as RVV doesn't support certain
		vkmrUnsubmitted Not Done Reply Inline Actions Nit: Use call to `getRegisterBitWidth()` here instead of `RISCVVType::RVVBitsPerBlock`. (Or implement and use `getMinVectorRegisterBitWidth()`) vkmr: Nit: Use call to `getRegisterBitWidth()` here instead of `RISCVVType::RVVBitsPerBlock`. (Or…
		craig.topperUnsubmitted Not Done Reply Inline Actions getRegisterWidth() is likely going to be updated to be similar to AArch64 and be controlled by a command line option for minimum width. So it won't be the right thing. craig.topper: getRegisterWidth() is likely going to be updated to be similar to AArch64 and be controlled by…
// reductions and the SelectionDAG can't legalize them either.		// reductions and the SelectionDAG can't legalize them either.
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
return false;		return false;
// These reductions have no equivalent in RVV		// These reductions have no equivalent in RVV
case Intrinsic::vector_reduce_mul:		case Intrinsic::vector_reduce_mul:
case Intrinsic::vector_reduce_fmul:		case Intrinsic::vector_reduce_fmul:
// The fmin and fmax intrinsics are not currently supported due to a		// The fmin and fmax intrinsics are not currently supported due to a
// discrepancy between the LLVM semantics and the RVV 0.10 ISA behaviour with		// discrepancy between the LLVM semantics and the RVV 0.10 ISA behaviour with
// regards to signaling NaNs: the vector fmin/fmax reduction intrinsics match		// regards to signaling NaNs: the vector fmin/fmax reduction intrinsics match
// the behaviour minnum/maxnum intrinsics, whereas the vfredmin/vfredmax		// the behaviour minnum/maxnum intrinsics, whereas the vfredmin/vfredmax
// instructions match the vfmin/vfmax instructions which match the equivalent		// instructions match the vfmin/vfmax instructions which match the equivalent
// scalar fmin/fmax instructions as defined in 2.2 F/D/Q extension (see		// scalar fmin/fmax instructions as defined in 2.2 F/D/Q extension (see
// https://bugs.llvm.org/show_bug.cgi?id=27363).		// https://bugs.llvm.org/show_bug.cgi?id=27363).
// This behaviour is likely fixed in version 2.3 of the RISC-V F/D/Q		// This behaviour is likely fixed in version 2.3 of the RISC-V F/D/Q
// extension, where fmin/fmax behave like minnum/maxnum, but until then the		// extension, where fmin/fmax behave like minnum/maxnum, but until then the
// intrinsics are left unsupported.		// intrinsics are left unsupported.
case Intrinsic::vector_reduce_fmax:		case Intrinsic::vector_reduce_fmax:
case Intrinsic::vector_reduce_fmin:		case Intrinsic::vector_reduce_fmin:
return true;		return true;
}		}
}		}

		Optional<unsigned> RISCVTTIImpl::getMaxVScale() const {
		// There is no assumption of the maximum vector length in V specification.
		// We use the value specified by users as the maximum vector length.
		// This function will use the assumed maximum vector length to get the
		// maximum vscale for LoopVectorizer.
		// If users do not specify the maximum vector length, we have no way to
		// know whether the LoopVectorizer is safe to do or not.
		// We only consider to use single vector register (LMUL = 1) to vectorize.
		unsigned MaxVectorSizeInBits = ST->getMaxVectorSizeInBits();
		if (ST->hasStdExtV() && MaxVectorSizeInBits != 0)
		return MaxVectorSizeInBits / RISCV::RVVBitsPerBlock;
		return BaseT::getMaxVScale();
		}

llvm/test/Transforms/LoopVectorize/RISCV/lit.local.cfg

This file was added.

				config.suffixes = ['.ll']

				if not 'RISCV' in config.root.targets:
				config.unsupported = True

llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll

This file was added.

				; RUN: opt -mtriple=riscv64 -mattr=+m,+experimental-v -loop-vectorize \
				; RUN: -riscv-vector-bits-max=512 -S < %s 2>&1 \
				craig.topperUnsubmitted Not Done Reply Inline Actions Is this directory new? If so it needs a lit.local.cfg to mark that all tests in it require the RISCV target to be compiled craig.topper: Is this directory new? If so it needs a lit.local.cfg to mark that all tests in it require the…
				; RUN: \| FileCheck %s

				; void test(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(2, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 64] = a[i] + b[i];
				; }
				; }
				;
				; CHECK: <vscale x 2 x i32>
				define void @test(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 64
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !6

				exit:
				ret void
				}

				!6 = !{!6, !7, !8}
				!7 = !{!"llvm.loop.vectorize.width", i32 2}
				!8 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Initial support of LoopVectorizer for RISC-V Vector.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 322222

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVSubtarget.h

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/test/Transforms/LoopVectorize/RISCV/lit.local.cfg

llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Initial support of LoopVectorizer for RISC-V Vector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 322222

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVSubtarget.h

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/test/Transforms/LoopVectorize/RISCV/lit.local.cfg

llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll

[RISCV] Initial support of LoopVectorizer for RISC-V Vector.
ClosedPublic