This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
1/1
RISCV.td
-
RISCVISelDAGToDAG.cpp
-
RISCVSubtarget.h
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
vsplats-fp.ll

Differential D137699

[RISCV] Don't use zero-stride vector load if there's no optimized u-arch
ClosedPublic

Authored by • pcwang-thead on Nov 9 2022, 1:42 AM.

Download Raw Diff

Details

Reviewers

jacquesguan
craig.topper
frasercrmck
reames

Commits

rGc66b69777cc9: [RISCV] Don't use zero-stride vector load if there's no optimized u-arch

Summary

For vector strided instructions, as the RVV spec says:

When rs2=x0, then an implementation is allowed, but not required, to
perform fewer memory operations than the number of active elements, and
may perform different numbers of memory operations across different
dynamic executions of the same static instruction.

So compiler shouldn't assume that fewer memory operations will be
performed when rs2=x0.

We add a target feature to specify whether u-arch supports optimized
zero-stride vector load. And we do vector splat optimization iff this
feature is supported.

This feature is enabled by default since most designs implement this
optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

• pcwang-thead created this revision.Nov 9 2022, 1:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:42 AM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 28 others. · View Herald Transcript

• pcwang-thead requested review of this revision.Nov 9 2022, 1:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:42 AM

Herald added subscribers: llvm-commits, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B196853: Diff 474193.Nov 9 2022, 2:21 AM

I have no problem with adding a feature to disable this optimization, but there's two problems with this patch.

First, a bunch of existing CPUs do support this optimization. Your change doesn't update any of them, so this is strictly a regression.

Second, most designs I'm aware of do implement this optimization. As such, I really think this patch gets the default wrong. We should allow disabling optimizations for code targeting designs which don't optimize this case, but if we're targeting a generic riscv64 vector core, assuming this optimization should be the default.

Also, are you are of a specific design which doesn't optimize this case? If not, the complexity doesn't seem worthwhile.

This revision now requires changes to proceed.Nov 10 2022, 9:38 AM

Enbale this feature by default.

In D137699#3919799, @reames wrote:

I have no problem with adding a feature to disable this optimization, but there's two problems with this patch.

First, a bunch of existing CPUs do support this optimization. Your change doesn't update any of them, so this is strictly a regression.

Second, most designs I'm aware of do implement this optimization. As such, I really think this patch gets the default wrong. We should allow disabling optimizations for code targeting designs which don't optimize this case, but if we're targeting a generic riscv64 vector core, assuming this optimization should be the default.

Thanks! I do agree with you and I have made it default now.

Also, are you are of a specific design which doesn't optimize this case? If not, the complexity doesn't seem worthwhile.

Yes, some of our taped-out low-end products didn't do this optimization.

Harbormaster completed remote builds in B197154: Diff 474647.Nov 10 2022, 7:51 PM

craig.topper added inline comments.Nov 10 2022, 10:28 PM

llvm/lib/Target/RISCV/RISCV.td
455	`FeatureNoOptimizedZeroStrideLoad` -> `TuneNoOptimizedZeroStrideLoad`.

Rename FeatureNoOptimizedZeroStrideLoad to TuneNoOptimizedZeroStrideLoad

• pcwang-thead marked an inline comment as done.Nov 10 2022, 11:20 PM

• pcwang-thead edited the summary of this revision. (Show Details)Nov 10 2022, 11:41 PM

Harbormaster completed remote builds in B197167: Diff 474666.Nov 11 2022, 12:01 AM

LGTM

If you're interested in optimizing for such a target, I'd suggest a follow up. We should probably be canonicalizing in the other direction (i.e. replace a zero stride load with a load and splat). A zero stride load can probably be matched during gather lowering.

This revision is now accepted and ready to land.Nov 11 2022, 9:16 AM

In D137699#3921845, @reames wrote:

LGTM

If you're interested in optimizing for such a target, I'd suggest a follow up. We should probably be canonicalizing in the other direction (i.e. replace a zero stride load with a load and splat). A zero stride load can probably be matched during gather lowering.

Thanks! I will have a try.

This revision was landed with ongoing or failed builds.Nov 13 2022, 9:52 PM

Closed by commit rGc66b69777cc9: [RISCV] Don't use zero-stride vector load if there's no optimized u-arch (authored by • pcwang-thead). · Explain Why

This revision was automatically updated to reflect the committed changes.

• pcwang-thead added a commit: rGc66b69777cc9: [RISCV] Don't use zero-stride vector load if there's no optimized u-arch.

• pcwang-thead mentioned this in D137931: [RISCV] Don't use zero-stride vector load for gather if not optimized.Nov 14 2022, 3:29 AM

• pcwang-thead mentioned this in rGa214c521f876: [RISCV] Don't use zero-stride vector load for gather if not optimized.Nov 15 2022, 6:44 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCV.td

5 lines

RISCVISelDAGToDAG.cpp

4 lines

RISCVSubtarget.h

2 lines

test/

CodeGen/

RISCV/

rvv/

vsplats-fp.ll

25 lines

Diff 475049

llvm/lib/Target/RISCV/RISCV.td

	Show First 20 Lines • Show All 446 Lines • ▼ Show 20 Lines
	def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",			def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",
	"true", "Enable save/restore.">;			"true", "Enable save/restore.">;

	def FeatureUnalignedScalarMem			def FeatureUnalignedScalarMem
	: SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",			: SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
	"true", "Has reasonably performant unaligned scalar "			"true", "Has reasonably performant unaligned scalar "
	"loads and stores">;			"loads and stores">;

				def TuneNoOptimizedZeroStrideLoad
				craig.topperUnsubmitted Done Reply Inline Actions `FeatureNoOptimizedZeroStrideLoad` -> `TuneNoOptimizedZeroStrideLoad`. craig.topper: `FeatureNoOptimizedZeroStrideLoad` -> `TuneNoOptimizedZeroStrideLoad`.
				: SubtargetFeature<"no-optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
				"false", "Hasn't optimized (perform fewer memory operations)"
				"zero-stride vector load">;

	def TuneLUIADDIFusion			def TuneLUIADDIFusion
	: SubtargetFeature<"lui-addi-fusion", "HasLUIADDIFusion",			: SubtargetFeature<"lui-addi-fusion", "HasLUIADDIFusion",
	"true", "Enable LUI+ADDI macrofusion">;			"true", "Enable LUI+ADDI macrofusion">;

	def TuneNoDefaultUnroll			def TuneNoDefaultUnroll
	: SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",			: SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",
	"Disable default unroll preference.">;			"Disable default unroll preference.">;

	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,784 Lines • ▼ Show 20 Lines	case ISD::EXTRACT_SUBVECTOR: {
SDValue Extract = CurDAG->getTargetExtractSubreg(SubRegIdx, DL, VT, V);		SDValue Extract = CurDAG->getTargetExtractSubreg(SubRegIdx, DL, VT, V);
ReplaceNode(Node, Extract.getNode());		ReplaceNode(Node, Extract.getNode());
return;		return;
}		}
case RISCVISD::VMV_S_X_VL:		case RISCVISD::VMV_S_X_VL:
case RISCVISD::VFMV_S_F_VL:		case RISCVISD::VFMV_S_F_VL:
case RISCVISD::VMV_V_X_VL:		case RISCVISD::VMV_V_X_VL:
case RISCVISD::VFMV_V_F_VL: {		case RISCVISD::VFMV_V_F_VL: {
		// Only if we have optimized zero-stride vector load.
		if (!Subtarget->hasOptimizedZeroStrideLoad())
		break;

// Try to match splat of a scalar load to a strided load with stride of x0.		// Try to match splat of a scalar load to a strided load with stride of x0.
bool IsScalarMove = Node->getOpcode() == RISCVISD::VMV_S_X_VL \|\|		bool IsScalarMove = Node->getOpcode() == RISCVISD::VMV_S_X_VL \|\|
Node->getOpcode() == RISCVISD::VFMV_S_F_VL;		Node->getOpcode() == RISCVISD::VFMV_S_F_VL;
if (!Node->getOperand(0).isUndef())		if (!Node->getOperand(0).isUndef())
break;		break;
SDValue Src = Node->getOperand(1);		SDValue Src = Node->getOperand(1);
auto *Ld = dyn_cast<LoadSDNode>(Src);		auto *Ld = dyn_cast<LoadSDNode>(Src);
if (!Ld)		if (!Ld)
▲ Show 20 Lines • Show All 1,047 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVSubtarget.h

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	private:
bool EnableLinkerRelax = false;		bool EnableLinkerRelax = false;
bool EnableRVCHintInstrs = true;		bool EnableRVCHintInstrs = true;
bool EnableDefaultUnroll = true;		bool EnableDefaultUnroll = true;
bool EnableSaveRestore = false;		bool EnableSaveRestore = false;
bool EnableUnalignedScalarMem = false;		bool EnableUnalignedScalarMem = false;
bool HasShortForwardBranchOpt = false;		bool HasShortForwardBranchOpt = false;
bool HasLUIADDIFusion = false;		bool HasLUIADDIFusion = false;
bool HasForcedAtomics = false;		bool HasForcedAtomics = false;
		bool HasOptimizedZeroStrideLoad = true;
unsigned XLen = 32;		unsigned XLen = 32;
unsigned ZvlLen = 0;		unsigned ZvlLen = 0;
MVT XLenVT = MVT::i32;		MVT XLenVT = MVT::i32;
uint8_t MaxInterleaveFactor = 2;		uint8_t MaxInterleaveFactor = 2;
RISCVABI::ABI TargetABI = RISCVABI::ABI_Unknown;		RISCVABI::ABI TargetABI = RISCVABI::ABI_Unknown;
std::bitset<RISCV::NUM_TARGET_REGS> UserReservedRegister;		std::bitset<RISCV::NUM_TARGET_REGS> UserReservedRegister;
RISCVFrameLowering FrameLowering;		RISCVFrameLowering FrameLowering;
RISCVInstrInfo InstrInfo;		RISCVInstrInfo InstrInfo;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	public:
bool enableLinkerRelax() const { return EnableLinkerRelax; }		bool enableLinkerRelax() const { return EnableLinkerRelax; }
bool enableRVCHintInstrs() const { return EnableRVCHintInstrs; }		bool enableRVCHintInstrs() const { return EnableRVCHintInstrs; }
bool enableDefaultUnroll() const { return EnableDefaultUnroll; }		bool enableDefaultUnroll() const { return EnableDefaultUnroll; }
bool enableSaveRestore() const { return EnableSaveRestore; }		bool enableSaveRestore() const { return EnableSaveRestore; }
bool hasShortForwardBranchOpt() const { return HasShortForwardBranchOpt; }		bool hasShortForwardBranchOpt() const { return HasShortForwardBranchOpt; }
bool enableUnalignedScalarMem() const { return EnableUnalignedScalarMem; }		bool enableUnalignedScalarMem() const { return EnableUnalignedScalarMem; }
bool hasLUIADDIFusion() const { return HasLUIADDIFusion; }		bool hasLUIADDIFusion() const { return HasLUIADDIFusion; }
bool hasForcedAtomics() const { return HasForcedAtomics; }		bool hasForcedAtomics() const { return HasForcedAtomics; }
		bool hasOptimizedZeroStrideLoad() const { return HasOptimizedZeroStrideLoad; }
MVT getXLenVT() const { return XLenVT; }		MVT getXLenVT() const { return XLenVT; }
unsigned getXLen() const { return XLen; }		unsigned getXLen() const { return XLen; }
unsigned getFLen() const {		unsigned getFLen() const {
if (HasStdExtD)		if (HasStdExtD)
return 64;		return 64;

if (HasStdExtF)		if (HasStdExtF)
return 32;		return 32;
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/vsplats-fp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -mattr=+f,+d,+zfh,+experimental-zvfh,+v -target-abi ilp32d -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -mattr=+f,+d,+zfh,+experimental-zvfh,+v -target-abi ilp32d -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s --check-prefixes=CHECK,OPTIMIZED
	; RUN: llc -mtriple=riscv64 -mattr=+f,+d,+zfh,+experimental-zvfh,+v -target-abi lp64d -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+f,+d,+zfh,+experimental-zvfh,+v -target-abi lp64d -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s --check-prefixes=CHECK,OPTIMIZED
				; RUN: llc -mtriple=riscv32 -mattr=+f,+d,+zfh,+experimental-zvfh,+v,+no-optimized-zero-stride-load -target-abi ilp32d -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s --check-prefixes=CHECK,NOT-OPTIMIZED
				; RUN: llc -mtriple=riscv64 -mattr=+f,+d,+zfh,+experimental-zvfh,+v,+no-optimized-zero-stride-load -target-abi lp64d -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s --check-prefixes=CHECK,NOT-OPTIMIZED

	define <vscale x 8 x half> @vsplat_nxv8f16(half %f) {			define <vscale x 8 x half> @vsplat_nxv8f16(half %f) {
	; CHECK-LABEL: vsplat_nxv8f16:			; CHECK-LABEL: vsplat_nxv8f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma			; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, ma
	; CHECK-NEXT: vfmv.v.f v8, fa0			; CHECK-NEXT: vfmv.v.f v8, fa0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%head = insertelement <vscale x 8 x half> poison, half %f, i32 0			%head = insertelement <vscale x 8 x half> poison, half %f, i32 0
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%head = insertelement <vscale x 8 x double> poison, double zeroinitializer, i32 0			%head = insertelement <vscale x 8 x double> poison, double zeroinitializer, i32 0
	%splat = shufflevector <vscale x 8 x double> %head, <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer			%splat = shufflevector <vscale x 8 x double> %head, <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer
	ret <vscale x 8 x double> %splat			ret <vscale x 8 x double> %splat
	}			}

	; Test that we fold this to a vlse with 0 stride.			; Test that we fold this to a vlse with 0 stride.
	define <vscale x 8 x float> @vsplat_load_nxv8f32(float* %ptr) {			define <vscale x 8 x float> @vsplat_load_nxv8f32(float* %ptr) {
	; CHECK-LABEL: vsplat_load_nxv8f32:			; OPTIMIZED-LABEL: vsplat_load_nxv8f32:
	; CHECK: # %bb.0:			; OPTIMIZED: # %bb.0:
	; CHECK-NEXT: vsetvli a1, zero, e32, m4, ta, ma			; OPTIMIZED-NEXT: vsetvli a1, zero, e32, m4, ta, ma
	; CHECK-NEXT: vlse32.v v8, (a0), zero			; OPTIMIZED-NEXT: vlse32.v v8, (a0), zero
	; CHECK-NEXT: ret			; OPTIMIZED-NEXT: ret
				;
				; NOT-OPTIMIZED-LABEL: vsplat_load_nxv8f32:
				; NOT-OPTIMIZED: # %bb.0:
				; NOT-OPTIMIZED-NEXT: flw ft0, 0(a0)
				; NOT-OPTIMIZED-NEXT: vsetvli a0, zero, e32, m4, ta, ma
				; NOT-OPTIMIZED-NEXT: vfmv.v.f v8, ft0
				; NOT-OPTIMIZED-NEXT: ret
	%f = load float, float* %ptr			%f = load float, float* %ptr
	%head = insertelement <vscale x 8 x float> poison, float %f, i32 0			%head = insertelement <vscale x 8 x float> poison, float %f, i32 0
	%splat = shufflevector <vscale x 8 x float> %head, <vscale x 8 x float> poison, <vscale x 8 x i32> zeroinitializer			%splat = shufflevector <vscale x 8 x float> %head, <vscale x 8 x float> poison, <vscale x 8 x i32> zeroinitializer
	ret <vscale x 8 x float> %splat			ret <vscale x 8 x float> %splat
	}			}