This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelLowering.h
-
RISCVISelLowering.cpp
-
RISCVTargetTransformInfo.h
-
test/Transforms/LoopVectorize/RISCV/
-
Transforms/
-
LoopVectorize/
-
RISCV/
1/2
interleaved-accesses.ll

Differential D105130

[RISCV] Enable interleaved access vectorization
Needs ReviewPublic

Authored by luke957 on Jun 29 2021, 9:31 AM.

Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck

Summary

Enable interleaved access vectorization.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,760 ms	x64 debian > libarcher.critical::critical.c
	2,800 ms	x64 debian > libarcher.critical::lock-nested.c
	2,950 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,930 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	3,030 ms	x64 debian > libarcher.races::lock-unrelated.c
		View Full Test Results (19 Failed)

Event Timeline

luke957 created this revision.Jun 29 2021, 9:31 AM

Herald added subscribers: vkmr, evandro, luismarques and 24 others. · View Herald TranscriptJun 29 2021, 9:31 AM

luke957 requested review of this revision.Jun 29 2021, 9:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 9:31 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
5	Is `-enable-interleaved-mem-accesses=true` needed if TTI enableInterleavedAccessVectorization() returns true

Harbormaster completed remote builds in B111554: Diff 355272.Jun 29 2021, 10:16 AM

Update

In D105130#2847617, @craig.topper wrote:

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

luke957 added inline comments.Jul 24 2021, 1:52 AM

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
5	Yes, `-enable-interleaved-mem-accesses=true` is not needed any longer.

Harbormaster completed remote builds in B116006: Diff 361420.Jul 24 2021, 2:15 AM

ping

If we aren't using segment load/store, what does the backend codegen for this look like?

In D105130#2902226, @luke957 wrote:

In D105130#2847617, @craig.topper wrote:

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

I believe we need to run the InterleavedAccessPass and and and implement TargetLowering::LowerInterleavedLoad/Store to create IR intrinsics. That's how it is done on ARM for their vldX and vstX intstructions.

In D105130#2909217, @craig.topper wrote:

If we aren't using segment load/store, what does the backend codegen for this look like?

It looks like this

%wide.vec = load <8 x i32>, <8 x i32>* %1, align 4
%strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%strided.vec1 = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
...
%interleaved.vec = shufflevector <4 x i32> %3, <4 x i32> %4, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %7, align 4

InnerLoopVectorizer::vectorizeInterleaveGroup() will generate shufflevector instructions for interleaved accesses.

In D105130#2909219, @craig.topper wrote:

In D105130#2902226, @luke957 wrote:

In D105130#2847617, @craig.topper wrote:

Please upload patches with full context using -U999999 has documented here https://releases.llvm.org/11.0.0/docs/Phabricator.html#requesting-a-review-via-the-web-interface

Do you plan to map these to segment load/store instructions in the future?

Yeah, segment load/store instructions are naturally suitable for mapping these. Do we need to create a new RISCVISD?

I believe we need to run the InterleavedAccessPass and and and implement TargetLowering::LowerInterleavedLoad/Store to create IR intrinsics. That's how it is done on ARM for their vldX and vstX intstructions.

Yeah, I think that is the right direction. Thanks. It seems I should submit a patch implementing TargetLowering::LowerInterleavedLoad/Store before this one.

In D105130#2913765, @luke957 wrote:
In D105130#2909217, @craig.topper wrote:

If we aren't using segment load/store, what does the backend codegen for this look like?

It looks like this
%wide.vec = load <8 x i32>, <8 x i32>* %1, align 4
%strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%strided.vec1 = shufflevector <8 x i32> %wide.vec, <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
...
%interleaved.vec = shufflevector <4 x i32> %3, <4 x i32> %4, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %7, align 4
InnerLoopVectorizer::vectorizeInterleaveGroup() will generate shufflevector instructions for interleaved accesses.

I was asking what the RISCV assembly looks like. We don't have a 2 input shuffle instruction so this has to broken down into something like 2 vrgathers and a vmerge, but I'm not sure.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.h

3 lines

RISCVISelLowering.cpp

16 lines

RISCVTargetTransformInfo.h

27 lines

test/

Transforms/

LoopVectorize/

RISCV/

interleaved-accesses.ll

87 lines

Diff 361420

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	public:
unsigned getNumRegistersForCallingConv(LLVMContext &Context,		unsigned getNumRegistersForCallingConv(LLVMContext &Context,
CallingConv::ID CC,		CallingConv::ID CC,
EVT VT) const override;		EVT VT) const override;

/// Return true if the given shuffle mask can be codegen'd directly, or if it		/// Return true if the given shuffle mask can be codegen'd directly, or if it
/// should be stack expanded.		/// should be stack expanded.
bool isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const override;		bool isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const override;

		bool isLegalInterleavedAccessType(VectorType *VecTy,
		const DataLayout &DL) const;

bool hasBitPreservingFPLogic(EVT VT) const override;		bool hasBitPreservingFPLogic(EVT VT) const override;
bool		bool
shouldExpandBuildVectorWithShuffles(EVT VT,		shouldExpandBuildVectorWithShuffles(EVT VT,
unsigned DefinedValues) const override;		unsigned DefinedValues) const override;

// Provide custom lowering hooks for some operations.		// Provide custom lowering hooks for some operations.
SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;		SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;
void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,		void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,957 Lines • ▼ Show 20 Lines	if (PartVTBitSize % ValueVTBitSize == 0) {
Val, DAG.getConstant(0, DL, Subtarget.getXLenVT()));		Val, DAG.getConstant(0, DL, Subtarget.getXLenVT()));
Parts[0] = Val;		Parts[0] = Val;
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		bool RISCVTargetLowering::isLegalInterleavedAccessType(
		VectorType *VecTy, const DataLayout &DL) const {

		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());

		// Ensure the number of vector elements is greater than 1.
		if (cast<FixedVectorType>(VecTy)->getNumElements() < 2)
		return false;

		// Ensure the element type is legal.
		if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)
		return false;

		return true;
		}

SDValue RISCVTargetLowering::joinRegisterPartsIntoValue(		SDValue RISCVTargetLowering::joinRegisterPartsIntoValue(
SelectionDAG &DAG, const SDLoc &DL, const SDValue *Parts, unsigned NumParts,		SelectionDAG &DAG, const SDLoc &DL, const SDValue *Parts, unsigned NumParts,
MVT PartVT, EVT ValueVT, Optional<CallingConv::ID> CC) const {		MVT PartVT, EVT ValueVT, Optional<CallingConv::ID> CC) const {
bool IsABIRegCopy = CC.hasValue();		bool IsABIRegCopy = CC.hasValue();
if (IsABIRegCopy && ValueVT == MVT::f16 && PartVT == MVT::f32) {		if (IsABIRegCopy && ValueVT == MVT::f16 && PartVT == MVT::f32) {
SDValue Val = Parts[0];		SDValue Val = Parts[0];

// Cast the f32 to i32, truncate to i16, and cast back to f16.		// Cast the f32 to i32, truncate to i16, and cast back to f16.
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	bool isLegalToVectorizeReduction(const RecurrenceDescriptor &RdxDesc,
default:		default:
return false;		return false;
}		}
}		}

unsigned getMaxInterleaveFactor(unsigned VF) {		unsigned getMaxInterleaveFactor(unsigned VF) {
return ST->getMaxInterleaveFactor();		return ST->getMaxInterleaveFactor();
}		}

		bool enableInterleavedAccessVectorization() { return true; }

		InstructionCost getInterleavedMemoryOpCost(
		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
		bool UseMaskForCond, bool UseMaskForGaps) {
		assert(Factor >= 2 && "Invalid interleave factor");
		auto *VecVTy = cast<FixedVectorType>(VecTy);

		if (ST->hasStdExtV() && isLegalElementTypeForRVV(VecTy->getScalarType()) &&
		!UseMaskForCond && !UseMaskForGaps &&
		Factor <= TLI->getMaxSupportedInterleaveFactor()) {
		unsigned NumElts = VecVTy->getNumElements();
		auto *SubVecTy =
		FixedVectorType::get(VecTy->getScalarType(), NumElts / Factor);

		if (NumElts % Factor == 0 &&
		TLI->isLegalInterleavedAccessType(SubVecTy, DL))
		return Factor * getMemoryOpCost(Opcode, SubVecTy->getElementType(),
		Alignment, 0, CostKind);
		}

		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
		Alignment, AddressSpace, CostKind,
		UseMaskForCond, UseMaskForGaps);
		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -mtriple riscv64-linux-gnu -mattr=+experimental-v \
				; RUN: -loop-vectorize -instcombine -force-vector-width=4 \
				; RUN: -force-vector-interleave=1 -runtime-memory-check-threshold=24 < %s \
				; RUN: \| FileCheck %s
				craig.topperUnsubmitted Not Done Reply Inline Actions Is `-enable-interleaved-mem-accesses=true` needed if TTI enableInterleavedAccessVectorization() returns true craig.topper: Is `-enable-interleaved-mem-accesses=true` needed if TTI enableInterleavedAccessVectorization()…
				luke957AuthorUnsubmitted Done Reply Inline Actions Yes, `-enable-interleaved-mem-accesses=true` is not needed any longer. luke957: Yes, `-enable-interleaved-mem-accesses=true` is not needed any longer.

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; Check vectorization on an interleaved load group of factor 2 and an
				; interleaved store group of factor 2.

				; int AB[1024];
				; int CD[1024];
				; void test_array_load2_store2(int C, int D) {
				; for (int i = 0; i < 1024; i+=2) {
				; int A = AB[i];
				; int B = AB[i+1];
				; CD[i] = A + C;
				; CD[i+1] = B * D;
				; }
				; }


				@AB = common global [1024 x i32] zeroinitializer, align 4
				@CD = common global [1024 x i32] zeroinitializer, align 4

				define void @test_array_load2_store2(i32 %C, i32 %D) {
				; CHECK-LABEL: @test_array_load2_store2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <4 x i32> poison, i32 [[D:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT2]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [1024 x i32], [1024 x i32] @AB, i64 0, i64 [[OFFSET_IDX]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <8 x i32>*
				; CHECK-NEXT: [[WIDE_VEC:%.]] = load <8 x i32>, <8 x i32> [[TMP1]], align 4
				; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
				; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[BROADCAST_SPLAT3]]
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [1024 x i32], [1024 x i32] @CD, i64 0, i64 [[TMP2]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i64 -1
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
				; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
				; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
				; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %indvars.iv
				%tmp = load i32, i32* %arrayidx0, align 4
				%tmp1 = or i64 %indvars.iv, 1
				%arrayidx1 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %tmp1
				%tmp2 = load i32, i32* %arrayidx1, align 4
				%add = add nsw i32 %tmp, %C
				%mul = mul nsw i32 %tmp2, %D
				%arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @CD, i64 0, i64 %indvars.iv
				store i32 %add, i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds [1024 x i32], [1024 x i32]* @CD, i64 0, i64 %tmp1
				store i32 %mul, i32* %arrayidx3, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
				%cmp = icmp slt i64 %indvars.iv.next, 1024
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body
				ret void
				}