This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Cost model for general case of dual vector permute
ClosedPublic

Authored by reames on Mar 28 2023, 9:52 AM.

Download Raw Diff

Details

Reviewers

luke
craig.topper
asb

Commits

rG57492b1eeb99: [RISCV] Cost model for general case of dual vector permute

Summary

The cost model was not accounting for the fact that we can generate a dual vrgather + an index expression sequence instead of scalarizing.

A couple cases to call out:

I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add.
Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement.
As indicated by the *regression* in reported cost for <2 x iN> vectors, our current vector lowering is missing support for a sub-case where scalarize-and-insert is actually faster than the generic fallback path.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Mar 28 2023, 9:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 9:52 AM

Herald added subscribers: jobnoorman, VincentWu, vkmr and 29 others. · View Herald Transcript

reames requested review of this revision.Mar 28 2023, 9:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 9:52 AM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B222283: Diff 509055.Mar 28 2023, 4:51 PM

LGTM. Not sure why clang-format is failing in the pre-merge checks

This revision is now accepted and ready to land.Mar 29 2023, 1:48 AM

This revision was landed with ongoing or failed builds.Mar 29 2023, 7:37 AM

Closed by commit rG57492b1eeb99: [RISCV] Cost model for general case of dual vector permute (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG57492b1eeb99: [RISCV] Cost model for general case of dual vector permute.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVTargetTransformInfo.cpp

20 lines

test/

Analysis/

CostModel/

RISCV/

shuffle-permute.ll

56 lines

Diff 509364

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	case TTI::SK_PermuteSingleSrc: {
(LT.second.getScalarSizeInBits() != 8 \|\|		(LT.second.getScalarSizeInBits() != 8 \|\|
LT.second.getVectorNumElements() <= 256)) {		LT.second.getVectorNumElements() <= 256)) {
VectorType *IdxTy = VectorType::get(IntegerType::getInt8Ty(Tp->getContext()),		VectorType *IdxTy = VectorType::get(IntegerType::getInt8Ty(Tp->getContext()),
Tp->getElementCount());		Tp->getElementCount());
InstructionCost IndexCost = getConstantPoolLoadCost(IdxTy, CostKind);		InstructionCost IndexCost = getConstantPoolLoadCost(IdxTy, CostKind);
return IndexCost + getLMULCost(LT.second);		return IndexCost + getLMULCost(LT.second);
}		}
}		}
		break;
		}
		case TTI::SK_PermuteTwoSrc: {
		if (Mask.size() >= 2 && LT.second.isFixedLengthVector()) {
		// 2 x (vrgather + cost of generating the mask constant) + cost of mask
		// register for the second vrgather. We model this for an unknown
		// (shuffle) mask.
		if (LT.first == 1 &&
		(LT.second.getScalarSizeInBits() != 8 \|\|
		LT.second.getVectorNumElements() <= 256)) {
		auto &C = Tp->getContext();
		auto EC = Tp->getElementCount();
		VectorType *IdxTy = VectorType::get(IntegerType::getInt8Ty(C), EC);
		VectorType *MaskTy = VectorType::get(IntegerType::getInt1Ty(C), EC);
		InstructionCost IndexCost = getConstantPoolLoadCost(IdxTy, CostKind);
		InstructionCost MaskCost = getConstantPoolLoadCost(MaskTy, CostKind);
		return 2 * IndexCost + 2 * getLMULCost(LT.second) + MaskCost;
		}
		}
		break;
}		}
}		}
};		};

// Handle scalable vectors (and fixed vectors legalized to scalable vectors).		// Handle scalable vectors (and fixed vectors legalized to scalable vectors).
switch (Kind) {		switch (Kind) {
default:		default:
// Fallthrough to generic handling.		// Fallthrough to generic handling.
▲ Show 20 Lines • Show All 1,291 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/RISCV/shuffle-permute.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	;
%v4f64 = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 0>		%v4f64 = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 0>

ret void		ret void
}		}

define void @general_permute_two_source() {		define void @general_permute_two_source() {
;		;
; CHECK-LABEL: 'general_permute_two_source'		; CHECK-LABEL: 'general_permute_two_source'
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v4i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v8i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16i8 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v16i8 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4i16 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v4i16 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8i16 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v8i16 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16i16 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v16i16 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2i32 = shufflevector <2 x i32> undef, <2 x i32> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i32 = shufflevector <2 x i32> undef, <2 x i32> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4i32 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v4i32 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8i32 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v8i32 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16i32 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16i32 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v2i64 = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i64 = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %v4i64 = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v4i64 = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %v8i64 = shufflevector <8 x i64> undef, <8 x i64> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v8i64 = shufflevector <8 x i64> undef, <8 x i64> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 142 for instruction: %v16i64 = shufflevector <16 x i64> undef, <16 x i64> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v16i64 = shufflevector <16 x i64> undef, <16 x i64> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2half = shufflevector <2 x half> undef, <2 x half> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2half = shufflevector <2 x half> undef, <2 x half> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4half = shufflevector <4 x half> undef, <4 x half> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v4half = shufflevector <4 x half> undef, <4 x half> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8half = shufflevector <8 x half> undef, <8 x half> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v8half = shufflevector <8 x half> undef, <8 x half> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16half = shufflevector <16 x half> undef, <16 x half> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v16half = shufflevector <16 x half> undef, <16 x half> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2float = shufflevector <2 x float> undef, <2 x float> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2float = shufflevector <2 x float> undef, <2 x float> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4float = shufflevector <4 x float> undef, <4 x float> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v4float = shufflevector <4 x float> undef, <4 x float> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8float = shufflevector <8 x float> undef, <8 x float> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v8float = shufflevector <8 x float> undef, <8 x float> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16float = shufflevector <16 x float> undef, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v16float = shufflevector <16 x float> undef, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v2double = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> <i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2double = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> <i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v4double = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %v4double = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v8double = shufflevector <8 x double> undef, <8 x double> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v8double = shufflevector <8 x double> undef, <8 x double> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %v16double = shufflevector <16 x double> undef, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v16double = shufflevector <16 x double> undef, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;		;
%v2i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 3, i32 0>		%v2i8 = shufflevector <2 x i8> undef, <2 x i8> undef, <2 x i32> <i32 3, i32 0>
%v4i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>		%v4i8 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 5, i32 7, i32 1, i32 0>
%v8i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>		%v8i8 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 14, i32 6, i32 5, i32 4, i32 13, i32 2, i32 1, i32 0>
%v16i8 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>		%v16i8 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 17, i32 11, i32 20, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

%v2i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 3, i32 0>		%v2i16 = shufflevector <2 x i16> undef, <2 x i16> undef, <2 x i32> <i32 3, i32 0>
Show All 31 Lines