This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
-
reduce-and.ll
-
reduce-or.ll
1/2
reduce-xor.ll

Differential D151184

[AArch64] Adjust costs of i1 and/or/xor reductions
ClosedPublic

Authored by dmgreen on May 23 2023, 12:48 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samtebbs
Sp00ph
jaykang10

Commits

rGe79fac2968dc: [AArch64] Adjust costs of i1 and/or/xor reductions

Summary

This expands the reduction cost of i1 and/or/xor, so that larger type sizes get handled by the existing code. For i1 reductions, and will use maxv, or will use minv and xor will use addv, plus the cost of legalizing the type for larger vectors using and/or/xor. The i1 vectors will be legalized to higher width integers (say v16i8), which this overrides the cost of. As with all i1 vectors there is a chance that the types the i1 vector is created with and how it is used will not match, introducing extra extends that are not necessarily costmodelled.
https://godbolt.org/z/6Gc9K6b7T

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.May 23 2023, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 12:48 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.May 23 2023, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 12:48 AM

david-arm added a subscriber: david-arm.May 23 2023, 1:29 AM

david-arm added inline comments.

llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll
20	Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do: ptrue p0.h, vl8 eorv h0, p0, z0.h fmov w0, s0 whereas I see we currently do ext v1.16b, v0.16b, v0.16b, #8 eor v0.8b, v0.8b, v1.8b fmov x8, d0 eor x8, x8, x8, lsr #32 lsr x9, x8, #16 eor w0, w8, w9

dmgreen added a child revision: D151189: [AArch64] Increase the cost of i1 inserts / extracts.May 23 2023, 1:35 AM

Harbormaster completed remote builds in B233790: Diff 524587.May 23 2023, 1:43 AM

ping

llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll
20	OK cool.

samtebbs accepted this revision.May 31 2023, 8:54 AM

This revision is now accepted and ready to land.May 31 2023, 8:54 AM

This revision was landed with ongoing or failed builds.Jun 1 2023, 1:28 AM

Closed by commit rGe79fac2968dc: [AArch64] Adjust costs of i1 and/or/xor reductions (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGe79fac2968dc: [AArch64] Adjust costs of i1 and/or/xor reductions.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

7 lines

test/

Analysis/

CostModel/

AArch64/

reduce-and.ll

6 lines

reduce-or.ll

6 lines

reduce-xor.ll

12 lines

Diff 527319

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 3,202 Lines • ▼ Show 20 Lines	case ISD::ADD:
break;		break;
case ISD::XOR:		case ISD::XOR:
case ISD::AND:		case ISD::AND:
case ISD::OR:		case ISD::OR:
const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy);		const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy);
if (!Entry)		if (!Entry)
break;		break;
auto *ValVTy = cast<FixedVectorType>(ValTy);		auto *ValVTy = cast<FixedVectorType>(ValTy);
if (!ValVTy->getElementType()->isIntegerTy(1) &&		if (MTy.getVectorNumElements() <= ValVTy->getNumElements() &&
MTy.getVectorNumElements() <= ValVTy->getNumElements() &&
isPowerOf2_32(ValVTy->getNumElements())) {		isPowerOf2_32(ValVTy->getNumElements())) {
InstructionCost ExtraCost = 0;		InstructionCost ExtraCost = 0;
if (LT.first != 1) {		if (LT.first != 1) {
// Type needs to be split, so there is an extra cost of LT.first - 1		// Type needs to be split, so there is an extra cost of LT.first - 1
// arithmetic ops.		// arithmetic ops.
auto *Ty = FixedVectorType::get(ValTy->getElementType(),		auto *Ty = FixedVectorType::get(ValTy->getElementType(),
MTy.getVectorNumElements());		MTy.getVectorNumElements());
ExtraCost = getArithmeticInstrCost(Opcode, Ty, CostKind);		ExtraCost = getArithmeticInstrCost(Opcode, Ty, CostKind);
ExtraCost *= LT.first - 1;		ExtraCost *= LT.first - 1;
}		}
return Entry->Cost + ExtraCost;		// All and/or/xor of i1 will be lowered with maxv/minv/addv + fmov
		auto Cost = ValVTy->getElementType()->isIntegerTy(1) ? 2 : Entry->Cost;
		return Cost + ExtraCost;
}		}
break;		break;
}		}
return BaseT::getArithmeticReductionCost(Opcode, ValTy, FMF, CostKind);		return BaseT::getArithmeticReductionCost(Opcode, ValTy, FMF, CostKind);
}		}

InstructionCost AArch64TTIImpl::getSpliceCost(VectorType *Tp, int Index) {		InstructionCost AArch64TTIImpl::getSpliceCost(VectorType *Tp, int Index) {
static const CostTblEntry ShuffleTbl[] = {		static const CostTblEntry ShuffleTbl[] = {
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-and.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s			; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @reduce() {			define void @reduce() {
	; CHECK-LABEL: 'reduce'			; CHECK-LABEL: 'reduce'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.and.v1i1(<1 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.and.v2i1(<2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.and.v16i1(<16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 91 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.and.v32i1(<32 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 181 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V64 = call i1 @llvm.vector.reduce.and.v64i1(<64 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 362 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V128 = call i1 @llvm.vector.reduce.and.v128i1(<128 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.and.v1i8(<1 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.and.v1i8(<1 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.and.v4i8(<4 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.and.v4i8(<4 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.and.v8i8(<8 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.and.v16i8(<16 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.and.v16i8(<16 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.and.v32i8(<32 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.and.v32i8(<32 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.and.v64i8(<64 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.and.v64i8(<64 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.and.v4i16(<4 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.and.v4i16(<4 x i16> undef)
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-or.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s			; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @reduce() {			define void @reduce() {
	; CHECK-LABEL: 'reduce'			; CHECK-LABEL: 'reduce'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.or.v1i1(<1 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.or.v8i1(<8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.or.v16i1(<16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 91 for instruction: %V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.or.v32i1(<32 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 181 for instruction: %V64 = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V64 = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 362 for instruction: %V128 = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V128 = call i1 @llvm.vector.reduce.or.v128i1(<128 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.or.v1i8(<1 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.or.v1i8(<1 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.or.v3i8(<3 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.or.v3i8(<3 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.or.v4i8(<4 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.or.v4i8(<4 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.or.v8i8(<8 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.or.v8i8(<8 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.or.v16i8(<16 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.or.v16i8(<16 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.or.v32i8(<32 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.or.v32i8(<32 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.or.v64i8(<64 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.or.v64i8(<64 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.or.v4i16(<4 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.or.v4i16(<4 x i16> undef)
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s			; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @reduce() {			define void @reduce() {
	; CHECK-LABEL: 'reduce'			; CHECK-LABEL: 'reduce'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.xor.v1i1(<1 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i1 @llvm.vector.reduce.xor.v1i1(<1 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.xor.v2i1(<2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i1 @llvm.vector.reduce.xor.v2i1(<2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4 = call i1 @llvm.vector.reduce.xor.v4i1(<4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4 = call i1 @llvm.vector.reduce.xor.v4i1(<4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V8 = call i1 @llvm.vector.reduce.xor.v8i1(<8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8 = call i1 @llvm.vector.reduce.xor.v8i1(<8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V16 = call i1 @llvm.vector.reduce.xor.v16i1(<16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16 = call i1 @llvm.vector.reduce.xor.v16i1(<16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 127 for instruction: %V32 = call i1 @llvm.vector.reduce.xor.v32i1(<32 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = call i1 @llvm.vector.reduce.xor.v32i1(<32 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 309 for instruction: %V64 = call i1 @llvm.vector.reduce.xor.v64i1(<64 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V64 = call i1 @llvm.vector.reduce.xor.v64i1(<64 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 673 for instruction: %V128 = call i1 @llvm.vector.reduce.xor.v128i1(<128 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V128 = call i1 @llvm.vector.reduce.xor.v128i1(<128 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.xor.v1i8(<1 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %V1i8 = call i8 @llvm.vector.reduce.xor.v1i8(<1 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.xor.v3i8(<3 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V3i8 = call i8 @llvm.vector.reduce.xor.v3i8(<3 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.xor.v4i8(<4 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i8 = call i8 @llvm.vector.reduce.xor.v4i8(<4 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8i8 = call i8 @llvm.vector.reduce.xor.v8i8(<8 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.xor.v16i8(<16 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V16i8 = call i8 @llvm.vector.reduce.xor.v16i8(<16 x i8> undef)
				david-armUnsubmitted Not Done Reply Inline Actions Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do: ptrue p0.h, vl8 eorv h0, p0, z0.h fmov w0, s0 whereas I see we currently do ext v1.16b, v0.16b, v0.16b, #8 eor v0.8b, v0.8b, v1.8b fmov x8, d0 eor x8, x8, x8, lsr #32 lsr x9, x8, #16 eor w0, w8, w9 david-arm: Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions OK cool. dmgreen: OK cool.
	; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.xor.v32i8(<32 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V32i8 = call i8 @llvm.vector.reduce.xor.v32i8(<32 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.xor.v64i8(<64 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V64i8 = call i8 @llvm.vector.reduce.xor.v64i8(<64 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.xor.v4i16(<4 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i16 = call i16 @llvm.vector.reduce.xor.v4i16(<4 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V8i16 = call i16 @llvm.vector.reduce.xor.v8i16(<8 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %V8i16 = call i16 @llvm.vector.reduce.xor.v8i16(<8 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V16i16 = call i16 @llvm.vector.reduce.xor.v16i16(<16 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V16i16 = call i16 @llvm.vector.reduce.xor.v16i16(<16 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2i32 = call i32 @llvm.vector.reduce.xor.v2i32(<2 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V2i32 = call i32 @llvm.vector.reduce.xor.v2i32(<2 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V4i32 = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V4i32 = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V8i32 = call i32 @llvm.vector.reduce.xor.v8i32(<8 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V8i32 = call i32 @llvm.vector.reduce.xor.v8i32(<8 x i32> undef)
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines